Analsing Quantitative Data 2020-21

12.
DATA ANALYSIS
 Learning outcomes
 By the end of this chapter, you should be able to:
 • identify the main issues that you need to consider when preparing
 quantitative data for analysis and when analysing these data by
 computer;
 • recognise different types of data and understand the implications of data
 type for subsequent analyses;
 • create a data matrix and to code data for analysis by computer;
 • select the most appropriate tables and diagrams to explore and illustrate
 different aspects of your data;
 • select the most appropriate statistics to describe individual variables and
 to examine relationships between variables and trends in your data;
 • interpret the tables, diagrams and statistics that you use correctly.
12. ANALYSING QUANTITATIVE DATA
 12.1 Introduction
 Data analysis is the stage succeeding the data collection.
 Quantitative data in a raw form, that is, before these data have
been processed and analysed, convey very little meaning to most
people. These data, therefore, need to be processed to make them
useful, that is, to turn them into information.
 Statistical thinking process based on data for constructing
statistical models for decision making under uncertainty suggests
that data are crude information and not knowledge by themselves.
 The sequence from data to knowledge is : from data to
information, from information to facts, and finally, from facts to
knowledge , which is the ultimate output of any business and
management research
 The data analysis is the stage which is responsable for
transforming the data collected to knowledge in line with
research questions and objectives.
 To be useful these data need to be analysed and interpreted.
Quantitative analysis techniques assist you in this process.
They range from creating simple tables or diagrams that show
the frequency of occurrence and using statistics such as indices
to enable comparisons, through establishing statistical
relationships between variables to complex statistical
modeling.
 12.2 Preparing, inputting and checking data
 If you intend to undertake quantitative analysis we recommend
that you consider the:
 type of data (scale of measurement);
 format in which your data will be input to the analysis

software;
 impact of data coding on subsequent analyses (for different
data types);
 methods you intend to use to check data for errors.
 Data types
 Quantitative data can be divided into two distinct groups:
categorical and numerical
 (i) Categorical data refer to data whose values cannot be
measured numerically but can be either classified into sets
(categories) according to the characteristics that identify or
describe the variable or placed in rank order (Berman Brown and
Saunders 2008).
 They can be further sub-divided into descriptive and ranked.
 For instance, car can be categorized into hatchback, saloon and

estate. These are known as descriptive data or nominal data as
it is impossible to define the category numerically or to rank it.
 Rather these data simply count the number of occurrences in each
category of a variable.
 descriptive data where there are only two categories are known as
dichotomous data, as the variable is divided into two categories,
such as the variable gender being divided into female and male
 Ranked (or ordinal) data are a more precise form of categorical
data. In such instances you know the relative position of each case
within your data set, although the actual numerical measures (such
as scores) on which the position is based are not recorded.
 Rating or scale questions, such as where a respondent is asked to
rate how strongly she or he agrees with a statement, collect ranked
(ordinal) data.
 (ii) Numerical data, which are sometimes termed
‘quantifiable’, are those whose values are measured or counted
numerically as quantities (Berman Brown and Saunders 2008).
 There are two possible ways of sub-dividing numerical data:
into interval or ratio data and, alternatively, into continuous
or discrete data.
 The Celsius temperature scale is a good example of an interval
scale. Although the difference between, say, 20°C and 30°C is
10°C it does not mean that 30°C is one and a half times as
warm.
 for ratio data, you can also calculate the relative
difference or ratio between any two data values for a
variable.
 For instance, if a multinational company makes a profit
of $300 000 in one year and $600 000 the following
year, we can say that profits have doubled.
 Data layout
 For other data collection methods, you will have to prepare and
enter your data for computer analysis. You therefore need to
be clear about the precise data layout requirements of your
analysis software.
 Virtually all analysis software will accept your data if they are
entered in table format. This table is called a data matrix
EXAMPLE OF A SIMPLE DATA MATRIX
LAY OUT
Respondent Id Variable 1 Variable 2 Variable 3 Variable 4
Jesica 1 10 8 16 9
Jamila 2 16 12 14 7
Tom 3 21 7 18 5
Talib 4 19 13 11 4
 Coding
 Almost all data types are recorded using numerical codes as
they enables you to enter the data quickly using the numeric
keypad on your keyboard with fewer errors. It also facilitate
subsequent analyses easier.
sold not sold not sure
type of item (code =1) code = (2) code = (3)
onion 2 1 1
cabbage 3 2 1
spinach 1 1 1
garlic 3 3 2
 Entering data
 Once your data have been coded, you can enter them into the
computer. Increasingly, data analysis software contains
algorithms that check the data for obvious errors as it is
entered.
 Despite this, it is essential that you take considerable care to
ensure that your data are entered correctly. When entering data
the well-known maxim ‘rubbish in, rubbish out’ certainly
applies!
 More sophisticated analysis software allows you to attach
individual labels to each variable and the codes associated with
each of them.
 Checking for errors
 No matter how carefully you code and subsequently enter data
there will always be some errors. The main methods to check data
for errors are as follows:
 Look for illegitimate codes
 Look for illogical relationships
 Check that rules in filter questions are followed.
 Important: For each possible error, you need to discover whether

it occurred at coding or data entry and then correct it.
 Data checking is very time consuming and so is often not
undertaken. Beware: not doing it is very dangerous and can result
in incorrect results from which false conclusions are drawn.
 1.3 Exploring and presenting data
 Once your data have been entered and checked for errors, you
are ready to start your analysis. We have found Tukey’s (1977)
exploratory data analysis (EDA) approach useful in these
initial stages.
 This approach emphasises the use of diagrams to explore and
understand your data, emphasising the importance of using
your data to guide your choices of analysis techniques.
 As you would expect, we believe that it is important to keep
your research question(s) and objectives in mind when
exploring your data.
 Advantages of EDA
 the exploratory data analysis approach allows you flexibility to
introduce previously unplanned analyses to respond to new
findings.
 formalises the common practice of looking for other
relationships in data, which your research was not initially
designed to test and hence may suggest other fruitful avenues
for analysis.
 Computers make this relatively easy and quick.
 Process of analysis:
 begin exploratory analysis by looking at individual
variables
 and their components. The key aspects you may need to
consider , however, will be guided by your research
question(s) and objectives, and are likely to include
(Sparrow 1989):
 specific values;
 highest and lowest values;
 trends over time;
 proportions;
 distributions
 Once you have explored these, you can then begin to compare
and look for relationships between variables, considering in
addition (Sparrow 1989):
 conjunctions (the point where values for two or more variables
intersect);
 totals;
 interdependence and relationships.

 Check list to have a relevant structure and clearly
labeling each diagram and table to avoid possible
misinterpretation
 Designing your diagrams and tables
 For both diagrams and tables
 ✔ Does it have a brief but clear and descriptive title?
 ✔ Are the units of measurement used stated clearly?
 ✔ Are the sources of data used stated clearly?
 ✔ Are there notes to explain abbreviations and
 unusual terminology?
 ✔ Does it state the size of the sample

 For diagrams
 ✔ Does it have clear axis labels?
 ✔ Are bars and their components in the same logical
 sequence?
 ✔ Is more dense shading used for smaller areas?
 ✔ Have you avoided misrepresenting or distorting
 the data
 ✔ Is a key or legend included (where necessary)?
 For tables
 ✔ Does it have clear column and row headings?
 ✔ Are columns and rows in a logical sequence?

 Exploring and presenting individual variables
 To show specific values
 The simplest way of summarizing data for individual variables

so that specific values can be read is to use a table (frequency
distribution).
 For categorical data, the table summarizes the number of cases
(frequency) in each category. For variables where there are
likely to be a large number of categories (or values for
numerical data), you will need to group the data into categories
that reflect your research question(s) and objectives
 To show highest and lowest values
 Diagrams can provide visual clues, although both categorical and
numerical data may need grouping. For categorical and discrete data,
bar charts and histograms are both suitable. Generally, bar charts
provide a more accurate representation and should be used for
research reports, whereas histograms or pictograms convey a general
impression and can be used to gain an audience’s attention.
 To show a trend and proportion
 Trends can only be presented for variables containing numerical

longitudinal data. The most suitable diagram for exploring the trend
is a line graph (Anderson et al. 1999) in which your data values for
each time period are joined with a line to represent the trend.
 For proportion, use pie chart.
 To show the distribution of values
 Prior to using many statistical tests it is necessary to establish the
distribution of values for variables containing numerical data.
 This can be seen by plotting either a frequency polygon or a
histogram for continuous data or a frequency polygon or bar chart
for discrete data.
 If your diagram shows a bunching to the left and a long tail to the
right as , the data are positively skewed. If the converse is true ,
the data are negatively skewed. If your data are equally
distributed either side of the highest frequency then they are
symmetrically distributed. A special form of the symmetric
distribution, in which the data can be plotted as a bell-shaped
curve, is known as the normal distribution.
 Comparing variables
 To show specific values and interdependence
 As with individual variables the best method of finding

specific data values is a contingency table or cross-tabulation
which also enables you to examine interdependence between
the variables.
 Contingency table: number of insurance claims by gender,
Number of claims Male Female Total
2008
0 1,345 1,557 2,902
1 75 78 153
2 35 29 64
3 4 2 6
Total 1,459 1,666 3,125

 For variables where there are likely to be a large number of
categories (or values for numerical data), you may need to
group the data to prevent the table from becoming too large.
 Most statistical analysis software allows you to add totals, and
row and column percentages when designing your table.
Statistical analyses such as chi square can also be undertaken at
the same time.
ANALYSING QUANTITATIVE DATA
 To compare highest and lowest values
 Comparisons of variables that emphasise the highest and
lowest rather than precise values are best explored using
a multiple bar chart (Anderson et al. 1999), also known
as a compound bar chart.
 As for a bar chart, continuous data – or data where there
are many values or categories – need to be grouped.
Within any multiple bar chart you are likely to find it
easiest to compare between adjacent bars.
 To compare proportions
 Comparison of proportions between variables uses either
a percentage component bar chart or two or more pie
charts. Either type of diagram can be used for all data
types, provided that continuous data, and data where
there are more than six values or categories, are grouped.
 Percentage component bar charts are more
straightforward to draw than comparative pie charts
when using most spreadsheets. Within your percentage
component bar chart, comparisons will be easiest
between adjacent bars.
 To compare trends and conjunctions
 The most suitable diagram to compare trends for two or
more numerical (or occasionally ranked) variables is a
multiple line graph where one line represents each
variable (Henry 1995).
 You can also use multiple bar charts in which bars for
the
 same time period are placed adjacent.
 If you need to look for conjunctions in the trends – that

is, where values for two or more variables intersect – this
is where the lines on a multiple line graph cross.
 To compare totals
 Comparison of totals between variables uses a variation of the
bar chart. A stacked bar chart can be used for all data types
provided that continuous data and data where there are more
than six possible values or categories are grouped.
 As with percentage component bar charts, the design of the
stacked bar chart is dictated by the totals you want to
 compare. For this reason, sales for each quarter have been
stacked to give totals which can be compared between
companies.
 To compare proportions and totals
 To compare both proportions of each category or value and the
totals for two or more variables it is best to use comparative
proportional pie charts for all data types. For each
comparative proportional pie chart the total area of the pie
chart represents the total for that variable.
 By contrast, the angle of each segment represents the relative
proportion of a category within the variable .
 Because of the complexity of drawing comparative
proportional pie charts, they are rarely used for exploratory
data analysis, although they can be used to good effect in
research reports.
 To show the relationship between cases for variables
 You can explore possible relationships between ranked and
numerical data variables by plotting one variable against
another. This is called a scatter graph or scatter plot, and
each cross (point) represents the values for one case .
 Convention dictates that you plot the dependent variable –
that is, the variable that changes in response to changes in
the other (independent) variable – against the vertical axis.
 The strength of the relationship is indicated by the closeness
of the points to an imaginary straight line.
 The strength of this relationship can be assessed statistically
using techniques such as correlation or regression
 12.4 Describing data using statistics
 Descriptive statistics enable you to describe (and compare)
variables numerically. Your research question(s) and
objectives, although limited by the type of data, should guide
your choice of statistics. Statistics to describe a variable focus
on two aspects:
 the central tendency;
 the dispersion.
 12.4.1 Describing the central tendency
 There are three ways of describing data for business and

management research collected through sample or even for
entire population.
 They are mean, median and mode .
 Mean is a value, often known as the average, that
includes all data values in its calculation
 Median is the middle value or mid-point after the data
have been ranked
 Mode is defined as the value that occurs most frequently.
 12.4.2 Describing the dispersion
 it is important to describe how the data values are
dispersed around the central tendency.
 Two of the most frequently used ways of describing the
dispersion are the:
 difference within the middle 50 per cent of values (inter-
quartile range);
 extent to which values differ from the mean (standard
deviation).
 To state the difference between values
 the difference between the lowest and the highest values
– that is, the range.
 A more frequently used statistic is the inter-quartile
range.
 the median divides the range into two. The range can be
further divided into four equal sections called quartiles.
 The lower quartile is the value below which a quarter of
your
 data values will fall; the upper quartile is the value
above which a quarter of your data values will fall.
 The remaining half of your data values will fall between
the lower and upper quartiles. The difference between
the upper and lower quartiles is the inter-quartile range
(Morris 2003). As a consequence, it is concerned only
with the middle 50 per cent of data values and ignores
extreme values.
 These split your distribution into 100 equal parts is
called percentile.
 the lower quartile is the 25th percentile and the upper
 quartile the 75th percentile.
 Another alternative is to divide the range into 10 equal

parts called deciles.
 To describe and compare the extent by which values
differ from the mean
 Conceptually and statistically in research it is important
to look at the extent to which the data values for a
variable are spread around their mean, as this is what
you need to know to assess its usefulness as a typical
value for the distribution.
 If your data values are all close to the mean, then the
mean is more typical than if they vary widely. To
describe the extent of spread of numerical data you use
the standard deviation.
 You may need to compare the relative spread of data between
distributions of different magnitudes (e.g. one may be
measured in hundreds of tonnes, the other in billions of
 tonnes). To make a meaningful comparison you will need to
take account of these different magnitudes. A common way of
doing this is:
 to divide the standard deviation by the mean;
 then to multiply your answer by 100.
 This results in a statistic called the coefficient of variation

(Diamantopoulos and Schlegelmilch 1997).
 Assessing the strength of relationship
 If your data set contains ranked or numerical data, it is likely
that, as part of your exploratory data analysis, you will already
have plotted the relationship between cases for these ranked or
numerical variables using a scatter graph.
 Such relationships might include those between weekly sales
of a new product and those of a similar established product, or
age of employees and their length of service with the company.
 These examples emphasise the fact that your data can contain
two sorts of relationship:
 those where a change in one variable is accompanied by a
change in another variable but it is not clear which variable
caused the other to change, a correlation;
 those where a change in one or more (independent) variables
causes a change in another (dependent) variable, a cause-and-
effect relationship.
 To assess the strength of relationship between pairs of

variables
 A correlation coefficient enables you to quantify the strength
of the linear relationship between two ranked or numerical
variables.
 This coefficient (usually represented by the letter r) can take on
any value between 1 and 1. A value of 1 represents a perfect
positive correlation. This means that the two variables are
precisely related and that, as values of one variable increase,
values of the other variable will increase.
 By contrast, a value of 1 represents a perfect negative
correlation. Again, this means that the two variables are
precisely related; however, as the values of one variable increase
those of the other decrease.
 Correlation coefficients between 1 and 1 represent weaker
positive and negative correlations, a value of 0 meaning the
variables are perfectly independent. Within business research it
is extremely unusual to obtain perfect
 The two used most widely in business and management
research are Spearman’s rank correlation coefficient
(Spearman’s rho) and Kendall’s rank correlation coefficient
 (Kendall’s tau). Where data is being used from a sample, both
these rank correlation coefficients assume that the sample is
selected at random and the data are ranked (ordinal).
 To assess the strength of a cause-and-effect relationship
between variables
 the coefficient of determination (sometimes known as the
regression coefficient) enables you to assess the strength of
relationship between a numerical dependent variable and one
or more numerical independent variables.
Thank for your attention
THE END

Analsing Quantitative Data 2020-21

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analsing Quantitative Data 2020-21

Uploaded by

Copyright:

Available Formats

12.

 quantitative data for analysis and when analysing these data by

 • recognise different types of data and understand the implications of data

 type for subsequent analyses;

 • create a data matrix and to code data for analysis by computer;

 different aspects of your data;

 • select the most appropriate statistics to describe individual variables and

 to examine relationships between variables and trends in your data;

 format in which your data will be input to the analysis

 For instance, car can be categorized into hatchback, saloon and

Respondent Id Variable 1 Variable 2 Variable 3 Variable 4

 Look for illogical relationships

 Check that rules in filter questions are followed.

 Important: For each possible error, you need to discover whether

 highest and lowest values;

 trends over time;

 interdependence and relationships.

 For both diagrams and tables

 ✔ Does it have a brief but clear and descriptive title?

 ✔ Are the units of measurement used stated clearly?

 ✔ Are the sources of data used stated clearly?

 ✔ Are there notes to explain abbreviations and

 ✔ Does it state the size of the sample

 ✔ Are bars and their components in the same logical

 ✔ Is more dense shading used for smaller areas?

 ✔ Have you avoided misrepresenting or distorting

 ✔ Is a key or legend included (where necessary)?

 ✔ Does it have clear column and row headings?

 ✔ Are columns and rows in a logical sequence?

 The simplest way of summarizing data for individual variables

 Trends can only be presented for variables containing numerical

 As with individual variables the best method of finding

Total 1,459 1,666 3,125

 If you need to look for conjunctions in the trends – that

 12.4.1 Describing the central tendency

 There are three ways of describing data for business and

 quartile the 75th percentile.

 Another alternative is to divide the range into 10 equal

 then to multiply your answer by 100.

 This results in a statistic called the coefficient of variation

 To assess the strength of relationship between pairs of

You might also like