Professional Documents
Culture Documents
Excel Handbook
Excel Handbook
Data Analysis
Introduction
This handbook is designed to instruct program staff on how to set up data entry processes and
perform simple analyses of data collected through surveys, course evaluations, or by observation
or other record keeping.
Throughout this handbook, a common example is used: data representing the results from six
surveys completed by fictional participants of a fictional training program. A copy of the
completed surveys can be found in Appendix I. The reader may find it helpful to review the
surveys before continuing with the rest of the handbook. In fact, it might be beneficial to pull
out the six surveys and refer to them periodically while reviewing the handbook.
The individual conducting the data analysis is referred to in this handbook as the analyst. This
person may be a program staff member, volunteer, board member or other stakeholder willing to
accomplish this task. There is no job description for this analyst. He or she needs only to have a
basic understanding of Microsoft Excel, know how to perform calculations using the contents of
multiple cells, and be familiar with formulas. Reminders about using Excel are found in text
boxes throughout the handbook.
Good luck!
Excel for Data Analysis was written by National Research Center, Inc.
3005 30th Street, Boulder, Colorado 80301
Phone: 303-444-7863 Fax: 303-444-1145 www.n-r-c.com
Copyright 2003 by National Research Center, Inc. All rights reserved.
Page 1
3005 30th St. Boulder, CO 80301 (303) 444-7863
Before beginning the data entry, it is advisable to put a unique identifier on each survey or data
form. This will allow the analyst to keep track of his/her progress, and will also make it easier to
track down and set straight any data entry errors. This identifier is not one that actually
associates or identifies the survey with a particular person; rather, it is only to make it easier to
find a specific survey at a later date. The surveys do not need to be in any particular order, just
begin at the top of the stack with 1, and number consecutively.
Setting up the Worksheet
To set up a worksheet for data entry, the analyst will use the first row (row 1) as the question or
question part labels. Dedicate the first column (column A) to the IDs. Thus, the analyst will put
the label ID in cell A1. Cell B1 would contain the label q1 (for question #1) or whatever is
appropriate for the first question or field of data. Cell B2 would contain the label q2 (or
whatever is appropriate), etc.
Each survey will then be entered into one row; the first survey in row 2 (ID #1), the second
survey in row 3, and so on.
Reminder: Cell References
Cells in an Excel spreadsheet are referred to by the intersection of the Column and Row in
which they appear. In the example used for this handbook, the cell that contains the label ID
is cell A1, because it is in the first column (A) and the first row (1). The cell that contains the
answer to question #1 of the third survey entered is B4 (the 2nd column and the 4th row).
Entering Single-Response, Closed-Ended Questions
A closed-ended question means that the respondent chooses an answer by marking a box or
circling a number from a given list of possible responses. A single-response question means
that the respondent is to only choose one answer from the list.
Question #1 (shown below) from the example survey represents a single-response, closed-ended
question.
1) How many of the training sessions did you attend?
1 to 2
3 to 4
5
6 or more
When entering and analyzing data, it is easiest to work with numbers. To do this, a number is
assigned to each possible response option:
1 to 2 = 1,
3 to 4 = 2,
5 = 3, and
6 or more = 4.
Excel for Data Analysis
National Research Center Inc.
Page 2
3005 30th St. Boulder, CO 80301 (303) 444-7863
There are two ways the data could be entered from a question of this type. In the first method, a
number is assigned to each response, similar to a single-response question. However, more than
one column is assigned to the question. The number of columns assigned should be as many as
the highest number of answers the analyst
believes that the respondent may give; if
necessary, assign as many columns as there
are possible responses (in case a respondent
checks every box). In the example at left, 3
columns were assigned to question #2, and
the answers entered as shown.
Page 3
3005 30th St. Boulder, CO 80301 (303) 444-7863
analysis, when calculating the percent of respondents giving each answer. The example below
shows how the data could be entered for question #2 using this approach.
Page 4
3005 30th St. Boulder, CO 80301 (303) 444-7863
An open-ended question is one in which respondents are invited to answer in their own words,
rather than from a list of responses. Question #6 on the fictional survey represents an
open-ended question.
6) Do you have any other comments you would like to make about this training?
________________________________________________________________________
________________________________________________________________________
Depending on the type of open-ended question asked, the analyst may or may not wish to enter
these responses into the dataset at the same time as the other questions are entered. These
questions could be entered later into an appendix for a report, or they could be read and assigned
codes; that is, like answers could be grouped into categories. Each category or code could be
assigned a number, and these codes entered into the dataset in a manner similar to the examples
shown above.
For this fictional survey, the answers to Question #6 were deemed short enough to enter verbatim
into the dataset, as shown in the example below:
The answers were entered into the dataset as written in by respondents, as shown below, but then
codes were assigned: 1=Latino/a; 2=Asian; 3=White/Caucasian.
Page 5
3005 30th St. Boulder, CO 80301 (303) 444-7863
Creating a Codebook
The examples above showed how the data entry would occur for each type of question.
Generally, the analyst will want to set up the data entry spreadsheet before beginning the data
entry. By knowing how to enter each type of question, the analyst can determine which
questions will be entered into each column, being sure to reserve the first column for the IDs.
Appendix II shows the codebook for the fictional survey being used as an example in this
handbook. The ID is in column A (shown with a circle around it), question #1 is in column B,
question #2, using the first version of multiple-response data entry, is in columns C through E,
while question #2 using the second version of multiple-response data entry is in columns F
through K (in this example, the others were ignored), the three parts of question #3 are in
columns L, M and N, and so on. This codebook also shows the numeric equivalents assigned to
each question response.
It is a good idea to hang on to this codebook. It will serve as a customized guide in data entry,
and in the analysis of the data once the dataset has been created. The example below shows the
entered data for the surveys shown in Appendix I.
(Note: the columns for the open-ended questions were shrunk to allow all the columns to show.)
Page 6
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 7
3005 30th St. Boulder, CO 80301 (303) 444-7863
Calculating an Average
Calculating the average of a range of cells is a fairly simple procedure within Excel, and
appropriate for certain types of data. For example, in the fictional survey for our training
program, one of the questions asks respondents to report their annual household income. The
average annual income of participants could be calculated and reported.
Excel for Data Analysis
National Research Center Inc.
Page 8
3005 30th St. Boulder, CO 80301 (303) 444-7863
The function AVERAGE would be used to make this calculation. As shown in the table
below, to create this formula an equals sign (=) is first typed, followed by the function, with the
range of cells proceeding the function in parentheses.
Page 9
3005 30th St. Boulder, CO 80301 (303) 444-7863
To get a count of the number of responses to each of the other possible answers, use the same
formula, but change the criteria each time. (See the formulas in cells C14, C15, and C16.)
In this example, no participants attended 1 to 2 sessions, three participants attended 3 to 4
sessions, two participants attended 5 sessions, and one participant attended 6 or more sessions.
Page 10
3005 30th St. Boulder, CO 80301 (303) 444-7863
To know the proportion (percent) of respondents attending 6 or more times, the analyst would
want to divide the number who gave that answer by the total number of those who answered the
question. The SUM function can be used to total the number of respondents who answered that
question. In the example above, the formula would be: =SUM(B13:B16). In the table below,
that formula was entered into cell B11.
To determine the proportion of people giving that answer, the contents of cell B16 would need to
be divided by cell B11. As shown below, those results are displayed in cell B22. The formulas
showing the formulas for calculating the proportion giving each answer to question #1 are also
shown.
Page 11
3005 30th St. Boulder, CO 80301 (303) 444-7863
The analyst may then wish to also calculate the proportion of youth served who are 15-17 years old. If
the contents of cell C2 were copied to cell C3, the formula would look like this: =B3/B6. This is because
in Excel the cell references in this formula are relative references; that is, Excel has assumed that
because in cell C2 the calculated number was derived by dividing the number in the same row and one
column to the left by the number three rows below and one column to the left, the same thing should
happen in the cell to which the formula is copied. However, cell B6 is blank, so an invalid number would
be calculated in cell C3 using this formula. This can be fixed by changing the formula after it has been
copied, so that the denominator refers to B5. But, if the formula is then copied to cell C4, the
denominator would again have to be manually changed in the formula to refer to the correct cell that
contains the total number of youth served. If this manual change was not made, the formulas in
column C would look like the formulas in column D in the table below.
If, however, an absolute reference was used to refer to the row that contains the total number of youth
served, when the formula was copied, the denominator would always refer to row 5. The dollar sign ($) is
used to indicate an absolute reference. In this example, it is only used for the row designation, not for the
column designation. It can be used for both the row and column designation, or only one or the other.
Excel defaults to assuming that all cell references are relative, unless the change is made manually.
Knowing how to use relative and absolute references can greatly speed up creation of spreadsheets in
Excel.
Page 12
3005 30th St. Boulder, CO 80301 (303) 444-7863
The approach to be used to calculate the results to a multiple-response question depends upon the
approach used to enter the data.
If the data have been entered such using the first approach described, where a numeric
assignment is made for each possible response, but more than one column is designated for entry
of the results (as in columns D, E and F in the table below), then the counts and proportions can
be calculated in a manner quite similar to that of an single-response question. The change would
be in the definition of the range of cells to include in the count. Instead of covering only one
column, it would cover multiple columns. In this example, the number of people who said they
heard of the program through the neighborhood newsletter would be determined using the
formula:
=COUNTIF(D2:F7,1)
Calculating the percent of respondents who heard of the program through the neighborhood
newsletter would also be changed slightly. Instead of dividing the number of respondents giving
a specific answer by the sum of the cells F13 through F17 (which would be the total number of
responses, not respondents answering the question), the denominator is the total number of
respondents answering the question.
To determine this, the number of valid answers entered in column D would need to be examined.
This can be done using the COUNT function. This formula is not shown in the table below, but
would be entered in cell D11 as follows:
=COUNT(D2:F7)
This function counts the number of non-blank answers in the range of cells specified. In this
case, every respondent gave at least one answer, so the total is 6, the same as the number of
returned surveys. This same formula (with the correct cell range specification) was used in
cells E11 and F11. The numbers displayed there designate the number of people who gave two
or more answers (4 people, see cell E11) or three answers (1 person, see cell F11).
It should be noted when reporting the percentages to a multiple response question that the
percents will add to more than 100%, as respondents can give more than one answer.
Page 13
3005 30th St. Boulder, CO 80301 (303) 444-7863
If the answers to question #2 were entered as shown in columns H through M, where each
possible answer was assigned to a column, and a 1 was used to designate when a box was
checked, then a slightly different approach is needed to create the frequency distribution.
First, to get the total number of respondents who gave an answer, column H needs to be
appropriately analyzed. In this instance, a 1 was entered if a respondent gave no answer to the
question, and a 2 was entered if a respondent gave at least one answer. The formula in
cell H11 (not shown in the table below) was =COUNTIF($H$2:$H$7,2), to count the number of
valid answers to question #2. This formula was copied to cells I11, J11, K11, L11 and M11.
To determine the number of people who indicated each potential source of familiarity with the
training, the number of 1 responses in each column was counted, using the COUNTIF
function. The formula for cell M13 (the number of respondents indicating they heard of the
program by word of mouth) is shown in cell N13. A similar formula was used for each of the
other responses.
Next, to determine the proportion of respondents each of those counts represented, the counts
were divided by the number of valid responses to question #2. As shown in cell M19, 33% of
respondents reported they had heard of the training by word of mouth. The formula used to
make that calculation is shown in cell N19. A similar formula was used for each of the other
responses.
Again, it should be noted when reporting the percentages to a multiple response question that the
percents will add to more than 100%, as respondents can give more than one answer.
Page 14
3005 30th St. Boulder, CO 80301 (303) 444-7863
The table on the next page displays the functions used to perform the analyses described in this
handbook. The examples all refer to the spreadsheet and examples shown in Appendix III.
Page 15
3005 30th St. Boulder, CO 80301 (303) 444-7863
example:
value
displayed:
ROWS
=ROWS(B2:B7)
6 surveys were
returned
AVERAGE
range of cells
containing the values
to be averaged
=AVERAGE(AH2:AH7)
$29,000
MIN
range of cells
containing the values
to be examined
=MIN(AH2:AH7)
$15,000
MAX
range of cells
containing the values
to be examined
=MAX(AH2:AH7)
$57,000
COUNTIF
=COUNTIF(B$2:B$7,3)
2 people gave an
answer of 5 times
question #1
SUM
range of cells to be
totaled
=SUM(B13:B16)
6 people answered
question #1
Calculate . . .
by . . .
COUNT
(division)
[cell reference1]/[cell reference2]
range of cells to be
examined
=COUNT(E2:E7)
=B15/B$11
33%
what it means:
33% of respondents
gave an answer of
5 times to
question #1
Page 16
3005 30th St. Boulder, CO 80301 (303) 444-7863
Staff can write a cover memo or report to accompany the annotated instrument that explains the
methods used to obtain the data and interprets the results.
An example copy of an annotated instrument can be found in Appendix IV.
The term annotated instrument is one created by and used by staff at National Research Center, Inc. It is NOT a
commonly used evaluation term, but one that we think is descriptive.
Excel for Data Analysis
National Research Center Inc.
Page 17
3005 30th St. Boulder, CO 80301 (303) 444-7863
In general, when the named range of cells will be used for creating pivot tables, it is a good idea to name
the range Database. This is the default name used by Excel in the pivot table wizard. The Define
Name dialogue box above shows that the name Database has been typed in. The field labeled Refers
to: shows that Database will refer to the cells starting at A1 and going to W7 in the worksheet labeled
Data Entry. These are the cells that contain the data entered for the fictional survey.
Once a range of cells has been defined, pivot tables can be created from those data. It is easiest
to create the pivot tables on another worksheet within the workbook.
Page 18
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 19
3005 30th St. Boulder, CO 80301 (303) 444-7863
Before the analyst sets up the pivot table, he or she should place the cursor in the cell where it is
desired to generate the pivot table. To set up a pivot table, go to the Data menu, then select
PivotTable and PivotChart Report The PivotTable and PivotChart Wizard will walk one
through the rest of the set up. In the example below, the pivot table will be placed in cell B4.
Page 20
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 21
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 22
3005 30th St. Boulder, CO 80301 (303) 444-7863
Sometimes it is useful to analyze the data based on certain respondent characteristics; for
example, satisfaction ratings by gender or program attended. One of the easiest ways to generate
a table like this is through the use of a PivotTable.
The example to the right shows the PivotTable layout and resulting table to perform a
crosstabulation of the results to question #5 How would you rate the overall quality of this
training? by the gender of the respondent. (Of course, crosstabulations are recommended with
larger datasets than that created for these
examples, with sufficient number of cases
within each subgroup examined.)
This PivotTable Layout: (Q9, gender, is
placed in the column area, while q5, quality
rating, is placed in the row area. ID is
again used for the data section.)
produces:
Page 23
3005 30th St. Boulder, CO 80301 (303) 444-7863
The analysis in the previous example could also be performed using the average quality rating,
on a scale from 1 to 4, where 4 = excellent and 1 = poor.
produces:
Page 24
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 25
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 26
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 27
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 28
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 29
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 30
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 31
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 32
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 33
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 34
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 35
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 36
3005 30th St. Boulder, CO 80301 (303) 444-7863
Page 37
3005 30th St. Boulder, CO 80301 (303) 444-7863
Very
Poor
Poor
Good
Very
Good
17%
67%
17%
25%
50%
25%
0%
80%
0%
Disagree
Agree
Strongly
Agree
20%
60%
20%
This training will help improve the quality of like for my family................. 0%
17%
50%
33%
4) Rate the extent to which you agree or disagree with each of the following statements.
Strongly
Disagree
Poor
Fair
Good
Excellent
5) How would you rate the overall quality of this training? ...............................17%
0%
33%
50%
6) Do you have any other comments you would like to make about this training?
I think we spent too much time reviewing the background information.
I had a lot of fun. I thought Angela was great.
This was great! I will definitely apply what I learned at work and at home!
7) What is your race?
50% Latino/a
17% Asian
33% White
8) How long have you lived in Colorado?
17% 6 years
50% 7 years
33% 8 years
9) What is your gender?
50% Female
50% Male