Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Excel for

Data Analysis

3005 30th Street


Boulder, CO 80301
303-444-7863
www.n-r-c.com

Excel for Data Analysis


Table of Contents
Introduction .................................................................................................... 1
Data Entry in Excel .......................................................................................... 2
Unique IDs ................................................................................................................................. 2
Setting up the Worksheet............................................................................................................. 2
Entering Single-Response, Closed-Ended Questions........................................................................ 2
Entering Multiple-Response Questions ........................................................................................ 3
Entering Open-Ended Questions ................................................................................................... 5
Creating a Codebook ................................................................................................................... 6

Analyzing the Data .......................................................................................... 8


Calculating an Average ................................................................................................................ 8
Creating a Frequency Distribution for a Single-Response Question ................................................. 10
Creating a Frequency Distribution for a Multiple-Response Question .............................................. 13
Functions and formulas used for simple descriptive analyses in Excel............................................. 15

Presenting the Results: One Quick Idea........................................................ 17


Using Pivot Tables for Basic and Advanced Analyses .................................... 18

Creating a PivotTables (Basic Analyses) .................................................................................... 20


Crosstabulation of Data Using PivotTables (Advanced Analyses) ................................................. 23

APPENDIX I: Example Completed Surveys for Data Entry ............................ 25


APPENDIX II: Example Codebook ................................................................. 32
APPENDIX III: Example Analysis, with Formulas.......................................... 33
APPENDIX IV: Example of an Annotated Instrument................................. 37

Introduction
This handbook is designed to instruct program staff on how to set up data entry processes and
perform simple analyses of data collected through surveys, course evaluations, or by observation
or other record keeping.
Throughout this handbook, a common example is used: data representing the results from six
surveys completed by fictional participants of a fictional training program. A copy of the
completed surveys can be found in Appendix I. The reader may find it helpful to review the
surveys before continuing with the rest of the handbook. In fact, it might be beneficial to pull
out the six surveys and refer to them periodically while reviewing the handbook.
The individual conducting the data analysis is referred to in this handbook as the analyst. This
person may be a program staff member, volunteer, board member or other stakeholder willing to
accomplish this task. There is no job description for this analyst. He or she needs only to have a
basic understanding of Microsoft Excel, know how to perform calculations using the contents of
multiple cells, and be familiar with formulas. Reminders about using Excel are found in text
boxes throughout the handbook.
Good luck!

The Staff of NRC

Excel for Data Analysis was written by National Research Center, Inc.
3005 30th Street, Boulder, Colorado 80301
Phone: 303-444-7863 Fax: 303-444-1145 www.n-r-c.com
Copyright 2003 by National Research Center, Inc. All rights reserved.

Excel for Data Analysis


National Research Center Inc.

Page 1
3005 30th St. Boulder, CO 80301 (303) 444-7863

Data Entry in Excel


The first job to be completed before data analysis of a data set is creating an electronic dataset, or
entering the data into an electronic file. This can be done fairly simply using Microsoft Excel.
Unique IDs

Before beginning the data entry, it is advisable to put a unique identifier on each survey or data
form. This will allow the analyst to keep track of his/her progress, and will also make it easier to
track down and set straight any data entry errors. This identifier is not one that actually
associates or identifies the survey with a particular person; rather, it is only to make it easier to
find a specific survey at a later date. The surveys do not need to be in any particular order, just
begin at the top of the stack with 1, and number consecutively.
Setting up the Worksheet

To set up a worksheet for data entry, the analyst will use the first row (row 1) as the question or
question part labels. Dedicate the first column (column A) to the IDs. Thus, the analyst will put
the label ID in cell A1. Cell B1 would contain the label q1 (for question #1) or whatever is
appropriate for the first question or field of data. Cell B2 would contain the label q2 (or
whatever is appropriate), etc.
Each survey will then be entered into one row; the first survey in row 2 (ID #1), the second
survey in row 3, and so on.
Reminder: Cell References
Cells in an Excel spreadsheet are referred to by the intersection of the Column and Row in
which they appear. In the example used for this handbook, the cell that contains the label ID
is cell A1, because it is in the first column (A) and the first row (1). The cell that contains the
answer to question #1 of the third survey entered is B4 (the 2nd column and the 4th row).
Entering Single-Response, Closed-Ended Questions

A closed-ended question means that the respondent chooses an answer by marking a box or
circling a number from a given list of possible responses. A single-response question means
that the respondent is to only choose one answer from the list.
Question #1 (shown below) from the example survey represents a single-response, closed-ended
question.
1) How many of the training sessions did you attend?
1 to 2
3 to 4
5
6 or more

When entering and analyzing data, it is easiest to work with numbers. To do this, a number is
assigned to each possible response option:
1 to 2 = 1,
3 to 4 = 2,
5 = 3, and
6 or more = 4.
Excel for Data Analysis
National Research Center Inc.

Page 2
3005 30th St. Boulder, CO 80301 (303) 444-7863

Thus, since the respondent to the first survey said they


attended 5 sessions, a 3 would be entered as the answer.
The example to the right shows how the answers to
question #1 would be entered for all six fictional surveys
(from Appendix I).

Entering Multiple-Response Questions

Question #2 from the fictional survey is a multiple-response question, meaning that


respondents could give more than one answer to the question; in this example, they may have
heard of the program from multiple sources.
2) How did you hear about this training? (Please check all that apply.)
Neighborhood newsletter
Bulletin boards in community buildings
Flyers
Your childs school
Word of mouth
Other

There are two ways the data could be entered from a question of this type. In the first method, a
number is assigned to each response, similar to a single-response question. However, more than
one column is assigned to the question. The number of columns assigned should be as many as
the highest number of answers the analyst
believes that the respondent may give; if
necessary, assign as many columns as there
are possible responses (in case a respondent
checks every box). In the example at left, 3
columns were assigned to question #2, and
the answers entered as shown.

The second approach to multiple-response questions is to assign a column to each possible


response. For the example question #2 (shown on the previous page), the following columns
would be assigned:
q2a: Neighborhood newsletter
q2b: Bulletin boards in community buildings
q2c: Flyers
q2d: Your childs school
q2e: Word of mouth
If a response was marked, place a 1 in the assigned column. If no response was given, leave it
blank, or place a 0 in the column. With this method, it is harder to know if a respondent
skipped a question altogether. The analyst may wish to have a column before q2a where he/she
marks whether or not the question was left blank (1=blank, 2=not blank). This will help in the
Excel for Data Analysis
National Research Center Inc.

Page 3
3005 30th St. Boulder, CO 80301 (303) 444-7863

analysis, when calculating the percent of respondents giving each answer. The example below
shows how the data could be entered for question #2 using this approach.

Reminder: Freeze Panes


Freezing the panes allows the labels at the top of the worksheet and the IDs at the left of the worksheet
to be always visible. To freeze the panes, put the cursor in the cell where the panes should break
(usually B2). Then select Windows from the menu bar, and then the option to Freeze Panes. This
option works as a toggle; that is, if this option is selected again, the panes will unfreeze. (If the panes
are frozen, the menu option will read Unfreeze Panes.) Using this option is quite helpful where there
are many variables (columns) or cases (surveys, records of data in rows).

Excel for Data Analysis


National Research Center Inc.

Page 4
3005 30th St. Boulder, CO 80301 (303) 444-7863

Entering Open-Ended Questions

An open-ended question is one in which respondents are invited to answer in their own words,
rather than from a list of responses. Question #6 on the fictional survey represents an
open-ended question.
6) Do you have any other comments you would like to make about this training?
________________________________________________________________________
________________________________________________________________________

Depending on the type of open-ended question asked, the analyst may or may not wish to enter
these responses into the dataset at the same time as the other questions are entered. These
questions could be entered later into an appendix for a report, or they could be read and assigned
codes; that is, like answers could be grouped into categories. Each category or code could be
assigned a number, and these codes entered into the dataset in a manner similar to the examples
shown above.
For this fictional survey, the answers to Question #6 were deemed short enough to enter verbatim
into the dataset, as shown in the example below:

However, the answers to Question #7 were considered appropriate for coding.


7) What is your race? ____________________

The answers were entered into the dataset as written in by respondents, as shown below, but then
codes were assigned: 1=Latino/a; 2=Asian; 3=White/Caucasian.

Excel for Data Analysis


National Research Center Inc.

Page 5
3005 30th St. Boulder, CO 80301 (303) 444-7863

Creating a Codebook

The examples above showed how the data entry would occur for each type of question.
Generally, the analyst will want to set up the data entry spreadsheet before beginning the data
entry. By knowing how to enter each type of question, the analyst can determine which
questions will be entered into each column, being sure to reserve the first column for the IDs.
Appendix II shows the codebook for the fictional survey being used as an example in this
handbook. The ID is in column A (shown with a circle around it), question #1 is in column B,
question #2, using the first version of multiple-response data entry, is in columns C through E,
while question #2 using the second version of multiple-response data entry is in columns F
through K (in this example, the others were ignored), the three parts of question #3 are in
columns L, M and N, and so on. This codebook also shows the numeric equivalents assigned to
each question response.
It is a good idea to hang on to this codebook. It will serve as a customized guide in data entry,
and in the analysis of the data once the dataset has been created. The example below shows the
entered data for the surveys shown in Appendix I.

(Note: the columns for the open-ended questions were shrunk to allow all the columns to show.)

Excel for Data Analysis


National Research Center Inc.

Page 6
3005 30th St. Boulder, CO 80301 (303) 444-7863

Reminder: Wrapping Text


Sometimes the text entered into a cell is too long for it to display in its entirety. To turn on text wrapping
(the text will automatically move to the next line if it runs out of room), highlight the cells to be formatted,
then choose Format from the menu bar, and then Cells.

Click on the Alignment tab, and check the box labeled,


Wrap text. Click the OK button to apply the
formatting.
The text should be wrapped in the cell. Note that
wrapping text will change the height of the rows.

Excel for Data Analysis


National Research Center Inc.

Page 7
3005 30th St. Boulder, CO 80301 (303) 444-7863

Analyzing the Data


Now that the data collected by the program has been entered into an electronic dataset, the
analyst is ready to start analyzing the information to get answers to the questions posed. This
next section will demonstrate how to use formulas and functions within Excel to produce the
statistics or summaries of the information needed.
Reminder: Formulas
Formulas are used to perform calculations within a spreadsheet. To insert a formula, as opposed to a
number or text, type an equals sign (=) in the cell where the calculation is to be performed, and then
type in the rest of the formula. A formula can perform mathematical calculations or execute a wide variety
of functions (see below for more on functions). To add or subtract, use the plus (+) or minus (-) symbol.
To multiply, use an asterisk (*) and to divide use a slash (/). Use parentheses as necessary to indicate
the desired order of operations.
For example, if the analyst wanted to know how many seconds there were in 3 hours, he or she could
type in the formula: =3*60*60. The result displayed in the cell would be 10,800.
There might have been a cell somewhere on the page that had a value of 3 to indicate three hours; for
the sake of an example, this cell is T21. To know how many seconds that represented, use the same
formula as above, but exchange the 3 for the cell reference: =T21*60*60. If the number of hours in cell
T21 changed, the result of the formula would also change.
Reminder: Functions and Referring to a range of cells
Functions can be used within formulas to perform special calculations or manipulations. There are a
large number and variety of functions that can be used in Excel. Some of the functions are mathematical,
some are logical, some are statistical, and others serve yet more purposes.
All functions begin in a similar fashion: the equals sign (=), the function, immediately followed by an open
parenthesis, the references on which the function should operate each separated by a comma (a different
number of references are needed for each function), and a close parenthesis.
For example, the SUM function can be used to add the values of several cells.
Some functions will refer to a range of cells. For example, if an analyst wanted to total the number of
youth served in the table below, a formula could be used like that found in cell B5: =B2+B3+B4.
Alternatively, the SUM function could be used which referred to a range of cells to be summed, like
this: = SUM(B2:B5). The colon indicates that a range of cells is being referred to, starting with (and
including) the cell to the left of the colon, and ending with (and including) the cell to the right of the colon.
The function SUM indicates what is to be done with this range of cells total all the values together.

Calculating an Average

Calculating the average of a range of cells is a fairly simple procedure within Excel, and
appropriate for certain types of data. For example, in the fictional survey for our training
program, one of the questions asks respondents to report their annual household income. The
average annual income of participants could be calculated and reported.
Excel for Data Analysis
National Research Center Inc.

Page 8
3005 30th St. Boulder, CO 80301 (303) 444-7863

The function AVERAGE would be used to make this calculation. As shown in the table
below, to create this formula an equals sign (=) is first typed, followed by the function, with the
range of cells proceeding the function in parentheses.

Reminder: Formatting cells


In many of the spreadsheet examples shown in this handbook, some of the cells are formatted as
numbers, and some are formatted as percents. You will want to format the cells appropriately. To format
a cell or group of cells, highlight the cells you wish to format, then choose Format from the menu bar,
and then Cells. A dialogue box will open, with a number of formatting options. You can format the
alignment of the cell contents, the cell shading or border, or the Number. If you choose the Number
tab, you will be presented with a list of types of number formats, such as currency, percentage, etc.
Choose the type, and then decide how many decimals you want. The highlighted cells will be formatted
according to the specifications you choose.

Excel for Data Analysis


National Research Center Inc.

Page 9
3005 30th St. Boulder, CO 80301 (303) 444-7863

Creating a Frequency Distribution for a Single-Response Question

Creating a frequency distribution, or a count and/or proportion of respondents giving each


response to a question, is an intuitively easy process. However, doing it within Excel for a large
number of cases is actually a multi-step procedure.
The first step is to count how many respondents gave each response. There is a function within
Excel that will help automate this step: COUNTIF. To use this function, specify two items:
- What range of cells contains the answers to the question of interest, and
- Which particular answer should be counted (the criterion).
The function is set up as: =COUNTIF(range of cells, criterion). To know how many people
attended the training program just one or two times, the analyst would want to count how many
times 1 (the numeric assignment for question #1 to the response 1 to 2) was entered as the
answer to question #1. The data for question #1 are in column B, and specifically in rows 2
through 7. The formula to enter to find out how many respondents said they attended one or two
sessions would be:
=COUNTIF(B2:B7,1)
The results can be seen in the table below in cell B13. The formula is shown to the right in cell C13.

To get a count of the number of responses to each of the other possible answers, use the same
formula, but change the criteria each time. (See the formulas in cells C14, C15, and C16.)
In this example, no participants attended 1 to 2 sessions, three participants attended 3 to 4
sessions, two participants attended 5 sessions, and one participant attended 6 or more sessions.

Excel for Data Analysis


National Research Center Inc.

Page 10
3005 30th St. Boulder, CO 80301 (303) 444-7863

To know the proportion (percent) of respondents attending 6 or more times, the analyst would
want to divide the number who gave that answer by the total number of those who answered the
question. The SUM function can be used to total the number of respondents who answered that
question. In the example above, the formula would be: =SUM(B13:B16). In the table below,
that formula was entered into cell B11.
To determine the proportion of people giving that answer, the contents of cell B16 would need to
be divided by cell B11. As shown below, those results are displayed in cell B22. The formulas
showing the formulas for calculating the proportion giving each answer to question #1 are also
shown.

Excel for Data Analysis


National Research Center Inc.

Page 11
3005 30th St. Boulder, CO 80301 (303) 444-7863

Reminder: Absolute versus relative cell references


In a formula, a cell reference can be made in a relative or an absolute manner. For example, looking
at the table below, if the analyst wanted to calculate a percent, he or she might create a formula in cell C2
which would display the proportion of youth served who are 12-14 years old. That formula would be:
=B2/B5, which would divide the value of B2 (12) by the value of B5 (112).

The analyst may then wish to also calculate the proportion of youth served who are 15-17 years old. If
the contents of cell C2 were copied to cell C3, the formula would look like this: =B3/B6. This is because
in Excel the cell references in this formula are relative references; that is, Excel has assumed that
because in cell C2 the calculated number was derived by dividing the number in the same row and one
column to the left by the number three rows below and one column to the left, the same thing should
happen in the cell to which the formula is copied. However, cell B6 is blank, so an invalid number would
be calculated in cell C3 using this formula. This can be fixed by changing the formula after it has been
copied, so that the denominator refers to B5. But, if the formula is then copied to cell C4, the
denominator would again have to be manually changed in the formula to refer to the correct cell that
contains the total number of youth served. If this manual change was not made, the formulas in
column C would look like the formulas in column D in the table below.
If, however, an absolute reference was used to refer to the row that contains the total number of youth
served, when the formula was copied, the denominator would always refer to row 5. The dollar sign ($) is
used to indicate an absolute reference. In this example, it is only used for the row designation, not for the
column designation. It can be used for both the row and column designation, or only one or the other.
Excel defaults to assuming that all cell references are relative, unless the change is made manually.
Knowing how to use relative and absolute references can greatly speed up creation of spreadsheets in
Excel.

Excel for Data Analysis


National Research Center Inc.

Page 12
3005 30th St. Boulder, CO 80301 (303) 444-7863

Creating a Frequency Distribution for a Multiple-Response Question

The approach to be used to calculate the results to a multiple-response question depends upon the
approach used to enter the data.
If the data have been entered such using the first approach described, where a numeric
assignment is made for each possible response, but more than one column is designated for entry
of the results (as in columns D, E and F in the table below), then the counts and proportions can
be calculated in a manner quite similar to that of an single-response question. The change would
be in the definition of the range of cells to include in the count. Instead of covering only one
column, it would cover multiple columns. In this example, the number of people who said they
heard of the program through the neighborhood newsletter would be determined using the
formula:
=COUNTIF(D2:F7,1)
Calculating the percent of respondents who heard of the program through the neighborhood
newsletter would also be changed slightly. Instead of dividing the number of respondents giving
a specific answer by the sum of the cells F13 through F17 (which would be the total number of
responses, not respondents answering the question), the denominator is the total number of
respondents answering the question.
To determine this, the number of valid answers entered in column D would need to be examined.
This can be done using the COUNT function. This formula is not shown in the table below, but
would be entered in cell D11 as follows:
=COUNT(D2:F7)
This function counts the number of non-blank answers in the range of cells specified. In this
case, every respondent gave at least one answer, so the total is 6, the same as the number of
returned surveys. This same formula (with the correct cell range specification) was used in
cells E11 and F11. The numbers displayed there designate the number of people who gave two
or more answers (4 people, see cell E11) or three answers (1 person, see cell F11).
It should be noted when reporting the percentages to a multiple response question that the
percents will add to more than 100%, as respondents can give more than one answer.

Excel for Data Analysis


National Research Center Inc.

Page 13
3005 30th St. Boulder, CO 80301 (303) 444-7863

If the answers to question #2 were entered as shown in columns H through M, where each
possible answer was assigned to a column, and a 1 was used to designate when a box was
checked, then a slightly different approach is needed to create the frequency distribution.
First, to get the total number of respondents who gave an answer, column H needs to be
appropriately analyzed. In this instance, a 1 was entered if a respondent gave no answer to the
question, and a 2 was entered if a respondent gave at least one answer. The formula in
cell H11 (not shown in the table below) was =COUNTIF($H$2:$H$7,2), to count the number of
valid answers to question #2. This formula was copied to cells I11, J11, K11, L11 and M11.
To determine the number of people who indicated each potential source of familiarity with the
training, the number of 1 responses in each column was counted, using the COUNTIF
function. The formula for cell M13 (the number of respondents indicating they heard of the
program by word of mouth) is shown in cell N13. A similar formula was used for each of the
other responses.
Next, to determine the proportion of respondents each of those counts represented, the counts
were divided by the number of valid responses to question #2. As shown in cell M19, 33% of
respondents reported they had heard of the training by word of mouth. The formula used to
make that calculation is shown in cell N19. A similar formula was used for each of the other
responses.
Again, it should be noted when reporting the percentages to a multiple response question that the
percents will add to more than 100%, as respondents can give more than one answer.

PivotTables cannot be used to calculate the frequency distribution of multiple response


questions.

Excel for Data Analysis


National Research Center Inc.

Page 14
3005 30th St. Boulder, CO 80301 (303) 444-7863

Reminder: Functions Revisited


SUM is only one of a large number of functions available in Excel. Some of the functions are
mathematical, some are logical, some are statistical, and others serve yet more purposes.
All functions begin in a similar fashion: the function, immediately followed by an open parenthesis, the
references on which the function should operate each separated by a comma (a different number of
references are needed for each function), and a close parenthesis. The functions needed for simple
descriptive analyses in Excel are shown below.

Functions and formulas used for simple descriptive analyses in Excel

The table on the next page displays the functions used to perform the analyses described in this
handbook. The examples all refer to the spreadsheet and examples shown in Appendix III.

Excel for Data Analysis


National Research Center Inc.

Page 15
3005 30th St. Boulder, CO 80301 (303) 444-7863

Functions and formulas used for simple descriptive analyses in Excel


operators are:

example:

value
displayed:

ROWS

range of cells for


which the number of
rows should be
counted

=ROWS(B2:B7)

6 surveys were
returned

calculating the average of


the ratings or answers
given by those who gave
an answer

AVERAGE

range of cells
containing the values
to be averaged

=AVERAGE(AH2:AH7)

$29,000

The average annual


income as reported
for question #10

the lowest number given as


an answer

examining the values in a


range of cells, and finding
the lowest value

MIN

range of cells
containing the values
to be examined

=MIN(AH2:AH7)

$15,000

The lowest annual


income as reported
for question #10

the highest number given


as an answer

examining the values in a


range of cells, and finding
the highest value

MAX

range of cells
containing the values
to be examined

=MAX(AH2:AH7)

$57,000

The highest annual


income as reported
for question #10

the number of respondents


who gave a specific
answer*

counting the number of


responses of a certain type
within a range of cells

COUNTIF

1) the range of cells to


be examined
2) the value to be
counted

=COUNTIF(B$2:B$7,3)

2 people gave an
answer of 5 times
question #1

the total number of


respondents who answered
the question**

adding the number of


people who gave a valid
answer to a question

SUM

range of cells to be
totaled

=SUM(B13:B16)

6 people answered
question #1

Calculate . . .

by . . .

the number of surveys


completed

counting the number of


rows of data entered
(regardless of whether
some cells/rows are blank)

the average rating or


answer of those who
responded

the total number of


respondents who answered
the question

counting the number of


nonblank answers

the proportion (percent) of


respondents who gave a
specific answer

dividing the number of


people who gave a specific
answer by the total number
of people who answered
the question

using the function or


formula . . .

COUNT

(division)
[cell reference1]/[cell reference2]

range of cells to be
examined

cell reference1 is the


cell reference of the
numerator; cell
reference2 is the cell
reference of the
denominator

=COUNT(E2:E7)

=B15/B$11

33%

what it means:

4 people gave two


or more answers to
question #2 (as

column E contains the


second answer people
gave to question #2)

33% of respondents
gave an answer of
5 times to
question #1

*This is used for each row or part of a frequency distribution.


** Or the sum of any list of numbers.
Excel for Data Analysis
National Research Center Inc.

Page 16
3005 30th St. Boulder, CO 80301 (303) 444-7863

Presenting the Results: One Quick Idea


Once the frequency distributions of the data set have been produced, how will the analyst and
other program staff share this information with others? The Excel spreadsheet is not very pretty.
One idea is to create an annotated instrument; that is, typing the results into a blank
questionnaire.1 Most evaluation forms or surveys have been created using word processing
software such as Word or WordPerfect, and thus are well-suited to this approach. A new file
should be created from the electronic version of the survey. The check boxes can then be
replaced with the proportion of respondents giving each answer. For example:
1) How many of the training sessions did you attend?
0% 1 to 2
50% 3 to 4
33% 5
17% 6 or more

Staff can write a cover memo or report to accompany the annotated instrument that explains the
methods used to obtain the data and interprets the results.
An example copy of an annotated instrument can be found in Appendix IV.

The term annotated instrument is one created by and used by staff at National Research Center, Inc. It is NOT a
commonly used evaluation term, but one that we think is descriptive.
Excel for Data Analysis
National Research Center Inc.

Page 17
3005 30th St. Boulder, CO 80301 (303) 444-7863

Using Pivot Tables for Basic and Advanced Analyses


Pivot tables are an analytic tool at the disposal of the Excel user. They take a bit of time to set
up, but can be very powerful. Pivot tables can be used as an alternate way to create frequency
distributions, although they cannot be used for multiple response questions. They can also be
used to create crosstabulations of data. For example, the analyst might wish to know whether
males and females have a different response to a training, or whether younger respondents feel
more positively about staff than older respondents.
A useful first step before creating a pivot table is to name the range of cells that will be used for
the analyses. This range of cells should include the first row with the variable names.
Reminder: Naming a Range of Cells
To name a range of cells, highlight all the columns and rows that make up the database. Choose Insert
from the menu bar, select Name and then Define

In general, when the named range of cells will be used for creating pivot tables, it is a good idea to name
the range Database. This is the default name used by Excel in the pivot table wizard. The Define
Name dialogue box above shows that the name Database has been typed in. The field labeled Refers
to: shows that Database will refer to the cells starting at A1 and going to W7 in the worksheet labeled
Data Entry. These are the cells that contain the data entered for the fictional survey.

Once a range of cells has been defined, pivot tables can be created from those data. It is easiest
to create the pivot tables on another worksheet within the workbook.

Excel for Data Analysis


National Research Center Inc.

Page 18
3005 30th St. Boulder, CO 80301 (303) 444-7863

Reminder: Worksheets within a Workbook (or Spreadsheet)


An Excel file is often referred to as a spreadsheet. This file, however, is comprised of a group of
worksheets. By default, a new workbook in Excel usually contains three worksheets. These are usually
labeled Sheet1, Sheet2, and Sheet3. The note below was entered in cell B7 on Sheet2. To see a
different worksheet, simply click on the tab of the worksheet to be viewed. To rename the worksheets,
double-click the tab and type a new name. Names are limited to a certain number of characters.

Excel for Data Analysis


National Research Center Inc.

Page 19
3005 30th St. Boulder, CO 80301 (303) 444-7863

Creating a PivotTables (Basic Analyses)

Before the analyst sets up the pivot table, he or she should place the cursor in the cell where it is
desired to generate the pivot table. To set up a pivot table, go to the Data menu, then select
PivotTable and PivotChart Report The PivotTable and PivotChart Wizard will walk one
through the rest of the set up. In the example below, the pivot table will be placed in cell B4.

The Pivot Table and PivotChart Wizard


Once PivotTable and Pivot Chart Report... has been selected from the Data menu, the Pivot
Table and PivotChart Wizard will start displaying a series of dialogue boxes. The first dialogue
box is shown below as Step 1 of 3. (Note: Different versions of Excel will have slightly different
Pivot Table and PivotChart Wizard dialogue boxes, but the steps to follow are the same or
similar.)
Step 1: Two questions are asked in Step 1 of the
Wizard. For the most part, the analyst will
select the Wizards default options. In answer
to the first question, the data to be analyzed is
an Excel list or database. In answer to the
second question, a PivotTable will be created.
(Note: PivotCharts are not discussed in this
handbook, but the analyst may wish to try this
option.)
Click Next to continue onto the next step of the
Wizard.
Excel for Data Analysis
National Research Center Inc.

Page 20
3005 30th St. Boulder, CO 80301 (303) 444-7863

Step 2: In Step 2, the Wizard asks for the


location of the data to be used in the PivotTable.
The name Database is automatically inserted
as the answer. If another named range is
desired, it can be typed into the field. If the
range of cells to be used has not been named, it
can be selected by clicking on the Browse
button.
Click Next to continue onto the next step of the Wizard.
Step 3: In Step 3, the Wizard asks where the
PivotTable should be placed. The default is the
location of the cursor when the Wizard was
started.
At this point, the analyst will choose the data to
be displayed in the PivotTable by clicking on
the Layout button. When this button is
clicked, another dialogue box is displayed
Layout: The Layout dialogue box displays all
the variables or fields available for display in
the PivotTable. These fields are shown as a
series of buttons in the right half of the dialogue
box. If there are a large number of fields, the
scroll button below the fields can be used to
show additional field buttons. In the left half of
a blank template is shown. To select a field for
display, simply drag the fields from the right
into the areas on the left.
To create a pivot table that displays the frequency of training attendances, the button q1 (How
many of the training sessions did you attend?) would be dragged into the row area, so that the
values in q1 will be listed vertically as rows. A field is also needed for the data section. It does
not really matter what button is dragged into the data section, as it will be used simply as a
counter. However, it should be a field that has no missing data; the ID field is ideal for this
situation. As shown above, the ID field was dragged into the data area. Usually by default the
field in the data area will be shown as a Count.
If a different summary is desired, double-click the button, and a dialogue box displaying various
options will be displayed.

Excel for Data Analysis


National Research Center Inc.

Page 21
3005 30th St. Boulder, CO 80301 (303) 444-7863

PivotTable Field: The Field dialogue box shown to


the left is displayed if a button in the data portion of
the template is double-clicked. In this example, the
data summary chosen is Count. In addition, if the
Options>> button is clicked, more options for the
display of the data are shown.

In this instance, it would be appropriate to display the


information as a proportion, so the option of showing the
data as: % of column was selected.

Format Cells: To choose a number format for the


data display, click on the Number button in the
PivotTable Field dialogue box. A Format Cells
dialogue box will be displayed, from which an
appropriate number format can be selected.

Excel for Data Analysis


National Research Center Inc.

Page 22
3005 30th St. Boulder, CO 80301 (303) 444-7863

After this, click the OK buttons


until the Step 3 dialogue box is
again showing. At this point, if
the Finish button is clicked, the
PivotTable will be displayed. In
this example, the PivotTable will
appear as shown to the right:
Note that when using the
PivotTable method for this
question, the value 1 (1 to 2
sessions) is not listed because no
one selected this response in the
survey.

Crosstabulation of Data Using PivotTables (Advanced Analyses)

Sometimes it is useful to analyze the data based on certain respondent characteristics; for
example, satisfaction ratings by gender or program attended. One of the easiest ways to generate
a table like this is through the use of a PivotTable.
The example to the right shows the PivotTable layout and resulting table to perform a
crosstabulation of the results to question #5 How would you rate the overall quality of this
training? by the gender of the respondent. (Of course, crosstabulations are recommended with
larger datasets than that created for these
examples, with sufficient number of cases
within each subgroup examined.)
This PivotTable Layout: (Q9, gender, is
placed in the column area, while q5, quality
rating, is placed in the row area. ID is
again used for the data section.)

produces:

Excel for Data Analysis


National Research Center Inc.

Females (1) gave more positive


answers than did males (2).

Page 23
3005 30th St. Boulder, CO 80301 (303) 444-7863

The analysis in the previous example could also be performed using the average quality rating,
on a scale from 1 to 4, where 4 = excellent and 1 = poor.

This PivotTable Layout: (Q9, gender, is placed


in the column area, while q5, quality rating, is
placed in the data area. The type of data
summary was changed to Average, and the
Number formatting was changed to a number
with two decimal places.)

produces:

Excel for Data Analysis


National Research Center Inc.

Again, this shows that


females
(1)
gave
higher quality ratings
than did males (2).

Page 24
3005 30th St. Boulder, CO 80301 (303) 444-7863

APPENDIX I: Example Completed Surveys for Data Entry


The following pages show the completed surveys from six participants in a fictional training
program. These were used for all the examples in this handbook.

Excel for Data Analysis


National Research Center Inc.

Page 25
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 26
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 27
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 28
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 29
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 30
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 31
3005 30th St. Boulder, CO 80301 (303) 444-7863

APPENDIX II: Example Codebook

Excel for Data Analysis


National Research Center Inc.

Page 32
3005 30th St. Boulder, CO 80301 (303) 444-7863

APPENDIX III: Example Analysis, with Formulas

Excel for Data Analysis


National Research Center Inc.

Page 33
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 34
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 35
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 36
3005 30th St. Boulder, CO 80301 (303) 444-7863

APPENDIX IV: Example of an Annotated Instrument


The next page shows an example of an annotated instrument for the training program using the
data examples as included in the previous appendices.

Excel for Data Analysis


National Research Center Inc.

Page 37
3005 30th St. Boulder, CO 80301 (303) 444-7863

Training Evaluation: Annotated Instrument


1) How many of the training sessions did you attend?
0% 1 to 2
50% 3 to 4
33% 5
17% 6 or more
2) How did you hear about this training? (Please check all that apply.)
33% Neighborhood newsletter
50% Your childs school
17% Bulletin boards in community buildings
33% Word of mouth
50% Flyers
0% Other
3) Please rate the following aspects of the training:

Very
Poor

Poor

Good

Very
Good

The instructors knowledge of the topic .......................................................... 0%

17%

67%

17%

The instructors presentation style/skills ...................................................... 0%

25%

50%

25%

The handouts or take-home materials ........................................................... 20%

0%

80%

0%

Disagree

Agree

Strongly
Agree

I would strongly recommend this training for my friend............................. 0%

20%

60%

20%

This training will help improve the quality of like for my family................. 0%

17%

50%

33%

4) Rate the extent to which you agree or disagree with each of the following statements.
Strongly
Disagree

Poor

Fair

Good

Excellent

5) How would you rate the overall quality of this training? ...............................17%

0%

33%

50%

6) Do you have any other comments you would like to make about this training?
I think we spent too much time reviewing the background information.
I had a lot of fun. I thought Angela was great.
This was great! I will definitely apply what I learned at work and at home!
7) What is your race?
50% Latino/a
17% Asian
33% White
8) How long have you lived in Colorado?
17% 6 years
50% 7 years
33% 8 years
9) What is your gender?
50% Female
50% Male

Excel for Data Analysis


National Research Center Inc.

10) What is your annual household income?


average annual income:= $29,000
33% less than $20,000
33% $20,000 to $29,999
17% $30,000 to $39,999
17% $40,000 or more
11) Is your child enrolled in the free lunch program?
50% Yes
50% NO

Thank you for your answers!


Page 38
3005 30th St. Boulder, CO 80301(303) 444-7863

You might also like