KoboToolbox Excel Data Analyser - User Guide v01 140925small - Dist

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

KoboToolbox Excel Data Analyser v1.

23
User Guide v0.1 140925

Quick start guide


There are three necessary steps to configuring the Excel Data Analyser so that you can analyse
data in it.1

1. On the Config sheet, choose the language in which you want the user interface to appear.
2. Import your data by copying into the workbook the following three Excel sheets: the ‘survey’
and ‘choices’ sheet from the form itself (either created directly with XLSForm, or exported to
Excel from the KoBo form manager), and the sheet containing the collected data (exported
to Excel from the Kobo project manager). On the Config sheet, type in the names you have
given to these three sheets to establish the necessary links.
3. Finally, choose on the Config sheet the language in which you want to see the graphical
outputs (choosing from the languages coded into the form itself).

Once you have completed these steps, you will be able to proceed to the different function sheets
(‘CHOICE’, ‘UNIQUE’ etc.) to visualise and analyse your data.

Table of Contents Introduction


The Excel Data Analyser can analyse the data produced by any
Quick start guide...................1
ODK-compatible form and dataset, including but not limited to
Introduction...........................1 those generated by the online KoboToolbox system. In order to
Setup....................................2 do so, it interprets the data itself in conjunction with the form
definition, from which it obtains the natural language equivalents
Analysis overview.................4
(such as question and option labels) as well as information on
Common functionality...........5 data types. This allows it to decide which types of visualisations
Function details...................10 and analyses are appropriate for each question, and allows the
user to interact with the data in any of the languages included
Appropriate use...................10
with the form definition, without having to refer to the underlying
data codes or question identifiers.
Key Due to Excel limitations (and the fact that the analyser is written
in pure Excel, without any VBA macros or plugins) the different
[! TROUBLESHOOTING]
types of analysis functions are provided on different sheets,
[ EXAMPLE ]
named ‘CHOICE’, UNIQUE’, ‘VALUE’ etc. Each question in a
[ RECOMMENDED PRACTICE]
form may therefore be analysable using more than just one of
these functions. To facilitate the Data Analyser’s use, each
function however only allows the appropriate questions to be
analysed, thus preventing a user from trying to analyse a
question with an inapplicable function (e.g. the ‘UNIQUE’ function
can only be used on single-select questions, so multi-select
questions will not be available to select from within this function).

Users should first proceed to set up and configure the Data


Analyser correctly for a particular dataset, referring either to the
‘Quick start guide’ above or to the more detailed setup
instructions below. Once set up, users can make an informed
choice (or rely on trial and error initially) to decide which analysis
1
These configuration steps are likely to be automated in a new release of KoBo Toolbox before the end of 2014.
function is the most appropriate, useful or informative for each question. They then proceed to the
chosen function, select the question from the drop-down list, and produce the appropriate
visualisations.

All visualisations can be customised and augmented through a number of different options and
disaggregations. They have also been designed to be directly exportable (to Word for example) in
a format suitable for a report, with minimal reformatting required.

Because the analyser is written in Excel rather than a sequential programming language, errors will
show up not as a popup error message, but as #REF! or #VALUE! errors in cells, or empty charts.
Throughout this user guide, troubleshooting advice is provided to help to resolve these problems.

Setup
Step 1: choose your user interface language
The user interface of the analyser tool itself is currently available in three languages: English,
French and Spanish. On the Config sheet, in the ‘Main Settings’ area at the top left, choose the
language you prefer from the drop-down box labelled ‘1 – Language / Langue / Idioma’

Step 2a: import your data


As explained in the above introduction, the analyser needs to access both the data itself and the
form definition file. If you are using KoBoToolbox to design forms and collect data, then you can
obtain the necessary forms here:

DATA

FORM

 DATA: In the ‘Projects’ view of KoBoToolbox, click on the appropriate project from the list to
open it, and then select ‘Download data’, followed by ‘XLS’. Save the downloaded file.
 FORM: In the ‘Projects’ view of KoBoToolbox, click on the appropriate project from the list
to open it. In the white box under the ‘Form’ heading, there is a small ‘download’ icon to the
right. Click on this, followed by ‘XLS’. Save the downloaded file.2

If you are using an alternate ODK-based system, consult its documentation for how to obtain the
relevant Excel data and form files.

2
Alternatively, the same download icon is also available alongside the form name in the ‘Forms’ view of KoBoToolbox – though if you
download it from here, you must ensure that you have made no alterations to the form since you deployed it as your survey project. The
form file must match the data file exactly.
Once you have downloaded these files, open both of them in Excel. Also,
open a blank, pristine copy of the Data Analyser3 in Excel. To copy the
2
appropriate sheets, first switch to the form file. This will include several
sheets, including one named ‘survey’ and one named ‘choices’. Right-click on
the tab of the ‘survey’ sheet (1) and select ‘Move or Copy…’ (2).
1
In the subsequent dialog, select the Data Analyser from the dropdown box
(3), tick the ‘Create a copy’ option (4), and then click ‘OK’ (5). 3

This will copy the sheet into the data analyser. Switch back to the form file
and repeat the process for the ‘choices’ sheet; then switch to the data file and
do the same for your data sheet. Once you have done this, you can close the 4
form and data files, and switch to the Data Analyser permanently. 5

[! My data file is on several sheets. This occurs when your form includes
repeating sections. At the moment, the Data Analyser cannot handle data from the repeating
sections, and will analyse questions in the main body of the form only. Copy over the sheet
with the main data on it only.]

Step 2b: link to the imported sheets


On the Config sheet, in the ‘Main Settings’ area at the top left where it says ‘2 – Sheet Names’,
there are three text boxes into which you can write the names of the three sheets you have
imported in the previous step. For example, if your data is contained on a sheet called
‘AfricaSurvey1’, then simply type ‘AfricaSurvey1’ into the text box labelled ‘Data’. Do the same for
the survey and choices sheets (which, unless you have changed the names, are called ‘survey’
and ‘choices’ by default). This tells the Data Analyser where to find the information it is looking for.

Step 1

Step 2b

Step 3a

Step 3b

Step 3a: choose your graphical output language


Once you have linked your sheets in the previous step, the drop-down box next to ‘3 – Survey
Language’ on the Config sheet will allow you to choose between the different languages contained

3
It is recommended that you do not copy data from a new survey into an earlier copy of the Analyser which you have already used for a
previous survey. Download a clean copy from www.humanitarianresponse.info/applications/kobotoolbox.
in your survey. If your survey does not contain multiple languages (and you haven’t provided a
language label), the only choice in the drop-down box will be ‘Default’.

The language you choose here will not change the language of the Data Analyser’s user interface,
but it will change the language of the questions you can select on each of the analysis function
sheets, as well as the language of the graphical outputs (charts). This allows a user to visualise
data and produce charts in any of the languages defined in a survey.

Step 3b: add translations for graphic elements


In order to display a chart in the language of the survey – Ukrainian, for example, or Sango – the
Data Analyser relies as far as possible on the translations already available in the form itself.
However, a graphical chart inevitably contains elements the translations of which cannot be found
in the form definition. For example: a bar chart legend explaining what the chart is showing, or the
unit of analysis (household, informant, site etc.) which each data point represents.

In order to provide these missing translations, they can be filled in on the ‘Translations for graphic
elements’ table on the Config sheet. Translations for English, French and Spanish are already filled
in, so if you have chosen one of these as the graphical output language in the previous step, you
do not need to do anything (unless you need to adjust the unit of analysis, which by default is set to
‘informant’).

 If your survey contains only one, unlabelled language, a new column labelled ‘Default’ will
appear in this table. If the default language happens to be one of English, French or Spanish,
then you can simply copy-paste the entire column of e.g. English translations into the ‘Default’
column4.
 If your survey contains English, French or Spanish but the language label(s) you used differ
from the labels for these languages in the Data Analyser, then your language labels will appear
as separate columns in this table. Simply copy-paste across the appropriate column of
translations5.
 If your survey contains another language, then you can manually complete the translations in
the appropriate column, using the English versions as a guide for your translation. You can
also refer to the charts themselves to see how these different labels appear on them, which will
help you choose the appropriate translation.

[ Even if your survey contains only one language, identify it by using ‘label::English’ (and
‘hint::English’ etc.) as your column headings, rather than just ‘label’ and ‘hint’.]

[ If your survey contains multiple languages, identify each language in the column headings
using the language itself, e.g. use ‘Français’ instead of ‘French’. This will make it easier for data
enumerators to select the correct language when conducting the surveys. ]

Analysis overview
Once setup has been completed, you can proceed to the data analysis. There are currently six
analytical functions available, three which analyse individual questions, and three which analyse
sets of questions together. The quick overview below explains the purpose and applicability of
each; further detail on each (including examples of their use) is provided in the subsequent
sections.

4
Alternatively, you may also modify your form definition directly on the imported ‘survey’ and ‘choices’ sheets, to change the ‘label’
column heading to ‘label::English’ etc. so that the Data Analyser can identify the correct language used and apply the appropriate labels.
5
Alternatively, as above, you can rename the appropriate columns in your form definition, e.g. from ‘label::Spanish’ to ‘label::Español’.
Analysing individual questions
 CHOICE. This uses a bar chart to visualise the frequency of different responses to single- and
multi-select questions (i.e. what percentage of respondents selected each available option). It
can also be used with numerical questions (integer or decimal), to show the frequency of
different number ranges defined by the user – e.g. what percentage of responses lie between 1
and 3, and so on.
 UNIQUE. This uses a pie chart to visualise the frequency of different responses to single-select
questions, or frequency of number ranges for numerical questions. Its functionality is identical
to that of CHOICE except that it does not analyse multi-select questions (as the percentages
would not add up to 100% and a pie chart would therefore be an inappropriate visualisation).
 VALUE. This calculates a single value from a question. If the question is numerical, it
calculates a numerical function such as the average or the maximum. If the question is single-
or multi-select, it can either calculate the percentage value of one particular option, or (if the
options are represented by numbers, such as e.g. a severity scale) again apply numerical
functions. These values are then categorised using a second question, visualising the different
values for each category as a bar chart.

Analysing sets of questions


The Data Analyser support two different types of question sets. Essentially, these are a series of
consecutive single- (or multi-) select questions each with the same set of options for the
respondent to choose from.

 RANK. A ranking question asks the respondent to select which from a set of options is ranked
in first place, which in second, which in third etc. The corresponding function uses a bar chart
to visualise the ranked responses, attributing different (customisable) weights to each rank.
This can analyse both single- and multi-select question sets, the latter corresponding to
‘grouping by tiers’
 SCORE. A scoring question asks the respondent to attribute a ‘score’ to each of a set of
options. The corresponding function uses a bar chart to visualise the combined score for each
option. This can analyse single-select question sets.
 COMPARE. Rather than combining answers by rank-weight or score as in the RANK or
SCORE functions, COMPARE shows multiple series of data on a bar chart, and can be used to
analyse both rank-type and score-type question sets, as well as any other repeating single-
select or multi-select question sets.

Common functionality
All functions follow the same layout, illustrated in the figure below.

Main analysis (1)


The large chart on the left analyses responses based on the available entire dataset. It features
two components: at the top is a drop-down list from which you can select the question you wish to
analyse; below it is the chart itself, which will appear as soon as you have selected your question.

[! The drop-down list is empty / Most of the questions are missing. This may occur because
you haven’t completed all the setup steps correctly. It may also occur because your form
definition includes blank rows, which the Data Analyser interprets as the end of the file. Make
sure you have entirely removed all blank rows from both the ‘survey’ and ‘choices’ imported
sheets.]

[! The particular question I want to analyse isn’t available. This may be because the question
is not a supported data type for the particular function. For example, text questions are not
1
3

analysable by the Data Analyser6. This may also occur in particularly long surveys, as the Data
Analyser is currently limited to 200 questions. Questions after this will not be available for
analysis.]

[! The question box is highlighted in red. This occurs when the question you have selected is
no longer on the list of available questions – most likely because you changed the graphical
output language on the Config sheet, as a result of which the same question now appears in a
different language. Simply open the drop-down again and reselect the question.]

The chart shows the question title and a brief subtitle providing an explanation of what the chart
shows. This explanation is editable in the ‘Translation for graphic elements’ section of the Config
sheet. All charts also always show the number of underlying data records included in each
analysis, e.g. ‘156 informants’ or ‘22 sites’ (the units are also editable on the Config sheet). Where
this sample size varies within the chart itself (as it does on the VALUE function), it is also shown in
parentheses next to each individual data point.

On charts showing percentages, the vertical scale is automatically set to show 0 - 100%.
Otherwise, the maximum value is chosen dynamically depending on the data itself.

[! The chart is blank. This can occur because the configuration isn’t complete – check for any
options either in the question-choosing area above the chart or in the advanced settings below
it which are appearing in red, and change your selection such that they no longer appear in
red. Otherwise, charts may also not appear because the data is unreadable. This is a particular
problem with numerical data which may have been ‘stored as text’ by Excel. In this case, go to
your data sheet, select all the data in question (a small green arrow in the left-hand corner

6
To analyse text questions, you will need to do a manual exercise of categorisation, essentially converting the text question into single-
or multi-select question. To make this available in the Data Analyser, you must amend the form definition to change the question’s data
type on the ‘survey’ sheet and to insert the newly-defined categories into your ‘choices’ sheet; then change the data itself by replacing
the text with the category codes.
may indicate where numbers are being stored as text) and bulk-apply the proposed
correction.]

[! The chart takes a long time to display. Large datasets take longer to analyse, and depending
on your computer’s processor speed Excel can take some time to recalculate its formulas. If
this is still occurring despite a reasonably-sized dataset, check in ‘Advanced Settings’ on the
Config sheet whether the column heading specified in ‘Reference Column-Data’ actually exists
in your dataset. If not, the Data Analyser is unable to determine the dataset’s length and
defaults to the ‘Maximum Row’ value (normally set to 10,000) instead, with a resulting impact
on calculation speed. To resolve, either change the reference column from its default ‘_uuid’ (a
column which is always present in data downloaded from Kobo Toolbox but may not be
present in datasets from other ODK platforms) to any column present in your dataset which
has no blank entries; or manually change the ‘Maximum Row’ value to the actual length of
your dataset.]

[! Negative values don’t appear. The minimum value of the chart’s vertical axis is fixed to 0, so
negative values will not appear. However, the chart is fully configurable with normal Excel
functionality – by choosing the ‘Chart Tools – Layout’ ribbon  Axes  Primary Vertical Axis
 More Options, the minimum value can be switched to ‘Fixed’ and specified manually.
However, this does mean that when you wish to use the same function to analyse another
question without negative values, you will need to change this setting back to ‘Auto’.]

Advanced settings (2)


Several of the advanced settings are available on all or most of the functions and are explained
here. Function-specific advanced settings are covered in the ‘Function details’ section further on.

 Options to exclude. Any options which you wish to exclude from your analysis can be hidden
by typing in the corresponding option code into the ‘Excluded codes’ box. These codes are the
underlying XML values provided by the ‘name’ column of the ‘choices’ sheet, and are provided
for reference in the ‘All codes’ list. If the meaning of these codes is unclear, the plain text
correspondence can be looked up on the ‘choices’ sheet or ascertained through trial and
error7. The main chart’s totals, percentages and sample sizes are all recalculated automatically
upon such an exclusion.

[ You wish to exclude from your analysis all respondents replying ‘none’ or ‘do not know’ to
a particular single-select question. Your option codes are ‘A B C D K N’ where ‘K’ and ‘N’
represent ‘do not know’ and ‘none’ respectively. Type ‘K N’ into the ‘Excluded Codes’ box and
these two options will disappear. If your original dataset of 100 informants included 40 who
replied either ‘K’ or ‘N’, your remaining dataset will comprise 60 informants, and the displayed
sample size and percentages are adjusted accordingly. ]

 Wrap labels. Excel is not particularly good at dynamically readjusting the space and visibility of
chart labels, so some labels may end up radically curtailed or missing altogether. This option,
which forces a new line between each word in the label, may help. Otherwise see below.
 Place options in descending order. If this is unticked, options appear in the order in which
they were placed in the form itself. Otherwise, options are reordered into descending order of
frequency, such that the most commonly chosen options appear first (on the left).
 Number grouping. Numerical data can be analysed using a categorical function such as
CHOICE, UNIQUE or COMPARE, by defining ranges into which the numbers will be grouped.
7
The ‘human-readable’ naming convention guidelines provided on the humanitarianresponse.info website will also help with this.
The range thresholds must first be specified by creating models in the ‘Thresholds for number
grouping models’ section on the Config sheet. One model, ‘A’, is already completed. New
models are created by writing the series of range thresholds into the appropriate column. Once
created, the different models can be applied in the ‘Advanced settings’ sheet of the function
itself.

[ You wish to analyse a numerical question asking respondents’ age to show what
percentage of respondents are 17 or under, 18-25, 26-55 and 56 or over. In a free column in
the Config sheet (let’s say column ‘B’) you type in ‘0’, ‘17’, ‘25’ and ‘55’, leaving the the last
three rows blank. You return to the function sheet, and choose model ‘B’ under ‘Number
grouping’ in the advanced settings. The ranges are applied.]

Some common issues:

[! The chart labels are (still) unreadable. In this case, you can modify the chart using normal
Excel functionality, for example by rotating them to vertical orientation and/or dragging the
bottom axis of the chart upwards to make more space for the labels. Note that when you wish
to use the same function to analyse another question with smaller labels, the manual changes
you have made will persist until you modify them back.]

[! A box is highlighted in red, or greyed out. Any box highlighted in red indicates that the box’s
current contents (or lack thereof) are an incorrect configuration which will prevent the chart
from displaying. To fix this, select an appropriate value from the drop-down list of the
highlighted box. A greyed-out option indicates that this option is not relevant for the current
configuration – e.g. it is a setting pertaining to numerical questions only and the currently
selected question is not a numerical question. These options therefore have no effect when
greyed-out and can be ignored.]

Disaggregating data (3)


All functions allow the visualised data to be disaggregated. The large chart on the left shows the
global situation, applying the chosen function to the entire dataset. By choosing any other single-
or multi-select question as a disaggregator, the smaller charts on the right apply the same function
to data subsets defined by the disaggregating question – i.e. filtering the main chart by the
responses to this question.

To maintain readability at the smaller scale, the full labels which appear on the main chart do not
reappear on the smaller disaggregation chart. The data in each small chart is always presented in
the same order as the data on the main chart, and the sequential numbers (1, 2, 3 etc.) on the
small chart axes allow quick reference to the labels on the main chart. The scale of the vertical axis
is automatically adjusted so that the maximum value on the main and the small charts always
match.

[ You are analysing the single-select question ‘what is your greatest need?’, as a bar chart
showing percentages of respondents selecting each option. You wish to see how responses
differ between men and women. An earlier question asked ‘what is your gender?’ with
response options ‘Male’,’Female’ and ‘No answer’. You select this question as your
disaggregation. While the main chart continues showing the percentages for the entire
dataset, the first three smaller charts now show the responses from Men, Women, and ‘No
answer’ respectively. Each smaller chart also shows the number of respondents in each of the
three subsets, which together will add up to the total size shown on the main chart.]
Although any single- or multi-select question can be chosen as a basis for disaggregation, it is
most often the survey’s introductory questions defining gender, age, respondent type (local govt /
health worker etc.), location (admin1 / admin2 etc.) and site type (urban/rural etc.) that are most
often used for this purpose.

Although only six small charts (or two on the COMPARE function) are visible at any one time, data
can be disaggregation by up to 1,000 categories: to allow for larger datasets being disaggregated
by country Admin2 areas. The scroll bar at the right allows you to scroll down through the different
charts.

[! The small charts are empty, or missing altogether. Check to ensure that you have selected
a disaggregation question and that the box is not blank. If you had previously scrolled down,
you may need to scroll back up to ‘Page 1’ as the charts further down may not have any data in
them. The ‘white curtain’ (explained below) may also need to be rolled back.]

Exporting charts
All charts are customised to have a consistent, elegant and easy-to-read design suitable for direct
exporting and integration into reports with no or minimal reformatting required.

In order to export the main chart only, simply select it with a single mouse click and copy. When
importing into Word or Powerpoint, best results are achieved by selecting ‘Paste Special’ and then
choosing ‘Picture (Enhanced Metafile)’.

To export the main chart together with the smaller disaggregation charts, first use the ‘white
curtain’ to hide any empty charts. This is a small white hidden rectangle that is positioned directly
to the right of the right-most small chart on the bottom row. Find it by clicking with your mouse on
the narrow strip of white area between the end of this chart’s axes and the grey border. Its border
and corner handles will light up when you successfully select it. Drag the middle-left handle to
extend the rectangle over the unwanted charts, such that they are hidden. Then select the entire
chart area – do not select the charts individually, but instead select the single large merged Excel
cell behind the charts (look around the edges of the white area or in the space between the main
chart and the smaller charts for where the mouse cursor turns into the Excel square cross, then
click there). Once the cell is selected, copy as before, and paste as an Enhanced Metafile into
Word/Powerpoint.

[ You may frequently wish to change the title of the exported chart, which by default is the
question text itself (which may not be appropriate for all reports). If this is the case, do not try
to change the title in the Excel chart prior to export, as this change will persist and is hard to
undo. Instead, paste into Word as an Enhanced Metafile, then use the ‘Crop’ functionality to
crop out theunwanted title. Create a new textbox with your desired title and position as
appropriate. Group the textbox together with the imported metafile, or for best results use a
Drawing Canvas onto which you place both the metafile and the textbox. This method also
allows you to format the title according to your document’s stylesheet.]

You might also like