Professional Documents
Culture Documents
1 - Analyzing Data
1 - Analyzing Data
Marc-Andrea Fiorina • During the training, find all materials in our shared
September 27, 2023 OneDrive: bit.ly/rrf23-materials
2
Outline of data analysis process
• Exploratory data analysis: the research team will look for patterns in the
data, in a descriptive fashion
• Final data analysis: the research team will decide on and compile main
results for research outputs, polish, and structure
Note: For projects with pre-analysis plans, the main specifications will be
pre-defined, so the exploratory phase has fewer implications for final outputs.
3
Exporting “raw” reproducible outputs
Raw outputs
• Results are exported to files that
can be used as inputs for papers
and reports
• Self-standing tables and graphs
• Accessible formats
• PNG
• EPS
• TEX
• XLSX
Exploratory analysis outputs can be
compiled dynamically with various tools
4
Compile “final” results in dynamic documents
5
Exploratory data analysis
Structure of analysis workflow
6
Stage One: Exploratory analysis
7
Exploratory Analysis: Code organization
8
Exploratory Analysis: Code organization
9
Example of exploratory analysis code
10
Example of exploratory analysis output
11
Example of exploratory analysis output
12
Final data analysis
Structure of analysis workflow
13
Stage Two: Final outputs
14
Code organization
• The data sets and variables you need should have been completed in
exploratory analysis
• This means you should not be subsetting data or generating new variables
• You should also remove or run quietly all commands with console outputs
(including regress) – Stata in particular is very slow in printing to the Results
window
• If you think you need to, think carefully about why you are changing the data...
15
Using analysis data: Final outputs
16
Final analysis: Script organization
18
Example of final analysis code
19
Final analysis: Professional expectations
20
Example of final analysis output
21
Example of final analysis output
22
Automating outputs from
statistical software
All results must be automatically exported from code
23
Automating outputs from code
All analysis results are exported first as “raw” outputs – even final analysis
24
Automating outputs from code
25
Exporting figures in Stata
• Always use graph export for final outputs – PNG is the most common, but
EPS or TIF may be required for some publishers
• Detailed information about creating visualizations on the DIME Wiki at https:
//dimewiki.worldbank.org/Stata_Coding_Practices:_Visualization
• Depending on your use case (and speed), you should use the nodraw option
whenever possible to avoid repeatedly rendering images
26
Exporting tables in Stata
• DIME Analytics commands like iebaltab have built-in support for common
export formats. Use these!
• estout can solve most of your problems
• It can export both summary statistics and regression tables easily
• It also supports a lot of customization, and exports both to Excel and LATEX
• In Stata 17, table and collect export have new functions and syntax
• In most recent versions, putexcel and putdocx can also be useful
27
Exporting tables in Stata
• You may save results as a dataset in various ways (such as svmat), format
them here, and then export them to Excel with export excel, to csv with
export delimited or to LATEX with dataout
• You can create matrices and export them using mat2txt or outwrite. This
tends to make to code harder to read, and there are easier ways to export
tables in most formats.
28
Exporting tables in Stata
If you need to create a table with a very particular format, consider writing it
manually using file write. If you do this, make sure you have very clear
comments and organization so readers can easily locate the important statistics
and econometrics and ignore the formatting commands.
29
Resources for outputs in R
• For R users, the stargazer and huxtable packages are the easiest way to
export formatted regression and summary statistics tables to LATEX (and html)
• Use modelsummary and gtsummary where appropriate. Info at
https://github.com/RRMaximiliano/r-latex-tables-sum-stats
• Creating custom tables is also much easier in R, since you can combine
objects to data frames and matrices, and use one of these commands, or
even write.csv to export them
• You can find sample codes and examples in our DIME R training repository at
https://github.com/worldbank/dime-r-training
30
Resources for outputs in R
31
Next steps and resources
What next?
• If you follow the steps outlined in this lesson, most of the data work involved in
the last step of the research process – publication – will already be done.
• Your analysis code will be organized in a reproducible way, so all you will
need to do release a replication package is a last round of code review.
• This will allow you to focus on what matters: writing up your results into a
compelling story.
32
DIME resources
33
External resources
34