Getting Started Witgh Stata Presentation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Frank Hubers



Getting started
Stata interface
Creating a workfile
Summary statistics
More info

Files used in stata:

Datafile (*.dta)
Do-files (*.do)
Log-files (*.log)
Graphs (*.gph)

NEVER change something permanently in

your original dataset!

Do-files allow you to work with data, without

changing your original dataset
A do-file is a set of commands a program
A log-file lists all your actions and outputs (tables)

Value . means no observation. Value . is

considered highest possible value!
Use the Help-function often!

Important command often used: if syntax in

combination with



gen var=1 if age==12

list if age>12

Also: value . is the greatest value

Replace if var<.

Do-file editor
Variables manager

Use a do-file to create a new workfile from your raw data

Basic functions:
Import data

use (*.dta) or insheet (other formats)

Make new variables

gen or egen

Label variables and values

label var (variables)

label values + label define (values)
Can also be conducted with Variable Manager

Data cleaning

destring / Data-editor / encode

drop or keep

Save as a new file

save newname.dta, replace

Create new dummy variable, e.g. variable

Old with value 0=younger than 10yo; value
1=10 years or older
Label variable, so you wont forget the
Label values, for the same reason

Drop or Keep are used to get rid of

unwanted variables

Sort is used to sort the values of a variable

Order is used to order variables

Stata sometimes reads variables as strings

when they actually are numeric
Solution: Destring

Sometimes you want to apply string variables

(e.g. codes) in a regression
Stata doesnt allow the use of string variables in
Solution: Encode gives a numeric code to every
value. (Note that these codes have no numeric

We now have a raw and clean dataset. Keep

them both.
Start a new do-file to conduct analyses on
your clean dataset

First analysis of your data will often involve

using the following commands:

sum (detailed) information about variable

tab Frequency table
tabstat Table with summary statistics
histogram visual reflection of distribution
list all values per observation

Use tab var1 treatment to compare

differences in var1 between treatment and
Use tabstat var1, by (treatment) to compare
differences in var1 between treatment and

Chi-square test can be applied via the tab

tab var1 treatment, chi


ttest var1 treatment

Standard OLS: reg

reg depv indepv controlv

Instrumental variables: ivreg

ivregress 2sls depv controlv
(indepv=instrumentvar), first

Fixed effect regression

xtset panelvar
xtreg depvar indepvar covar, fe

If you want to use robust error terms or cluster your
error terms, end your reg command with:
Cluster: , clus(var)
Robust: , robust

Default variable is continuous in standard

regression. If your variable is a factor variable (e.g.
Etnicity: 1 Dutch 2 German 3 Malinese) use
Example: reg score treatment i.etnicity

Interaction terms: Use ##.

Note: default in interaction is factor (not
continuous). Use c. in case of an continuous
Example: reg score treatment etnicity##c.age

For nice tables, use the outreg2 function

If not yet installed:
Findit outreg2
Install outreg2

reg Y X
outreg2 using example, see tex excel word

Use the HELP function in stata!

E.g. help ivregress

Getting started with stata booklet

Internet sources and wikis:

You might also like