Getting Started Witgh Stata Presentation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Frank Hubers

1.
2.
3.

4.
5.
6.

Getting started
Stata interface
Creating a workfile
Summary statistics
Regressions
More info

Files used in stata:


Datafile (*.dta)
Do-files (*.do)
Log-files (*.log)
Graphs (*.gph)

NEVER change something permanently in


your original dataset!

Do-files allow you to work with data, without


changing your original dataset
A do-file is a set of commands a program
A log-file lists all your actions and outputs (tables)

Value . means no observation. Value . is


considered highest possible value!
Use the Help-function often!

Important command often used: if syntax in


combination with

==
<=
>=
<
>

E.g.

gen var=1 if age==12


list if age>12

Also: value . is the greatest value


Replace if var<.

Commands
Data-editor
Do-file editor
Variables manager

Use a do-file to create a new workfile from your raw data


Basic functions:
Import data

use (*.dta) or insheet (other formats)

Make new variables


gen or egen
replace

Label variables and values

label var (variables)


label values + label define (values)
Can also be conducted with Variable Manager

Data cleaning

destring / Data-editor / encode


drop or keep
sort

Save as a new file

save newname.dta, replace

Create new dummy variable, e.g. variable


Old with value 0=younger than 10yo; value
1=10 years or older
Label variable, so you wont forget the
meaning
Label values, for the same reason

Drop or Keep are used to get rid of


unwanted variables

Sort is used to sort the values of a variable

Order is used to order variables

Stata sometimes reads variables as strings


when they actually are numeric
Solution: Destring

Sometimes you want to apply string variables


(e.g. codes) in a regression
Stata doesnt allow the use of string variables in
regressions
Solution: Encode gives a numeric code to every
value. (Note that these codes have no numeric
meaning!!)

We now have a raw and clean dataset. Keep


them both.
Start a new do-file to conduct analyses on
your clean dataset

First analysis of your data will often involve


using the following commands:

sum (detailed) information about variable


tab Frequency table
tabstat Table with summary statistics
histogram visual reflection of distribution
list all values per observation

Use tab var1 treatment to compare


differences in var1 between treatment and
control
Use tabstat var1, by (treatment) to compare
differences in var1 between treatment and
control

Chi-square test can be applied via the tab


function:
tab var1 treatment, chi

T-test

ttest var1 treatment

Standard OLS: reg

reg depv indepv controlv

Instrumental variables: ivreg


ivregress 2sls depv controlv
(indepv=instrumentvar), first

Fixed effect regression

xtset panelvar
xtreg depvar indepvar covar, fe

Notes:
If you want to use robust error terms or cluster your
error terms, end your reg command with:
Cluster: , clus(var)
Robust: , robust

Default variable is continuous in standard


regression. If your variable is a factor variable (e.g.
Etnicity: 1 Dutch 2 German 3 Malinese) use
i.var
Example: reg score treatment i.etnicity

Interaction terms: Use ##.


Note: default in interaction is factor (not
continuous). Use c. in case of an continuous
variable
Example: reg score treatment etnicity##c.age

For nice tables, use the outreg2 function


If not yet installed:
Findit outreg2
Install outreg2

Command:
reg Y X
outreg2 using example, see tex excel word

Use the HELP function in stata!


E.g. help ivregress

Getting started with stata booklet


(Blackboard)
Internet sources and wikis:
Recommended:
http://data.princeton.edu/stata/default.html

You might also like