Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Basics of STATA

Anusha Nath
Delhi School of Economics
Winter Semester 2008-09

Running STATA
STATA Windows
When STATA is started, there are four
windows that open on the screen:
1. Command
2. Results
3. Review
4. Variables
Anusha Nath
Delhi School of Economics 2008-09

Anusha Nath
Delhi School of Economics 2008-09

Anusha Nath
Delhi School of Economics 2008-09

Running STATA contd


Graphical User Interface (GUI)
Since version 8.0, STATA has a GUI that
allows almost all commands to be accessed
by clicking on the menus Data, Graphics, or
Statistics and making relevant selections.
STATA then behaves as if the coresponding
command had been typed.
Anusha Nath
Delhi School of Economics 2008-09

Anusha Nath
Delhi School of Economics 2008-09

Anusha Nath
Delhi School of Economics 2008-09

Running STATA contd


Do-files
Instead of typing out commands in the command window one-by-one,
we can create a file that contains all the commands necessary to carry
out a particular data analysis. This can be done using STATAs Do-file
Editor. The steps for this are:

Step 1: Open the Do-file editor by clicking the icon on the menu or
selecting Do from the File menu.
Step 2: Type the all the commands in the file that are required for the
analysis
Step 3: Run the commands as a batch by using the command:
do dofilename
The Do-file can be saved for use in a future STATA session. Note that
STATA automatically saves the do files with an extension .do

Anusha Nath
Delhi School of Economics 2008-09

Running STATA contd


Log files

This file keeps a record of all the commands and outputs of a particular
STATA session.

To open a log file, you can either go to the File menu, select Log and
then Begin or simply type the following command:
log using logfilename
where logfilename is the name you want to give to the log file. STATA
automatically appends the extension .log to the filename.

Once you begin a log file, everything displayed on the screen, by


default, will be recorded in the file.

Anusha Nath
Delhi School of Economics 2008-09

Running STATA contd


Log files contd

To close the log file, simply type:


log close

To add new output to an existing (but closed) log file, we can use the
following command:
log using logfilename, append

To erase data on existing log file and overwriting it with new output, we can
use the following command:
log using logfilename, replace

To list the last n number of commands typed, we can use the following
command:
#review n

Anusha Nath
Delhi School of Economics 2008-09

10

Running STATA contd


Getting Help

If you know the command name you want to find out


more about, type the following:
help command
If the command name is not known and you want to
search for the appropriate command, type the
following:
search keyword
STATA is not sentitive to punctuation, capitalisation
or abbreviations in keywords.
Anusha Nath
Delhi School of Economics 2008-09

11

Running STATA contd


Closing STATA
Stata can be closed in three ways:
Click on [X] button on right-hand corner of
STATA screen
Select Exit from the File menu
Type the following command in the command
window:
exit, clear
Anusha Nath
Delhi School of Economics 2008-09

12

Setting Memory
By default, STATA allocates 1 megabyte of memory
to its data areas. To work with larger datasets, it
becomes imperative to increase the memory. This
can be done through the following command:
set memory memsize
where memsize can be 2m, 100m etc.
Note that the memory can be set only if the there is
no data currently loaded in the STATA spreadsheet.

Anusha Nath
Delhi School of Economics 2008-09

13

Feeding Data into STATA

STATA spreadsheet can be opened by either


clicking on the data editor icon on the menu or
typing the command:
edit
The editor must be closed before any analysis can
be carried out.
If you want to look at your data afterwards, either
click on the data browser icon or type the following
in the command window:
browse
Again, this window must be closed before resuming
analysis.
Anusha Nath
Delhi School of Economics 2008-09

14

Importing Data from Excel to STATA

In order to import data from excel, one needs to


make sure that the data is stored in a manner that
can be read by STATA. In particular, the following
things need to be kept in mind:
The first line in the worksheet should have variable
names.
The variable names should be of eight characters or
less and should not begin with a special character or
number.
Second line should begin with data.
Anusha Nath
Delhi School of Economics 2008-09

15

Anusha Nath
Delhi School of Economics 2008-09

16

Importing Data contd

Missing numeric data should not be coded


with a dot or space.
There should not be any commas in the data
because STATA thinks that they are
delimiters and hence will not read the data
properly
The cells at the end of each line should be
non-empty.
The file should be saved as .csv or comma
separated values
Anusha Nath
Delhi School of Economics 2008-09

17

Importing Data contd


Once the data is saved in a proper format in
excel, the following command can be used to
read it in STATA:
insheet using filename.csv

Anusha Nath
Delhi School of Economics 2008-09

18

Saving Changes in Data


To save data that has just been fed into the data
editor as filename, use the following command:
save filename
If you already have created a file and want to save
the change to the existing dataset then, the
following command is used:
save filename, replace
If the replace option is not used, an error message
will appear. Note that all data is saved with an
extension .dta
Anusha Nath
Delhi School of Economics 2008-09

19

Reading Data

If the data is saved in the working directory,


the command used to read the data is:

use filename

If the file is not stored in the current STATA


directory, then the complete path must be
specified:

use c:\user\data\filename
Anusha Nath
Delhi School of Economics 2008-09

20

Reading Data Contd

Before reading data into STATA, all data


already stored in the memory needs to be
cleared. This can be done by either using the
command clear before the use command or
by using the option clear as follows:
use filename, clear

Anusha Nath
Delhi School of Economics 2008-09

21

Examining Data

To make sure that all the data is there in the format


you want, you can type the following command:
describe or d
This will provide basic information about the file and
the variables in the data.
To list the data on the result screen, can type the
command list or l. To list data for the first two
variables, we can type the following command:
list varname1 varname2
Anusha Nath
Delhi School of Economics 2008-09

22

Types of Variables

There are two kinds of variables in STATA: numeric


and string.

There are different types of numeric varibles: float,


binary, double, long and int. These differ simply in
the amount of space they take up in a file.

STATA assumes that variables defined are numeric


unless specified otherwise.

STATA allows analyses only on numeric variables,


hence all string variables need to be changed to
numeric variable.
Anusha Nath
Delhi School of Economics 2008-09

23

STATA Commands Syntax


The basic language syntax in STATA is:
[by varlist:] command [varlist] [=exp] [if exp] [in range] [weight]
[using filename] [, options]
Description of each component:
1. [by varlist:] instructs STATA to repeat the command for each
combination of values in the list of variables varlist.
2. command is the name of the command and can often be
abbreviated (e.g: l for list)

Anusha Nath
Delhi School of Economics 2008-09

24

STATA Commands Syntax contd


[by varlist:] command [varlist] [=exp] [if exp] [in range] [weight]
[using filename] [, options]
Description contd
3. [varlist] is the list of variables to which the command applies.
4. [=exp] is an expression
5. [if exp] restricts the command to that subset of the observations
that satisfies the logical expression exp.
6. [in range] restricts the command to those observations whose
indices lie in a particular range range
Anusha Nath
Delhi School of Economics 2008-09

25

STATA Commands Syntax contd


[by varlist:] command [varlist] [=exp] [if exp] [in range] [weight]
[using filename] [, options]
Description contd
7. [weight] allows weights to be associated with observations
8. [using filename] specifies the filename to be used
9. [, options] are specific to the command used (a comma is
needed only if options are actually used)

Anusha Nath
Delhi School of Economics 2008-09

26

Expressions in STATA
Logical Expressions
The logical expressions are evaluated as 1 if true and 0 if false.
The logical operators used are:
& and
|
or
! not
~ not
The relation operators are:
== equal
< less than
<= less than or equal to
!= not equal to
Anusha Nath
Delhi School of Economics 2008-09

27

Expressions in STATA contd


Logical Expressions contd
Hence the expression
if (y!=2 & z>x)|x==1
reads as:
if y is not equal to 2 and z is greater than x
or if x equals 1
Anusha Nath
Delhi School of Economics 2008-09

28

Expressions in STATA contd


Algebraic expressions

Algebraic expressions use the usual arithmatic


operators +,-, *, /, and ^ for addition, subtraction,
multiplication, division and powering.

STATA also has many mathematical functions such


as sqrt(), exp(), log() etc. and statistical functions
such as normprob() and chiprob() for cumulative
distribution functions and invnorm() etc. for inverse
cumulative distribution functions.
Anusha Nath
Delhi School of Economics 2008-09

29

Expressions in STATA contd


Algebraic expressions contd
For example:
invnorm(uniform()) + 2
Invnorm(uniform()) returns a (different) draw
from the standard normal distribution for each
observation while the second term adds the
value 2 to each such draw.
Anusha Nath
Delhi School of Economics 2008-09

30

Inserting Comments
To make STATA ignore any comments
inserted in the do file, we can use the
following options:

Type * in front of the comment

Type */ in the beginning of the expression


and /* at the end of the expression
Anusha Nath
Delhi School of Economics 2008-09

31

Observation Indices
Each observation has an index associated with it. The macro _n
takes on the value of the running index and _N is equal to the
total number of observations. For example, x[3] refers to the third
observation of variable x and
x[_n-1]
refers to the previous observation of a variable x. We can hence
calculate percentage change in a variable x by using the
following expression:
((x[_n]-x[_n-1])/(x[_n-1]))*100

Anusha Nath
Delhi School of Economics 2008-09

32

Observation Ranges
We can refer to a range of observations either by
using if with a logical expression involving _n or by
using the following command:
in f/l
where f/l is used to specify a range of indices. f
refers to the first value in the range while l refers to
the last value in the range. For example,
list x in 5/12
will list observations from fifth to twelfth in variable
x.
Anusha Nath
Delhi School of Economics 2008-09

33

Generating Variables

New variables can be generated in STATA using the


commands generate or egen (extended generate).

The command generate (or gen) equates a new variable to


an expression which is evaluated for each observation. For
example,
gen x=1
creates a new variable called x and sets it equal to 1

When generate is used together with if exp or in range, the


remaining observations are set to be missing.
Anusha Nath
Delhi School of Economics 2008-09

34

Generating Variables contd


For example:
gen percent= 100*(old-new)/old if old>0
this expression generates the variable percent and sets it equal to
the percentage decrease from old to new where old is positive
and equal to missing otherwise.

To change the missing values in the variable percent to zeros,


we can use the replace command as follows:
replace percent = 0 if old<=0

Anusha Nath
Delhi School of Economics 2008-09

35

Generating Variables contd


The above two commands can be replaced by
a single command as follows:
gen percent = cond(old>0, 100*(old-new)/old,0)

where cond() evaluates to the second


argument if the first argument is true and to
the third argument otherwise.

Anusha Nath
Delhi School of Economics 2008-09

36

Generating Variables contd


The comand egen provides extensionsto generate
as some of its functions accept a variable list as an
argument, whereas the function gen can only take
simple expressions as arguments. For example, we
can form the average of 100 variables m1 to m100
using:
egen average = rmean(m1-m100)
where missing values are ignored.

Anusha Nath
Delhi School of Economics 2008-09

37

Graphs in STATA

Scatter Plots:
scatter yvar xvar, options
type help scatter to see what options you can
include with the scatter. In general, for plots
requiring an x-axis and a y-axis, the command
twoway (short for graph twoway) can be used:
twoway (scatter yvar xvar), options
Anusha Nath
Delhi School of Economics 2008-09

38

Graphs in STATA contd


Consider the following command:
twoway (scatter ug um), ytitle(governmenrt
expenditure) xtitle(militaryexpenditure)
Gives the following graph:

Anusha Nath
Delhi School of Economics 2008-09

39

1000
0

governmenrt expenditure
200
400
600
800
0

100

200

300

militaryexpenditure

Anusha Nath
Delhi School of Economics 2008-09

40

Graphs in STATA contd

Line Graphs:
The syntax for this is similar to the scatter
plot. To get a line graph between government
and military expenditures, we type:
twoway (scatter ug um), ytitle(government
expenditure) xtitle(militaryexpenditure)
Anusha Nath
Delhi School of Economics 2008-09

41

1000
0

governmenrt expenditure
200
400
600
800
0

100

200

300

militaryexpenditure

Anusha Nath
Delhi School of Economics 2008-09

42

Graphs in STATA contd

Histograms:
The syntax for the command is:
histogram varname [if] [in] [weight] [,
[continuous_opts | discrete_opts]
options]
type help histogram to check out the options
available.
Anusha Nath
Delhi School of Economics 2008-09

43

Exercise: Drawing from a Normal


Distribution
Question:
Generate a sample of 100 observations from
a Normal Distribution with mean 100 and
variance 400. Is it approximately normally
distributed?

Anusha Nath
Delhi School of Economics 2008-09

44

Drawing from a Normal Distribution contd


Step 1: Clear all data currently stored in the
memory
clear
Step 2: Since we want to generate a sample of
100 observations, we set the observation to
be 100.
set obs=100
Anusha Nath
Delhi School of Economics 2008-09

45

Drawing from a Normal Distribution contd


Step 3: Generate a variable taking values from the normal
distribution with a mean of 100 and variance of 400.
gen x=uniform()
gen normalvar=invnormal(x)*20+100
Step 4: Draw a histogram to see if the sample is approximately
normally distributed.
hist normalvar, norm

Anusha Nath
Delhi School of Economics 2008-09

46

.005

Density
.01
.015

.02

Histogram of Sample from Normal Distribution

60

80

100
normalvar

Anusha Nath
Delhi School of Economics 2008-09

120

140

47

References
Everitt, Brian and Rabe-Heskethm S. (2004),
A Handbook of Statistical Analyses using
STATA,
Christenson, Dino and Powell, Scott (2008),
An Introduction to STATA
STATA Tutorials at Princeton

Anusha Nath
Delhi School of Economics 2008-09

48

You might also like