Professional Documents
Culture Documents
Stata
Stata
OF CLIMATE CHANGE
1
INTERACTIVE USE OF STATA
• Interactive use means that STATA commands are
initiated within STATA.
• A graphical user interface (GUI) for stat is
available. It enables almost all the STATA
commands to be accessed using drop down
menus.
• STATA allows users to directly type commands to
execute a particular task.
• The standard procedure however in STATA is to
aggregate the various commands needed into
one file called a do-file that can be run with or
without interactive use.
BASICS IN STATA
• Like most softwares, STATA has some example
data sets that allows ‘amateur’ users to use as
starting point in learning STATA.
– An example of such data sets is the auto.dta data
• To access the example data:
– Click File/Example Datasets/… Example datasets
installed with Stata
• Select the data set auto.dta
– Interactive Users can however type the command
• sysuse auto
DATA MANAGEMENT
• To describe the variables in the data set type:
– describe or des
– Or to describe some specific variables type add the name of the
variable to the command.
• Eg: des mpg
• NB: stata commands does not allow upper case
• If you wish to the summary statistics of the variable type:
• summarize,detail
• sum, detail
• su, detail
• su, d
– You can drop the subcommand detail if you wish to obtain the basic
summary statistics.
– You can summarize specific variables
• sum varlist, detail
• Eg: sum mpg, detail
– sum mpg
– su mpg
DATA MANAGEMENT
• If you are only interested in a subset of your data, you can inspect it using
filters. E.g. If you are only interested in price of a particular type of car you
can type:
– sum if price>=3000 & price<=4400
– sum if mpg>=16& mpg<=23
• And then you can contrast
– sum if price>=3000 |price<=4400
– sum if mpg>=16 |mpg<=23
• Interpretation of Logical Operators in STATA.
>= greater or equal to
<= less or equal to
== equal to
& and
| or
!= or ~= not equal to
> greater than
< Less than
. missing
DATA MANAGEMENT
• The usual arithmetic operators (+,-,*,/) are
applicable in STATA.
• tab make, sum(mpg) gives the mean, std deviation, and frequency of
mpg for each car model.
• tab make, sum(price) mean gives the mean price for each car
• tab foreign weight, sum(price)
Tabstat
This command gives summary statistics for a set of continuous
variable for each value of a categorical variable.
The syntax is:
tabstat varlist [if exp] [in range] , stat(statname [...])
by(varname)
where
varlist is a list of continuous variables
statname is a type of statistic
varname is a categorical variable.
Example:
table
This command can creates many types of tables. It is probably the most
flexible and useful of all the table commands in Stata. The syntax is:
table rowvar colvar [if exp] [in range], c(clist) [row col]
where
rowvar is the categorical row variable
colvar is the categorical column variable
clist is a list of statistic and variables
row is an option to include a summary row
col is an option to include a summary column
Examples:
table foreign, c(mean rep78 sd rep78 median rep78) – table of yield
statistics by region
. table foreign rep78, c(mean mpg) –table of average mpg by foreign
rep78
• table foreign, c(mean rep78 mean mpg) –table of average rep78 &
mpg by foreign
MODIFYING DATA FILES
• This section describes a number of commands that are used to
modify and combine data files in Stata.
rename , drop , keep,
rename
This command renames variables. Syntax:
rename oldname newname
• Eg: rename mpg mile_per_gallon
drop
This command deletes records or variables.
drop if price>=4000
drop if foreign==1
keep
This command deletes everything but specified observations or
variables.
Keep if price<=3000
keep mpg rep78 headroom trunk if foreign
PRESENTING DATA WITH GRAPHS
• In Stata, graphs are primarily made with the graph command, followed by
numerous subcommands for controlling the type and format of graph. In
addition to graph, there are many other commands that draw graphs.
graph
twoway
bar
pie
matrix
connect( )
msymbol( )
histogram
scatter
http://www.stata.com/support/faqs/graphics/piechart.html
PRESENTING DATA WITH GRAPHS
graph
This command generates numerous types of graphs and
diagrams. The syntax is:
graph graphtype [varlist] [if exp] [in range] [, options]
where
graphtype is the type of graph
varlist is the list of variables to graph
if is used to limit observations that are included based on the
exp condition
in is used to limit observations that are included based on the
case number
options are commands to control the look of the graph
• graph bar income, over(sexhead) over( locality)
Histograms
histogram income, by(sexhead) normal bin(20)
histogram income, by(locality) normal bin(20)
histogram mpg, by( foreign) normal bin(20)
Nb: bin () refers to the number of columns it
should include in the histogram
Scatter Plots
scatter mpg price
scatter mpg price,by(foreign)
• PIE CHARTS
In Stata, pie and bar charts are drawn using the sum of the variables
specified. Therefore, any zero values will not appear in the chart, as they
sum to zero and make no difference to the sum of any other values. If you
have a categorical variable that contains labeled integers (for example, 0
or 1, or 1 upwards), and you want a pie or bar chart, you presumably want
to show counts or frequencies of those integer values. To create pie charts,
first run the variable through tabulate to produce a set of indicator
variables:
Eg:
tab foreign, gen (f)
graph pie f1 f2
Try:
tabulate rep78, generate(r) .
graph r1 r2 r3 r4 r5, pie
graph r1 r2 r3 r4 r5, bar
• Do-file Editor
A Do-file is a file that stores a Stata program (a set of
commands) so that you can edit it and run it later.
The Do-file Editor is like a simplified word processor for
writing Stata programs. Why use the Do-file Editor
rather than the Command window or the menu
system?
– It makes it easier to check and fix errors,
– it allows you to run the commands later,
– it lets you show others how you got your result, and
– it allows you to collaborate with others on the analysis.
– In general, any time you are running more than 5-10
commands to get a result, it is easier and safer to use a Do-
file to store the commands.
• LOG FILES
• You can click on File/Log to begin or close a log file (Suspend
and Resume are to temporarily turn off and on the log).
• You can use “log” commands in the Command window
• You can use “log” commands in a Do-file.
OPENING FILES STATA FILES (.dta)
To open a stata file:
use filename, clear
Eg: use "G:\fenergydata.dta", clear
use varlist using filename, clear [for a subset of the data file].
Alternatively you can use the drop down menu bar to import the data
– File/open/………………….. (select the data)
IMPORTING EXCEL DATA
To import data from excel, one has to convert the data into an CSV [tab
delimited] format. For non stata files, the command for importing data is
“insheet using”
– insheet using filename, clear
– Eg: insheet using "C:\Users\myjumens\Desktop\fenergydata.csv"
• Alternatively you can use the drop down menu bar to import the data.
– File/import/ASCII data created by spreadsheet/ …… (select the data)
CODING QUESTIONAIRES INTO STATA
Type the
variable
label
Click on
the
manage to
display a
new dialog
Click Apply to add your commands box
into the system
• Creating Value Labels Click on create
label
Type in the
corresponding
label to the
values assigned
Click on
Add
• Note that you can create all the value labels
for all the questions before exiting the
manage value label dialog box
• Assign the imputed value labels to their
corresponding questions, or variables in the
Variables Manager.
• Exit the Variables Manager dialog box and go
back to the data editor.
• You can now type in the coded response.
MICROECONOMETRIC
REGRESSION ANALYSIS
• Ordinary Least Squares
• Probit Models
• Logit Models
• Ordered Probit/Logit Models
• Multinomial Logit Models
• Tobit Models
Ordinary Least Squares
Like most statistical packages, STATA allows users
to run some basic regressions such as the OLS.
The syntax is:
regress dependent var independent var
Eg: regress gpa tuce psi
reg gpa tuce psi
LOGIT AND PROBIT MODELS
• Probit and logit models are among the most
widely used members of the family of
generalized linear models in the case of binary
dependent variables.
• These group of models allows researchers to
analyse data on issues even though the
dependent variables are binary (0, 1).
– Eg: yes/ no; married or not married; foreign or
domestic
PROBIT MODEL
Let us examine whether a new method of teaching
economics, PSI, significantly influence performance in later
economics courses using the probit model. The dependent
variable used is GRADE, which indicates whether a student’s
grade in intermediate macroeconomics course was higher
than that in the principle course.
The probit model is specified as:
• Estimation of Probit Model
probit grade gpa psi tuce
• The basic probit commands report coefficient estimates
and the underlying standard errors. These coefficients
are the index coefficients and what we can only say is the
direction of the effect and partial effects on the Probit
index/score. They do not correspond to the average
partial effects.
• Let’s try to interpret the results:
– Tuce: one unit increase in tuce increases the probit
index by 0.05 standard deviations.
– But are we concerned with an Probit index? No
• In analysing binary choice models the parameter of
interest are not the index coefficients, rather the
marginal/ partial effects.
Marginal Effects
• NB: The marginal effects are just the same as from the
regression model
Starting with do files
version 11
set mat size 400
clear
set mem 1000
capture log close
set more off