Stata Basics

Stata Basics
Lecture 1
Log File
Do File
Do files are a very important feature of Stata. It allows you to save all your commands in a tiny text file
that you can use later to reproduce all your results (to be made manually).
 You can open a Do File window from window>Do-file Editor>New Do-file Editor. Or open
from the toolbar by clicking a shortcut button.
Help
Importing and Viewing data in Stata
1. You can copy-paste data into Stata from an excel file (.xls extension) using Data Editor.
2. You can open Data Editor from Data>Data Editor>Data Editor (Edit). You can also type “ed”
in the command menu. Or open from the toolbar by clicking a shortcut button.
3. Import data: File>Import>Excel Spreadsheet (imp: Treat First Row as Variable Names).
4. In case of a Stata data file (.dta extension): open>Select file
Data Browser
We usually view data in data browser. It looks same as Data Editor but it does not allow you to make any
changes in the data. You can open Data Browser from the button on top or simply type “br”
Variable types
Stata has two broad categories for variable:
1. String: this variable can contain both numbers and alphabets (and other symbols)
 String variables are stored in “str” format (they are highlighted red and are not read by
Stata in regressions unless converted to numeric through destring or encode command)
2. Numeric: you can only have numbers in this format of variable
 Following are different types of number variables: byte, int, long, float, and double
Converting between strings and numbers
If the data contains string variables that contain numbers, an easy way to convert them to numbers is to
use the “destring” command
destring year month date, replace
This will convert string variables named year, month and date to numeric variables, asumming the strings
really contain numbers.
To convert back from numeric to string:
tostring year month date,replace
Change variable name and label (label is simply a description of a variable)
The following commands can be used:
rename OldVariableNname NewVariableName
label variable VariableName “label that you have to give”
Make sure the variable name does not contain any space/symbols and label is written between inverted
commas
Dropping a variable
You can use drop command to drop any variable:
drop VariableName1 VariableName2
keep is another command for the same purpose. You can also specify the list of variables that you do not
want to drop:
keep VariableName3
Sorting data by a variable
Sort command is used to arrange your dataset with respect to a given variable
sort VariableName
(can be sorted by a bunch of variables)
Saving do/data files: Using the save icon.
Logical and Relational Operators in Stata
Logical Operators Relational Operators

~ not > Greater than
| or < Less than
& and == Equal to
>= Greater than or equal to
<= Less than or equal to
!= or ~= Not equal to
Count command
It counts observations satisfying specified conditions. A simple count returns total number of
observations ins data set. We can also combine count command with if command to narrow our search.
count if VariableName==1
count if gender== “Male”
count if VariableName==1 & gender== “Male”
 First command simply counts the number of observations where “VariableName” is

equal to 1.
 Second command performs the similar function on “gender” but specifies condition in
inverted commas: this is because gender is a string variable.
 Last command combines both of these conditions.
Summarize Command
This command is used to generate summary statistics for a given variable or a set of variables
 A simple summarize command followed by a variable or a list of variables will display: Number
of observations, mean, standard deviation, minimum value, and maximum value.
sum VariableName
sum VariableName1 VariableName2
 We can also generate additional summary statistics by using option of “detail” as follows
sum VariableName1 VariableName2, detail
 We can also combine this command with if to specify any condition.
e.g. sum VariableName1 if variableName2==20

sum VariableName1 VariableName2>50
Tabulate Command
 A simple tabulate command followed by single variable will display each unique observation
along with its frequency followed by cumulative frequency. We can use option of “missing” to
find out number of missing observations as well.
tab VariableName1
tab VariableName1, missing
 Tabulate command followed by two variables will generate a cross-tab of those variables. We can
use both string and numeric variables with tabulate command.
tab VariableName1 VariableName2
Generate Command
 It is used to create new variables in Stata.
Gen VariableNew=1
This command will generate a new variable with name “VariableNew” where each observation
will be “1”. Note that we have only used single “equals to” sign instead of writing it twice; this is
used when we are setting anything equal to a value or condition, on the other hand double “equals
to” signs are used after an if condition to test for equality.
Generating a dummy variable:

 We can combine the generate command with the “if” condition as follows
Gen Dummy=1 if gender == “Male”
This command will generate a new variable with the name “Dummy” that will have a value of 1
where gender will be male. This command will leave rest of the observations blank (or missing,
displayed by dots). In order to fill those blanks, we use the “replace” command:
replace Dummy=0 if Dummy!=1 OR replace Dummy=0 if gender== “Female”

replace Dummy= . If gender== .
In the first command, we are replacing the rest of the values with a zero which denote a value for
female in the variable. Now remember there might be some values in the variable “Male” in
which there will be missing values i.e. dots. To take them into account we use the second
command.
Full set of command example:

gen training = 1 if skilled == “yes”
replace training = 0 if skilled == “no”
replace training = . if training == .
Generating a log term of a variable:
gen NewVariableName=log(VariableName)
or
gen NewVariableName=ln(VariableName)
e.g. gen lwage=log(wage)
Generating a square term of a variable:
gen NewVariableName=VariableName^2
e.g. gen expersq=exper^2
Generating a interaction term of two variables:
gen Newvariable=VariableName1*VariableName2
e.g. gen genderxregion=gender*region
Regression Commands
The basic regression command in Stata is
reg y x1 x2 x3
where y is the dependent variable and x1, x2 and x3 are independent variables
The command also be accessed from the toolbar:
Statistics  Linear Models and Related  Linear Regression
This can be combined with the if command as well
reg wage age educ if maritalstatus==1 & gender ==0
Post estimation commands
 After the regression output several post-estimation commands can be used for example to
generate estimated/fitted values of the dependent variable or the residual/error term
predict fitted,xb
predict error,residual
In the above commands, “fitted” and “error” are the names of the variables generated containing the
linear prediction/fitted values and residuals from the estimated regression model, respectively.
Please note that post estimation commands will only work after the regression model has been estimated
The command can also be accessed from the toolbar:
Statistics  Postestimation  Prediction, residuals etc.
Correlation between variables
corr VariableName1 VariableName2
Typing “corr” by itself produces a correlation matrix for all variables in the dataset. If you specify the list
of variables, a correlation matrix for just those variables is displayed.
Exporting Regression Output
Another useful feature of Stata is that it allows exporting of the regression output into excel or word
format in the form of tables generally shown in published papers.
ssc install outreg2
outreg2 using filename.xls
outreg2 using filename.doc
Stata can also combine multiple regression output in the same tableif the same file name is given for the
regression output to be combined followed by append
Outreg2 using filename.xls, append

Stata Basics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stata Basics

Uploaded by

Copyright:

Available Formats

Stata Basics

Importing and Viewing data in Stata

Stata has two broad categories for variable:

Converting between strings and numbers

To convert back from numeric to string:

tostring year month date,replace

Change variable name and label (label is simply a description of a variable)

The following commands can be used:

rename OldVariableNname NewVariableName

label variable VariableName “label that you have to give”

You can use drop command to drop any variable:

drop VariableName1 VariableName2

Sorting data by a variable

(can be sorted by a bunch of variables)

Saving do/data files: Using the save icon.

Logical and Relational Operators in Stata

Logical Operators Relational Operators

count if gender== “Male”

count if VariableName==1 & gender== “Male”

 First command simply counts the number of observations where “VariableName” is

sum VariableName1 VariableName2, detail

 We can also combine this command with if to specify any condition.

e.g. sum VariableName1 if variableName2==20

tab VariableName1 VariableName2

 It is used to create new variables in Stata.

Generating a dummy variable:

replace Dummy=0 if Dummy!=1 OR replace Dummy=0 if gender== “Female”

Full set of command example:

e.g. gen lwage=log(wage)

Generating a square term of a variable:

e.g. gen expersq=exper^2

Generating a interaction term of two variables:

e.g. gen genderxregion=gender*region

The basic regression command in Stata is

The command also be accessed from the toolbar:

Statistics  Linear Models and Related  Linear Regression

This can be combined with the if command as well

reg wage age educ if maritalstatus==1 & gender ==0

Post estimation commands

The command can also be accessed from the toolbar:

Statistics  Postestimation  Prediction, residuals etc.

Correlation between variables

corr VariableName1 VariableName2

Exporting Regression Output

ssc install outreg2

outreg2 using filename.xls

outreg2 using filename.doc

Outreg2 using filename.xls, append

You might also like