Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Stata Basics

Lecture 1

Log File

Do File

Do files are a very important feature of Stata. It allows you to save all your commands in a tiny text file
that you can use later to reproduce all your results (to be made manually).
 You can open a Do File window from window>Do-file Editor>New Do-file Editor. Or open
from the toolbar by clicking a shortcut button.

Help

Importing and Viewing data in Stata

1. You can copy-paste data into Stata from an excel file (.xls extension) using Data Editor.
2. You can open Data Editor from Data>Data Editor>Data Editor (Edit). You can also type “ed”
in the command menu. Or open from the toolbar by clicking a shortcut button.
3. Import data: File>Import>Excel Spreadsheet (imp: Treat First Row as Variable Names).
4. In case of a Stata data file (.dta extension): open>Select file

Data Browser

We usually view data in data browser. It looks same as Data Editor but it does not allow you to make any
changes in the data. You can open Data Browser from the button on top or simply type “br”

Variable types

Stata has two broad categories for variable:

1. String: this variable can contain both numbers and alphabets (and other symbols)
 String variables are stored in “str” format (they are highlighted red and are not read by
Stata in regressions unless converted to numeric through destring or encode command)
2. Numeric: you can only have numbers in this format of variable
 Following are different types of number variables: byte, int, long, float, and double

Converting between strings and numbers

If the data contains string variables that contain numbers, an easy way to convert them to numbers is to
use the “destring” command
destring year month date, replace

This will convert string variables named year, month and date to numeric variables, asumming the strings
really contain numbers.

To convert back from numeric to string:

tostring year month date,replace

Change variable name and label (label is simply a description of a variable)

The following commands can be used:

rename OldVariableNname NewVariableName

label variable VariableName “label that you have to give”

Make sure the variable name does not contain any space/symbols and label is written between inverted
commas

Dropping a variable

You can use drop command to drop any variable:

drop VariableName1 VariableName2

keep is another command for the same purpose. You can also specify the list of variables that you do not
want to drop:

keep VariableName3

Sorting data by a variable

Sort command is used to arrange your dataset with respect to a given variable

sort VariableName

(can be sorted by a bunch of variables)

Saving do/data files: Using the save icon.

Logical and Relational Operators in Stata

Logical Operators Relational Operators


~ not > Greater than
| or < Less than
& and == Equal to
>= Greater than or equal to
<= Less than or equal to
!= or ~= Not equal to

Count command

It counts observations satisfying specified conditions. A simple count returns total number of
observations ins data set. We can also combine count command with if command to narrow our search.

count if VariableName==1

count if gender== “Male”

count if VariableName==1 & gender== “Male”

 First command simply counts the number of observations where “VariableName” is


equal to 1.
 Second command performs the similar function on “gender” but specifies condition in
inverted commas: this is because gender is a string variable.
 Last command combines both of these conditions.

Summarize Command

This command is used to generate summary statistics for a given variable or a set of variables

 A simple summarize command followed by a variable or a list of variables will display: Number
of observations, mean, standard deviation, minimum value, and maximum value.

sum VariableName
sum VariableName1 VariableName2

 We can also generate additional summary statistics by using option of “detail” as follows

sum VariableName1 VariableName2, detail

 We can also combine this command with if to specify any condition.

e.g. sum VariableName1 if variableName2==20


sum VariableName1 VariableName2>50

Tabulate Command

 A simple tabulate command followed by single variable will display each unique observation
along with its frequency followed by cumulative frequency. We can use option of “missing” to
find out number of missing observations as well.

tab VariableName1
tab VariableName1, missing

 Tabulate command followed by two variables will generate a cross-tab of those variables. We can
use both string and numeric variables with tabulate command.

tab VariableName1 VariableName2

Generate Command

 It is used to create new variables in Stata.

Gen VariableNew=1

This command will generate a new variable with name “VariableNew” where each observation
will be “1”. Note that we have only used single “equals to” sign instead of writing it twice; this is
used when we are setting anything equal to a value or condition, on the other hand double “equals
to” signs are used after an if condition to test for equality.

Generating a dummy variable:


 We can combine the generate command with the “if” condition as follows
Gen Dummy=1 if gender == “Male”

This command will generate a new variable with the name “Dummy” that will have a value of 1
where gender will be male. This command will leave rest of the observations blank (or missing,
displayed by dots). In order to fill those blanks, we use the “replace” command:

replace Dummy=0 if Dummy!=1 OR replace Dummy=0 if gender== “Female”


replace Dummy= . If gender== .

In the first command, we are replacing the rest of the values with a zero which denote a value for
female in the variable. Now remember there might be some values in the variable “Male” in
which there will be missing values i.e. dots. To take them into account we use the second
command.

Full set of command example:


gen training = 1 if skilled == “yes”
replace training = 0 if skilled == “no”
replace training = . if training == .
Generating a log term of a variable:

gen NewVariableName=log(VariableName)
or
gen NewVariableName=ln(VariableName)

e.g. gen lwage=log(wage)

Generating a square term of a variable:

gen NewVariableName=VariableName^2

e.g. gen expersq=exper^2

Generating a interaction term of two variables:

gen Newvariable=VariableName1*VariableName2

e.g. gen genderxregion=gender*region

Regression Commands

The basic regression command in Stata is

reg y x1 x2 x3

where y is the dependent variable and x1, x2 and x3 are independent variables

The command also be accessed from the toolbar:

Statistics  Linear Models and Related  Linear Regression

This can be combined with the if command as well

reg wage age educ if maritalstatus==1 & gender ==0

Post estimation commands

 After the regression output several post-estimation commands can be used for example to
generate estimated/fitted values of the dependent variable or the residual/error term

predict fitted,xb
predict error,residual
In the above commands, “fitted” and “error” are the names of the variables generated containing the
linear prediction/fitted values and residuals from the estimated regression model, respectively.

Please note that post estimation commands will only work after the regression model has been estimated

The command can also be accessed from the toolbar:

Statistics  Postestimation  Prediction, residuals etc.

Correlation between variables

corr VariableName1 VariableName2

Typing “corr” by itself produces a correlation matrix for all variables in the dataset. If you specify the list
of variables, a correlation matrix for just those variables is displayed.

Exporting Regression Output

Another useful feature of Stata is that it allows exporting of the regression output into excel or word
format in the form of tables generally shown in published papers.

ssc install outreg2

outreg2 using filename.xls

outreg2 using filename.doc

Stata can also combine multiple regression output in the same tableif the same file name is given for the
regression output to be combined followed by append

Outreg2 using filename.xls, append

You might also like