Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Useful Stata Commands

Intro to Stata
describe Provides descriptive information about the data currently in memory
codebook varname Provides detailed coding and labeling information about a variable
set more off Tells Stata to display all results, even if the results occupy more than one screen
search keyword Searches Stata’s resources for desired commands and information
help command_name Describes command syntax and options
findit Finds a command online to install into Stata
Descriptive Statistics
tab varname Produces one way frequency distribution
tab1 varname1 varname2 … Produces one way frequency distribution for several variables
summarize varname, detail Produces detailed information about interval level variables
sktest varname Tests to see if a variable is significantly skewed
histogram varname Creates bar charts of nominal or ordinal variables
scatter varname1 varname2
The Submit button Instructs stata to run a graphing command without closing the current dialog
window
Transforming Variables
recode, generate ( ) Creates a new variable by translating or combining codes of an existing variable
xtile, nquantiles ( ) Creates a new variable by collapsing an existing variable into categories containing
approximately equal numbers of cases
generate Creates a new variable from the codes of one or more existing variables
label variable Labels a variable
label define Creates and names a variable that connects a set of numeric codes to a set of value
labels
label values Labels the values of a variable using a previously defined label
drop Deletes variable from a dataset
replace Replaces the values of an existing variable
Making Comparisons
tabulate dep_var Produces a cross tabulation with column percentages
indep_var, column
tabulate indep_var, Produces a mean comparison table
summarize dep_var
graph bar (mean) dep_var, Produces a bar chart of the mean value of a dependent variable for each value of an
over (indep_var) independent variable
Making Controlled Comparisons
bysort control_var: tabulate Produces, for each value of the control variable, a cross tabulation of the
dep_var indep_var, col dependent variable and independent variable with column percentages
 You could also add m
to show you the
missing values
tabulate control_var indep Produces a breakdown table showing mean values of the dependent variable for
var, summarize (dep_var) each combination of the control variable and independent variable
graph bar dep_var, over Produces a bar chart showing the relationship between the dependent variable and
(control_var) over the independent variable for each value of the control variable
(indep_var)
if A command qualifier that selectively applies a Stata command to a subset of cases
Making Inferences About Sample Means
ttest varname = testvalue Performs a one sample t-test
ttest varname, by (group_var) Performs a two sample t-test
robvar varname, by (group_var) Tests the assumption that the two groups have equal sample variances
Chi Square and Measures of Association
(tabulate option), chi2 Reports the chi-square test of statistical significance
(tabulate option), taub Reports the value of Kendall’s tau-b
(tabulate option) V Reports the value of Somer’s de

1 MGSOG Poverty & Inequality 2012_Stata Commands v1_zn


Correlation and Linear Regression
correlate varlist Reports Pearson’s correlation coefficients
regress dep_var indep_var(s) Performs bivariate regression and multiple regression
Graph twoway (scatter dep_var Creates a scatterplot of the relationship between two variables
indep_var)
Graph twoway (scatter dep_var Superimposes a linear regression predication line on a scatterplot
indep_var) lfit dep_var indep_var)
Dummy Variables and Interaction Effects
xi: regress dep_var i.indep_var Automatically creates dummy independent variable and performs
dummy variable regression
char varname [omit] # Overrides Stata’s default for defining the omitted category of a dummy
variable
test varname1=varname2 Tests the null hypothesis that two regression coefficient are not
significantly different from each other
Logistic Regression
logit dep_var indep_var(s) Reports logistic regression coefficients and maximum likelihood
iteration history for logistic regression models
logistic dep_var indep_var(s), [coef] Reports odds rations for logistic regression models, with coef option
report logistic regression coefficients
predict newvar Creates a new variable containing predicted probabilities of the
dependent variable for each value of the independent variable (s)
quietly Command prefix that asks stat to perform a command but not to display
the output in the Results window
tabstat dep_var1 dep_var2, by Displays means of one or more dependent variables for each value of an
(indep_var) independent variable
Poverty & Inequality Commands
clorenz clorenz can produces the following distributional curves for a given list of variables:. Lorenz
curves. Generalised Lorenz curves. Concentration curves . Generalised concentartion curves.
Deficit share curves
glcurve glcurve draws its generalized Lorenz curve and/or generates two new variables containing the
generalized Lorenz ordinates for x; i.e., GL(p) at each p = F(x).
inequal7 inequal7 computes a series of inequality measures of the variables in varlist.
ineqdeco ineqdeco estimates a range of inequality and related indices commonly used by economists,
plus decompositions of a subset of these indices by population subgroup.
sumdist sumdist estimates distributional summary statistics commonly used by income distribution
analysts, complementing those available via pctile, xtile, and summarize, detail.
ineqerr ineqerr computes three indices of inequality - Gini coefficient, Theil entropy measure and
Variance of Logs - and bootstrap estimates of their sampling variances.
povdeco povdeco estimates three poverty indices from the Foster, Greer and Thorbecke (1984) class,
FGT(a), plus related statistics (such as mean income amongst the poor). must supply the
poverty line value(s), either as a single number # in pline(#), or provide the variable name
containing the values as zvar in varpl(zvar). povdeco varname, pl(xxx)
poverty can calculate several The poverty line (1/2 of median value)

STATA has four types of weights: fweight, pweight, aweight, and iweight. Of these, the most important are:
 Frequency weights (fweight), which indicate how many observations in the population are represented by
each observation in the sample, must take integer values.
 Analytic weights (aweight) are especially useful when working with data that contain averages (e.g. average
income per capita in a household). The weighting variable is proportional to the number of persons over which
the average was computed (e.g. number of members of a household). Technically, analytic weights are in
inverse proportion to the variance of an observation (i.e. a higher weight means that the observation was based
on more information and so is more reliable in the sense of having less variance).
Further information on weights may be obtained by typing help weight.

2 MGSOG Poverty & Inequality 2012_Stata Commands v1_zn

You might also like