Shazam To Stata

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

For Previous Users of SHAZAM: This document is designed to show the Stata equivalents to a variety of Shazam commands.

In each case, the SHAZAM command is followed by an explanation of the corresponding command(s) to use with Stata. Stata has a set of comprehensive reference manuals that discuss these commands, options, and much more. PAR Both Shazam and Stata keep all data and calculations in the computers main memory and the size of the memory area used by the program can be set manually. The PAR command sets memory in units of 1K (1024 bytes), so PAR 1000 would set memory at 1Mb. The equivalent Stata command is: SET MEMORY 1m FILE (Unit) Filename This command assigns a number (unit) to a file (Filename) from which the Shazam will read or to which it will write. In Stata, the file from which data are to be read is specified by, USE Filename.dta, clear The option, clear, allows the data file to be used again if the command file is used interactively. The output can be written to a file by using the command, LOG USING Filename.out, text replace The options, text and replace tell Stata the format to use for the output file and allow the file to be replaced in subsequent runs of the same command file. To reduce the amount of information in the log file, logging can be turned on or off with the LOG on and LOG off commands respectively, SAMPLE 1 n This command tells Shazam which contiguous (sub)set of observations to use in its calculations. In Stata, this is handled on a case-by-case basis as an option to the specific command. See OLS below for an example. READ (Filename) A B C Shazam uses this command to assign variable names to numbers in the dataset. In Stata, the variable names are established when the dataset is created and the information is contained in the data file. Therefore, this command is not necessary in Stata and the variable names can be seen in the variables window. STAT A B C/pcor This command calculates descriptive statistics for the variables indicated. The option, pcor, calculates a correlation matrix of the variables. Stata uses the command, SUMMARIZE A B C to calculate the descriptive statistics and a separate command, CORRELATE A B C

2
to calculate the correlation matrix. GENR A = P + Q This is used in Shazam to generate new variables from existing variables. The Stata command is similar, GENERATE A = P + Q or, GEN A = P + Q OLS A B C / RSTAT AUXRSQ Ordinary Least Squares Regressions are calculated in Shazam with the OLS command followed by a slash (/) and options. The first variable listed (A) is the dependent variable and the remaining variables are the independent variables in the regression. The basic form of the OLS command in Stata is REGRESS A B C Which calculates the regression coefficients, standard errors, P-values, and 95% confidence intervals for the coefficients as well as the R2 and F-statistic for the model as a whole. Stratified regressions can be run on sub-samples of the data by adding an if clause or an in clause to the command. An if clause is equivalent to Shazams SKIPIF command. For example, REGRESS A B C if C==2 would run the regression using only the observations in which C equals 2. Note that the double equal sign is used to establish equality with a constant value. One could also have, REGRESS A B C if C<=2 REGRESS A B C if C=D Shazams versions would be SKIPIF(C.LE.2) and SKIPIF(C.EQ.D) respectively. To restrict the regression to a contiguous subset of the data, you can use an in clause to limit the command to a particular range of data. This is equivalent to Shazams SAMPLE command. For example, REGRESS A B C in 5/50 Is equivalent to running the regression on observations 5 through 50 (SAMPLE 5 50). The common options in Shazam, RSTAT for the residual statistics as a check for autocorrelation, and AUXRSQ to calculate the R2 values for the auxiliary regressions as a check for multicollinearity. The Stata equivalent for the AUXRSQ command is to type the command,

3
VIF immediately following the REGRESS command. This will calculate the variance inflation factors for the auxiliary regressions. The factors relate to Shazams R2 results in that, VIF = an R2 or 0.80 corresponds to a VIF of 5.

1 . Thus, 1 " R2

With time series data, we may want to test for autocorrelation. Shazam uses the RSTAT command to calculate residual statistics including the Durbin-Watson statistic. Using ! Stata, one must first designate the data as time series using the command, TSSET preceding the REGRESS command. If there is a specific variable in the data set that designates the time index, it can be shown by, TSSET X (where X is the variable representing the time index). Following the REGRESS command with the command, DWSTAT to calculate the Durbin-Watson statistic for the regression. DIAGNOS HET This is used in Shazam to calculate several Chi2 statistics designed to diagnose heteroskedasticity. In Stata, these tests are done step-by-step entering commands for the specific tests one wishes to use. The commands for these tests are entered immediately after the REGRESS command. Some common Chi2 tests are, HETTEST (the Cook-Weisburg test) WHITETST (Whites test) It is also quite easy with Stata to examine the residual plots to see whether they are approximately normally distributed. Enter the following two commands directly after the REGRESS command, PREDICT RSTD, rstudent KDENSITY RSTD, normal The first Stata command generates a variable RSTD which holds the (normalized) residuals from the regression. The second command then plots these residuals against a normal density function and displays a graph which can be saved or printed. OLS A B C/HETCOV The HETCOV option in Shazam uses Whites panel-corrected covariance matrix to adjust for heteroskedasticity. The equivalent option in Stata is, REGRESS A B C, robust AUTO A B C This Shazam command executes a Cochrane-Orcutt regression to take into account autoregressive errors. The Stat command (assuming a time index variable of X) is,

4
TSSET X PRAIS A B C, corc The Durbin-Watson statistic is automatically reported. The robust option can also be used with the PRAIS command. If the corc option is omitted, then the regression uses the Prais - Winsten method of correcting the model for autocorrelation. 2SLS A B C (C D E) This is a two-stage least-squares model in Shazam in which A and B and endogenous and C, D, and E are exogenous variables. It would be estimated in Stata by using the command: IVREG A (B = D E) C Which specifies D and E as instruments of B in this instrumental variables regression. PROBIT A B C LOGIT A B C TOBIT A B C or TOBIT A B C/upper The Stata methods for estimating models with binary dependent variables (PROBIT and LOGIT) and censored dependent variables (TOBIT) are respectively, PROBIT A B C LOGIT A B C TOBIT A B C, ll(#) where ll (double el) specifies a lower limit of #. Upper-level censoring is specified with ul(#). SKIPIF Shazam uses this command to skip observations that meet certain criteria. In Stata, the observations to be used are specified as an option to a command, not as a separate command. See an example in the discussion of OLS. STOP This must be the last command in a Shazam file. There is no requirement that a Stata do file end with any particular command. However, good practice suggests ending a Stata file with, CLEAR in order to purge the data space in preparation for additional work.

You might also like