Professional Documents
Culture Documents
All by Itself Means
All by Itself Means
Internally, Stata equates true and false with one and zero. That means you can write:
or:
This makes for simple and readable code. Just be careful: anything other than zero will also be
interpreted as true, including missing.
Combining Conditions
You can combine conditions with & (logical and) or | (logical or). The character used for logical
or is called the "pipe" character and you type it by pressing Shift-Backslash, the key right above
Enter. Try:
This shows you cars that get more than 25 miles per gallon and cost less than $5000 (in 1978
dollars). In set theory terms it is the intersection of the two sets. Now try:
This shows you cars that get more than 25 miles per gallon or cost less than $5000. A car must
meet only one of the two conditions to be shown. In set theory terms it is the union of the two
sets.
All the conditions to be combined must be complete. If you wanted to list the cars that have a 1
or a 2 for rep78 you should not use:
(Why this does what it does is left as an exercise for the reader, but it's not what you want.)
Instead you should use:
Missing Values
If you have missing values in your data, you need to keep them in mind when writing if
conditions. Internally, missing values are stored using the 27 largest possible numbers, starting
with the generic missing value (.) and the extended missing values (.a, .b, etc.) after that in
alphabetical order, so the following inequalities hold:
any observed value < . < .a < .b < .c ... < .x < .y < .z
If you want a list of cars that are known to have good repair records, you won't get it with:
An easy shortcut is to think of missing values as (positive) infinity, and since infinity is greater
than 3 cars with a missing value for rep78 are included in the list. So add a second condition to
exclude them:
Why <. rather than !=. ? In this data set it makes no difference. But if the data set included
extended missing values, the condition !=. would not exclude them. The condition <. excludes
them because extended missing values are greater than the generic missing value. Thus using <.
ensures you're excluding all missing values.
Exercise: Browse domestic cars that get more than 25 miles per gallon and are known to
have good repair records (rep78 greater than 3). Then browse foreign cars that cost less
than $5,000 and are not known to have poor repair records (rep78 less than or equal to 3).
Include the variables used in the conditions so you can spot-check your results. Explain
why you handled missing values the way you did in both cases.
Options
Options change how a command works. They go after any variable list or if condition, following
a comma. The comma means "everything after this is options" so you only type one comma no
matter how many options you're using.
Consider:
We know that value labels have been applied to the foreign variable, so the words "Domestic"
and "Foreign" are not the actual values. We can see the values instead of the labels by adding the
nolabel option:
Options must always be one word. Here the words "no" and "label" are combined because
otherwise Stata would think they were two different options.
Many options require additional information, such as a number or a variable they apply to. This
additional information goes in parentheses directly after the option name. To illustrate that we
need to use a command other than browse, because nolabel is the only option it has.
The list command is very simlar to browse, but it just lists the data in the Results window. If you
have a log open the list output will be stored in the log, which is sometimes useful. Try:
list make
The string() option tells the list command to truncate string variables after a given number of
characters, with the number going in the parentheses:
You might use the string() option to save space, or if the first part of the string contains all the
information you really need. But it's mostly here as an example of the "additional information
goes in parentheses" syntax you'll use regularly.
Stata reuses option names wherever it makes sense. Thus many commands take a nolabel option
that prompts them to ignore value labels. Other common options include gen() to create a new
variable (with the name of the new variable going in parentheses), by() to act on groups, and
vce() to tell regression commands how to estimate the variance-covariance matrix.