Professional Documents
Culture Documents
Netcourse 101: Answers To Exercises in Lesson 2
Netcourse 101: Answers To Exercises in Lesson 2
Introduction to Stata
1.
Create a dataset of 10 observations on x containing 1, 2, ..., 10. Here is a quick way to do this:
. clear
. set obs 10
obs was 0, now 10
. gen x = _n
. list
+----+
| x |
|----|
1. | 1 |
2. | 2 |
3. | 3 |
4. | 4 |
5. | 5 |
|----|
6. | 6 |
7. | 7 |
8. | 8 |
9. | 9 |
10. | 10 |
+----+
Explain why, if you type the following, you observe the output shown:
. list if x==4
+---+
| x |
|---|
4. | 5 |
+---+
Answer:
You type list if x==4, and see listed an x value that is apparently 5 because the value label xlbl maps 4 to
5. x in the fourth observation, contains the numeric value 4, but the value label says to Stata, "When you see 4,
do not display 4; display 5."
2.
You have a dataset containing string variable x. You made a mistake when you input the data (we will not worry
about how this happened), and although the variable is stored as a string, it was supposed to be stored as a
numeric variable. As a sample, you can create the following:
. clear
. input str8 x
x
1. 5
2. 2
3. 8
4. 10
5. 3
6. end
You now wish to correct the mistake. Explain why you do not want to type encode x, gen(newx). Explain
what you should do.
Answer:
You do not want to type encode x, gen(newx) because that will lead to the problem we just explained in
Exercise 1; that is, newx will be an integer variable with five values, 1 to 5. These values will carry the label of
the original value, so if you give the command list, the values will appear to be correct. However, typing
list, nolab (to list without value labels) will reveal the true values of the new variable.
There are two correct solutions to this problem. You could use the destring command, or you could use the
real() function with the generate command.
. destring x, replace
x has all characters numeric; replaced as byte
The destring command is written for converting a single variable or an entire varlist from string to
numeric. An alternative to the replace option would be to specify generate(varlist ). This would leave the
x variable unchanged and create a new variable that is numeric.
The real() function takes a string and attempts to interpret it as if it were a number. For instance, real("2")
is 2. If the interpretation is unsuccessful, real() returns missing; real("alpha") is missing.
3.
Why do you think Stata's authors made value labeling a two-step procedure? To label the values of a variable,
you must first create a value label, and then associate the value label with the variable:
The two-step method allows you to use the same value label with more than one variable, which saves Stata
memory and keeps you from typing. For instance, if you had variable q2 in your data that also had a yes or no
response, you could type
4.
It was casually mentioned that you could type gen himpg = mpg>=20 to create a variable that is 1 when
mpg>=20 and 0 otherwise. There were no missing values in our data, but what would be the contents of himpg
if mpg did have missing values? (Hint: Load the auto data, change some of the mpg values to missing, and then
try the command.)
What would be the right way to perform the command gen himpg = mpg>=20 in the presence of missing mpg
values?
Answer:
When you ran the experiment, you discovered that himpg=1 when mpg==..
Stata does that because it stores missing values as infinity, and because infinity>=20, the statement mpg>=20 is
true when mpg==..
The if mpg<. on the end, restricts generate to executing the statement mpg>=20 to those observations for
which mpg is not missing. Where the if condition is not satisfied, himpg is set to missing. The statement
would not solve the problem. This would merely switch himpg from being 1 to being 0 where mpg was missing.
5.
In the brief discussion on generate and replace, the example
. generate dense = 0
. replace dense = 1 if lbperin>16
Answer:
It does not differ. In particular, do not think the two-step construction would somehow get around the missing-
value problem mentioned in Exercise 4. If lbperin contained missing values, then lbperin>16 would
evaluate to 1 (true) where lbperin==. in either the replace or generate statements.
6.
In the discussion on string variables, to make a copy of the string variable make into a new variable, y, we typed
. generate y = make
Because generate is smart, we allowed it to decide the best storage format. What would happen if we typed
Is this an error?
Answer:
It might be a mistake, but it is not an error in that Stata will not complain. The str4 variable y will contain the first
four characters of make.
7.
We extracted the first word from string variable make by typing
String variable, make, contains the make and model of each car. Show how to create another string variable,
model, that contains the second word of make.
Answer:
word(make, 1)
word(make, 2)
If the second word is missing, word() returns missing (""). Thus the solution is
Remember that make had Subaru in one of its observations, so there is no second word. This means that model
will be missing for Subaru.
There are some models that are more than one word long in the make variable. If you wanted to get the entire
model, not just the second word, you would use the substr() function:
8.
Show how to use manuf and the variable created in Exercise 7 to make a new variable, mandm, that contains
make and model.
Answer: