Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

NetCourse® 101

Introduction to Stata

Answers to Exercises in Lesson 2


[1|2|3|4|5|6|7|8]

1.
Create a dataset of 10 observations on x containing 1, 2, ..., 10. Here is a quick way to do this:

. clear

. set obs 10
obs was 0, now 10

. gen x = _n

. list

+----+
| x |
|----|
1. | 1 |
2. | 2 |
3. | 3 |
4. | 4 |
5. | 5 |
|----|
6. | 6 |
7. | 7 |
8. | 8 |
9. | 9 |
10. | 10 |
+----+

Now create a value label, and associate it with variable x by typing

. label define xlbl 1 "3" 4 "5"


. label values x xlbl

Explain why, if you type the following, you observe the output shown:

. list if x==4

+---+
| x |
|---|
4. | 5 |
+---+

Answer:

You type list if x==4, and see listed an x value that is apparently 5 because the value label xlbl maps 4 to
5. x in the fourth observation, contains the numeric value 4, but the value label says to Stata, "When you see 4,
do not display 4; display 5."

2.
You have a dataset containing string variable x. You made a mistake when you input the data (we will not worry
about how this happened), and although the variable is stored as a string, it was supposed to be stored as a
numeric variable. As a sample, you can create the following:

. clear

. input str8 x

x
1. 5
2. 2
3. 8
4. 10
5. 3
6. end

describe these data; you will find that x is stored as a string.

You now wish to correct the mistake. Explain why you do not want to type encode x, gen(newx). Explain
what you should do.

Answer:

You do not want to type encode x, gen(newx) because that will lead to the problem we just explained in
Exercise 1; that is, newx will be an integer variable with five values, 1 to 5. These values will carry the label of
the original value, so if you give the command list, the values will appear to be correct. However, typing
list, nolab (to list without value labels) will reveal the true values of the new variable.

To see this, try the following:

. encode x, gen(newx) generates newx


. describe reveals that newx is an integer
. list shows the value labels for newx
. list, nolab displays the true values for newx

There are two correct solutions to this problem. You could use the destring command, or you could use the
real() function with the generate command.

. destring x, replace
x has all characters numeric; replaced as byte

The destring command is written for converting a single variable or an entire varlist from string to
numeric. An alternative to the replace option would be to specify generate(varlist ). This would leave the
x variable unchanged and create a new variable that is numeric.

To use the generate command, you would type

. gen newx = real(x)

The real() function takes a string and attempts to interpret it as if it were a number. For instance, real("2")
is 2. If the interpretation is unsuccessful, real() returns missing; real("alpha") is missing.

3.
Why do you think Stata's authors made value labeling a two-step procedure? To label the values of a variable,
you must first create a value label, and then associate the value label with the variable:

. label define yesno 0 no 1 yes


. label values q1 yesno

A simpler syntax would have been

. label values q1 0 no 1 yes


Answer:

The two-step method allows you to use the same value label with more than one variable, which saves Stata
memory and keeps you from typing. For instance, if you had variable q2 in your data that also had a yes or no
response, you could type

. label values q2 yesno

Use value label yesno for both variable q1 and q2.

4.
It was casually mentioned that you could type gen himpg = mpg>=20 to create a variable that is 1 when
mpg>=20 and 0 otherwise. There were no missing values in our data, but what would be the contents of himpg
if mpg did have missing values? (Hint: Load the auto data, change some of the mpg values to missing, and then
try the command.)

What would be the right way to perform the command gen himpg = mpg>=20 in the presence of missing mpg
values?

Answer:

When you ran the experiment, you discovered that himpg=1 when mpg==..

Stata does that because it stores missing values as infinity, and because infinity>=20, the statement mpg>=20 is
true when mpg==..

Regardless of the reason, the workaround is

. gen himpg = mpg>=20 if mpg<.

The if mpg<. on the end, restricts generate to executing the statement mpg>=20 to those observations for
which mpg is not missing. Where the if condition is not satisfied, himpg is set to missing. The statement

. gen himpg = mpg>=20 & mpg<.

would not solve the problem. This would merely switch himpg from being 1 to being 0 where mpg was missing.

5.
In the brief discussion on generate and replace, the example

. generate dense = 0
. replace dense = 1 if lbperin>16

was offered. How does this differ from

. generate dense = lbperin>16

Answer:

It does not differ. In particular, do not think the two-step construction would somehow get around the missing-
value problem mentioned in Exercise 4. If lbperin contained missing values, then lbperin>16 would
evaluate to 1 (true) where lbperin==. in either the replace or generate statements.

6.
In the discussion on string variables, to make a copy of the string variable make into a new variable, y, we typed

. generate y = make
Because generate is smart, we allowed it to decide the best storage format. What would happen if we typed

. generate str4 y = make

Is this an error?

Answer:

It might be a mistake, but it is not an error in that Stata will not complain. The str4 variable y will contain the first
four characters of make.

7.
We extracted the first word from string variable make by typing

. gen manuf = word(make, 1)

String variable, make, contains the make and model of each car. Show how to create another string variable,
model, that contains the second word of make.

Answer:

If the first word is

word(make, 1)

then the second word is

word(make, 2)

If the second word is missing, word() returns missing (""). Thus the solution is

. gen model = word(make, 2)

Remember that make had Subaru in one of its observations, so there is no second word. This means that model
will be missing for Subaru.

There are some models that are more than one word long in the make variable. If you wanted to get the entire
model, not just the second word, you would use the substr() function:

gen model = substr(make, strpos(make, " ") + 1, .)

8.
Show how to use manuf and the variable created in Exercise 7 to make a new variable, mandm, that contains
make and model.

Answer:

. generate mandm = manuf + " " + model

That is, the + operator applied to strings, concatenates them.

© Copyright 2019 StataCorp LP.

You might also like