North Korea

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Even statistics go wrong in North Korea

The Green Cow February 28, 2013

This month we make a trip to a somewhat unusual place: North Korea. Since 1945, when the northern part of the Korean peninsula was occupied by the (former) Soviet Union, the country evolved into a totalitarian, Stalinistic dictatorship 1 . We do not hear a lot about this country, as it became pretty isolated. In the winter 2012-2013 issue of Political Science Quarterly an interesting article was published about this country 2 . The article describes some aspects of the North Korean penal system. The article includes data gathered in two surveys conducted among refugees in China and in South Korea. In total, over 1600 refugees participated between 2004 and 2008. I will try and explain how to interpret the table with statistical data included in the article.

The analysis
The article contains a table describing dierent statistical models predicting the chances of being arrested by means of a probit regression, apparently. That sounds rather fancy and complicated, but I will try to explain it without going into the technical details. The basic idea is that people can either be arrested or not be arrested. Statistical models here are used to try and nd out what kind of people get arrested. For example, we could want to know whether men have a higher chance of getting arrested as compared to women. One thing is important to know about this kind of models: the models give no direct chance of being arrested, but a score from which the actual probability can be calculated 3 . Therefore it is needed to have a reference group for the categorical variables. What is a reference category? And what is a categorical variable? To start with the last question: a categorical variable is a variable which can only take a xed number of values. Gender is an example of a categorical variable: one is either a man or a woman. This variable can take two values. In contrast with categorical variables, there exist continuous variables. An
1 2

See Wikipedia and its sources Original article 3 Some technicalities

example of a continuous variable is age. Theoretically, age can take any positive value. So it makes perfectly sense to say that the mean age of a certain group of people is 35.6. This stays again in contrast to categorical variables: there is no such thing as a mean gender. So, as I said, for categorical variable a reference group is needed when a statistical model is used. Basically, the reason therefore is that the model needs a group to which the other groups can be compared.

The table
I will try and guide you through the table describing the statistical models. First it is important to understand what the numbers mean. Give it a try and look at the table. Although the text reads The probability of being arrested is highly correlated with involvement in private-market activities (...) the numbers in the table are clearly not probabilities. A lot of the numbers are negative and/or greater than one. As most of you will know, probabilities should always be between zero and one (by there very nature). The numbers are probit index numbers (probably, it is not clearly stated in the table or the text). The interpretation of this number is not intuitive 4 . For readers of the article it would have been easier to interpret actual probabilities. These are easy to calculate from the probit index number for someone who is used to it. The numbers in parenthesis are standard robust errors. These errors give an indication of how accurate the probit index number is. For example the 0.166 reported for Class: weavering means that it is 95% sure that the actual number should be somewhere between -0.020 and 0.630. The stars next to the numbers refer to P-values, indicating statistical signicance 5 . Then I would like to come back to the reference categories for categorical variables. There are clearly categorical variables described in the table. Look to occupation. There are three categories for this variable included in this table: professional, housewife, and laborer. Looks pretty innocent, doesnt it? Looking at the table, it is clear that for all these categories some number is provided. So we know that there should be another category, not mentioned in the table, which is the reference category. (The reference category does not get a number, in standard models.) For the other variables we see the same. Then nally, I would like to draw your attention to the dierent models (1 to 4). As you can see, they dier in which time period is included. It is not clear, neither from the table nor from the text, why this is done. Normally, these time periods should have been included in one model as a
If you want to know a bit more, you can start here. Id est the probability that this estimate is found, while it should have been zero. Notice the values corresponding to the number of stars. Normally something is considered to be signicant when P<0.05.
5 4

Figure 1: Original article: Stephan Haggard and Marcus Noland Economic Crime and Punishment in North Korea Political Science Quarterly 2012 Vol. 127 (No. 4): pp. 659-683 categorical variable time. I suspect, interactions between time and the other variables were included in the models. This would be the way to nd trends over time. However, if this suspicion is correct, it is a bad way of reporting. Interpretation of the table is not possible without additional information. This information is not provided in the complete text.

Conclusion
The conclusion of this story is basically that you should not be too impressed by fancy and complicated tables with statistical details. The reporting of the statistics in this article is not very good. More details should be provided in order to make it possible to interpret the table. It would have been even better, for a public of non-statisticians, to provide the actual probabilities of being arrested. This is far easier to interpret and to get an impression of

what is going on in North Korea.

You might also like