Professional Documents
Culture Documents
SDA Lab 4
SDA Lab 4
RULE OF THUMB
If a viewer can’t interpret the story in 10-15 seconds, then it’s time to SIMPLIFY.
1) Graph command
2) Graph type
3) The variable name we want to use
Now Let’s open STATA’s example dataset ‘auto’ through the following command.
sysuse auto, clear
Histograms:
Visualizing Continuous Variables through One way graphs:
histogram price
histogram price, normal
histogram price, freq
sum price
Q: Open data1_sample.dta and keep only the observations with household size greater than or
equal to 1 & rent <2000.
keep if hhsize == 1 & rent < 2000
Bar Graphs:
Visualize data using the following graph commands
Eeman Qureshi SDA Lab 4
Q: Provide the command to view the mean of income for each state in separate bar graphs:
graph bar income, by (state)
Minimizing the clutter and improving clarity through the following command:
graph bar income, over (state)
Something’s not right about the above visualization. The legend as well as the axis of the graph are
extremely important so you must ensure these are clearly visible.
Q: Provide the command below to get the bar graph of the variable 'income' by state and
gender:
graph hbar income, over(state) by(sex)
Eeman Qureshi SDA Lab 4
The over option stacks the graph into the same window whereas ‘by’ splits it into different
windows.
graph hbar income, over (state) over(sex)
The above command visualizes both sex and state within the same window
In the same way, we have specified sum in brackets, we can also specify multiple other options for the
graph to show such as median, mean etc.
Q: Suppose we wanted to see the mean and median of income together for each of the states in
the same graph?
graph hbar (median) income (mean) income, over (state)
*OR
graph hbar income (median) income, over (state)
Notes: When we perform the same command for sum and mean together we can only see sum in the
graph and not mean because sum is disproportionately higher than mean.
Box Plots:
Box plot shows the distribution in which median is the line at the middle and the box represents the
interquartile range, there is the standard deviation which is capped by minimum and max values.
*The noout option eliminates outliers and provides a clear picture of the distribution.
SCATTERPLOTS:
Scatterplots are used to estimate the relationship between two variables so in other words, they are
useful for visualizing the bivariate relationship between variables.
You always write the y-variable before the x-variable in the scatterplot command:
graph twoway scatter rent size
Get rent separately for different states:
graph twoway scatter rent size, by(state)
You can also change the shape of the scatter points using the msymbol () command and use the
following symbols:
• diamond (d)
• circle (o)
• triangle (t)
• plus (+)
Options: Change the shape from dots on the graph to triangles, you can use the msymbol
command for this
graph twoway scatter rent size, msymbol (t)
Likewise, you can change the colors as well using the mcolor command (as provided in your do-files)
graph twoway scatter rent size, mcolor(lime) msymbol (t)
You can also change the intensity of the color. The intensity of the colors can be changed from
0.1-0.9 with 0.1 being the lowest and 0.9 being the highest intensity of colors.
graph twoway scatter rent size, mcolor(lime*0.3) msymbol (t)
Questions
Q1: As we have discussed earlier that the distribution of continuous variables are
determined by the histogram. Using 'histogram' command, draw the histogram of the
variable ‘income’.
Q2: As you would have noticed that y-axis of histogram of income has density, can you
change the y-axis to the frequency of the values in the variable 'income'? By using the
Stata help on histogram, redraw the histogram with frequency on y-axis and your
graph should have 10 bins.
Q3: Draw a graph to show the distribution of rent of each state within the same graph
and also comment on which scores the largest distribution of income.
Q4: Draw a scatterplot between rent (y-axis) and size (x-axis). The points should be in
triangular form and it should have red color.