Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Visualizations in R

• One of the first steps of any data analysis project is exploratory data
analysis.
• This involves exploring a dataset in three ways:
• 1. Summarizing a dataset using descriptive statistics.
• 2. Visualizing a dataset using charts.
• 3. Identifying missing values.
• By performing these three actions, you can gain an understanding of
how the values in a dataset are distributed and detect any
problematic values before proceeding to perform a hypothesis test or
perform statistical modeling.
• The easiest way to perform exploratory data analysis in R is by using
functions from the tidyverse packages.
• library(tidyverse)

• #load diamonds dataset
• data(diamonds)
• head(diamonds)
• Summary(diamonds)
• #create scatterplot of price, grouped by cut
• ggplot(data=diamonds, aes(x=cut, y=price)) +
• geom_boxplot(fill="steelblue")


• #create histogram of values for price
• ggplot(data=diamonds, aes(x=price)) +

• geom_histogram(fill="steelblue", color="black") +
• ggtitle("Histogram of Price Values")

• ggplot(data=diamonds, aes(x=carat, y=price, color=cut))


• geom_point()

• dim(diamonds)



• #create scatterplot of price, grouped by cut
• ggplot(data=diamonds, aes(x=cut, y=price)) +
• geom_boxplot(fill="steelblue")


• Step 4: Identify Missing Values
• We can use the following code to count the total number of missing
values in each column of the dataset:
• #count total missing values in each column
• sapply(diamonds, function(x) sum(is.na(x)))
• Data exploration is about the journey to find a message in your data.
The analyst is trying to put together the pieces of a puzzle.
• Data presentation is about sharing the solved puzzle with people who
can take action on the insights. Authors of data presentations need to
guide an audience through the content with a purpose and point of
view.

• The goal of data exploration is often to ask a better question. The
process of finding better questions gets to new insights and a better
understanding of how your business works.
• Data presentations are about guiding decision-makers to make
smarter choices. Much of the learning (through data exploration)
should be done, leaving the equally difficult task of communicating
the insights and the actions that should result.
• In all these ways, data exploration and data presentation are different
b

You might also like