Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Scatter plot

A scatter plot (also called a scatterplot, scatter graph, scatter


Scatter plot
chart, scattergram, or scatter diagram)[3] is a type of plot or
mathematical diagram using Cartesian coordinates to display
values for typically two variables for a set of data. If the points are
coded (color/shape/size), one additional variable can be displayed.
The data are displayed as a collection of points, each having the
value of one variable determining the position on the horizontal
axis and the value of the other variable determining the position on
the vertical axis.[4]

Overview
A scatter plot can be used either when one continuous variable is
under the control of the experimenter and the other depends on it One of the Seven Basic Tools of
or when both continuous variables are independent. If a parameter Quality
exists that is systematically incremented and/or decremented by the First John Herschel[1]
other, it is called the control parameter or independent variable and
described
is customarily plotted along the horizontal axis. The measured or
by
dependent variable is customarily plotted along the vertical axis. If
no dependent variable exists, either type of variable can be plotted Purpose To identify the type of
on either axis and a scatter plot will illustrate only the degree of relationship (if any)
correlation (not causation) between two variables. between two
quantitative variables
A scatter plot can suggest various kinds of correlations between
variables with a certain confidence interval. For example, weight
and height would be on the y-axis, and height would be on the
x-axis. Correlations may be positive (rising), negative (falling),
or null (uncorrelated). If the dots' pattern slopes from lower left
to upper right, it indicates a positive correlation between the
variables being studied. If the pattern of dots slopes from upper
left to lower right, it indicates a negative correlation. A line of
best fit (alternatively called 'trendline') can be drawn to study the
relationship between the variables. An equation for the
correlation between the variables can be determined by
established best-fit procedures. For a linear correlation, the best-
fit procedure is known as linear regression and is guaranteed to
generate a correct solution in a finite time. No universal best-fit
procedure is guaranteed to generate a correct solution for Waiting time between eruptions and the
arbitrary relationships. A scatter plot is also very useful when we duration of the eruption for the Old
wish to see how two comparable data sets agree to show Faithful Geyser in Yellowstone National
nonlinear relationships between variables. The ability to do this Park, Wyoming, USA. This chart
can be enhanced by adding a smooth line such as LOESS.[5] suggests there are generally two types
Furthermore, if the data are represented by a mixture model of of eruptions: short-wait-short-duration,
simple relationships, these relationships will be visually evident and long-wait-long-duration.
as superimposed patterns.
The scatter diagram is one of the seven basic tools of quality
control.[6]

Scatter charts can be built in the form of bubble, marker, or/and


line charts.[7]

Example
For example, to display a link between a person's lung capacity,
and how long that person could hold their breath, a researcher
would choose a group of people to study, then measure each
one's lung capacity (first variable) and how long that person
could hold their breath (second variable). The researcher would
A 3D scatter plot allows the
then plot the data in a scatter plot, assigning "lung capacity" to
visualization of multivariate data. This
the horizontal axis, and "time holding breath" to the vertical axis.
scatter plot takes multiple scalar
variables and uses them for different
A person with a lung capacity of 400 cl who held their breath for
axes in phase space. The different
21.7 s would be represented by a single dot on the scatter plot at
variables are combined to form
the point (400, 21.7) in the Cartesian coordinates. The scatter
coordinates in the phase space and
plot of all the people in the study would enable the researcher to
they are displayed using glyphs and
obtain a visual comparison of the two variables in the data set
coloured using another scalar
and will help to determine what kind of relationship there might
variable.[2]
be between the two variables.

Scatter plot matrices


For a set of data variables (dimensions) X1 , X2 , ... , Xk , the scatter plot matrix shows all the pairwise scatter
plots of the variables on a single view with multiple scatterplots in a matrix format. For k variables, the
scatterplot matrix will contain k rows and k columns. A plot located on the intersection of row and jth
column is a plot of variables Xi versus Xj.[8] This means that each row and column is one dimension, and
each cell plots a scatter plot of two dimensions.

A generalized scatter plot matrix[9] offers a range of displays of paired combinations of categorical and
quantitative variables. A mosaic plot, fluctuation diagram, or faceted bar chart may be used to display two
categorical variables. Other plots are used for one categorical and one quantitative variables.
Visualization of 3D data along with the correspondent scatterplot matrix

See also
Data and information visualization
Rug plot
Bar graph
Line chart
Scagnostics
Dot plot (statistics)

References
1. Friendly, Michael; Denis, Dan (2005). "The early origins and development of the scatterplot".
Journal of the History of the Behavioral Sciences. 41 (2): 103–130. doi:10.1002/jhbs.20078
(https://doi.org/10.1002%2Fjhbs.20078). PMID 15812820 (https://pubmed.ncbi.nlm.nih.gov/1
5812820).
2. Visualizations that have been created with VisIt (https://wci.llnl.gov/codes/visit/gallery.html)
at wci.llnl.gov. Last updated: November 8, 2007.
3. Jarrell, Stephen B. (1994). Basic Statistics (Special pre-publication ed.). Dubuque, Iowa:
Wm. C. Brown Pub. p. 492. ISBN 978-0-697-21595-6. "When we search for a relationship
between two quantitative variables, a standard graph of the available data pairs (X,Y), called
a scatter diagram, frequently helps..."
4. Utts, Jessica M. Seeing Through Statistics 3rd Edition, Thomson Brooks/Cole, 2005, pp
166-167. ISBN 0-534-39402-7
5. Cleveland, William (1993). Visualizing data (https://archive.org/details/visualizingdata00wil
l). Murray Hill, N.J. Summit, N.J: At & T Bell Laboratories Published by Hobart Press.
ISBN 978-0963488404.
6. Nancy R. Tague (2004). "Seven Basic Quality Tools" (http://www.asq.org/learn-about-quality/
seven-basic-quality-tools/overview/overview.html). The Quality Toolbox. Milwaukee,
Wisconsin: American Society for Quality. p. 15. Retrieved 2010-02-05.
7. "Scatter Chart - AnyChart JavaScript Chart Documentation" (https://web.archive.org/web/201
60201084227/http://docs.anychart.com/7.9.0/Basic_Charts_Types/Scatter_Chart). AnyChart.
Archived from the original (http://docs.anychart.com/7.9.0/Basic_Charts_Types/Scatter_Char
t) on 1 February 2016. Retrieved 3 February 2016.
8. Scatter Plot Matrix (http://www.itl.nist.gov/div898/handbook/eda/section3/scatplma.htm) at
itl.nist.gov.
9. Emerson, John W.; Green, Walton A.; Schoerke, Barret; Crowley, Jason (2013). "The
Generalized Pairs Plot". Journal of Computational and Graphical Statistics. 22 (1): 79–91.
doi:10.1080/10618600.2012.694762 (https://doi.org/10.1080%2F10618600.2012.694762).
S2CID 28344569 (https://api.semanticscholar.org/CorpusID:28344569).

External links
Media related to Scatterplots at Wikimedia Commons
What is a scatterplot? (http://www.psychwiki.com/wiki/What_is_a_scatterplot%3F)
Correlation scatter-plot matrix for ordered-categorical data (http://www.r-statistics.com/2010/0
4/correlation-scatter-plot-matrix-for-ordered-categorical-data/) – Explanation and R code
Density scatterplot for large datasets (http://www.r-bloggers.com/ggplot2-for-big-data/)
(hundreds of millions of points)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Scatter_plot&oldid=1139942287"

You might also like