Graphing Skills

Written by Matthew Donald

Updated: Friday, 14 October 2016

(hyperlinked, click on the title) Page

Independent Variables 3

Title Formation 3

Discontinuous Data in a Column Graph 3

Continuous Data in a Column Graph 4

Discontinuous Data in a Scatter Graph 5

Continuous Data in a Scatter Graph 6

Interpolation and Extrapolation 7

Data points 8

Line of best fit 9

‘Hinging’ 10

Increments on the axes 10

Using the grid 11

Gradients 11

Curve of best fit 13

Starting a Graph

The independent variable is the most important factor when deciding what type of graph to use and
what type of information you can extract. The dependent variable is analysed as a value function
secondary to the ind. variable. Thorough analysis of how and why data is collected will help make
the right choice.

To make life easy for scientists who read your report, we must make sure proper labelling is applied.
A title to a table or a graph is never going to be unnecessary ‘so put them in’. A title should be clear,
meaningful and without assumption. It should also have the ind. and dep. variables in some form.

For example, a person measures the height of radish seedlings every day. A good title would be
“Daily height of radish seedlings”. This title makes no assumptions or inferences. A less correct title
would be “How fast radish seedlings grow”. This makes the assumption that height is the only
measurement made when analysing growth and that speed is a measured value when only time is a
present parameter.

The first question when building the body of a graph should be ‘is the data continuous or
discontinuous?’ To determine this, try and measure in between each variable. For example, between
apples and bananas, there are no half apples or semi-bananas yet 15g is half way between 10g and

If measurements can be made within the data set of the independent variable then it is continuous
data. If no value can be found between the data, it is discontinuous data.

Each type of graph has a set of rules based on how the data can be processes. The following
information relates to the two most dominate types; column graphs and scatter graphs (line graphs).

Discontinuous Data in a Column Graph

This data was collected by taking 1 piece of each type of fruit, pureeing, filtering and evaporating the
filtrate to find a mass of dissolved solids. The assumption is that this mass is the mass of sugar in
piece of fruit.

We can’t take measurements between the variables but we can use this information to create an
order of highest to lowest. “The results show that the order of highest sugar to least is duran, apple,
banana and then carrot.” This could be very useful information if establishing diet/health routines.

Some like to create a second, separate graph with the ind. variables ordered as they are quoted
above but it is not necessary if discussed.

No trend can be found in this type of data (without reordering) as a pattern can’t be seen between
individual samples. This means no trend line can be drawn and no interpolation or extrapolation can
be made.

Continuous Data in a Column Graph

This data was collected by weighing out a slice of apple (e.g. 10g), pureeing, filtering and evaporating
the filtrate to find a mass of dissolved solids. The assumption is that this mass is the mass of sugar in
piece of fruit. This was repeated for other masses (e.g. 20g, 30g, and 40g).

Notice that the columns are now touching. This is a good way to observe whether the data is
continuous or discontinuous.

The data is now such that we can predict masses of sugar in between the data measured. If the mass
of sugar is required for a 15g piece of apple, the trend is first identified by a line of best fit and a
measure is taken by ruling (broken line only) from 15g through to the y-axis. This is called
interpolation which means ‘between the data’.

Notice the trend line does not have to go through every dot and that it does not have to go through
the first dot or last. (We will discuss this further in a later section.) A result of 2.9g is found from the

Alternatively, predictions can be made outside of the data, called extrapolation, this means ruling a
trend line outside of the data (make sure you use a dashed/broken line).

If data is required for “how much sugar in 50g of apple?”, firstly we need to extend the graph area to
include the extended range. Next an extension of the line of best fit is required (broken line). Finally,
repeat the interpolation section above and rule from 50g through to the y-axis to read off 9.3g.

Note: You must use all the data available when forming the trend line to maximise the reliability and
accuracy of the reading.

Extrapolation and interpolation are outside of the measured data and therefore form an inference
(or calculated guess) of the experiment. Their use is such limited to general ideas and developmental
measurements, i.e. used for future experiments of method improvements. Only direct
measurements are considered “reliable beyond reasonable error”.

Discontinuous Data in a Scatter Graph (also called a Line Graph)

Just as written above in the Discontinuous Data in a Column Graph, there can be no data found
between the measured ind. variables. The following graph has been used to trick students for years.

The data looks continuous however you cannot have half a proton in an atom. The graph is
discontinuous and is treated as such. This data is showing the average number of neutrons found in
the nucleus of the first 8 elements.

Again, no data can be interpolated or extrapolated as the data is not continuous. However you can
form trends and assign lines of best fit in cases where numbers are used instead of names. These
results can be a meaningful extraction of information but of a more general form. “As the number of
protons increases, the number of neutrons increase.” One could not use 1.5 protons to find the
number of neutrons.

Dot-to-dot (line) graphs are used in economics and weather studies significantly, even when it is
continuous data. This is because people can find value in how the seasonal fluctuations happen, not
just the overall trend. This results in the graph looking like this:

Teachers see this as difficult to read and find it near impossible to tell the dot-to-dot from the line of
best fit. They have rules that state if a graph is not easily read and understood, it is malformed.
However, there are examples of graphs that make this work. The use of colour and line thickness

Note: Applying a trend line when the data is discontinuous has a level of significant error. Have
someone check your graph to make sure it is easy to use and makes sense before accepting it as well

Continuous Data in a Scatter Graph (Line Graph)

This is by far the best and most useful type of graph when analysing experimental results. It is not
always possible to have these graphs but when they are possible, show you trends, predictions,
orders, rates, and even equations that govern the system.

This data was taken from a selection of young Australians and compares the maturity levels,
measured in mental years, to a physical age. Psychologists map these people continuously to find
any deviation from the trend of 1:1.

Data can be interpolated and extrapolated from this graph and predictions can be made for the
future mental development of young Australians. If the following question is asked, “What mental
age would a 5 ½ year old have in this sample group?” a simple interpolation could be made.

One should expect a 5 ½ year old to be developing a maturity of a 5 ¾ year old.

Extrapolation is the same: “Where should a 9 year old be at next year?”

Extrapolation shows an age of 9.2 years of maturity.

Further analysis of the graph could reveal a gradient that would give a magnitude to the rate of
increase of maturity with increasing age. This could be used to calculate future mental development
well outside of the graph area or to formulate a theory for all mental development of young people
in Australia.

Extra Points of Interest

Data points can be lost when placing a line of best fit. See here:

Marks are awarded for the plotting of data in the right spot. Missing data will lose you marks. If your
line of best fit covers data and it can’t be seen, you will lose marks for not plotting data. Best
practice is to use crosses:

The data points are still seen easily even when covered. When plotting multiple sets of data from
multiple ind. variables, a variety of acceptable crosses and plots are used:

With such a variety of data points, a key is important (necessary).

Line of best fit does not have to touch any data points, it is an average of the vertical and horizontal
values of each point. Manual placement (i.e. using a ruler) has a level of error to consider, however,
a good check is to use a digital software like Excel to compare.

The line of best fit must not go outside of the data as you have not made any measurements
outside of the data so it would be an inference to extend the line.

Example: The smallest data value above has a magnitude of 1.5 on the x-axis and therefore the line
of best fit should not extend lower than this value. Also, the smallest data value has a magnitude of
~1 on the y-axis and therefore the line of best fit should not extend lower than this value.

The same applies for the largest data value.

‘Hinging’ the line of best fit on the (0, 0) coordinate so that it always passes through this point is not
always required. It is required however when; an equation is given that implies a y-intercept of zero
at x=0, or if data is collected at (0, 0) that is, without a doubt, a legitimate data point. Best to not
‘hinge’ unless you are certain and then talk about the reasons why you did not hinge, e.g. error,
uncontrolled variables.

Variables should be written in logical increments on the axes such that reading off the data values
is easy and straight forward. The units of increment should be the same value and should be easily
divisible into minor lines for detailed data extraction.

This axis shows an increase of 0.1 for each increment. This makes it easy to break it up into minor
increments of 10x0.01s or 5x0.02s.

This axis is increasing by a steady rate of 1.1 however it is not easy to use. To divide 1.1 by 5 or 10 to
make minor increments is messy and hard. To be avoided.

This axis has a value placed on the axis for ever point measured. Reading off the data values would
be very easy. However, if we were to extrapolate or interpolate data we would have to work very
hard to find the values. This graph has an inconsistent increment increase and would be time
consuming to divide into minor increments. Not acceptable.

Using the grid is very important. This refers to the amount of space you use when graphing data.

The following data is graphed on a standard scale. The y-axis is on a scale from 0 to 12 and it looks
like there is next to no apparent trend.

However, when the data is graphed using a scale from 11 to 11.5 we see a distinct difference
between measurements.

Society is constantly tricked by these graphical ‘illusions’. Therefore, all graphs must utilise at least
60% of the graph area both above and below the data.

Gradients are really useful information that can be extracted from continuous data. If the data is
linear (straight line), it is a simple matter of choosing a rise and a run.
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝐶𝐶ℎ𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖𝑖𝑖 𝑦𝑦 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
𝑟𝑟𝑟𝑟𝑟𝑟 𝐶𝐶ℎ𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
When a report is solely reliant on the formation of a gradient to support the hypothesis, it is very
important to show how the calculation was done (show evidence).

Take this data which was measured with the addition of masses to a vertical spring scale/balance. A
trend line was drawn that best fits the data.

A manual way of showing working and taking measurements is to draw on the triangle of change:

The numbers here are not needed but a welcome addition. It shows the rise and run chosen. The
calculation would be:
Another way to show the rise and run chosen is by a straight extraction of numbers, although, you
must detail to the reader the range used. Even if it is zero, it should be written in the equation like
49 − 10
𝐺𝐺𝐺𝐺𝐺𝐺𝑑𝑑𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 = = 9.75
It is a scientist’s goal to maximise the accuracy of the measurements and extracted data from their
experiment. This means when creating a gradient, one must use the maximum length of trend line to
find the change in each axis.

The graph on the left gives a gradient of 9 and the graph on the right gives a gradient of 9.75 yet
both methods of calculation are correct. The error formed here comes from the reading of the graph
and the measuring tools.

Let us say that the force meter has an error of ±0.5N. If the percentage error is calculated by taking
the instrument error over the total, the left graph would show:
𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑜𝑜𝑜𝑜 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 0.5
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = = = 0.06 (6%)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 9
the percentage error on the right graph would show:
𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑜𝑜𝑜𝑜 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 0.5
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = = = 0.013 (1.3%)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 39
By increasing the size of measurement we decrease the size of error. This is why teachers take marks
off for small ranges in gradients.

Curves of Best Fit

Data can present in a way that doesn’t fit a linear (straight) trend. In this case a curve of best fit can
and should be used. The rules for curves are unwritten but somehow enforced. The following is a set
of graphs that we will analyse and make some general rules that help encompass the overriding rule
of, “if it makes analysis valid and easier, then it is correct.”

In mathematics, this trend would be described as such: as the values in the x axis increase, the
values in the y axis increase, at an increasing rate. This is a valid interpretation of the data and the
curve fits the data best, so, you would have to say this is a good standard.

We can see that firstly it is hard to draw and secondly to extract a worded description of the trend
would prove to be very difficult: as the values in the x-axis increase, the values in the y-axis increase
then decrease then increase again, at a changing rate…

This is not valid and does not make data analysis easy. We would have to say that there is something
not right in our data or our method. There is argument to say that this could be the correct trend
with a critical change occurring at a particular point. In high school, we tend not to go here so don’t
expect it.

This graph has a trend that is similar to the first curve of best fit above and is described the same
way. It does not touch all the points and it doesn’t have to! It actually shows clearly that there is a
data point that could be an outlier or data measurement that had a significant percentage error. This
would drive the repeat experiment where more measurements could be made at that ind. variable
in question in order to show it as an anomaly.


