Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Graphing Statistical Data

When the data set contains large number of values, making conclusions from an ordered array of stun-
and-leaf plot is often difficult. We will need graphs or charts in such situations. There are a number of
graphs or charts to visually show numerical data. These include histogram, frequency polygon, and
cumulative frequency (ogive).

In this section, we will discuss several graphical methods that are used for interval data. The most
important of these graphical methods is the histogram. Histogram is a powerful graphical technique
used to summarize interval data, but it also helps explain an important aspect of probability.

Histogram. A histogram is a graph in which the classes are marked on the horizontal axis (x-axis) and the
class frequencies on the vertical axis (y-axis). The height of the bars represents the class frequencies,
and the bars are drawn adjacent to each other. Nevertheless, the histogram focuses on the frequency of
each class and sacrifices whatever information is contained in the actual observation.

Frequency Polygon. A frequency polygon is a graph that displays the data using points which are
connected by lines. The frequencies are represented by the heights of the points at the midpoints of the
classes. The vertical axis represents the frequency of the distribution while the horizontal axis represents
the midpoints of the frequency distribution.

Cumulative Frequency Polygon (Ogive). A cumulative frequency polygon or ogive (read as oh’-jive) is a
graph that displays the cumulative frequencies for the classes in a frequency distribution. The vertical
axis represents the cumulative frequency of the distribution while the horizontal axis represents the
upper class boundaries (real upper limits) of the frequency distribution.

A pareto chart is a graph used to represent a frequency distribution for a categorical data for nominal-
level) and frequencies are displayed by the heights of vertical bars, which are arranged in order from
highest to lowest.

Bar Chart (Bar Graph). A bar chart is similar to bar histogram. The bases of the rectangles are arbitrary
Intervals whose centers are the codes. The height of each rectangle represents the frequency of that
category. It is also applicable for categorical data (or nominal-level).
Pie Chart (Circle Graph). A pie chart is a circle divided into portions that represent the relative
frequencies (or percentages) of the date belonging to different categories. The data in a pie chart should
be categorical or nominal-level.

Time Series Graph. A time series graph represents data that occur over specific period of time under
observation. In addition, it shows a trend or pattern on the increase or decrease over the period of time.

Pictograph (Pictogram). A pictograph immediately suggests the nature of the data being shown. It is a
combination of the attention-getting quality and the accuracy of the bar chart. Appropriate pictures
arranged in a row (sometimes in a column) pre the quantities for comparison.

Scatter Plot. A scatter plot is used to examine possible relationships between two numerical variables.
The two variables are plot in x-axis and y-axis.

Z-score is used to know the position of one observation relative to others in a set of data we apply a
score. Let say, we want to know a score of a student of 42 compared to the scores of the other students
in the class based from a quiz on a total of 50 points. The mean and the standard deviation of the scores
can be used to compute a z-score, which will measure the relative standing of a measurement in a data
set.

A z-score measures the distance between an observation and the mean, measured in units of standard
deviation. The following formulas show how to compute the z-score for a data value z in a population
and in a sample.

Pogression analysis is a simple constical rexi ained-o inadal zo dupenderice of a variable on are (or
more) explanatory variabler. This functional relationship may then be formally stated as an equation,
with associated statistical values that describe how well this equation fits the data.

A simple linear regression is the least estimator of a linear regression model with a single predictor (or
one independent variable). The least square model determines a regression equation by minimizing the
sum of squares of the vertical distances between the actual y values and the predicted values of y.
Meaning, simple linear regression fits a straight line through the set of n points in such a way that makes
the sum of squared residuals of the model as small as possible. This method gives what is generally
known as the "best-fitting" line. The difference between an observed and predicted value is called the
residual. The mean of the residuals is always zero. The points that fall outside the overall pattern of the
other points are known as outliers.
In a scatterplot, there are scores whose removal greatly changes the regression line which are called
influential scores. In some cases, these scores are restricted to points with extreme x-values. Some
influential scores may have a small residual but still have a greater effect on the regression line than
scores with possibly larger residuals but average x-values.

Y na may tataas = predicted or fitted value of y

x = the value of any particular observation of the independent variable

y-the value of any particular observation of the dependent variable.

bi= slope of the regression line. bo-intercept of the regression line.

x - mean of the independent variable.

y = mean of the dependent variable.

You might also like