Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

PERCENTILE AND PERCENTILE RANKS

- percentile - comparing the person's rank than the reference group


- 96th percentile - the person stands 96% higher than the rest of the reference group /
sample
- 96% students are below
- percentile is a score/value ( gives generalized score or idea of the positioning of sth )

Quatile - divison of 4
Percentile - division of 100
Decile - division of 10

Percentile Ranks
•The PR of a given score is the number representing the percentage of cases in the group
lying below the given score, while the percentile is the score below which lies a given % of
cases.
• The distinction between a percentile and PR will be clear if one remembers that in
calculating the percentile one starts with a certain percentage of N and then counts into the
distribution the given / and the point reached is the required percentile.

An example of percentile - the 20th percentile is the value below which 20% of the
observations maybe found or it can be interpreted that the individual is above 20% of the
observations to be found.
If a person has an IQ of 120 is at the 91st percentile it indicates that the person's IQ is
higher than 91% of the people in the group.

One starts with a certain percentage of N and then counts into the distribution the given
percent and the point reached is the required percentile.

The procedure - the procedure for computing PR is the reverse of this. Here we begin with
the individual score and determine the percentage of score which lie below it. Eg - If this
percentage is 20, then the score has a PR of 20.

Eg - If we find that P45 is 94.5 it means that 45 percent of the cases scored below 94.5.

Uses of percentile
- the most frequent application of percentiles is in psychological testing.
- standardised tests usually report norms in terms of percentile values of raw scores. The
percentile system enables us to compare the standing of a student in various tests. Knowing
that the student has scored 52 in maths and 55 on a science test doesn't help us understand
or estimate their ability. But if we know that the two scores corresponds to the 84th and the
92nd percentile respectively we can compare their standings in the two subjects. similarly
percentiles are useful in comparing one individual to the rest of the group
- percentile norms are suitable for many types of tests such as aptitude, achievement,
personality, and intelligence

Limitations and disadvantages


- percentile norms will have no meaning in the norm group or the reference group is not
taken into consideration. For example - if student A has a PR of 60 on a vocabulary test, it
doesn't clearly convey much unless the reference group is specified.
- the percentile unit rank are unequal in length.
For eg - 1st - 39
2nd - 28
3rd - 27
- because of this inequality it is not correct to combine the PR of an individual onto or more
tests to find an average percentile rank.
- percentiles are not well suited to the computation of mean and correlation. ( Can't study
these )

A measure of variability is an indicator showing the extent to which scores tend to scatter or
spread around a measure of central tendency. The quantitative measures of variability are
range, quartile deviation, average deviation and standard deviation. All these measures
represent distances rather than the points and the larger they are the greater is the
variability.
Quartile deviation
- quartiles are points which divide the distribution into four equal paths. ( The number line )
Q1 is defined as the point below which 25 percent of the scores lie
Q2 is the median - 50%
Q3 - 75%
Q4 - 100%
The quartile deviation is one half the distance between the first quartile and the third quartile
and therefore can also be called semi inter quartile range
Write the formula here - 3N/4
- quartiles are points and not quarters

Merits of QD
- as it is independent of the values of extreme scores in a series or data set - the Q is more
a representative measure of variability than the range.
- quartiles are useful in indicating the skewness of a deviation. If the distribution is
symmetrical then the distance of the Q1 and Q3 are equidistant from the Q2. When the
skewness in the distribution - the distance would be unequal.
- as the quartile measures the average of the quartile points from the median it is a good
index of the score density in the middle of the distribution. If the scores are packed together
closely the Qd or the quartile deviation will be very small. If they are widely scattered the QD
will be larged.

Limitations
- since it is based on the middle 50% of the scores, it gives no information about the upper
and lower 25% scores
- like the median the QD of two distributions ( two different data sets ) cannot be combined
- methods of determining quartile deviation of ungrouped data are not satisfactory.
- quartile measures computated from series which are short to warrant grouping are of
limited uses

Range
Range is a rough or a proximate indicator of variability as two distributions they have the
same range but very different variability (distance)
- group A Group B
10 10
20 47
30 48
40 49
50. 43
60 45
70 44
80 10
90

Therefore it is an unreliable measure and can't be used in every situation


It should not be used when two distributions have a very different number of cases
It shouldn't be used for comparing two distributions where the units of measurement are not
the same for example - the heights of two groups are given in inches and cms respectively
the range would be meaningless.
AVERAGE DEVIATION
it is also called as the mean deviation. it is the mean of deviation of the scores in a series
taken from the mean.
- when a score coincides with a mean, its deviation is zero. deviation from the true mean is
represented by 'x'
x=X-M

AD - formula

- we do not take the signs of the deviations into the considerations we fake them as
absolute value that is we disregard the direction and consider only the magnitude

merits
- it is the simplest measure of variability that takes into account all the fluctuations if the
scores in a series.
- it is the most meaningful measure even to people untrained in statistics

limitations
- as the AD, is based upon the deviation of all the scores it may be inflated because of a
singlw extreme score. but when a series is long, then this limitation is not very serious
- this measure is not very commonly used in mathematics and that's why it is infrequently
used in research, however the standard deviation which takes into account all the scores is
the most stable index of variability and is customarily.
CORRELATION
- A coefficient of correlation tells us to what extent two variables are related and how
variation in one variable goes with variation in the other variables
- knowledge of co variation makes prediction possible in the natural sciences correlation
tends to be perfect or nearly perfect co- relation. For ex - a column of mercury in a
thermometer rises or falls according to the temperature. Similarly the relationship between
pressure and volume of a gas at a given temperature is perfect.
- In the biological and social sciences the correlation is rarely perfect. For ex - height and
weight vary together though certainly no perfectly. Tall people generally weigh more than
short people but the correlation is not perfect because there are people who are tall and thin
or short on the heigher side. Similarly the relationship between IQ and academic
performance is significant and postive but not perfect. (Perfect correlation= +1 or -1).

DEGREES OF CORRELATION
(Explain the different types of correlation with examples)
- the relation between two variables is expressed by an index known as - coefficient of
correlation. it is used to represent the relationship between two variables. The index may
take any value from -1 to +1.
- if two variables are perfectly related it is a perfectly positive relationship - +1 so the
coefficient will be +1. If it is a perfect negative relationship the coefficient is -1. If there is
absolutely no correlation between the variables the correlation is 0
- if the relationship is positive but not perfect the coefficient will be less 1. 0.51, 0.81 etc... A
correlation can also be negative that is a high score in one trait maybe associated with a low
score on another. In such a case the relationship is said to be inverse. Negative correlations
which are not perfect that is less than -1 for ex - -.24, -.89, -.91 etc.. As the value of the
number increases it indicates a greater degree of correlation or a stronger correlation. The
strength/ magnitude of a correlation is determined by the number that is closer to 1. The
closer it is to 1 the stronger it is.
- the sign of the coefficient indicates the dircetion (whether it is positive or negative) for ex -
school achievement and absence from school show negative correlation. Self confidence
and submissiveness shows negative correlation.

HOW DO WE INTERPRET THE CORRELATION COEFFICIENT?


can give a sum for part a and part b can be six marker theory related to the sum
When the coefficient is interpreted, two things are considered, one is the sign and the other
is the number. The sign of the coefficient indicates the direction of the relationship. Positive
correlation coefficient indicates a direct relationship, i.e. there is a tendency for the two
series to vary in the same direction while negative coefficients indicate an inverse
relationshipi.e the two variables tend to vary in opp directions. A coefficient of -.82 denotes
just as strong of a relationship as a coefficient of +.82. +.82 indicates direct covariance.
Both have equal predictive value.
Another equally important but far more difficult thing to interpret is the magnitude or size of
the coefficient. The size of the coefficient indicates the degree of closeness of the
relationship, just as the sign indicates the direction of the relationship. The minimum
coefficient is zero which indicates no relationship whatsoever. From the minimum value, the
coefficients increase in both directions until -1 is reached for one limit and +1 for the other.
Bothe -1 and +1 indicate an equally close relationship and both are perfect. Their one imp
diff is the their direction. Before one can describe a relationship as high or low or strong and
weak, one has to decide why
What would be a large coefficient for one would be regarded as a moderate one for another.
Interpretation therefore is largely a relative matter. A coefficient of +0.65 between an
intelligence test and achivement test in school may be considered very high initially however
when the scores on two parallele forms of the same test may be considered moderate.
Therefore it is important to consider the purpose of computing the coefficient while
interpreting it.
SPEARMAN RHO and PEARSON R – RANK METHOD
Spearman rank order method is the best method for computing the correlation
when only the ranks are available or given to us. It is designated by the Greek letter rho.
When there are no ties, rho is equal to r. But when there are tied positions, the value of rho
is slightly different from r. For practical reasons, rho maybe taken as a close approximation
of p
Pearson's r. The rank difference/ rank order method provides a quick and convenient method
of estimating the relationship between two variables, but there are certain diffences between
two methods. The PMM Method deals with the size of the scores as well as with the
positions in the series. The rank diff method on the other hand takes into account only the
positions in the series. Eg, individuals who score 80,78,60 on a test would be ranked 1st and
2nd third respectively. Although the diff between 80 and 78 is much less than the diff
between 78 and 60. Accuracy may be lost in translating scores into ranks especially when
there are number of ties. In spite of its mathematical disadvantages, rho provides a quick
and convenient way of estimating the correlation when n is small or when we only have
ranks. Neither of two methods can be used when the relationship between the two variables
is non linear.

Pearsons product-moment method (PPMM)


- It is most widely used and the best measure if correlation. It was developed by the
statistician carl Pearson in the 1900. The product moment of coefficient of correlation is
essentially a ratio which expresses the degree to which changes in one variable are
accompanied by changes in the second variable. For ex - if we want to find the correlation
between scores on a reading test and a vocabulary test that is we want to determine the
relationship between reading and vocabulary. The major assumption underlining the use of
PPMM and the rank order (other method of calculation - spearman) is the assumption of
linearity. Which means the relationship between the two sets of scores can be described by
a straight line. If the data tends to follow some curve, then these methods are not applicable.
When the relationship is linear the graphical representation is a line. This happenes when an
increase in one variable is accompanied by an increase / decrease in the other variable.
Sometimes however the direction of the relationship may differ at different levels of the
variable that we are assessing. For ex - phsycial strength tends to increase with age upto
early adulthood. And then it begins to decline especially from middle age to old age.
(Non-linear) the relationship between anxiety and performance also tends to be non-linear.
Most psychological tests when administered to a large sample display linear of
approximately linear relationship.
REGRESSION

2 TYPES - simple and multiple regression


- definition - regression is defined as the analysis of the relationship among variables for the
purpose of understanding how one variable may predict another.
- regression is causation
- IV - the predictor variable (x)
- DV - the outcome variable (y)
- FORMULA --> y = a + bx
- a = intercept (interception)
- b = slope of the line
- x = predictor
- y = outcome
Why is the level of regression - finding the outcome

Simple regression
- only two variables - IV and DV
- involves one IV as the predictor variable and one DV as the outcome variable.
- simple regression analysis results in an equation for a regression line
- the regression line is the best fit that is it is a straight line that in one sense comes closest
to the greatest number of points on the scatterplot of x and y (FORMULA --> y = a + bx). this
equation is a equation of a straight line also the equation of the regression line. in the
formula the a and b are the regression coefficients. a stands for the intercept that is constant
indicating where the line crosses. And b stands for the slope of the line
- the regression line represented by specific values of a and b is fitted precisely to the points
of the scatterplot. The values of a and b can be determined through simple algebreic
calculation that is through the formula
- the primary use of regression in testing is to predict one score or variable from another.
Example - suppose a dean at a college of dentistry wishes to predict the GPA, an applicant
might have after the first year of college, he or she would accumulate the data about the
current students scores on the college exams and the end of the first year GPA. This day
would then be used to predict the GPA (y - outcome) (from the score in the dental college
entrance exam - x - predictor)
GPA= a + bx (scores of the college entrance exams)
- using the regression line, the likely value of y that is the GPA, can be predicted based on
specific values of x (college entrance exam) by plugging the x value to the equation
- a student with an entrance exam score for 50 would be expected to have GPA of 2.3
where in a=0.82 and b=0.03
- a student with an entrance exam score of 85 would be expected to have GPA of 7.37
- this prediction can also be done graphically by tracing a particular value that is the
entrance exam score upto the regression line that is usually on the x axis. Upto to the
regression line and straight to the y axis (predicted GPA). Ofcourse the students who get an
entrance exam of 50 do not get the same GPA, this is called as error in the prediction. Each
of these students would be predicted to get the same GPA based on the entrance exam
scored but they have obtained different GPA. This error in prediction of 'y' from 'x' is
represented by the standard error of estimates.
- the higher the correlation between x and y the greater the prediction, and the smaller
standard error of the estimates.
MULTIPLE REGRESSION (12m; 6m for each type)
Suppose the dean suspects that the GPA will be enhanced by another test score. For
example, a score on a test of fine motor skills which is also used as a predictor. The use of
more than one score to predict 'y' requires the use of multiple regression.
Multiple regression takes into account the intercorrelation among all the variables involved.
The correlation between all the predictor scores and what is being predicted is the
correlation of the entrance exam of the entrance exam and the fine motor skills test with the
GPA in the first year of dental college.
Predictions that correlate highly with the predicted variable are generally given more weight.
This means that their regression coefficients are larger. The multiple regression equation
also takes into account the correlations among the predictor scores. In this case it takes into
account the correlation between the dental college entrance exams and the scores on the
fine motor skills tests. If many predictors are used and if one is not correlated with any of the
others but is correlated w the predicted scores then that predictor is given relatively more
weight because it is providing unique information.
In contrast if two predictor scores are highly correlated w each other then they could be
providing redundant information. If both were kept in the regression equation, each might be
given less weight so that they would share the prediction of "y". More predictor variables are
not necessarily better. If two predictors are providing the same information, the person using
the regression equation may decide to use only one of them for the sake of efficiency.
Example: If the dean observes that the dental college entrance exam scores and the scores
on the test of fine motor skills were highly correlated with each other and that each of these
scores correlate about the same with the GPA, the dean might decide to use only one
predictor variable, because nothing is gained by the addition of the second variable. On the
other hand, I'd the second predictor (FMS) is highly correlated with the GPA but not highly
correlated with the college entrance exam score then that would also be used as a predictor
because the information it provides is different from that given by the entrance exam scores.

You might also like