Professional Documents
Culture Documents
Untitled Document
Untitled Document
Quatile - divison of 4
Percentile - division of 100
Decile - division of 10
Percentile Ranks
•The PR of a given score is the number representing the percentage of cases in the group
lying below the given score, while the percentile is the score below which lies a given % of
cases.
• The distinction between a percentile and PR will be clear if one remembers that in
calculating the percentile one starts with a certain percentage of N and then counts into the
distribution the given / and the point reached is the required percentile.
An example of percentile - the 20th percentile is the value below which 20% of the
observations maybe found or it can be interpreted that the individual is above 20% of the
observations to be found.
If a person has an IQ of 120 is at the 91st percentile it indicates that the person's IQ is
higher than 91% of the people in the group.
One starts with a certain percentage of N and then counts into the distribution the given
percent and the point reached is the required percentile.
The procedure - the procedure for computing PR is the reverse of this. Here we begin with
the individual score and determine the percentage of score which lie below it. Eg - If this
percentage is 20, then the score has a PR of 20.
Eg - If we find that P45 is 94.5 it means that 45 percent of the cases scored below 94.5.
Uses of percentile
- the most frequent application of percentiles is in psychological testing.
- standardised tests usually report norms in terms of percentile values of raw scores. The
percentile system enables us to compare the standing of a student in various tests. Knowing
that the student has scored 52 in maths and 55 on a science test doesn't help us understand
or estimate their ability. But if we know that the two scores corresponds to the 84th and the
92nd percentile respectively we can compare their standings in the two subjects. similarly
percentiles are useful in comparing one individual to the rest of the group
- percentile norms are suitable for many types of tests such as aptitude, achievement,
personality, and intelligence
A measure of variability is an indicator showing the extent to which scores tend to scatter or
spread around a measure of central tendency. The quantitative measures of variability are
range, quartile deviation, average deviation and standard deviation. All these measures
represent distances rather than the points and the larger they are the greater is the
variability.
Quartile deviation
- quartiles are points which divide the distribution into four equal paths. ( The number line )
Q1 is defined as the point below which 25 percent of the scores lie
Q2 is the median - 50%
Q3 - 75%
Q4 - 100%
The quartile deviation is one half the distance between the first quartile and the third quartile
and therefore can also be called semi inter quartile range
Write the formula here - 3N/4
- quartiles are points and not quarters
Merits of QD
- as it is independent of the values of extreme scores in a series or data set - the Q is more
a representative measure of variability than the range.
- quartiles are useful in indicating the skewness of a deviation. If the distribution is
symmetrical then the distance of the Q1 and Q3 are equidistant from the Q2. When the
skewness in the distribution - the distance would be unequal.
- as the quartile measures the average of the quartile points from the median it is a good
index of the score density in the middle of the distribution. If the scores are packed together
closely the Qd or the quartile deviation will be very small. If they are widely scattered the QD
will be larged.
Limitations
- since it is based on the middle 50% of the scores, it gives no information about the upper
and lower 25% scores
- like the median the QD of two distributions ( two different data sets ) cannot be combined
- methods of determining quartile deviation of ungrouped data are not satisfactory.
- quartile measures computated from series which are short to warrant grouping are of
limited uses
Range
Range is a rough or a proximate indicator of variability as two distributions they have the
same range but very different variability (distance)
- group A Group B
10 10
20 47
30 48
40 49
50. 43
60 45
70 44
80 10
90
AD - formula
- we do not take the signs of the deviations into the considerations we fake them as
absolute value that is we disregard the direction and consider only the magnitude
merits
- it is the simplest measure of variability that takes into account all the fluctuations if the
scores in a series.
- it is the most meaningful measure even to people untrained in statistics
limitations
- as the AD, is based upon the deviation of all the scores it may be inflated because of a
singlw extreme score. but when a series is long, then this limitation is not very serious
- this measure is not very commonly used in mathematics and that's why it is infrequently
used in research, however the standard deviation which takes into account all the scores is
the most stable index of variability and is customarily.
CORRELATION
- A coefficient of correlation tells us to what extent two variables are related and how
variation in one variable goes with variation in the other variables
- knowledge of co variation makes prediction possible in the natural sciences correlation
tends to be perfect or nearly perfect co- relation. For ex - a column of mercury in a
thermometer rises or falls according to the temperature. Similarly the relationship between
pressure and volume of a gas at a given temperature is perfect.
- In the biological and social sciences the correlation is rarely perfect. For ex - height and
weight vary together though certainly no perfectly. Tall people generally weigh more than
short people but the correlation is not perfect because there are people who are tall and thin
or short on the heigher side. Similarly the relationship between IQ and academic
performance is significant and postive but not perfect. (Perfect correlation= +1 or -1).
DEGREES OF CORRELATION
(Explain the different types of correlation with examples)
- the relation between two variables is expressed by an index known as - coefficient of
correlation. it is used to represent the relationship between two variables. The index may
take any value from -1 to +1.
- if two variables are perfectly related it is a perfectly positive relationship - +1 so the
coefficient will be +1. If it is a perfect negative relationship the coefficient is -1. If there is
absolutely no correlation between the variables the correlation is 0
- if the relationship is positive but not perfect the coefficient will be less 1. 0.51, 0.81 etc... A
correlation can also be negative that is a high score in one trait maybe associated with a low
score on another. In such a case the relationship is said to be inverse. Negative correlations
which are not perfect that is less than -1 for ex - -.24, -.89, -.91 etc.. As the value of the
number increases it indicates a greater degree of correlation or a stronger correlation. The
strength/ magnitude of a correlation is determined by the number that is closer to 1. The
closer it is to 1 the stronger it is.
- the sign of the coefficient indicates the dircetion (whether it is positive or negative) for ex -
school achievement and absence from school show negative correlation. Self confidence
and submissiveness shows negative correlation.
Simple regression
- only two variables - IV and DV
- involves one IV as the predictor variable and one DV as the outcome variable.
- simple regression analysis results in an equation for a regression line
- the regression line is the best fit that is it is a straight line that in one sense comes closest
to the greatest number of points on the scatterplot of x and y (FORMULA --> y = a + bx). this
equation is a equation of a straight line also the equation of the regression line. in the
formula the a and b are the regression coefficients. a stands for the intercept that is constant
indicating where the line crosses. And b stands for the slope of the line
- the regression line represented by specific values of a and b is fitted precisely to the points
of the scatterplot. The values of a and b can be determined through simple algebreic
calculation that is through the formula
- the primary use of regression in testing is to predict one score or variable from another.
Example - suppose a dean at a college of dentistry wishes to predict the GPA, an applicant
might have after the first year of college, he or she would accumulate the data about the
current students scores on the college exams and the end of the first year GPA. This day
would then be used to predict the GPA (y - outcome) (from the score in the dental college
entrance exam - x - predictor)
GPA= a + bx (scores of the college entrance exams)
- using the regression line, the likely value of y that is the GPA, can be predicted based on
specific values of x (college entrance exam) by plugging the x value to the equation
- a student with an entrance exam score for 50 would be expected to have GPA of 2.3
where in a=0.82 and b=0.03
- a student with an entrance exam score of 85 would be expected to have GPA of 7.37
- this prediction can also be done graphically by tracing a particular value that is the
entrance exam score upto the regression line that is usually on the x axis. Upto to the
regression line and straight to the y axis (predicted GPA). Ofcourse the students who get an
entrance exam of 50 do not get the same GPA, this is called as error in the prediction. Each
of these students would be predicted to get the same GPA based on the entrance exam
scored but they have obtained different GPA. This error in prediction of 'y' from 'x' is
represented by the standard error of estimates.
- the higher the correlation between x and y the greater the prediction, and the smaller
standard error of the estimates.
MULTIPLE REGRESSION (12m; 6m for each type)
Suppose the dean suspects that the GPA will be enhanced by another test score. For
example, a score on a test of fine motor skills which is also used as a predictor. The use of
more than one score to predict 'y' requires the use of multiple regression.
Multiple regression takes into account the intercorrelation among all the variables involved.
The correlation between all the predictor scores and what is being predicted is the
correlation of the entrance exam of the entrance exam and the fine motor skills test with the
GPA in the first year of dental college.
Predictions that correlate highly with the predicted variable are generally given more weight.
This means that their regression coefficients are larger. The multiple regression equation
also takes into account the correlations among the predictor scores. In this case it takes into
account the correlation between the dental college entrance exams and the scores on the
fine motor skills tests. If many predictors are used and if one is not correlated with any of the
others but is correlated w the predicted scores then that predictor is given relatively more
weight because it is providing unique information.
In contrast if two predictor scores are highly correlated w each other then they could be
providing redundant information. If both were kept in the regression equation, each might be
given less weight so that they would share the prediction of "y". More predictor variables are
not necessarily better. If two predictors are providing the same information, the person using
the regression equation may decide to use only one of them for the sake of efficiency.
Example: If the dean observes that the dental college entrance exam scores and the scores
on the test of fine motor skills were highly correlated with each other and that each of these
scores correlate about the same with the GPA, the dean might decide to use only one
predictor variable, because nothing is gained by the addition of the second variable. On the
other hand, I'd the second predictor (FMS) is highly correlated with the GPA but not highly
correlated with the college entrance exam score then that would also be used as a predictor
because the information it provides is different from that given by the entrance exam scores.