Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

RAUF AHMED (BS 3rd YEAR MORNING)

RESEARCH METHADOLOGY
ASSIGNMENT SUBMITTED TO SIR SHAHAN

VALIDITY AND RELIABILITY


Reliability refers to the consistency of a measure. Psychologists consider three
types of consistency: over time (test-retest reliability), across items (internal
consistency), and across different researchers (inter-rater reliability).

Test-Retest Reliability

When researchers measure a structure that they believe will remain constant
over time, the score they achieve should also be consistent with time. This is
what test-retest reliability means. For example, intelligence is generally
considered chronological. The most intelligent person today is the most
intelligent person next week. These good spirits should give this guy almost
the same score next week. Clearly, the measure that produces the most
inconsistent scores over time may not be a good measure of the structure that
is considered stable.

Assessing the test - using the measurement on a group of people at a time for
retest reliability, then reusing it on individuals in the same group, and then
using the test - looking back at the correlation between the two sets of scores.
This is usually done by graphing the data in the scalp plot and calculating the
correlation coefficient. Twice a week separately on the Rosenberg self-esteem
scale, Figure 4.2 shows the correlation between the scores of several university
students. The correlation coefficient for this data is +.95. Generally, a test-
correlation of +.80 or higher is considered to indicate good reliability.
Figure 4.2 Test-Retest Correlation Between Two Sets of Scores of Several College Students on the
Rosenberg Self-Esteem Scale, Given Two Times a Week Apart

Again, the high test - when the relationship correlation measuring structure is
time-appropriate makes sense when it comes to intelligence, self-esteem and
the Big Five personality measurements. But other structures are not
considered stable over time. The nature of the mood, for example, it changes.
So re-testing the correlation over a one-month period is not a cause for
concern - a measure of the mood that produces less testing.

Internal Consistency

Another type of reliability is internal consistency, which is the consistency of


people's responses across elements on a multi-factor measure. In general, all
the elements on such actions reflect the same underlying structure, so people’s
scores on those factors should be related to each other. On the Rosenberg self-
esteem scale, those who accept that they are eligible agree that they have very
good qualities. If people's reactions to different objects are not related to each
other, it no longer makes sense to state that they all measure the same
underlying structure. The same is true for behavioral and physical actions for
self-report actions. For example, people can bet in a row on a roulette fake
game that measures their level of risk. This measure corresponds to the degree
to which the bet of the individual participants in the trial is more or less
constant.

Test - As with retest reliability, internal consistency can only be assessed by


collecting and analyzing data. Looking for a split-half correlation is one
approach. It divides objects into two sets, such as first and second parts or odd
and even-numbered elements. A score is calculated for each set and the
relationship between the scores of the two sets is examined. For example, Figs.
4.3 Most university students show a split-half correlation between their scores
on the score-odd objects and on the odd-numbered items on the Rosenberg
self-esteem scale. The correlation coefficient for this data is +.88. A split-half
correlation of +.80 or higher generally has good internal consistency.

Figure 4.3 Split-Half Correlation Between Several College Students’ Scores on the Even-Numbered Items
and Their Scores on the Odd-Numbered Items of the Rosenberg Self-Esteem Scale
The most common measure of internal consistency used by researchers in
psychology is Kronbach's statistic, also known as the Greek alpha.
Conceptually, is the average of all possible split-half correlations for a set of
objects. For example, there are 252 ways to divide a set of 10 objects into five
sets. Kronbach's α 252 is a tool for split-half correlations. Notice how it
actually counts, but this is the right way to understand the meaning of this
number. Then, a value of +.80 or higher is usually taken to indicate good
internal consistency.

Interrater Reliability

Many behavioral actions involve important decisions of the observer or rater.


Inter-rater reliability means that different observers are consistent with their
decisions. For example, if you are interested in measuring the social skills of
university students, you can make a video recording of their conversation with
another student they are meeting for the first time. Then you can have two or
more observers watch the video and rate each student's social skills. In fact,
each participant has some social skills that are recognized by the attentive
observer, and the ratings of different observers should be more correlated with
each other. Inter-rater reliability is also measured in Bandura's Bobo Doll
study. In this case, observer ratings should be positively correlated with how
much aggression a particular child has committed while playing with the Bobo
Doll. Intraoperative reliability is often assessed using Kronbach using when
the decisions are quantitative or hierarchically called the relevant statistical
conco (Greek letter kappa).

VALIDITY

Validity means that the scores obtained from the measurement indicate the
variable to which they are intended. But how do researchers make this
decision? We have already considered one factor that they take into account -
reliability. When the measure has good test-re-examination reliability and
internal consistency, researchers should be more confident that the scores
indicate what they are going to do. However, it should be more, because one
measurement is very reliable but it is not valid. As an absurd example, imagine
that the index finger length reflects the self-esteem of the people and therefore
attempts to measure self-esteem by holding the ruler up to the index finger of
the people. Although this measurement has very good test-return reliability, it
is not valid. The fact that one person's index finger is one centimeter longer
than another indicates that one does not have self-esteem.
The discussion of authenticity generally divides it into several types. But a
good way to describe these types is that they are evidence other than reliability
- they should be considered depending on the authenticity of the
measurement. Here we consider three basic types: face authenticity, content
authentication, and standard validity.

Face Validity

Face validity means how far the measurement method looks to build interest


“on its face”. Most people expect to include in their self-esteem questionnaire
whether they see themselves as a worthy person and whether they feel they
have good qualities. So a questionnaire containing these types of items has
good facial authenticity. The finger length method of measuring self-esteem,
on the other hand, has nothing to do with self-esteem and is therefore less
valid. However, facial authenticity can be quantitatively assessed - for
example, in the context of a large sample of people whether a measure appears
to measure what its purpose is - which is usually informal. Will be evaluated
from.

Face validity is very weak evidence, the measurement method guesses what it
is. One reason is that it depends on people’s intuition about human behavior,
which is often wrong. Many fixed actions work well in psychology despite the
lack of facial authenticity. The Minnesota Multifaceted Personality Inventory-
2 (MMPI-2) measures many personality traits and disorders, allowing people
to determine that more than 567 different statements apply to them - where
multiple statements have a clear relationship to that structure. They are not
measurable. For example, "I enjoy goo action or secret stories" and "Seeing
blood does not scare me or make me sick" both suppress aggression. In this
case, the participants 'answers to these questions of interest were not literal,
but whether the range of participants' responses matched the questions of the
people suppressing their aggression.

Content Validity

Content validity The validity of a material "covers" the scope of interest. For


example, if a researcher defines test anxiety as sympathetic nervous system
activation (leading to nervous emotions) and negative thoughts, his test
measure of anxiety includes both nervous feelings and negative thoughts.
Must have items. Or consider that behavior is usually defined for thoughts,
feelings, and actions. By this conceptual definition, a person's positive attitude
towards exercise means that he or she has a positive opinion about exercise,
feels good about exercising, and actually exercises. Therefore, for good content
authenticity, measuring people’s attitudes towards exercise reflects these three
factors. Like facial authenticity, content authenticity is not usually
quantitatively assessed. Instead, it can be assessed by carefully examining the
measurement method as opposed to the conceptual definition of the structure.

Criterion Validity

Criterion validity The validity of a criterion is the degree to which people's


scores relate to other variables (called criteria) based on the measure they
estimate. For example, public scores on a new measure of exam anxiety should
be negatively correlated with their performance in an important school exam.
If it turns out that people's scores are negatively related to their test
performance, this is evidence that these scores actually indicate public test
anxiety. If it turns out that people scored equally on the test regardless of their
test anxiety score, it casts doubt on the authenticity of the measurement.

A criterion can be any variable, it has to do with the structure of the idea and
usually has a lot of them. For example, test anxiety should be negatively
correlated with test performance and course grades and should be positively
correlated with general anxiety during exams. Or imagine that a researcher is
developing a new measure to take physical risk. People’s scores on this
measure should be related to the number of speed tickets they receive and the
number of broken bones involved in “serious” activities such as snowboarding
and rock climbing. When measuring a standard at the same time in
construction, the validity of the standard is called simultaneous validity;
However, when a standard is measured at some point in the future (measured
after construction), it is called a predictive validity (because the score of the
measurement predicts the future result).

Other measures of the same structure may also have standards. For example,
the test hopes that the new actions of the test or concern are positively
correlated with the current established actions of the same structure. This is
called convergent validity.

Data should be collected using measurements to assess convergent validity.


Researchers John Cassioppo and Richard Petty devised their self-report Need
for Cognition Scale on how valuable and thought-provoking people were
(Cassiope and Petty, 1982) [1]. In a series of studies, they showed that people's
scores were positively correlated with their scores on the standard Academic
Achievement Test and that their scores were characterized by a level of dogma
(which indicates a tendency toward obedience). The measurement was
negatively correlated with their scores. Since its creation, the Cognition Scale
requirement has been used literally in hundreds of studies and has been found
to be associated with many other types, including the impact of advertising,
interest in politics, and judicial decisions. (Patty, Brenol, Larsh, & McCallin,
2009)

Discriminant Validity

Discriminant validity,on the other hand, is that scores based on measurement


are not theoretically related to measurements of different variables. For
example, self-esteem is a general attitude towards oneself that is very stable
over time. It’s not the same as the mood, no matter how good or bad it feels
right now. Therefore, people’s scores on the new measure of self-esteem
should not have much to do with their mood. If the new measure of self-
esteem is very much related to the measure of mood, then it can be argued
that the new measure does not actually measure self-esteem; Instead of
measuring the mood.

When they created the need for a cognition scale, Cassiopo and Petty also
provided evidence for the validity of discrimination by showing that public
scores are not correlated with some other variables. For example, they found
only a weak correlation between the need for people's knowledge and the
measure of their cognitive style - how analytically they think in terms of the
"big picture" by dividing ideas into smaller parts or aggregates. They found
that there was no correlation between people’s need for knowledge and their
testing anxiety measures and their tendency to respond socially appropriately.
All of these low correlations provide evidence that this measure represents a
conceptually distinct structure.

You might also like