Evaluation of Studies Rasha

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

10- Mark Essay question

Paper 1
Core Studies
Evaluation of the studies
EVALUATION FOR PILIAVIN

One strength of the Piliavin study is that it was high in internal validity as the participants were unaware
that they were a part of a field experiment. The experiment took place on the A and D trains of the 8th
avenue IND and the stooges were covert, so the participants did not know their behaviours were being
observed. This means that their behaviour would be more natural and free of demand characteristics.
This improves the internal validity as the researchers were measuring actual responses, rather than
simply acting due to social desirability bias or acting in a way that the researchers were looking for -
demand characteristics.

On the other hand, it could be argued that generalisability is low, despite the large sample size of
around 4,450 participants, because the participants were only from one specific area of New York near
Harlem, and travelled on the subway at the same time of day (11am-3pm) on weekdays. This might mean
that the participants had similar occupations or routines because they travelled at that time of day, and
they may have had similar experiences living in New York, meaning that their altruism towards each other
may differ from those living in other areas. Therefore the findings may not be able to be generalised to
other populations in the world as their behaviours and upbringings may be different to those in the study.

However, another strength of the Piliavin et al. study is the use of both qualitative and quantitative
data. The 2 observers collected quantitative data (for example, the total number of people in the critical
area or the race and gender of the first helper) which allowed numerical comparisons to be made between
the conditions and is more objective and less open to bias. However, a strength of this study is that the
observers also noted down any remarks made by nearby passengers, for example, women were
recorded saying ‘‘It’s for men to help him’ or ‘I wish I could help him - I’m not strong enough’. This is a
strength because it allows conclusions to be made as to why women were less likely to help than men
were, and it may have helped in the theory behind the cost-reward matrix that was used to explain
whether or not a person would help, depending on whether the rewards outweigh the risks (cost).
Therefore, the use of both qualitative and quantitative data in this study is a strength because it allows
for comparisons to be made between groups, and gives reason behind those differences in results.

On the other hand, another weakness of the Piliavin et al. study is that it is low in ethics. Participants
never gave informed consent to participate in the study and neither were they debriefed after the study
was completed. This is highly unethical, especially since the procedure may have been distressing for
some people as participants are not told that the victim is a stooge, and they may therefore end up leaving
the study wondering if the victim was okay, especially if they received no help. Without debriefing,
participants may have not only been left to be psychologically harmed, but they were not told that their
data would be used in the study or had the right to withdraw from the experiment. The train journey was
7.5 minutes long, meaning that the victim could have been collapsed for at least 6 minutes if he did not
receive help, which may have caused many people to be distressed, as perhaps noted in the comments
noted by the observers (‘You feel so bad that you don’t know what to do’). Therefore, this study was
highly unethical because participants did not give informed consent, nor were they given the right to
withdraw or were debriefed after the study.
EVALUATION FOR YAMAMOTO

One strength of the study by Yamamoto et al. is that reliability was high because procedures were
standardised. For example, they used the exact same 7 tools for each experiment which were the
straw, stick, chain. belt, rope, hose and brush. This ensures that the experiment can be easily
replicated by other psychologists who should find the same results when following the highly detailed
procedure. It can be seen that the procedure is highly detailed as precise measurements were given,
for example, the exact dimensions of the experimental booths including the fact that the slot between
the two booths was approximately 1m off the ground.

On the other hand, one weakness of this study is that generalisability was quite low because five
chimpanzees were used from the same institute. This is not a lot and since only one type of primate
was used means that the results may not be generalisable to other primates. In addition, since they all
came from the same institute and were often used for these types of behavioural testing, they are not
wild chimpanzees and may be conditioned to performing altruistically through other behavioural
experiments. Another reason why generalisability is quite low is that they used mother-child pairs,
therefore the altruistic behaviour may be maternal/family instinct involved in helping the other
chimpanzee, and therefore you may not be testing pure altruism. However, we can presume that we
can generalise to the whole population as the theory of mind is seen as a biological or evolutionary
factor, and therefore, the small sample size is not that much of a problem.

Another weakness of the study by Yamamoto et al. is that the validity is quite low as a repeated
measures design was used which could lead to order effects affecting the results, despite the use of
counterbalancing. In addition, since it was a repeated measures design, individual differences may
have played a key role in determining the results which may not have been the case with an
independent groups design with a larger sample size. For example, Pal gave the right tool 100% of the
time in the second condition which may not be the case for most chimpanzees. Ecological validity is
also low because it was conducted in an artificial setting which would not be the chimpanzee’s normal
environment.

On the other hand, another strength of this study is that it is high in ethics because low numbers of
the chimpanzees were used. Only 6 chimpanzees participated in the study, therefore it did not cause
unnecessary stress to too many animals. Also, they were housed in the institute that they were being
tested in and the handlers were very professional which minimised the amount of stress that the
chimpanzees were under. In addition, since the chimpanzees were used to participating in other
studies, their stress levels would not be as high as, for example, a chimpanzee from the wild as the
environment would be very unfamiliar to them. Also, there was no aversive stimuli used and therefore,
ethics is higher.
EVALUATION FOR MILGRAM

One strength of Milgram is that there is high internal validity due to high levels of controls that were
put in place. For example, Milgram predetermined the responses of the learner (e.g. pounding on the
wall and then no response after 315V). This ensured that the dependent variable (the response of the
participants) was not affected as there were fewer extraneous variables.
On the other hand, Milgram’s study has low ecological validity as the participants were not in a
natural setting due to the method of a lab experiment. Most people would not normally go to Yale
University and be required to shock someone by an experimenter, and therefore the results may not be
applicable to real life situations.

In addition, Milgram’s experiment had low generalisability because the sample used was 40 males
who lived in the New Haven area. Since they were all males, generalisability would be low because you
cannot generalise the findings to females. In addition, since all the participants lived in the New Haven
area, they may have had similar experiences as they would have had similar experiences. This means
that the results cannot be generalised to other populations as they may have different behaviours.

On the other hand, it can also be argued that generalisability is higher because Milgram did further
studies to investigate how females behave in this situation and how other populations globally behave
too. He found that they behave in similar ways to the participants in the pilot study and therefore, it
could be argued that generalizability is higher because of that fact.. In addition, Milgram used
participants from different professions which increases generalisability because the data will not be
distorted by one profession that is perhaps more submissive than another.
Canli et al Evaluation

One strength of Canli et al is that it was high in internal validity. For example, high levels of control (e.g.
the length of time that each scene was presented and the length of time between each scene)
prevented extraneous variables affecting the DV. This increases our ability to infer cause and effect
(that the valence of the images affected the amygdala which enhanced memory of those scenes. In
addition, internal validity is high because the randomisation of the images reduces order effects as it
minimises the risk of emotional experience and intensity affecting the ratings.

On the other hand, generalisability is low because the sample used was particularly small, consisting
of just 10 females who were all right-handed. Therefore, it is unknown whether or not we can generalise
the findings of this study to men or left-handed people. Also, the sub-group ‘women’ were not
represented in appropriate proportions due to the fact that it was a volunteer sample and therefore, all
of the volunteers may have had similar upbringings and experiences. Therefore, we cannot even
conclude that all women experience emotions in the same way.

However, another strength of the Canli et al. study is that it was high in reliability. For example,
participants experienced the exact same procedures due to the repeated measures design of the study
(e.g. same valence scores and length of times that each image was viewed for). This means that we
can replicate this study more easily and test for reliability. Also, the fact that a repeated measures
design used means that the results are unlikely to be down to individual differences, which Canli et al’s
previous study may have suffered from as they used an independent measures design prior.

On the other hand, another weakness of the Canli et al study is that it is low in external validity. For
example, participants had to lie still within the fMRI scanner while the measurements were being taken
of the brain. This could have affected the emotional impact of the scene.

Furthermore, the absence of positive emotional stimuli reduces the generalisability of this research in
terms of its practical application. Since positive emotional stimuli were not used, we cannot be sure that
it is only negative highly-arousing emotional stimuli that increase your likelihood to remember
something, and therefore, more research needs to be done to investigate whether or not this is the
case.
EVALUATION Dement and Kleitman

One strength of the study by Dement and Kleitman is the use of quantitative data. Dement and
Kleitman collected both qualitative and quantitative data, but the use of quantitative data is a strength
as it allowed them to draw correlations between two factors, for example, they found that dream
duration was positively correlated with how long the participants estimated the dream length to be
(ranging from r=0.4 to r=0.7). The use of qualitative data also allows us to check the consistency of
results, for example, 79% of participants dreamed in REM, and only 7% dreamed in NREM, allowing us
to draw the conclusion that dreams are much more likely to be recalled for all participants in REM sleep
than NREM sleep.

On the other hand, one weakness of this study is that it used self-reports to collect quantitative data
which is much more subjective and open to researcher bias due to the nature of the self-reports being
more subjective. For example, in order to test the second research question, participants had to
describe the contents of their dreams, collecting qualitative data, however, this data may have been
interpreted differently by the researcher in order to fit in with the pattern that they are looking for.
Therefore, this method of data collection (self-reports) is more open to bias and therefore the results
may not be as valid.

However, another strength of this study is that it is high in internal validity due to the many
standardised procedures that were put in place. For example, participants were told to come to the
sleep lab just before their normal bedtime and not to drink any alcohol or caffeinated beverages on the
day of the experiment. In addition, the participants had the wires of the EEG gathered in a ponytail at
the back of their head to allow for them to be more comfortable and have less restricted movement.
This makes the study high in internal validity because the participants' sleep would therefore not be
affected by any extraneous variables, like sleep patterns being disturbed by caffeine or
uncomfortableness, which could have affected the results of the experiment.

Nevertheless, another weakness of this study is that it lacks external validity because the participants
had to sleep in a sleep lab which would not be their normal environment to sleep in. Participants may
feel more uncomfortable sleeping in a sleep lab than in their homes, especially since they are being
observed and their data recorded, so therefore the results may differ from if the study was completed in
the natural environment for participants. This might also mean that the results of this study are not
generalisable to real life because it took place in such a highly controlled environment, which does not
reflect the real world.
Evaluation of Schachter and Singer

One strength of the study by Schachter and Singer is that it used standardised procedures which
increases the internal validity of the study. One standardised procedure used was that the stooge in both
conditions had to follow a script in order for the interactions to be the same for all participants. For
example, in the angry condition, the stooge would flick through the questionnaire and exclaim ‘Boy, this
is a long one’ before even filling it out. The stooge would also have to complete the questionnaire at the
same pace of the participant so that the participant could relate with his remarks. The questions on the
questionnaire were also standardised, and in the anger condition, they increased in insultingness as the
questionnaire went on. For example, the final question was ‘how many men (other than your father) has
your mother had extramarital relationships?’ with the options being 4 and under, 5-9, 10 and over. These
standardised procedures increased internal validity as the content of the questions and the way that the
stooge acted was the same for all participants in that condition.

On the other hand, one weakness of this study is the use of a self-report. Participants knew that their
answers would be seen by the researcher, and therefore it was reported that some were reluctant to
record the true extent of their emotions, especially in the anger condition, because they did not want to
lose their course credit. This is an example of demand characteristics because participants wanted to
please the researchers with the answers that they thought the aim of the experiment was in order to gain
their course credit and not offend them by reporting that they were angry or frustrated. This lowers the
internal validity of the study because the results may not be a true representation of the emotional arousal
of the participants.

However, another strength of this study is that it used an independent measures design, meaning that
order effects would not affect the participant’s behaviour. Schachter and Singer used an independent
groups design, as it would have been impossible for participants to complete all conditions. Participants
were randomly assigned to one of 7 conditions (for example, EpiInf in the Euphoric condition), and did
not perform in any other condition. This is a strength because participants would not have become bored
of the experimental task, meaning that they may not experience such intense emotions as they would
have if they were experiencing the condition for the first time. Therefore, internal validity is increased
because participants all have a baseline of emotions because none of them had experienced any of the
conditions beforehand.

On the other hand, another weakness of this study is that it is low in ethics. Participants were deceived
as they were told that the aim of the experiment was to investigate the effects of a vitamin compound of
Suproxin on vision, and participants were told that they were being injected with Suproxin. In fact they
were actually being injected with Epinephrine (which they had not consented to), and the real aim was to
test the two-factor theory of emotion (how they will label the reason behind their state of physiological
arousal). In addition to this deception, participants in the EpiMis conditions were also deceived about the
effects of the drug that they were being injected with. These deceptions are highly unethical and since
participants were deceived, they could not give informed consent. Therefore, this study has very low
ethics.
Laney et al Evaluation Essay

One strength of the study by Laney et al. is the experimental design used. Laney et al. used a repeated
measures design which increases the internal validity as order effects are reduced. Laney et al. also
randomly allocated participants to each group, increasing internal validity because the individual
differences between participants are likely to balance out between the two groups, and also the groups
are not affected by researcher bias. If a repeated measures design was used, order effects such as
practice effects may have affected the results, and it would not have worked because the participants
would have figured out the aim of the study, then leaving the results open to demand characteristics.
Therefore, the independent measures design was a strength because it eliminated order effects and any
bias that could have been caused by participants figuring out the real aim of the study.

On the other hand, one weakness of this study is that it used self-reports. Self-reports are often subjective
and open to bias which decreases the internal validity of the study. For example, participants may have
been aware of the aims of the experiment, and therefore their responses would be biased by demand
characteristics - they may alter their answers to fit what the researchers are looking for. In addition,
participants may be embarrassed of their usual eating habits, and therefore may change their answers
to appear healthier. This means that social desirability could affect the results, and therefore the
implantation of the positive false memory of liking asparagus is not the only variable that will affect the
responses. Therefore, self-reports lower internal validity because they are open to bias.

Another strength of the Laney et al. study however, is that it is high in reliability because of the
standardised measures used. One standardised measure used was that all of the questionnaires were
standardised as they used the same questions in the same order. For example, in the Food History
Inventory, the participants were asked to rate each statement on a scale of 1 to 8, 1 = definitely did not
happen and 8 = definitely did happen, both before the age of 10. The scale used was kept the same for
each of the 24 food items, and the critical item (asparagus) was always kept in the 16th position. These
standardised procedures increase the reliability of Laney et al.’s study as the procedure was kept the
same for all of the participants. Laney et al. even tested the reliability of the study by replicating the study
at a different university, yielding similar results, and therefore showing that this study is highly reliable

However, another weakness of this study is that it lacked ecological validity and mundane realism due to
it being performed in a lab and that the participants did not eat anything throughout the experiment - they
only recorded their feelings towards asparagus via questionnaires. Laney et al. did attempt to combat
this through the use of the Restaurant Questionnaire made to look like a menu, however, even with this
questionnaire, mundane realism and ecological validity is very low because the situation did not represent
what participants would normally experience (for example, going out and physically buying groceries).
This means that the findings may not generalise to real-life eating behaviours, as pen-and-paper tasks
do not accurately represent real life.
EVALUATION FOR BANDURA ET AL.

One weakness of the Bandura et al. study is that it is low in generalisability. Though 72 children were
used in total, only 6 children performed in each condition, meaning that individual differences may have
affected the results, despite the matched pairs design. In addition, all of the children were from the same
nursery in Stanford and therefore would have had similar upbringings and experiences in terms of
exposure to aggressive behaviour. Also, due to the fact that all participants came from the same nursery,
they are likely to be of the same ethnicity, therefore, due to the ethnocentric sample, the results may not
be generalisable to other cultures with different teachings and levels of aggression.
Psychologists have also argued that Bandura’s study has a number of ethical issues. It is unclear as to
whether or not the children or the parents gave consent for their children to be used in the study. In the
original document, Bandura thanks the director and head teacher of Stanford university nursery school,
however, there is no mention of any parental consent given in the study which is concerning when
speaking about ethics. In addition, children were not given the right to withdraw or any informed consent
about what the study entailed, and given the nature of the experiment, children may have wanted to
withdraw but couldn’t. In fact, in the original paper, Bandura states that ‘It was necessary for the
experimenter to remain in the room during the experimental session; otherwise a number of the children
(...) would leave before the termination of the session.’ This shows that children may have wanted to
withdraw from the study but due to the experimenter preventing them from doing so, they could not. Due
to the nature of the aggression that the children were exposed to, ethics are further lowered as the
children are not protected from psychological damage done through watching the aggressive model beat
up the bobo doll. It is also not surprising that some of the children went on to have increased levels of
aggression after the study and these long term effects have been said to have come from what some
researchers call ‘aggressive training’. Therefore, the ethics of this study were very low, however, due to
the lack of ethical guidelines at the time, it is not so surprising that there is little mention of ethics in the
study itself.

However, Bandura used a lab experiment which is a strength as it controlled extraneous variables which
may have affected the dependent variable. For example, Bandura used the same toys in each room that
were placed in the same positions and the actions of the models were standardised. This means that the
experiment had high internal validity and we can deduce a cause-and-effect relationship between the
independent variable (the condition of the model i.e aggressive, non-aggressive, no model) and the
behaviour of the children. In addition, though a lab experiment would usually have low ecological validity,
Bandura’s experiment has higher ecological validity because the children were used to playing with toys
and the whole experiment was described to the children as a game - therefore, they would not suspect
that they were in an experiment and they would just be performing their normal behaviours which was
playing with toys.

Another strength of the Bandura et al. study is that it has a lot of application in the real world. Bandura et
al. found that children are highly susceptible to learning from observation, and the findings supported the
social learning theory. This can be used to teach parents that they should not swear or act aggressively
in front of their children, as their children may then be likely to imitate that behaviour. Instead, they should
display positive and helpful behaviour, so that their children may imitate that behaviour and become less
aggressive as a result. This study may also be applicable to violent video games and movies, as children
watching these violent actions may become more violent as a result due to social learning theory, though
more research would need to be done in this area as video games and movies may not necessarily have
the same effect as real-life modelling.
EVALUATION of Saveedra and Silverman

One weakness of Saavedra and Silverman is that it was a case study, meaning that
generalisability is very low. The only participant in this study was a 9-year-old Hispanic who had an
unusual phobia of buttons. This means that the study is low in generalisability because the sample is
not representative of the target population (anyone with a phobia). This means that the findings may not
be generalisable to people with more extreme phobias, people who have had their phobias for a longer
period of time than 5 years, and adults.

Another weakness of this study is that the data was mainly based on self-reports. Despite having
one quantitative measure of progress other than self-reports (number of buttons manipulated in the
exposure therapy), most of the data gathered was the ratings of fear and disgust given by the boy on the
‘Feelings Thermometer’. This decreases the validity of the study because the boy may have displayed
demand characteristics as he understood what the researchers would have been investigating. This is
even more problematic because the boy may have developed a relationship with the researchers over
the course of his treatment, and he may have therefore rated his fear/disgust on the ‘Feelings
Thermometer’ as the researcher wanted. Therefore, the study lacks validity as the results may have been
manufactured by the boy in order to support what the researchers were looking for.

On the other hand, one strength of the Saavedra and Silverman study is that it was high in
ecological validity. The 9-year-old boy in this study had to hug his mum whilst she was wearing buttons
as part of the in vivo exposure therapy. This means that the study has high ecological validity as the boy
performed tasks that were familiar to him, and that he would have completed outside of the Psychiatrist’s
office.

Another strength of this study is that ethics were very high. Both the mother and the boy provided
informed consent to participate in the assessment and intervention procedures. This is highly ethical as
both the mother and the boy were made fully aware of the procedures that the boy would need to undergo,
which is especially important as the therapies would have been very distressing for the boy. Despite the
distress that the therapies caused, this study is still highly ethical as it aimed to decrease the boy’s phobia
of buttons in the long term, so even though the boy was exposed to distressing stimuli in the short term,
the therapies allow him to overcome his fear and help him to return to school in the long term, giving him
a better quality of life overall. In addition, the boy's identity was never revealed, and therefore
confidentiality is preserved in this study. Therefore, this study is highly ethical as informed consent was
given by both the boy and the mother, the study benefited the boy long term (despite causing him some
distress) and the confidentiality of the boy and his mother were maintained.
EVALUATION FOR PEPPERBERG

One strength of Pepperberg’s research is that it has high validity. For example, the trainer who tested
Alex had not been working with him during training. Instead, the researcher who trained Alex stood in the
corner of the room with her back turned to the objects being tested and interpreted what Alex’s response
was as the person who tested Alex may not have understood some of his responses. As a result,
researcher bias was limited as the tester could not be criticised for ‘cueing’ Alex to respond in a particular
way. In addition, a student was asked to choose the question order and materials used in the study which
again removes any researcher bias.

However, one weakness of Pepperberg’ study is that it lacks generalisability. For example, the study
was a case study of one African grey parrot called Alex who had undergone previous cognitive testing
and had been kept in captivity for at least 10 years before the study. Therefore, it is difficult to generalise
his behaviour to other parrots that are wild as they may display different behaviours.

Another weakness of Pepperberg’s study is that it is low in ethics. To some extent, psychologists argue
that Alex suffered with boredom as a result of repetitive testing over a 2 year period. Furthermore, Alex
was confined to a wire cage during the night (~62X62X73cm) and was observed to be plucking his own
feathers when he was bored. Because of this, and the fact that he is in a situation that is foreign to his
natural environment, makes ethics in this study low.

In contrast, one strength of this study is that Pepperberg used quantitative data when collecting the
‘same/different’ question responses. This is a strength as it allows Pepperberg to make an objective
analysis of whether Alex could comprehend abstract concepts. Furthermore, the use of objective data
allowed the researchers to make comparisons between novel and familiar objects. Therefore allowing
the researchers to establish whether Alex could use the rules of same/different beyond the training
materials.

Finally, this study has a lot of real-life applications beyond this study. Despite not necessarily being
generalisable to other parrots, we can use the training methods of operant conditioning, continuous
reinforcement and social learning to try and shape the behaviours of other animals, in zoos for example.
Zoo keepers can use observations and imitations to introduce new animals to groups more easily by
encouraging role models to show the new member what behaviour is appropriate.

You might also like