Professional Documents
Culture Documents
Psyc 315 - Lesson 2 - 11 - 09
Psyc 315 - Lesson 2 - 11 - 09
Frequency tables:
Histograms:
● The histogram plots the values of the variable between minimum and
maximum in the x-axis and the proportion of scores in the data set, having
each of these values in the y-axis
● Histograms vs Bar Graphs:
● Because there’s a natural ordering to the values of a quantitative
variable they must be placed in this natural order on the x-axis of the
histogram → the bar graph, on the other hand, which the order of value
on the x-axis is arbitrary
● There’s no space between the bars of a histogram as there is between the
bars of a bar graph → this convention was adopted to convey the fact
that the levels of the variable have a natural order (when the bars touch,
it conveys the continuity of adjacent value of the variable)
Example:
● The values of this variable are continuous → which means that they are
real numbers that may have infinitely many digits to the right of the
decimal place
● In this example, we have 60 scores and would like to have about 10
intervals
● The largest score we have in the example is 96.9 and the lowest score is
38.9 → range = 96.9 - 38.9 = 58
● When we divide the range by the approximate number of intervals we'd like
to have we find that we'd be aiming for an interval width of approximately
58/10 = 5.8 → because our interval width should be an intuitive number we
will have to choose one close to 5.8 --. for this example, 5 and 10 would be
good choices
Defining intervals:
● We aim to have 5 to 20 intervals - small number of intervals is appropriate
when we have a small number of scores and a large number of intervals is
appropriate when we have a large number of scores
● Interval width must be an intuitive integer value (integers like 5, 10, 20 and
100 are intuitive)
● Interval width depends on the number and range of scores in the dataset →
to choose an interval width we first determine the range, which is defined
by the maxim score minus the minimum score (range = max - min)
● Approximate width = range/ approximate number of intervals we'd like
to have → to yield an approximate interval width
● choose an intuitive number that is close to the approximate interval
width
● Lower score limits should be multiples of interval width
● E.g: if we chose an interval width of 10, then lower score limits should
be multiples of 10 (30,40,50,etc).
● Once we have chosen our interval width, we list our intervals from highest
to lowest, making sure that the real limits of the highest interval capture the
maximum score in our dataset and the real limits of the lowest interval
capture the minimum score in our dataset
● For our example, we’ve chosen a interval width of 10 and found that the
maximum score is 96.9 → therefore, the highest score limits are 90 - 99
| and because the minimum score is 38.9 the lowest score limits are 30 -
39
● Interval midpoint: is simply the point midway between the lowest and
upper real limits
● Once the real limits have been defined we can count the numbers scores
falling in each interval → these counts ar enlisted in the frequency column
(f)
● Cumulative f, p and P are computed the exact same way as discrete
quantitative variable
● When assigning scores to intervals, we face a small problems on what to do
with scores like 79.5 that fall simultaneously in the upper real limit of one
interval and the lower real limit of the next interval
● The short answer would be that in the real world of data collection this
happens very very rarely
● But the rule is: we will always put a score that falls into a real limit
boundary in the higher interval
2
4 6
● The image above plots our sixty final exams scores as six different
histograms
● The histograms correspond to group frequency distributions having
interval width of 1) 40 , 2) 20 , 3) 10 , 4)5 , 5)2 and 6)1
● The x-axis in each panel shows the interval midpoints → those are used
because they require less space than the real limits or the score limits
● The y-axis shows the proportion of scores falling in each interval
Video 4: Probability
● Subjective probability: subjective judgment about the likelihood of a
specific outcome
● Objective probability: numerical expressions of the likelihood of some
event occurring
● Frequent, or lung-run probability (e.g: outcomes, events, proportions
and sampling experiments)
Coin flipping:
● Heads and tails, the two sides of a coin, can be considered two value of a
qualitative variable → flipping a coin is a way of select one of these values
randomly
● Sampling experiment: the random selection of a value of the variable
● Outcome: the value of a variable obtained through random selection in a
sampling experiment
● Or, the value of a variable obtained through random selection in a
sampling experiment
● Success: a specified outcome of interest
● If we define one the outcomes as a success we can count the number
of successes in a given number of sampling experiments
● Then, we can calculate the proportion of successes in a given number
of sampling experiments by dividing the number of successes by the
number of sampling experiments
● E.g: p = 14/25 = 0.56
Rolling a die:
● By posing the question this way we defined the event as in the interval
80 - 89 → the p column on the table shows that the answer is 0.20
● Drawing a simple score from this distribution is a sampling experiment
→ if we repeat this sampling experiment infinitely many times with
replacement, then 20% of the times the score will fall in the interval 80 -
89
● The probability of a random score falling outside the interval 80 - 89
is 80% or t5he probability is 0.8 that a randomly chosen score will
fall outside the interval.
● This is the same as saying that the probability is 0.2 that a randomly
chosen score will fall in the interval 80 - 89
● We are connecting intervals in a frequency table with the events in a
sampling experiment
● When we state the probability that a randomly chosen score will fall in a
given interval, we mean the proportion of times a randomly chosen
score would fall in that interval in a infinite series of identical sampling
experiments
Probability distribution
● This frequency table for a qualitative variable and the corresponding bar
graph are → probability distributions
● If we repeatedly choose individuals at random from this
distribution with replacement, then the probability of drawing a
student whose reference is Clinical would be 0.29
● The frequency table for a discrete quantitative variable and a
corresponding histogram are also probability distributions
● If we repeatedly choose scores at random from this distribution,
with replacement, then the probability of drawing a score of 7
would be 0.40
● Each histogram is a probability distribution
● Each bar represents an interval, and the bar heights show the proportion
of individuals in each interval → in each panel, each bar shows the
probability that a score will fall in the corresponding interval with
repeated random sampling with replacement
● A: to denote any event → such as, coin coming up heads, die coming up 4,
etc.
● p(A): to express the probability of event A occurring
● The constraint that the probability of event A must be between 0 and 1
can be expressed as:
● 0 ≤ p(A) ≥ 1
● E.g: Coin flips → A: the outcome is “heads”, so p(A) = 0.5 → the
probability of any event must be between 0 and 1
● The six histograms on the left of this figure, depict the same distribution of
scores in a very large population
● each histogram has a very different interval width → as interval width
decreases the proportion of scores in each interval also decreases
● The bars in the six panels of the figure to the right show the probability
density associated with each interval → this is obtained by dividing the
proportion of scores in each integral by the interval width
● As interval width decreases toward 0, probability density converges to a
single fixed value, so probability density is defined at a point → this
yields a probability density function
● The smooth line in each panel of the figure to the right, shows this
probability density function
EXCEL VIDEOS:
3) We can next compete the proportion of scores having each value as we did
for qualitative
● we sum the number of scores having each value of the variable; then
divide each raw frequency by this sum to produce a proportion
● Column “p”: =(a value from column f/the sum[make the sum
the absolute reference) → then drag to the other cells in the
column
● Start using the MIN and Max functions to determine the range
● The interval width will be 5
● Because the maximum score in the data set is 96.9 and the interval width is
5, the score limits of the top interval are 95 to 99
● therefore, we can enter 96 as the lower score (Lower SL column) limit
of the top interval → the upper score limit will be the lower score limit
+ the interval width - 1 (=(lower score limit)+(interval width)-1)
● make the interval width an absolute reference (command-t)
● We can now determine the score limits of the second highest interval by
subtracting the interval width rom the lower and upper limits of the
interval above
● =(the number in the Lower SL column that is above)-(interval width and
make it an absolute reference) → drag the cell to the right and determine
the upper score limit
● drag everything down and calculate the score limits of the remaining
levels
● Now, the main thing that remains is to compute the raw frequency counts
(COUNTIF)
● The first criterion is that the score must be greater than or equal to the
lower real limit of the corresponding interval
● =COUNTIF(select all the range and make it absolute),”>=”&(highest
Lower RL score),(re enter the biggest range and make it absolute),
“<”&(highest Upper RL) → drag it down
● Then, we sum the number of scores having each value of the variable,
then divide each raw frequency by this sum to produce a proportion
● =(highest f value)/(sum → make it absolute)\
● We start putting 1 in the highest cell in the column “P” because the
proportion of scores at or below the highest interval is 1
● in the next cell, it will be the proportion below the interval above,
minus the proportion in the interval above
● =(the highest value in the column “P”)-(the highest value in the
“p” column → then drag it all down