Misbie

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

All resources:

- Material
- MisBIE desktop folder
- Prism

Graph 1, MisBIE workbook 1 (12 sec vs. 55 sec interval—what’s the effect?)

10/28/2020
In the future, when creating graphs:
- ALWAYS label axes
- Create a horizontal line which represents the actual time value or, better yet, make the
y-axis ratios of actual time value/time estimation
- Instead of finding the average of a data set, we typically find the median
● It’s more representative of the population because it takes every value into
account fairly, together with the frequency of those values (“common occurrence
value”)
● The mean is usually a bit misleading and skews the data since 1 extreme outlier
could draw the entire average in a particular direction
● So, for example, if the middle bar was a mean for the ​M:006 Round 2 graph​, that
would be misrepresentative—instead, it’s the median
● ANOVA/T-test?

Deliverable for next week:


Create an x-y graph with the 12 different tasks of one round as the x-axis and the ratios as the y
● Ratios = difference between actual absolute value and the estimate
● More accurate if you divide as opposed to subtracting because as the time increases the
size of the proportion increases (e.g. if the trial is 12 seconds the margin of error is
approximately 2 seconds but if it’s 60 seconds the margin is closer to 7, so divide and the
ratio stays the same no matter the amount of time)
There should be two separate graphs (one for each round), as well as one which combines the
two and therefore relates the inverse of round 2 to round 1
What’s the purpose of this graph?
To ascertain whether the participants’ accuracy improved throughout the tasks: was there a
learning curve?
Did using a strategy have an impact on the accuracy of the participants? Compare Mississippi vs.
1-1000 vs. fingers vs. counting in head vs. tapping

Why do some individuals have so much variance and others not at all?
Did some individuals appear not as motivated to participate?​ (There are outliers such as 180,
120, 90, etc. that could be the result of giving up and guessing)

Understanding the study:


The closer the value is to 1 (that’s a perfect ratio—12 seconds actual / 12 seconds estimate = 1),
the better the estimate
Round 1 premise​ - “tell me how much time has passed when I click this buzzer starting now”
Thus, the less time the individual says has passed, the “older” they are
Meaning, if the trial is 12 seconds and when the buzzer rings the person says 8 seconds have
passed, they experience time more slowly and are “older”
If the person says 16 seconds, they experience time more quickly are are “younger”
>1 = young

Round 2 premise​ - “tell me when 10 seconds have passed starting now”


This is​ REVERSE LOGIC
Thus, the more time the individual says has passed, the older they are
Meaning, if the trial is 12 seconds and the individual says “now” after 16, they experience time
more slowly and are “older”
If the person says “now” after 8 seconds, they experience time more quickly are are “younger”
<1=young

If you take the inverse of round 2, the values should now all also be >1 (if younger) and thus
correlated to round 1
● That’s what the inverse does—it allows us to relate round 2 to round 1 because now
they’re in the same playing field, the same terms

Average 2 inverse = same logic as average 1

Slower = time contraction (each sec is less time in R1 terms)


Faster = time expansion (each sec is more time ...)
On a box plot, the thick line represents either the mean or median* (typically median), and the
thinner lines on either side the 25th percentile and 75th percentile
*Could also calculate SEM (standard error of the mean)
Box plots are good b/c:
- They nicely portray distribution/whether there are outliers or the data is
skewed/variability/is the data symmetrical?/how tight is it?

AVERAGE ROUND 1 BY GROUP graph:


We see that those healthy are, on average, above 1 (they are younger), whereas those with 3243
below. Those with deletion have a similar distribution to the healthy participants, and the data
collected on participants with MELAS is somewhat inconclusive since there are only two data
points to reference. ​3423 is contracted (i.e. old) and healthy is expanded (i.e. young).

r2 = goodness of the line of best fit


In biology, with data collected from humans quite literally off the street (in vivo), a correlation of
0.5 is great.
AGE graph:
Healthy r^2 (to access, go to family and click simple linear regression) = 0.1977
Terrible!
We see that for the healthy participants, as age increases so too does the average ratio of time
estimation, which shouldn’t happen. In fact, it should be the obverse. As individuals get older,
the ratio should decrease and become less than 1 because their cells are aging and becoming
older.
● The MELAS correlation is 1.00—it’s perfect. That’s visible in the graph—when the
participant is 50 years old their ratio is approximately 0.8 and when 32 0.9.
● The deletion graph also isn’t bad—0.3009.
● The 3243 graph is 0.1047—there’s essentially no correlation.

AS PEOPLE GET OLDER, THEIR RATIO SHOULD DECREASE. INSTEAD, IT


INCREASES. WHY?
Time Estimation Data Exploration, Round Ratios (is there a learning curve throughout all the
trials?)

11/05/2020
Don’t include task 1 when analyzing the data = trial task (this was the first task of 24, so it makes
sense that the participants were nervous and did “poorly”—their results suggest that they all
experience much time contraction)

The smaller the ratio, the older the person: if the trial is 12 seconds, and the participant guesses
10, the ratio is 0.83333 and they’re old. If they guess 15, the ratio is 1.25 and the participant is
young.
The participants don’t improve over time, meaning there is no learning curve. However, they
stay consistent and probably adhere to their methods of counting.

For both rounds 1 and 2 (inverse), there are a couple outliers—about 4 in round 1 that are much
“younger” and 5 in round 2 that are “younger” and 1 that is much “older.” However, the
resounding majority of the participants remain in a particular margin throughout all 12 tasks of
each round.

For round 2, there’s one participant that starts the first round as the second “youngest” and then
only worsens throughout the remaining 11 tasks, reaching a maximum ratio of 2.27 on the tenth
task. This is a healthy individual, so perhaps they just lost patience throughout and started
experiencing time much more quickly.
Melissa: was there anything that caught your attention about this participant (Mi014)?

What was it about trial 4 (25 secs) of round 1 that really threw off Mi015? His ratio was 4.8.

Prism graphs, learning curve rounds 1 and 2 (same as previous Excel graphs)

How to make an XY plot graph of 12 trials on Prism:


1) Click “new data table” in left side table
2) Make sure “XY” is checked
3) Delete everything in the spreadsheet
4) Click on blue matrix with red arrow (under “Change”—lower left hand corner)
5) Click on “Table format: XY”
6) Select “Enter and plot a single y-value” for each point
7) Label “X” “Trials”
8) Write 1-12 in X column or command-shift-t (transpose) trials from Excel in the X column
9) Put in participants (e.g. Mi006) into the y row (groups) with command-shift-t
10) Copy all data from Excel and command-shift-t into X1
11) Rename title in left side table
12) A “Change graph type” blurb will appear—click the scatter plot with lines connecting
each point
13) Label the axes by double clicking on them
14) Make the x tic max 12 since there are 12 total trials—to do this, double click on either
axis, go into the X-axis tab, uncheck “automatically determine the range and interval,”
change min to be 1 and max 12, and change “major ticks interval” to be 1.
15) Do the same for y axis tics except make the range 0 to 5, since the ratio doesn’t exceed
4.8 for either rounds.
16) Change colors by clicking on the red, green, and blue venn diagram under “Change” and
selecting “colors”
17) Change the thickness of the axes (double click on either axis, go to “frame and origin,”
and select ¼ point on thickness of axes—we do this to draw attention toward the actual
graph)
18) If frame style is set to plain frame, hit no frame
19) Change thickness of points (double click on any point, go to “Appearance,” select
“change ALL data sets” under “Data Set,” change the border thickness and thickness to ¼
point, change shape to unfilled circle)
20) To unbolden the dots themselves, hit command A and then command B twice
21) Draw a dotted 1-point thickness line at 1 to illustrate the perfect ratio—hit the line under
“Draw” and make a decent estimate

Reference Melissa’s data -


What does it show? How does it represent the correlation between the data?
People do differently on different tasks.
Tasks - e.g. Q1: Q1 = 1.00 and Q1:Q3 = 0.36

Time tasks that are longer (60 seconds) have less room for error and thus there’s more
correlation.
Meaning, if a participant guesses 3 seconds off for a task that’s 12 seconds long, their ratio is
0.75, but if they guess 3 seconds off for a task that’s 60 seconds, it’s 0.95. In order for the margin
of error to be the same, the participant could guess 15 seconds off on the 60-second trial.

Deliverable for next week:

Create a bar graph, box plot, or ​violin plot​ where the x-axis is different trials as a quantity of
time (10 secs, 30 secs) and the y is the ratios (|estimate/actual|)
What are we trying to find out?:
Does a participant’s success depend on the length (in secs) of the trial? How does the length
affect their performance?

- Create one for each round and all the trials


- Use the inverse for round 2 again

Why a violin plot?


A violin plot is an enhanced, more accurate version of a box plot—box plot + KDE (kernel
density estimation)—and it is especially insightful when comparing two distributions of data.
While boxplots are not sensitive to outliers or morphs in the distribution, violin plots highlight
them. Boxplots work well with a normal distribution curve (a bell curve), but if the distribution is
stranger or more erratic, they aren’t as meaningful seeing as they will always adhere to the shape
of a box. A violin plot is like a density graph that’s mirrored vertically, so where the graph is
thicker, there’s a greater frequency and, similarly, where it’s thinner, there’s a lesser frequency.

In the graph to the right, the two


groups have practically the same
median, but the first one has a much
greater distribution. There are several
outliers pulling it upward and
increasing the mean, whereas the
second group is clustered in the same
region and there are very few outliers.
- A violin plot has a box plot inside with a median, interquartile range (25th and 75th
percentile markers), and whiskers which represent variability outside the interquartile
range
- Bimodal distribution: there a multiple modes, meaning certain data points occur multiple
times and that affects the frequency of the graph (if there are modes, then the graph
clusters in those regions)

● How would I overlay a swarm plot? Is that the same as putting all the data points on the
graph?
● What’s the difference between a regular and truncated violin plot?
● If I were to create a violin plot whose variable was the existence of mitochondrial disease
or age, how could I take into consideration the amount of people of each disease or age
bracket that participated? With the sample of data I have, the number of people who
participated in each category was not consistent and that, I would expect, sways the data.
● How do I create box plots inside the violin plots?

Observations (from graphs below):


● With the longer time intervals (50, 51, 55, 60 seconds), the participants do better since a
small error is less impactful than for a smaller time interval. Being 2 seconds off on a
12-second trial will result in a worse ratio than on a 55-second trial (0.83 vs. 0.96).
● The first trial (12 seconds) of the first round suggests that almost everyone experiences
time very slowly but this is most likely an exception to the rule since this was the first
trial and the participants weren’t yet accustomed to the task.
● For round 1, nearly all of the outliers exist above the 75th percentile so those that were
less consistent made jumpy decisions and became impatient, thus resulting in time
expansion. For round 2, the same is true (plus a couple outliers below the 25th percentile)
except that, in general, everyone was more consistent.
● From the graph of round 2, it seems as if the longer time intervals contained more
outliers. This was probably due to people losing track of time, giving up, guessing, losing
patience, etc.
● The violin plot is handy because although the median and quartiles are essentially the
same all throughout, there is much distribution depending on the time interval.
● What was it about the 25-sec round 1 trial that caused such a notable outlier?

11/15/2020
Does the length of the interval affect the participants’ performance? How so?
There’s a trend in both rounds that suggests more time contraction amongst the shorter time
intervals than the longer ones, but it’s not a significant trend.

CoV = coefficient of variation = standard deviation / average


Is the data consistent? Does it have a normal distribution (adhere to the shape of a bell curve)?

High percentage = bounces around a lot, low = consistent

Round 2 CoV:
60 < 30 < 10

There’s more room for error during a task that’s less time.

- Round 2:
● As opposed to having 5 10-second plots, 4 30-second plots, etc., create multiple y
values for every x

- Put points on each violin plot to better visualize the distribution.

Just like Melissa’s matrix, one way to test how each trial is correlated to the other would be to
perform a one-way Anova test.

A p-value ≤ 0.05 is significant, the smaller the p-value the stronger the evidence against the null
hypothesis
What is a null hypothesis?: it is mutually exclusive from an alternative hypothesis. Meaning, a
null hypothesis states that the results of a study are random, accidental, and aren’t caused by
something systematic. An alternative hypothesis is what we hope for—it proves that there’s a
relation or causality within our data.

Gaussian distribution = normal distribution


What we’d expect with our data. We hope it’s not random.

A non-parametric test doesn’t assume the data falls under a normal distribution.
Multiple comparisons: compare the mean of each trial to the mean of every other trial
individually
- Wizard wand—borrow the style of another graph and translate it into this one

As time ↑ , median ↑ and → 1 , just as we’d expect

Threshold = significance = 0.05

As time ↑ , |dif f erence between trials| ↑


E.g.​ variation between 10 sec trial and 60 sec trial is much greater than between 10 sec and 30
sec or 30 sec and 60 sec

As time ↑ , variation ↓

Deliverable for next week:


Create a diagram that creatively details how we can interpret time contraction (older) vs.
expansion (younger)
- Use illustrations to convey what each score/estimate of each task means
- Difference between rounds 1 and 2

Rough sketch:
Diagram questions:

* ​Is there a learning curve throughout each round?

*How does round 1 compare to round 2?

*How does the individual's counting method affect their performance?

*What's the distribution of each round? Is there considerable variance?

*How does age affect performance?

*How does the length of the trial affect the participants' performance?

*How does the presence of a mitochondrial disease affect performance?

What counting methods did the participants use?

- Counted in head

- Tapped

- 1 1 thousand

- Mississipi

- Used fingers

- Visualized a clock

Who are the participants?

- Between the ages of 20 and 52

- Healthy individuals and those with one of three mitochondrial diseases:

1. Deletion

2. 2342 Mutation

3. MELAS
I created this hopefully visually pleasing infographic which briefly summarizes what the study is
about and what the results of the study mean.

Our findings:

● There wasn’t a learning curve throughout each round which makes sense b/c we didn’t
inform the participants of how accurate they were after each round so there was no way
for them to judge their performance and improve as a result
● The participants stayed consistent, meaning they probably adhered to the same counting
methods
● Trial 1 of round 1 suggests that the participants experience time really slowly, but since
this was the first trial, it makes sense they did so “poorly”—they were nervous,
unaccustomed to the task, panicked, etc.
● There are a couple outliers in round 1 that appear very young, but still, the majority of the
participants remain in the same region: just below 1, meaning they, on average,
experience time more slowly
● Round 2 is much more consistent than 1 (could be b/c of how this round was structured: 3
sets of 10 secs, 30 secs, and 60 secs)
● For round 2, there’s one participant that starts the first round as the second “youngest”
and then only worsens throughout the remaining 11 tasks, reaching a maximum ratio of
2.27 on the tenth task. This is a healthy individual, so perhaps they just lost patience
throughout and started experiencing time much more quickly—uncontrollable factors like
this should be considered
● Time tasks that are longer (60 seconds) have less room for error and thus there’s more
correlation
● For round 2 (see violin plot), it seems as if the longer time intervals contained more
outliers (this was probably due to people losing track of time, giving up, guessing, losing
patience) but the median is closer to 1
● Melissa’s matrix: longer time trials correlate more to longer time trials (60 to 55) than to
shorter ones (60 to 12)
○ After a nonparametric test, we found that as time ↑ , median ↑ and → 1
○ As time ↑ , |dif f erence between trials| and variation ↑
● Age graph: there’s nearly no correlation, which suggests that age doesn’t play a
determining factor in how people perceive time (e.g. contrary to expectation, as age ↑ ,
ratio ↓
● Disease graphs: everyone’s a little older, especially the 3243 group
○ The healthy individuals have more variation and outliers
○ The MELAS group were fairly accurate, although there were only 2 participants
○ The deltion group was most similar to the healthy one—kind of all over the place
but still with a median close to 1
● An ANOVA test told us that our data are not coincidental
REMINDER:
Put in P-value for graph

You might also like