Dataexploration

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

KateStankiewicz

period6
9/18/15

Question
:
Howmanytimeshaveyouhitthewhip,oratleasttried,inthelastmonth?

Conclusion
:
Thisdatarepresentshowmanytimesahighschoolstatisticsstudenthitthewhipin
August.Thissampleisintendedtorepresenthighschoolstudents,howeverthesampleisonly
statisticsstudentsbecausethatswhotookthesurvey.Theunitsarejusteachtimeapersondid
thedancemovebecausethatistheeasiestwaytoobservehowpopularthemoveisamongthe
students.
ThisdatawascollectedthroughaGoogleSurveythatwaspostedinaLASAStatistics
Facebookgroup.Iwantedtocollectthisdatabecausealotofmyfriendsalwaystrytodoitand
itsapopulardancerightnow,evenjustasajoke.Personally,Ihaveonlytriedtodoitacouple
times,butIwantedtoseehowmanytimessomepeoplewouldadmittheytriedtohitthewhip.
42peopleansweredthesurvey,givingkindofasmallersamplesizethanIdhopedfor.
Becauseitwassosmall,theresultswereveryweird.Noneofthecenterorspreadtypeswillgive
ameaningfulanalysisbecausethemajorityofthestudentswhippedbetween0and100timesin
thelastmonth,buttherewere7thatwhippedmuchmorethanthat.Thesestudentsthatdanced
morecouldbeconsideredoutliers,howevertheyallwerentoutlierswhencomparedtoeach
other.Theminimumofthedatais0,thefirstquartileis1,themedianis18,thethirdquartileis
70,andthemaximumis500.Thismakestherange500,whichisreallylarge,buttheIQR,69,is
muchsmaller.TheoutlierscanbefoundbymultiplyingtheIQRby1.5,whichmakesanything
greaterthan173.5anoutlier.Themeanisabout84,whichisalothigherthanthemedian.Itsa
lessaccuratecenterofthedatabecausethedataisskewedright.Thestandarddeviationisalmost
150,whichmeansthatazscoreof1wouldbelessthan0whichisntpossible,sothisnumberis
prettymuchuseless.Thevarianceisstandarddeviationsquared,meaningthatits22,500.
Aftereliminatingtheoutliers,thosewhohitthewhipmorethan100times,whichwas
alsoeveryonewhowhippedlessthan173.5times,thedatalookedmoreevenlydistributed,but
stillverypositivelyskewed.ForthisnewdatasetwhichInamedfew,theminimumisstill0,
thefirstquartileisstill1,butthemedianisnow9,thethirdquartileisnow29,andthemaximum
is100.Therangeistherefore100,andtheIQRis28.Themeanandstandarddeviationofthe
resultsarestillnotthebestwaytofindthecenterandspread,buttheyareabout20.5and26.66
respectively.Azscorelessthanabout.77isntpossible.Thevarianceisnowsignificantly
smaller,710.76.Tofindallofthesemeasurements,Ipluggedthedatasetintodifferentformulas
inRStudio.YoucanfindwhatItypedintothescriptwindowbelowintheWorksection.
Afteradding100toeachofthenumbersinmydata,theminimumbecame100andthe
maximumbecame200obviously.Thefirstquartileis101,themedianis109,andthethird

quartileis129.Themeanbecameabout120.5,butthestandarddeviationstayedapproximately
26.66.Themeanandmedianbothincreasedby100,whilethestandarddeviationstayedthe
samebecausethedifferencesbetweenthenumbersinthedataremainedconstant.
Ifthedataisincreasedby50%insteadofadding100toeachnumber,thehistogram
continuestobepositivelyskewed,butspreadoutoveramuchlargerrange.Theminimumstays
0because50%of0sonothingisaddedtoit.Ontheotherhand,thefirstquartileincreasesto1.5,
themedianincreasedto13.5,thethirdquartileincreasesto43.5,andthemaximumbecomes
150.Thismakestherangealsogrowby50%,makingit150.TheIQRchangesto42.Again
becausethedataisntrelativelynormal,meanandstandarddeviationarentthebestwaystofind
thecenterandspread.However,themeanbecomes30.8andthestandarddeviationbecomes
almost40.Mostoftheseareapproximately50%increasesofthemeasuresoftheoriginaldata.
Theoreticallyifthedatahadanormaldistribution,approximately42.6%ofthedatawas
5unitsabovethemean.Only7.4%ofthedatawasbetween3unitsunderthemeanand2units
over.Tobeinthetop10%ofmostwhipshit,someonewouldhavehadtohitthewhipmorethan
34times.
Inconclusion,thecenterofthewhipdatais18,meaningthatmosthighschoolstudents
whippedaroundthatmanytimesinamonth.Ibelievethatthisisbecauseofthepopularsong,
WatchMebySilento.Whenstudentshearthissongtheyhavetowhipatleast5times,soifthey
listentothesongabout3times,theyllhavewhippedalmost18times.However,therewere
manystudentsthatwhippedover100timesandalmostasmanythatdidntwhipatall.

Work
:

>hist(Whip$whip,main="Whip",xlab="NumberofWhipsHit",breaks=10)
>few<Whip[Whip$whip<=100,]
>many<Whip[Whip$whip>100,]
>fivenum(Whip$whip)
[1]011870500
>mean(Whip$whip)
[1]84.28571
>sd(Whip$whip)
[1]149.4237
>69*1.5
[1]103.5
>103.5+70
[1]173.5
>hist(few,main="FewWhips",xlab="NumberofWhipsHit")
>fivenum(few)
[1]01929100
>mean(few)
[1]20.54286
>sd(few)

[1]26.65945
>boxplot(few,horizontal=TRUE,main="FewWhips",xlab="NumberofWhipsHit")
>stem(few)

Thedecimalpointis1digit(s)totherightofthe|

0|0000001111122356890088
2|005805
4|000
6|05
8|0
10|0

>transform(few+100)
X_data
1106
2100
3110
4180
5100
6140
7101
8102
9170
10105
11101
12120
13125
14101
15100
16101
17102
18100
19150
20150
21200
22103
23130
24109
25100
26118
27128
28110
29118
30100
31101
32135

33120
34175
35108
>newfew<transform(few,more=few+100)
>hist(newfew$more,main="FewWhips+100",xlab="NumberofWhipsHit")
>fivenum(newfew$more)
[1]100101109129200
>mean(newfew$more)
[1]120.5429
>sd(newfew$more)
[1]26.65945
>boxplot(newfew$more,horizontal=TRUE,main="FewWhips+100",xlab="Numberof
WhipsHit")
>stem(newfew$more)

Thedecimalpointis1digit(s)totherightofthe|

10|0000001111122356890088
12|005805
14|000
16|05
18|0
20|0

>transform(few+few*.5)
X_data
19.0
20.0
315.0
4120.0
50.0
660.0
71.5
83.0
9105.0
107.5
111.5
1230.0
1337.5
141.5
150.0
161.5
173.0
180.0
1975.0
2075.0
21150.0
224.5

2345.0
2413.5
250.0
2627.0
2742.0
2815.0
2927.0
300.0
311.5
3252.5
3330.0
34112.5
3512.0
>alot<transform(few,alot=few+few*.5)
>hist(alot$alot,main="FewWhipsIncreased50%",xlab="NumberofWhipsHit")
>fivenum(alot$alot)
[1]0.01.513.543.5150.0
>mean(alot$alot)
[1]30.81429
>sd(alot$alot)
[1]39.98917
>boxplot(alot$alot,horizontal=TRUE,main="FewWhipsIncreased50%",
xlab="NumberofWhipsHit")
>stem(alot$alot)

Thedecimalpointis1digit(s)totherightofthe|

0|00000022222335892455
2|77008
4|253
6|055
8|
10|53
12|0
14|0

Graphs
:

You might also like