Professional Documents
Culture Documents
Using Excel To Do Precision Journalism
Using Excel To Do Precision Journalism
SearchDataDrivenJournalism
About
Events
Featured Projects
Resources
Mailinglist
Course
24/4/2013
UsingExceltoDoPrecisionJournalism
Like
Tweet
AtutorialbySteveDoig,journalismprofessoratASU'sCronkiteSchoolandPulitzerwinningdata
journalist,basedonhisworkshop,ExcelforJournalists.TheworkshopispartoftheSchoolofData
Journalism2013attheInternationalJournalismFestival.
Didyoucreateorknowofanusefulresourcefor
datajournalismthatyouthinkweshould
feature?Sendusanemailat
info@datadrivenjournalism.net
Upcoming Events
15/12/201516/12/2015
CodeforSouthAfricaDataJournalism
School(SouthAfrica)
CodeforSouthAfricaislaunchingAfricasfirst
datajournalismschoolinFebruary2016.
11/1/20167/3/2016
McGrawHillFinancialDataJournalism
(Online)
WiththesupportofMcGrawHillFinancial,the
InternationalCenterforJournalists(ICFJ)will
offertwoonlinecoursesforreportersonein
EnglishandoneinSpanishfocusingonusing
datatocoverentrepreneurshipandfinancial
news.
19/1/201621/9/2015
SteveDoigattheSchoolofDataJournalisminPerugia.ImagebyLucyChambers
MicrosoftExcelisapowerfultoolthatwillhandlemosttasksthatareusefulforajournalistwho
needstoanalyzedatatodiscoverinterestingpatterns.Thesetasksinclude:
Sorting
Filtering
Usingmathandtextfunctions
Pivottables
WebScrapingforJournalists(London,UK)
Scrapinggettingacomputertocapture
informationfromonlinesourcesisoneofthe
mostpowerfultechniquesfordatasavvy
journalistswhowanttogettothestoryfirst,or
findexclusivesthatnooneelsehasspotted.
PaulBradshawwillshowyouhowtoscrape
contentfromthewebandfindstoriesthat
otherwisemighthavebeenmissed.
IntroductiontoExcel
4/2/20164/2/2016
Excelwillhandlelargeamountsofdatathatisorganizedintableform,withrowsandcolumns.The
columns(whicharelabeledA,B,C)listthevariables(likeName,Age,NumberofCrimes,etc.)
Typically,thefirstrowholdsthenamesofthevariables.Therestoftherowsarefortheindividual
recordsorcasesbeinganalyzed.Eachcell(likeA1)holdsapieceofdata.
TheHighPerformanceComputingandBig
DataConference2016(London,UK)
TheHighPerformanceComputingandBigData
Conference2016willbuilduponthesuccessof
thelasttwoyearseventsandexplorethepolicy
surroundingthedevelopmentoftheUKse
infrastructure,HPCandbigdata.
14/3/201615/3/2016
MachineLearningandDataAnalytics
SymposiumMLDAS2016(Doha,Qatar)
Thepurposeofthissymposiumistobring
togetherresearchers,practitioners,students,
andindustryexpertsinthefieldsofmachine
learning,datamining,andrelatedareasto
presentrecentadvances,todiscussopen
http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism
1/8
1/20/2016
ModernversionsofExcelwillholdasmanyas1,048,576recordswithasmanyas16,384variables!
AnExcelspreadsheetalsowillholdmultipletablesonseparatesheets,whicharetabbedonthe
bottomofthepage.
ThistrainingonAIandautonomousrobotics
aimstoprovidemediaandpublicrelations
professionalswithanindepthunderstandingof
theimplicationsthattherapidadvancementofAI
technologymayaffecttheglobalcommunityin
boththephysicalandstructuralspheresandthe
potentialimpactofthefutureevolutionofsuch
technology,especiallyintermsofsecurity.
28/3/20161/4/2016
CARBootCamp(Missouri,USA)
Sorting
OneofthemostusefulabilitiesofExcelistosortthedataintoamorerevealingorder.Toooften,
wearegivenliststhatareinalphabeticalorder,whichisusefulonlyforfindingaparticularrecordin
alonglist.Injournalism,weusuallyaremoreinterestedinextremes:Themost,theleast,the
biggest,thesmallest,thebest,theworst.
Considerthedatausedinthisworkshop,alistoftheprovincesofItalyshowingthenumberof
variouskindsofcrimesreportedduringarecentyear.Hereishowitlookssortedinalphabetical
orderofprovincename:
Learnhowtoacquireelectronicinformation,use
spreadsheetsanddatabasestoanalyzethe
informationandtranslatethatinformationinto
highimpactstories.
7/8/201611/8/2016
CARBootCamp(Missouri,USA)
Learnhowtoacquireelectronicinformation,use
spreadsheetsanddatabasestoanalyzethe
informationandtranslatethatinformationinto
highimpactstories.
12/8/201616/8/2016
MappingBootCamp(Missouri,USA)
IREandNICARconductsthishandsontraining
usingthelatestversionofArcViewGIS.
Farmoreinterestingwouldbetosortitindescendingorderofthetotalnumberofcrimes,withthe
mostcrimeriddencityatthetopofthelist:
Therearetwomethodsofsorting.Thefirstmethodisquickandcanbeusedforsortingbyasingle
variable.Putthecursorinthecolumnyouwishtosortby(Delittiintotaleinthiscase)andthen
clicktheZAbutton:
http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism
2/8
1/20/2016
Butbeware!Putthecursorinthecolumn,butDONOTselectthecolumnletter(C,inthiscase)and
thensort.Considertheexamplebelow:
DoingthatwillsortONLYthedatainthatcolumn,therebydisorderingyourdata!Noticewellhow
thiscanhappen!
Theothermethodofsortingisforwhenyouwanttosortbymorethanonevariable.Forinstance,
supposewewishtosortthecrimedatafirstbyTerriterioinalphabeticalorder,butthenbyDelittiin
TotaleindescendingorderwithineachTerriterio.Todothat,gotothetoolbar,clickonDataand
thenSort,andthenchoosethevariablesbywhichyouwishtosort.ThenclickOK.
Theresultwillbethis:
Filtering
Sometimesyouwanttoexamineonlyparticularrecordsfromalargecollectionofdata.Forthat,you
http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism
3/8
1/20/2016
canuseExcelsFiltertool.Onthetoolbar,gotoDataFilterAutofilter.Smallbuttonswillappear
atthetopofeachcolumn:
SupposewewishtoseeonlytherecordsfromtheterriterioofLazio.Clickonthebuttononthe
TerriteriocolumnandchooseLaziofromthelist.Thisistheresult:
Noticethatyounowareseeingonlyrows36,44,78,80and104.
Morecomplicatedfiltersarepossible.Forinstance,supposeyouwishtoseeonlyrecordsinwhich
Delittiintotaleisgreaterthanorequalto50,000.ClickonthebuttonandchooseCustom
Filter:
Youcouldalso,forinstance,chooserecordsinwhichDelittiintotaleisgreaterthan50,000and
Omicidiislessthanorequalto25.
Functions
Excelhasmanybuiltinfunctionsusefulforperformingmathcalculationsandworkingwithdates
andtext.Forinstance,assumethatwewishtocalculatethetotalnumberofcrimesinallthe
provinces.Todothis,wewouldgotothebottomofColumnC,skiparow,andthenenterthis
formulaINCellC106:=SUM(C2:C104).Theequalssign(=)isnecessaryforallfunctions.Thecolon
(:)meansallthenumbersfromCellC2toCell104.Theresultisthis:
(Thereasonforskippingarowistoseparatethesumfromthemaintablesothatthetablecanbe
sortedwithoutpullingthesumintothetableduringthesortingoperation.Thiswaythesumwillstay
atthebottomofthecolumn.
Oftenyouwillwanttodoacalculationoneachrowofyourdatatable.Forinstance,youmightwant
tocalculatethecrimerate(thenumberofcrimesper100,000population),whichwouldletyou
comparethecrimeproblemincitiesofdifferentsizes.Todothis,wewouldcreateanewvariable
calledCrimeRateinColumnL,thefirstemptycolumn.Then,inCellL2,wewouldenterthis
formula:
=(C2/J2)*100000.Thisdividesthetotalcrimesbythepopulation,thenmultipliestheresultby
100,000.(Noticethattherearenospacesandnothousandsseparatorsusedintheformula.)Here
istheresult:
http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism
4/8
1/20/2016
Itwouldbeverytedioustorepeatwritingthatcalculationineachof103rowsofdata.Happily,Excel
hasawaytorapidlycopyaformuladownacolumnofcells.Todothat,youcarefulmovethecursor
(normallyabigfatwhitecross)tothebottomrightcornerofthecellcontainingtheformula.Whenit
isintherightspot,thecursorwillchangetoasmallblackcross.Atthatpoint,youcandoubleclick
andtheformulawillcopydownthecolumnuntilitreachesablankcellinthecolumntotheleft.This
wouldbetheresult:
Noticethattheformulachangesforeachrow,sothatRow6is=(C6/J6)*100000.
Now,ifwesortbyCrimeRateindescendingorder,weseethecitieswiththeworstcrimeproblems:
andsortinginascendingorder,theleastcrime:
HerearesomeotherusefulExcelfunctionsthatcanbeusedinsimilarways:
(Youcanadd,subtract,multiplyordividebyusingthesymbols+*and/)
=AVERAGEcalculatesthearithmeticmeanofacolumnorrowofnumbers
=MEDIANfindsthemiddlevalueofacolumnorrowofnumbers
=COUNTtellsyouhowmanyitemsthereareinacolumnorrow
=MAXtellsyouthelargestvalueinacolumnorrow
=MINtellsyouthesmallestvalueinacolumnorrow
Therearealsoavarietyoftextfunctionsthatcanjoinandcutaparttextstrings.Forinstance:
IfSteveisinCellB2andDoigisinCellC2,then=B2&&C2willproduceSteveDoig.And
=C2&,&B2willproduceDoig,Steve.Othertextfunctionsinclude:
=SEARCHthiswillfindthestartofadesiredstringoftextinalargerstring.
=LENthiswilltellyouhowmanycharactersareinatextstring.
=LEFTthiswillextracthowevermanycharactersyouspecifystartingfromtheleft.
=RIGHTthiswillextractcharactersstartingfromtheright.
Youcanalsododatearithmetic,suchascalculatingthenumberofdaysoryearsbetweentwo
dates,orhours,minutesand/orsecondsbetweentwotimes.Forinstance,tocalculateonApril24,
2010,theageinyearsofsomeonewhosebirthdateisincellB2,youcouldusethisformula:=
(DATE(2010,4,24)B2)/365.25.Thefirstpartoftheformulacalculatesthenumberofdaysbetween
thetwodates,thenthatisdividedby362.25(the.25accountsforleapyears)toproducetheyears.
Anotherusefuldatefunctionis=WEEKDAY,whichwilltellyouonwhichdayoftheweekachosen
datefalls.Forinstance=WEEKDAY(DATE(1948,4,21))returnsa4,whichmeansIwasbornona
http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism
5/8
1/20/2016
Wednesday.
Excelofferswellover200functionsinavarietyofcategoriesbeyondjustmath,datesandtext:
Financial,engineering,database,logical,statistical,etc.Butitisunlikelythatyouwillneedtobe
familiarwithmorethanadozenorsofunctions,unlessyouareajournalistwithaveryspecialized
beatsuchaseconomics.
PivotTables
OneofExcelsbesttricksistheabilitytosummarizedatathatisincategories.Thetoolthatdoes
thisiscalledapivottable,whichcreatesaninteractivecrosstabulationofthedatabycategory.
Tocreateapivottable,everycolumnofyourdatamusthaveavariablelabelinfact,itisalways
goodpracticetoputinavariablelabelanytimeyouinsertoraddanewcolumn.First,youmake
sureyourcursorisonsomecellinthetable.ThengotothetoolbarandclickonDataPivotTable
Report.AwindowwillpopupcalledthePivotTableWizard.JusthitNextNextFinishonthe
threestepsofthewizard.
Thiswillopenanewsheetthatlookslikethis:
Tobuildapivottable,youshouldvisualizethepieceofpaperthatwouldansweryourquestion.Our
exampledatashows103provincesinthe20TerritoriosofItaly.Imaginethatyouwantedtoknow
thetotalnumberofcrimesineachTerritorio.Thepieceofpaperthatwouldanswerthatquestion
wouldlisteachTerritorio,withthetotalnumberofcrimesnexttoeachname.
Tobuildthispivottable,wewouldusethemousetopickupTerritoriofromthelistofvariablesin
thefloatingboxtotheright,andplaceitintheDropRowFieldsHerebox.Wewouldthentakethe
DelittiintotalevariableandputitintheDropDataItemsHerebox.Thiswouldbetheresult:
IfyouclickthecursorintotheTotalColumnandhittheZAbuttontosort,youwillgetthis:
Itispossibletomakeverycomplicatedpivottables,withmultiplesubtotals.ButIrecommend
makinganewpivottableforeachquestionyouwanttoanswerseveralsimpletablesareeasierto
understandthanoneverycomplicatedtablethattriestoanswermanyquestionsatonce.
The
buttononthevariablelistopensupaboxthatwillletyoumakeavarietyofother
choicesabouthowtosummarizeanddisplaytheresult:
http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism
6/8
1/20/2016
OtherExcelTips
Excelwillimportdatathatcomesinavarietyofformatsotherthanthenative*.xlsthatExceluses.
Forinstance,Excelcanreadilyimporttextfilesinwhichthedatacolumnsareseparatedby
commas,tabs,orothercharacters,likethis:
Ifyoufindawebpagewithdataintableformat(rowsandcolumns),Excelcanopenitasa
spreadsheet.
Excelalsowillletyouformatyourdatatomakeitmorereadable.Forinstance,FormatCells
Numberwillallowyoutoputthousandsseparatorsinyournumbers,likethis:
FindingData
GovernmentagenciesarestartingtomakesomeoftheirdataavailableinExcelorotherformats.
Forinstance,ISTAT.IThasverycomprehensivedataaboutItaliandemographics,economy,crime,
etc.ManyoftheirtablescanbedownloadeddirectlyasExcelfiles.
OnetricktofindinterestingdatawouldbetouseGoogleandaddthesesearchterms:site:.it
filetype:xls.
Note:TheslidesfromthisworkshopareavailableonSlideShare.
NeedHelp?
Feelfreetosendmeanemailatsteve.doig[at]asu.edu.IwillbegladtogiveyouadviceifIcan.
Comments
http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism
7/8
1/20/2016
1 Comment
Share
Recommend
Login
Sort by Best
3 years ago
I've been finding google spreadsheets can be much more powerful than Excel
because they have collaboration features, revision control, and commenting.
Also investigators at @Propublica are doing interesting things such as using csv
files/google spreadsheets to power their CMS, which I think is brilliant.
1
Subscribe
Reply Share
Privacy
DatadrivenjournalismwascreatedbytheEuropeanJournalismCentre(EJC)withpartialfundingfromtheDutchMinistryofEducation,CultureandScience.
http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism
8/8