Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

1/20/2016

Using Excel to Do Precision Journalism | Resources | Data Driven Journalism

SearchDataDrivenJournalism

About

News & Analysis

Events

Featured Projects

Resources

Mailinglist

Course

Home> Resources >Article

24/4/2013

UsingExceltoDoPrecisionJournalism
Like

Tweet

Submit your resource


Doyouwanttoseeyourwork
featuredonourwebsite?

AtutorialbySteveDoig,journalismprofessoratASU'sCronkiteSchoolandPulitzerwinningdata
journalist,basedonhisworkshop,ExcelforJournalists.TheworkshopispartoftheSchoolofData
Journalism2013attheInternationalJournalismFestival.

Didyoucreateorknowofanusefulresourcefor
datajournalismthatyouthinkweshould
feature?Sendusanemailat
info@datadrivenjournalism.net

Upcoming Events
15/12/201516/12/2015
CodeforSouthAfricaDataJournalism
School(SouthAfrica)
CodeforSouthAfricaislaunchingAfricasfirst
datajournalismschoolinFebruary2016.
11/1/20167/3/2016
McGrawHillFinancialDataJournalism
(Online)
WiththesupportofMcGrawHillFinancial,the
InternationalCenterforJournalists(ICFJ)will
offertwoonlinecoursesforreportersonein
EnglishandoneinSpanishfocusingonusing
datatocoverentrepreneurshipandfinancial
news.
19/1/201621/9/2015

SteveDoigattheSchoolofDataJournalisminPerugia.ImagebyLucyChambers
MicrosoftExcelisapowerfultoolthatwillhandlemosttasksthatareusefulforajournalistwho
needstoanalyzedatatodiscoverinterestingpatterns.Thesetasksinclude:
Sorting
Filtering
Usingmathandtextfunctions
Pivottables

WebScrapingforJournalists(London,UK)
Scrapinggettingacomputertocapture
informationfromonlinesourcesisoneofthe
mostpowerfultechniquesfordatasavvy
journalistswhowanttogettothestoryfirst,or
findexclusivesthatnooneelsehasspotted.
PaulBradshawwillshowyouhowtoscrape
contentfromthewebandfindstoriesthat
otherwisemighthavebeenmissed.

IntroductiontoExcel

4/2/20164/2/2016

Excelwillhandlelargeamountsofdatathatisorganizedintableform,withrowsandcolumns.The
columns(whicharelabeledA,B,C)listthevariables(likeName,Age,NumberofCrimes,etc.)
Typically,thefirstrowholdsthenamesofthevariables.Therestoftherowsarefortheindividual
recordsorcasesbeinganalyzed.Eachcell(likeA1)holdsapieceofdata.

TheHighPerformanceComputingandBig
DataConference2016(London,UK)
TheHighPerformanceComputingandBigData
Conference2016willbuilduponthesuccessof
thelasttwoyearseventsandexplorethepolicy
surroundingthedevelopmentoftheUKse
infrastructure,HPCandbigdata.
14/3/201615/3/2016
MachineLearningandDataAnalytics
SymposiumMLDAS2016(Doha,Qatar)
Thepurposeofthissymposiumistobring
togetherresearchers,practitioners,students,
andindustryexpertsinthefieldsofmachine
learning,datamining,andrelatedareasto
presentrecentadvances,todiscussopen

http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism

1/8

1/20/2016

Using Excel to Do Precision Journalism | Resources | Data Driven Journalism


researchquestions,andtobridgethegap
betweendataanalyticsresearchandindustry
needsoncertainconcreteproblems.
22/3/201623/3/2016
QatarFoundationAnnualResearch
Conference2016(Doha,Qatar)
Thisyearsconferencewillbeheldunderthe
themeInvestinginResearch&Innovatingfor
Society.
23/3/201624/3/2016
Specializedcourseformediaprofessionals
onArtificialIntelligenceandAutonomous
Robotics(NL)

ModernversionsofExcelwillholdasmanyas1,048,576recordswithasmanyas16,384variables!
AnExcelspreadsheetalsowillholdmultipletablesonseparatesheets,whicharetabbedonthe
bottomofthepage.

ThistrainingonAIandautonomousrobotics
aimstoprovidemediaandpublicrelations
professionalswithanindepthunderstandingof
theimplicationsthattherapidadvancementofAI
technologymayaffecttheglobalcommunityin
boththephysicalandstructuralspheresandthe
potentialimpactofthefutureevolutionofsuch
technology,especiallyintermsofsecurity.
28/3/20161/4/2016
CARBootCamp(Missouri,USA)

Sorting
OneofthemostusefulabilitiesofExcelistosortthedataintoamorerevealingorder.Toooften,
wearegivenliststhatareinalphabeticalorder,whichisusefulonlyforfindingaparticularrecordin
alonglist.Injournalism,weusuallyaremoreinterestedinextremes:Themost,theleast,the
biggest,thesmallest,thebest,theworst.
Considerthedatausedinthisworkshop,alistoftheprovincesofItalyshowingthenumberof
variouskindsofcrimesreportedduringarecentyear.Hereishowitlookssortedinalphabetical
orderofprovincename:

Learnhowtoacquireelectronicinformation,use
spreadsheetsanddatabasestoanalyzethe
informationandtranslatethatinformationinto
highimpactstories.
7/8/201611/8/2016
CARBootCamp(Missouri,USA)
Learnhowtoacquireelectronicinformation,use
spreadsheetsanddatabasestoanalyzethe
informationandtranslatethatinformationinto
highimpactstories.
12/8/201616/8/2016
MappingBootCamp(Missouri,USA)
IREandNICARconductsthishandsontraining
usingthelatestversionofArcViewGIS.

Farmoreinterestingwouldbetosortitindescendingorderofthetotalnumberofcrimes,withthe
mostcrimeriddencityatthetopofthelist:

Therearetwomethodsofsorting.Thefirstmethodisquickandcanbeusedforsortingbyasingle
variable.Putthecursorinthecolumnyouwishtosortby(Delittiintotaleinthiscase)andthen
clicktheZAbutton:

http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism

2/8

1/20/2016

Using Excel to Do Precision Journalism | Resources | Data Driven Journalism

Butbeware!Putthecursorinthecolumn,butDONOTselectthecolumnletter(C,inthiscase)and
thensort.Considertheexamplebelow:

DoingthatwillsortONLYthedatainthatcolumn,therebydisorderingyourdata!Noticewellhow
thiscanhappen!

Theothermethodofsortingisforwhenyouwanttosortbymorethanonevariable.Forinstance,
supposewewishtosortthecrimedatafirstbyTerriterioinalphabeticalorder,butthenbyDelittiin
TotaleindescendingorderwithineachTerriterio.Todothat,gotothetoolbar,clickonDataand
thenSort,andthenchoosethevariablesbywhichyouwishtosort.ThenclickOK.

Theresultwillbethis:

Filtering
Sometimesyouwanttoexamineonlyparticularrecordsfromalargecollectionofdata.Forthat,you

http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism

3/8

1/20/2016

Using Excel to Do Precision Journalism | Resources | Data Driven Journalism

canuseExcelsFiltertool.Onthetoolbar,gotoDataFilterAutofilter.Smallbuttonswillappear
atthetopofeachcolumn:

SupposewewishtoseeonlytherecordsfromtheterriterioofLazio.Clickonthebuttononthe
TerriteriocolumnandchooseLaziofromthelist.Thisistheresult:

Noticethatyounowareseeingonlyrows36,44,78,80and104.
Morecomplicatedfiltersarepossible.Forinstance,supposeyouwishtoseeonlyrecordsinwhich
Delittiintotaleisgreaterthanorequalto50,000.ClickonthebuttonandchooseCustom
Filter:

Youcouldalso,forinstance,chooserecordsinwhichDelittiintotaleisgreaterthan50,000and
Omicidiislessthanorequalto25.

Functions
Excelhasmanybuiltinfunctionsusefulforperformingmathcalculationsandworkingwithdates
andtext.Forinstance,assumethatwewishtocalculatethetotalnumberofcrimesinallthe
provinces.Todothis,wewouldgotothebottomofColumnC,skiparow,andthenenterthis
formulaINCellC106:=SUM(C2:C104).Theequalssign(=)isnecessaryforallfunctions.Thecolon
(:)meansallthenumbersfromCellC2toCell104.Theresultisthis:

(Thereasonforskippingarowistoseparatethesumfromthemaintablesothatthetablecanbe
sortedwithoutpullingthesumintothetableduringthesortingoperation.Thiswaythesumwillstay
atthebottomofthecolumn.
Oftenyouwillwanttodoacalculationoneachrowofyourdatatable.Forinstance,youmightwant
tocalculatethecrimerate(thenumberofcrimesper100,000population),whichwouldletyou
comparethecrimeproblemincitiesofdifferentsizes.Todothis,wewouldcreateanewvariable
calledCrimeRateinColumnL,thefirstemptycolumn.Then,inCellL2,wewouldenterthis
formula:
=(C2/J2)*100000.Thisdividesthetotalcrimesbythepopulation,thenmultipliestheresultby
100,000.(Noticethattherearenospacesandnothousandsseparatorsusedintheformula.)Here
istheresult:

http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism

4/8

1/20/2016

Using Excel to Do Precision Journalism | Resources | Data Driven Journalism

Itwouldbeverytedioustorepeatwritingthatcalculationineachof103rowsofdata.Happily,Excel
hasawaytorapidlycopyaformuladownacolumnofcells.Todothat,youcarefulmovethecursor
(normallyabigfatwhitecross)tothebottomrightcornerofthecellcontainingtheformula.Whenit
isintherightspot,thecursorwillchangetoasmallblackcross.Atthatpoint,youcandoubleclick
andtheformulawillcopydownthecolumnuntilitreachesablankcellinthecolumntotheleft.This
wouldbetheresult:

Noticethattheformulachangesforeachrow,sothatRow6is=(C6/J6)*100000.
Now,ifwesortbyCrimeRateindescendingorder,weseethecitieswiththeworstcrimeproblems:

andsortinginascendingorder,theleastcrime:

HerearesomeotherusefulExcelfunctionsthatcanbeusedinsimilarways:
(Youcanadd,subtract,multiplyordividebyusingthesymbols+*and/)
=AVERAGEcalculatesthearithmeticmeanofacolumnorrowofnumbers
=MEDIANfindsthemiddlevalueofacolumnorrowofnumbers
=COUNTtellsyouhowmanyitemsthereareinacolumnorrow
=MAXtellsyouthelargestvalueinacolumnorrow
=MINtellsyouthesmallestvalueinacolumnorrow
Therearealsoavarietyoftextfunctionsthatcanjoinandcutaparttextstrings.Forinstance:
IfSteveisinCellB2andDoigisinCellC2,then=B2&&C2willproduceSteveDoig.And
=C2&,&B2willproduceDoig,Steve.Othertextfunctionsinclude:
=SEARCHthiswillfindthestartofadesiredstringoftextinalargerstring.
=LENthiswilltellyouhowmanycharactersareinatextstring.
=LEFTthiswillextracthowevermanycharactersyouspecifystartingfromtheleft.
=RIGHTthiswillextractcharactersstartingfromtheright.
Youcanalsododatearithmetic,suchascalculatingthenumberofdaysoryearsbetweentwo
dates,orhours,minutesand/orsecondsbetweentwotimes.Forinstance,tocalculateonApril24,
2010,theageinyearsofsomeonewhosebirthdateisincellB2,youcouldusethisformula:=
(DATE(2010,4,24)B2)/365.25.Thefirstpartoftheformulacalculatesthenumberofdaysbetween
thetwodates,thenthatisdividedby362.25(the.25accountsforleapyears)toproducetheyears.
Anotherusefuldatefunctionis=WEEKDAY,whichwilltellyouonwhichdayoftheweekachosen
datefalls.Forinstance=WEEKDAY(DATE(1948,4,21))returnsa4,whichmeansIwasbornona

http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism

5/8

1/20/2016

Using Excel to Do Precision Journalism | Resources | Data Driven Journalism

Wednesday.
Excelofferswellover200functionsinavarietyofcategoriesbeyondjustmath,datesandtext:
Financial,engineering,database,logical,statistical,etc.Butitisunlikelythatyouwillneedtobe
familiarwithmorethanadozenorsofunctions,unlessyouareajournalistwithaveryspecialized
beatsuchaseconomics.

PivotTables
OneofExcelsbesttricksistheabilitytosummarizedatathatisincategories.Thetoolthatdoes
thisiscalledapivottable,whichcreatesaninteractivecrosstabulationofthedatabycategory.
Tocreateapivottable,everycolumnofyourdatamusthaveavariablelabelinfact,itisalways
goodpracticetoputinavariablelabelanytimeyouinsertoraddanewcolumn.First,youmake
sureyourcursorisonsomecellinthetable.ThengotothetoolbarandclickonDataPivotTable
Report.AwindowwillpopupcalledthePivotTableWizard.JusthitNextNextFinishonthe
threestepsofthewizard.
Thiswillopenanewsheetthatlookslikethis:

Tobuildapivottable,youshouldvisualizethepieceofpaperthatwouldansweryourquestion.Our
exampledatashows103provincesinthe20TerritoriosofItaly.Imaginethatyouwantedtoknow
thetotalnumberofcrimesineachTerritorio.Thepieceofpaperthatwouldanswerthatquestion
wouldlisteachTerritorio,withthetotalnumberofcrimesnexttoeachname.
Tobuildthispivottable,wewouldusethemousetopickupTerritoriofromthelistofvariablesin
thefloatingboxtotheright,andplaceitintheDropRowFieldsHerebox.Wewouldthentakethe
DelittiintotalevariableandputitintheDropDataItemsHerebox.Thiswouldbetheresult:

IfyouclickthecursorintotheTotalColumnandhittheZAbuttontosort,youwillgetthis:

Itispossibletomakeverycomplicatedpivottables,withmultiplesubtotals.ButIrecommend
makinganewpivottableforeachquestionyouwanttoanswerseveralsimpletablesareeasierto
understandthanoneverycomplicatedtablethattriestoanswermanyquestionsatonce.

The
buttononthevariablelistopensupaboxthatwillletyoumakeavarietyofother
choicesabouthowtosummarizeanddisplaytheresult:

http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism

6/8

1/20/2016

Using Excel to Do Precision Journalism | Resources | Data Driven Journalism

OtherExcelTips
Excelwillimportdatathatcomesinavarietyofformatsotherthanthenative*.xlsthatExceluses.
Forinstance,Excelcanreadilyimporttextfilesinwhichthedatacolumnsareseparatedby
commas,tabs,orothercharacters,likethis:

Ifyoufindawebpagewithdataintableformat(rowsandcolumns),Excelcanopenitasa
spreadsheet.
Excelalsowillletyouformatyourdatatomakeitmorereadable.Forinstance,FormatCells
Numberwillallowyoutoputthousandsseparatorsinyournumbers,likethis:

FindingData
GovernmentagenciesarestartingtomakesomeoftheirdataavailableinExcelorotherformats.
Forinstance,ISTAT.IThasverycomprehensivedataaboutItaliandemographics,economy,crime,
etc.ManyoftheirtablescanbedownloadeddirectlyasExcelfiles.
OnetricktofindinterestingdatawouldbetouseGoogleandaddthesesearchterms:site:.it
filetype:xls.

Note:TheslidesfromthisworkshopareavailableonSlideShare.
NeedHelp?
Feelfreetosendmeanemailatsteve.doig[at]asu.edu.IwillbegladtogiveyouadviceifIcan.

Return to Resources overview

Comments

http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism

7/8

1/20/2016

Using Excel to Do Precision Journalism | Resources | Data Driven Journalism

1 Comment

Data Driven Journalism

Share

Recommend

Login

Sort by Best

Join the discussion


Chris Blow

3 years ago

I've been finding google spreadsheets can be much more powerful than Excel
because they have collaboration features, revision control, and commenting.
Also investigators at @Propublica are doing interesting things such as using csv
files/google spreadsheets to power their CMS, which I think is brilliant.
1

Subscribe

Reply Share

Add Disqus to your site Add Disqus Add

Privacy

DatadrivenjournalismwascreatedbytheEuropeanJournalismCentre(EJC)withpartialfundingfromtheDutchMinistryofEducation,CultureandScience.

http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism

8/8

You might also like