Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Visualizing Relationships

Journalistic problems in a digital age

Summary
1. 2. 3. 4. 5. Introduction The problem we are solving Involved issues Problems we found The Challenge

Who are we?


Mariano Blejman is a technology editor and youth editor in Argentine newspaper Pgina/12, and Hacks/Hackers Buenos Aires co-founder. @blejman Marcos Vanetta is a biomedical engineer. Software developer at 3PillarGlobal and hacker at Hacks/Hackers Buenos Aires. @malev

Hacks/Hackers Buenos Aires

The problem
1976 A dictatorship started in Argentina. 30,000 persons were kidnapped and disappeared. 1985 First trials happened in Argentina. They judged the bad guys but we have to stop. 2003 Justice start judging the bad guys again. 2012 Large amount of judicial documents. No one can read all of them

Involved issues
Semantic Analytics Ontology Data Mining Social Network Analysis Visualizations

Who were dealing with documents?


DocumentCloud, Overview, Open Calais, NLTK, Gate

First approach
Read all the documents Software solution based on regular expressions Ruby, Padrino and MySQL database
def self.extract_plain_text(path) basename = File.basename(path).split('.')[0..-2].join('.') tmp_dir = Dir.tmpdir Docsplit.extract_text(path, :output => tmp_dir, :ocr => false) text = File.open(File.join(tmp_dir, "#{basename}.txt")).read self.clean_text(text) end

The problems we found


Convert text from pdf files Extract entities from documents Parse dates and addresses Co-reference names resolution How to store relations Documents contextual information Confidence on data on a crowdsourcing platform.

Visualizing relationships over the time

What do we have now?


Prototype for a single (and local) use case: mapa76 Platform for different use cases: analice.me

The visualizations that we imagined

Visualizations that we found

The #mozfest challenge


Find a big journalistic issue that involves: Lot of documents with unstructured data Lof of data to find inside What relationships do you wants to find

The #mozfest challenge


Propose at least one new visualization to find relationships (could be maps, timelines, network graphs, treemaps, bars and anything you can imagine). We want a poster! We want post-its! We want you (to work for us)

You might also like