Professional Documents
Culture Documents
WRG BigData Presentation DonMoody Nov2012
WRG BigData Presentation DonMoody Nov2012
WRG BigData Presentation DonMoody Nov2012
1.0: Overview/Agenda
1.0:
Overview/Agenda
1.2
'Set the table' Framing the Issue (Perspectives on BD) How 'big' is data getting? (fun exercise) A few BD tools/technologies Is BD a real biz issue or just hype? BDs current popularity wave Real legal concerns or just Chicken Little?
Handouts
1.3:
2.0:
2.1 This is old hat!! (See innumerable beer and diapers examples from 1990s even before Target & pregnancy) 2.2 Framing the threshold question Do I Have to Worry About Big Data legal concerns? from two points of view: Small-to-Medium Enterprises (SMEs): (or lawyers representing them) Large Enterprises/Fortune 1000: I know Im a big company, I know data, and (if applicable) I know Im in a heavily privacy regulated area like health/HIPAA, financial/GLBA, education/FERPA, video history (VPPA), kids (COPPA) but when do I have to worry about separate BD legal issues? BD legal issues typically center around privacy but also can include: False/deceptive/unfair business practices e-Discovery IP
2.0:
2.3 "Big data" is relative term (& somewhat misleading) due to 3 V's:
2.0:
2.4
Better: "Lotta Data" Not limited to large enterprises or large individual file sizes (e.g. trillions of small text entries) Better Still: "Lotta Messy Data" Lack of structure huge concern (e.g. 80% of data worldwide is unstructured Imposing order on chaos (e.g. pattern recognition) is key goal of 'big' data analytics
2.5
2.0:
2.5 Perspectives
Large Enterprises/Fortune 1000: When data is big enough or detailed enough for: Temptation to de-anonymize
Likelihood of unintended pattern reocgition (exceeds reasonable consumer expectations and/or what Priv Policy says)(FTC)
3.0:
3.0:
3.1
Measurement scales (with pragmatic examples and some fun facts/tidbits for each)
3.0:
3.1
Measurement scales (with pragmatic examples and some fun facts/tidbits for each) Getting 'bigger': (big' but still not expensive!) Terabytes (enterprise servers, large HDDs) U.S. Library of Congress had over 235 terabytes of data in 2011 100 terabytes uploaded to Facebook/day 3 Terabyte Seagate HDD available on Amazon for $120 (as of 11/01/2012) AT&T claims to have largest single, unique database (1.9 trillion rows) @ 312 terabytes
3.0:
3.1
Measurement scales (with pragmatic examples and some fun facts/tidbits) Definitely 'big': Petabytes (supercomputers, large virtual "drives") The total file size of the movie Avatar (incl. encoding for 3-D, IMAX, HD, etc.) constituted over 1 petabyte of data, roughly equivalent to a 32-year long MP3 song. In 2008, eBay, Walmart and BofA were considered data storage leaders with 4 PB, 2.5 PB and 1.5 PB respectively Now, however, Facebook reportedly has over 30+ petabytes of data in a massive Hadoop cluster IBM put together a 120 Petabyte (120 million gigabyte) data cluster (virtual drive) using over 200,000 smaller HDDs, equaling a 1 trillion files or 2 billion hour long MP3
3.0:
3.1 Measurement scales (with pragmatic examples and some fun facts/tidbits for each)
Definitely 'big':
Exabytes (largest individual data sets globally) Zettabytes (total data currently on Earth projected to be 2.7 ZB)
3.0:
3.1
Measurement scales (with pragmatic examples and some fun facts/tidbits for each)
Excellent video demonstration from Univ. of Utah Prof. Chris Johnson @ TEDx Salt Lake City 2011: http://www.youtube.com/watch?v=5UxC9Le1eOY
3.0:
3.0:
3.1
Measurement scales (with pragmatic examples and some fun facts/tidbits for each)
Getting 'bigger': (big' but still not expensive!) Terabytes (enterprise servers, large HDDs)
Definitely 'big':
log/use data
CONCLUSION