Professional Documents
Culture Documents
Big Data and Data Ethics
Big Data and Data Ethics
∙ Autonomous Vehicles
CMU, Stanford, Google, Tesla
Source: https://www.businessinsider.in/There-Was-Something-Different-About-The-Vatican-Crowd-In-2005/articleshow/21257038.cms
Evolution of Big Data – Smart Phones
Source: https://siliconangle.com/blog/2016/02/04/data-rich-more-people-have-access-to-the-internet-than-water/
Evolution of Data – Smart Phone Data Usage
Source: https://www.statista.com/statistics/752731/worldwide-average-monthly-smartphone-cellular-data-usage/
Data Size Projections – How Big is ZettaBytes
Bytes(8 bits)
Kilobyte (1000 bytes)
Megabyte (1 000 000 bytes)
Gigabyte (1 000 000 000 bytes)
Terabyte (1 000 000 000 000 bytes)
Petabyte (1 000 000 000 000 000 bytes)
Exabyte (1 000 000 000 000 000 000 bytes)
Zettabyte (1 000 000 000 000 000 000 000 bytes)
Yottabyte (1 000 000 000 000 000 000 000 000 bytes)
Data Size Projection
Evolution of Big Data
• All these factors together, generated humongous data that was not possible to store,
manage and analyze in conventional relational and data warehouse systems
• There was a need to store this in a way that is cheaper and allows the analysis of this
data quickly
• Companies like Yahoo, Google and Microsoft put a lot of money into research and
opened it to the open source community
• This resulted in birth of Hadoop that provided a software framework to store data on
distributed commodity hardware and process it using map reduce model
Evolution of Big Data – Artificial Intelligence/Machine
Learning
• With the evolution of data, it was not possible to analyze all this data
manually
• Companies started to use Artificial Intelligence to make sense out of this
• With the evolution in hardware speed, companies are now using Machine
Learning/Deep learning to create predictive models
Big Data/AI Use cases:
Customer profiling and Recommendation
• Most common use case that almost everybody has experienced is recommendation
engines on various commodity websites
• Used by most of e-commerce and subscription website like Amazon, ebay, Netflix
etc
• Also used by social media websites to serve content based on your profile
• Data from user base is profiled on regular basis, user actions are watched on real time
basis and recommendations are made as per user interests
Big Data/AI use cases: Connected Vehicles
Source: http://www.qstarz.com/Products/GPS%20Products/CR-Q1100Vbc-F.html
Big Data/AI Applications – Smart Meters
Big Data/AI Applications – Smart Meters
• Fraud Detection
Credit Card companies use big data/AI to find anomalies in real time transaction
Big Data Challenges - Security
• Security and privacy is a myth
• Only 2 things can save you from privacy intrusion:
YOU!
Regulations
• Europe is very serious about bringing stringent privacy regulations
• Recently introduced GDPR that provides the “Right to forget” to the public
Big Data Challenges: Misuse of the data
• Companies know about you more than they should
• Consumers providing them this data for using their products for free
• Companies analyze data and reach out to potential targets using
recommended pages and products.
• This is the potential of changing the fabric of society
Big Data/AI Challenges: Quick Reaction
In 2013, The Associated Press (AP) account was hacked and one fake tweet
was posted form the account. It wiped out $130 Billion dollars from stock
market
Source: https://www.telegraph.co.uk/finance/markets/10013768/Bogus-AP-tweet-about-explosion-at-the-White-House-wipes-billions-off-US-markets.html
Big Data Challenges: ML/AI not always accurate
●
National/Local Data: Data Data:
National/Local
Transportation
from Places
Dataandfrom
ThingsPlaces and Things
○ Transportation
Telecom/Smartphone/Communication/WiFi
○ Telecom/Smartphone/Communication/WiFi
Banking
Entertainment
○ Banking
Shopping
○ Entertainment
Data – Data from Unplanned Events (Black Swans)
○ Shopping
Global
Earthquakes/Tsunami
Typhoons/Hurricanes/Cyclones
●Global Data – Data from
Fire Unplanned Events (Black Swans)
Flooding
○ Earthquakes/Tsunami
○ Typhoons/Hurricanes/Cyclones
○ Fire
○ Flooding
What to Collect?
Data Relevance: Big Data Parameters
Volume – Size and Scale
■ Up to 40,000 sensors in the Airbus A380
■ 7 TB per day
Velocity – Data Rate and Streaming Data
■ Sensor data collected in msec
■ Type and No of Sensors
Variety – Cross Media Data
■ Sensors
■ Images and videos
■ Text data
■ Relational business data
Validity - Reliability
■ Poor data quality
■ Missing data
■ Data collected doesn't suit targeted use cases
Value – Usefulness and Importance
■ Data per se is not valuable
■ How to extract real value from data?
Necessary Conditions for Collection
and Use of Big Data
●Infrastructure
Right Information
To the Right People
At the Right Time
In the Right Language
In the Right Medium: Voice, Video and/or Text
In the Right Level of Detail
Data to Knowledge: Machine Learning
Machine Learning is the Key to Unlocking Big Data
Machine Learning - Complexity and Automation Levels
• Data Analytics
18
Role of Machine Learning in Big Data
Anomaly Detection
Healthy Individuals vs. Persons with Potential Problems
Classification
Clustering into Groups of Similar Populations
Failure Prediction
Sensor-based Prediction of Future Health Problems
Forecasting
Predict Life Expectancy for Insurance
Companies Using Behavioral analytics
● Services like Hulu and Netflix competing for viewers’ attention, Time
Warner collects user data such as
how frequently customers tune in,
the effect of bandwidth on consumer behavior,
customer engagement
peak usage times
● Customer complaints and PR crises have become more difficult to handle thanks to
social media.
● Nestle created a 24/7 monitoring centre to listen to all of the conversations about the
company and its products on social media.
● The company will actively engage with those that post about them online in order to
mitigate damage and build customer loyalty.
● McDonalds tracks vast amounts of data in order to improve operations and boost the
customer experience.
● The company looks at factors such as
○ the design of the drive-thru,
○ information provided on the menu,
○ wait times,
○ the size of orders and
○ ordering patterns
Problems with Big Data Machine Learning
●Learning
○ Use domain user annotations as labels and sensor data as well as
business data to learn machine learning models
●Prediction
○ Use the learned model and apply to new data
●Feedback
○ Ask domain users to annotate patterns and anomalies
●Recommendation
○ Recommend steps that should be done by the domain user
● Action 1
● Social responsibility means that individuals and companies have a duty to act in the
best interests of their environment and society.
● The concept of social responsibility protect the interests of other members of a society
such as workers, consumers and the community as a whole.
● The objective of managers for taking business decisions is not merely to maximize
profits but also to serve and protect the interests of other members.
Social Responsibility of business
● Social responsibility is related to the concept of ethics.
● Ethics is the discipline that deals with moral duties and
obligations.
● Social responsibility implies corporate enterprises should
follow business ethics and work for not only to maximise their
profits or shareholders’ value but also to promote the interests
of other stakeholders and the society.
Social Responsibility of business
Most often, the employees fear to raise a voice against the illegal activity being
carried out in the organization because of following reasons:
Threat to life
Lost friendships
Artificial
… Intelligent Amplifiers": Use of AI Intelligence
Technology
to augment human intelligence
1
Ethics in data science
● Data has been integrated into every aspect of our life: the
friends and business connections we are asked to make, the
shopping circulars we receive in the mail, the news we see, and
the songs we’ve played.
● Data is collected from us at every turn: every trace of our
online presence, and sometimes even traces of our physical
presence.
● We’ve gained some advantages from data, but we’ve also seen
the damage that the misuse of data has caused.
Facebook Case Study
Ethics in data science
● Data-type check
● Simple range and constraint check
● Code and cross-reference check
● Structured check
● Consistency check
● Range check
● Criteria
Algorithmic Fairness
● Any code of data ethics will tell you that you shouldn’t collect
data from experimental subjects without informed consent.
● But that code won’t tell you how to implement “informed
consent.”
● Informed consent is easy when you’re interviewing a few
dozen people in person for a psychology experiment.
● Informed consent means something different when someone
clicks an item in an online catalog
Doing Good Data Science
● Once you have given your data to a service, you must be able
to understand what is happening to your data. Can you control
how the service uses your data?
● All too often, users have no effective control over how their
data is used.
● They are given all-or-nothing choices, or a convoluted set of
options that make controlling access overwhelming and
confusing.
● It’s often impossible to reduce the amount of data collected,
or to have data deleted later.
Example