Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

Digital Business Information Systems

UMMDF7-15-M

Presentation by

Dr. Frano Barbic


Module Leader
AI and Big Data
Learning objectives

1. Introduce concepts of big data and artificial intelligence.


2. Identify the characteristics of big data.
3. Examine how big data has grown over the last few years.
4. Explain the importance of using big data in business organizations.
5. Examples of what you can accomplish with big data.
DEFINING BIG DATA
Big Data
VIDEO
A punched card is a piece of card
stock that stores digital data using
punched holes. Punched cards were
once common in data processing and
the control of automated machines.
Margaret Hamilton was director of
the Software Engineering Division of
the MIT Instrumentation Laboratory.
Bill Gates in 1994 showing a CD-ROM
could hold more info than a stack of
330,000 pieces of paper.
What is Big Data?

• Big data can be defined as datasets whose size or type is


beyond the ability of traditional relational databases to
capture, manage and process the data with low latency
(IBM, 2020).

• Big data refers to data that is too big to fit on a single server,
too unstructured to fit into a row-and-column database, or
too continuously flowing to fit into a static data warehouse.

9
Big Data is the
Volume refers to the vast amount of Velocity refers to the speed at which
data generated and collected by data is generated, collected,
organizations from various sources. processed, and analyzed.

It represents the sheer size or It represents the rate at which data


magnitude of data that exceeds the streams into a system or platform.
capacity of traditional database
systems to manage and process
efficiently.

Variety refers to the diverse types and formats of data that are generated and collected by organizations.

Unlike traditional structured data found in relational databases, big data encompasses a wide range of data
types, including structured, semi-structured, and unstructured data.
Data Volume

• Data volume is increasing exponentially

• 44x increase from 2009 to 2020


• From 0.8 zettabytes to 35zb

Exponential increase in
collected/generated data

15
Reference: Ruoming Jin http://www.cs.kent.edu/~jin/BigData/index.html
Key technologies and approaches used to address
volume in big data include:
1. Distributed file systems: These systems, such as Hadoop Distributed File System (HDFS), allow data to be
distributed across multiple nodes in a cluster, enabling scalable storage of massive datasets.

2. NoSQL databases: Unlike traditional relational databases, NoSQL databases are designed to handle large volumes
of unstructured or semi-structured data efficiently. Examples include MongoDB, Cassandra, and Apache CouchDB.

3. Data compression and deduplication: Techniques like data compression and deduplication help reduce the storage
footprint of large datasets without sacrificing data integrity or accessibility.

4. Data warehousing: Data warehousing solutions provide centralized repositories for storing and managing large
volumes of structured and semi-structured data, enabling organizations to perform complex analytics and
generate insights.

5. Cloud storage and computing: Cloud platforms offer scalable storage and computing resources on-demand,
allowing organizations to efficiently manage and analyze large volumes of data without investing in costly
infrastructure.
Exponential growth of data in today's digital age
4.6
Billions of RFID billion
tags
12+ TBs (1.3B in 2005)
camera
of tweet data phones
every day world wide

100s of
millions
data every day

of GPS
? TBs of

enabled
devices sold
annually

25+ TBs of 2+
log data
every day billion
people on
the Web
76 million smart meters by end
in 2009… 2011
How many in 2020?
Reference: Ruoming Jin http://www.cs.kent.edu/~jin/BigData/index.html
Speed (Velocity)

• Data is being generated fast and needs to be processed fast


• Online Data Analytics

• Late decisions ➔ missing opportunities

• Examples
• E-Promotions: Based on your current location, your purchase
history, what you like ➔ send promotions right now for store next
to you

• Healthcare monitoring: sensors monitoring your activities and


body ➔ any abnormal measurements require immediate reaction 19
Importance of processing and analyzing data in real-time or near real-time

Social media and networks Scientific instruments


(all of us are generating data) (collecting all sorts of data)

Sensor technology and networks


(measuring all kinds of data) Mobile devices
(tracking all objects all the time)
Real-Time Analytics/Decision Requirement
Product Learning why Customers
Recommendations Influence Behavior Switch to competitors
that are Relevant and their offers; in
& Compelling time to Counter

Improving the Friend Invitations


Marketing Customer to join a
Effectiveness of a Game or Activity
Promotion while it Preventing Fraud that expands
is still in Play as it is Occurring business
& preventing more
proactively
Data types
• In the context of big data, variety refers to the diverse types and formats of data that are
generated and collected by organizations. Unlike traditional structured data found in relational
databases, big data encompasses a wide range of data types, including structured, semi-
structured, and unstructured data.

• Structured data: This type of data is organized in a tabular format with a predefined schema,
where each data element is assigned to a specific field or column. Examples of structured
data include transaction records, customer information, and financial data.

• Unstructured data: Unstructured data does not have a predefined schema or format and can
exist in various forms, such as text documents, emails, social media posts, videos, images, and
audio recordings. Analyzing unstructured data requires advanced natural language processing
(NLP), text mining, and image recognition techniques.

• Semi-structured data: Semi-structured data falls somewhere between structured and


unstructured data and typically contains some organizational structure but lacks the strict
schema of structured data. Examples include XML files, JSON documents, log files, and web
server logs.
Variety (Complexity)
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …

• Streaming Data
• You can only scan the data once

• A single application can be generating/collecting many types


of data

• Big Public Data (online, weather, finance, etc)

To extract knowledge➔ all (or most of )


these types of data need to linked together 24
Key technologies and approaches used to address
variety in big data include
1.NoSQL databases: NoSQL (Not Only SQL) databases are designed to handle diverse data
types and provide flexible schemas to accommodate varying data structures. Examples
include MongoDB, Cassandra, and Apache HBase.
2.Hadoop ecosystem: The Hadoop ecosystem, including tools like Apache Hadoop, Apache
Spark, and Apache Hive, supports the processing and analysis of structured, semi-
structured, and unstructured data at scale.
3.Data integration and ETL (Extract, Transform, Load) tools: Data integration platforms enable
organizations to ingest, transform, and consolidate data from disparate sources into a
unified format for analysis. Examples include Informatica, Talend, and Apache NiFi.
4.Text mining and NLP: Text mining and NLP techniques enable organizations to extract
valuable insights from unstructured text data, such as customer reviews, social media
posts, and emails.
5.Data lakes: Data lakes are centralized repositories that store raw, unprocessed data from
various sources in its native format, allowing organizations to perform analytics and derive
insights across diverse data types.
Characterization of Big Data (3 Vs)

Variety

Velocity Volume

26
Big Data: 3V’s

Reference: Ruoming Jin http://www.cs.kent.edu/~jin/BigData/index.html


27
Big Data Hard to manage!
Challenges

• The challenges include capture, transformation, storage,


search, sharing, transfer, analysis, and visualization.

• The trend to larger data sets is due to the additional


information derivable from analysis of a single large set
of related data, as compared to separate smaller sets

Reference: Ruoming Jin http://www.cs.kent.edu/~jin/BigData/index.html


29
30
Adoption of Technology

Faster ever adoption

100 million users in less


than 2 months
Machine Learning
• Machine learning is a subset of artificial intelligence (AI) that enables
computers to learn from data and improve their performance on a task
without being explicitly programmed.

• In essence, it is the process of teaching a computer system to recognize


patterns and make predictions or decisions based on data, without the
need for explicit instructions.

• The core idea behind machine learning is to develop algorithms that


can analyze data, identify patterns, and make decisions or predictions
based on those patterns. These algorithms are trained on large
datasets, where they learn from examples and iteratively improve their
performance over time.
Machine learning algorithms
1. Supervised learning: In supervised learning, the algorithm is trained on a labeled dataset, where
each example is associated with a target output. The algorithm learns to map inputs to outputs by
minimizing the difference between its predictions and the true labels. Common tasks in
supervised learning include classification (e.g., spam detection, image recognition) and regression
(e.g., predicting house prices, stock prices).

2. Unsupervised learning: In unsupervised learning, the algorithm is trained on an unlabeled dataset,


where there are no predefined target outputs. The goal is to discover hidden patterns or structures
within the data, such as clusters or associations. Unsupervised learning algorithms are commonly
used for tasks such as clustering (e.g., customer segmentation) and dimensionality reduction (e.g.,
feature extraction).

3. Reinforcement learning: In reinforcement learning, the algorithm learns by interacting with an


environment and receiving feedback in the form of rewards or penalties. The goal is to learn a policy
that maximizes cumulative rewards over time. Reinforcement learning is often used in scenarios
where an agent must make sequential decisions, such as playing games (e.g., AlphaGo) or
controlling autonomous vehicles.
Machine Learning Application
• Machine learning has a wide range of applications across various domains, including:

• Natural language processing (NLP): Machine learning algorithms are used to analyze and
understand human language, enabling applications such as sentiment analysis, language
translation, and chatbots.

• Computer vision: Machine learning models can analyze and interpret visual data, allowing
computers to recognize objects, detect anomalies, and understand scenes in images or
videos.

• Healthcare: Machine learning is used in medical imaging for tasks such as diagnosing
diseases from medical images (e.g., X-rays, MRIs) and predicting patient outcomes based
on electronic health records.

• Finance: Machine learning algorithms are applied in financial services for tasks such as
fraud detection, credit scoring, and algorithmic trading.
AI and Machine Learning

Big Data
Social Media and Internet of Things
Data growth is exponential
Main contributors to data growth
Increase in volume and velocity and variety
Data Warehouses and the Cloud
In computing, a data warehouse is a central
location and permanent storage area for data
and information held around your business in
different systems.

Data warehouses are used to:

• help companies make better business


decisions, having access to all the
information from all business systems.

• facilitate systems integration,

• sharing data between all business


applications as well as automating
communication between staff, customers,
suppliers and/or other external
organisations.
The purpose of Data Warehouse

• A data warehouse gives you control of your data and your business
processes in one place.

• You can report across many data sets and join this information together.

• For example, you can merge a CRM customer record with its corresponding
finance record and report across all of your customers using both these data sets.

• Other data sources can be uploaded into a data warehouse, including ERP, e-
Commerce shops, public portals, stock management, amongst others. This means
you can automate business processes using data held across all your systems.
What are the advantages of a Data Warehouse?

• Keeping all information in a central and secure location


• Reporting across all business data; helping to make better informed
decisions
• Controlling integration between systems; it can act as the conduit for
passing information between systems
• Monitoring changes to information and then acting upon this e.g. giving
people notifications or updating other systems
• Being GDPR-compliant by knowing all information you hold about your
contacts
• Digitalising business processes, removing human intervention when
completing data-driven tasks.
Cloud

"The cloud" refers to servers that


are accessed over the Internet,
and the software and databases
that run on those servers.

Cloud servers are located in data


centers all over the world.
Cloud Computing

• Cloud computing refers to the delivery of computing services over the


internet, including storage, processing power, and software applications,
without the need for on-site infrastructure or physical hardware.

• Cloud computing allows businesses to access computing resources and


services on-demand, enabling them to scale their operations quickly and
efficiently without the need for significant upfront investment in hardware
or infrastructure.

• E.g. Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP),…
Key components of cloud computing

1.Infrastructure as a Service (IaaS): IaaS provides virtualized computing resources over the
internet, including virtual machines, storage, and networking infrastructure. Businesses
can rent these resources on-demand, allowing them to scale their infrastructure up or
down as needed without the need for physical hardware.

2.Platform as a Service (PaaS): PaaS provides a platform for developing, testing, and
deploying applications over the internet, without the need to manage the underlying
infrastructure. PaaS offerings typically include development tools, databases, and
middleware, allowing developers to focus on building and deploying applications rather
than managing infrastructure.

3.Software as a Service (SaaS): SaaS delivers software applications over the internet on a
subscription basis, allowing users to access and use the software from any device with an
internet connection. Examples of SaaS applications include email, customer relationship
management (CRM), and productivity tools like Microsoft Office 365 and Google
Workspace.
Big Data for business strategy
Big Data for customer interactions
Ten Practical Big Data
Benefits

EXAMPLES OF WHAT YOU CAN ACCOMPLISH WITH BIG DATA

Inspired by an article of the Big Data Insight Group.

Reference: http://datascienceseries.com/stories/ten-practical-big-data-benefits

71
Dialogue with Consumers

• Today’s consumers are a tough nut to crack.


• They look around a lot before they buy, talk to their entire social network
about their purchases, demand to be treated as unique and want to be
sincerely thanked for buying your products.

• Big Data allows you to profile these increasingly vocal and little
‘oppositions / rulers’ in a far-reaching manner so that you can engage in an
almost one-on-one, real-time conversation with them.
• This is not actually a luxury. If you don’t treat them like they want to, they will leave
you in the blink of an eye.

72
Dialogue with Consumers
Example

• When any customer enters a bank, Big Data tools allow the clerk to check
his/her profile in real-time and learn which relevant products or services (s)he
might advise.

• Big Data will also have a key role to play in uniting the digital and physical
shopping spheres: a retailer could suggest an offer on a mobile carrier, on the
basis of a consumer indicating a certain need in the social media.

73
Re-develop your products

• Big Data can also help you understand how others perceive your products so
that you can adapt them, or your marketing, if need be.

• On top of that, Big Data lets you test thousands of different variations of
computer-aided designs in the blink of an eye so that you can check how
minor changes in, for instance, material affect costs, lead times and
performance.
• You can then raise the efficiency of the production process accordingly.

74
Perform risk analysis

• Success not only depends on how you run your company. Social and
economic factors are crucial for your accomplishments as well.

• Predictive analytics, fueled by Big Data allows you to scan and analyze
newspaper reports or social media feeds so that you permanently keep up to
speed on the latest developments in your industry and its environment.

75
Keeping your data safe

• You can map the entire data landscape across your company with Big Data
tools, thus allowing you to analyze the threats that you face internally.

• With real-time Big Data analytics you can, for example, flag up any situation
where 16 digit numbers – potentially credit card data - are stored or emailed
out and investigate accordingly.

76
Create new revenue streams

• The insights that you gain from analyzing your market and its consumers with
Big Data are not just valuable to you. You can sell them….

• One of the more impressive examples comes from Shazam, the song
identification application.
• It helps record labels find out where music sub-cultures are arising by monitoring the
use of its service, including the location data that mobile devices so conveniently
provide.
• The record labels can then find and sign up promising new artists or remarket their
existing ones accordingly.
77
Customize your website in real time

• Big Data analytics allows you to personalize the content or look and feel of
your website in real time to suit each consumer entering your website,
depending on, for instance, their sex, nationality or from where they ended
up on your site.
• The best-known example is probably offering tailored recommendations:
• Amazon’s use of real-time, item-based, collaborative filtering (IBCF) to fuel its
‛Frequently bought together’ and ‛Customers who bought this item also bought’
features
• LinkedIn suggesting ‛People you may know’ or ‛Companies you may want to follow’.
And the approach works: Amazon generates about 20% more revenue via this method.

78
Reducing maintenance costs

• Traditionally, factories estimate that a certain type of equipment is likely to


wear out after so many years.
• Consequently, they replace every piece of that technology within that many
years, even devices that have much more useful life left in them. Big Data
tools do away with such unpractical and costly averages.
• The massive amounts of data that they access and use and their unequalled
speed can spot failing grid devices and predict when they will give out.
• The result: a much more cost-effective replacement strategy for the utility
and less downtime, as faulty devices are tracked a lot faster.

79
Offering tailored healthcare

• Healthcare still using generalized approaches.

• When someone is diagnosed with cancer they usually undergo one


therapy, and if that doesn’t work, the doctors try another, etc.
• But what if a cancer patient could receive medication that is tailored to
his individual genes?
• This would result in a better outcome, less cost, less frustration and less
fear.

80
Offering enterprise-wide insights

• Previously, if business users needed to analyze large amounts of varied data,


they had to ask their IT colleagues for help as they themselves lacked the
technical skills for doing so.
• Often, by the time they received the requested information, it was no
longer useful or even correct.

• With Big Data tools, the technical teams can do the groundwork and then build
repeatability into algorithms for faster searches. In other words, they can
develop systems and install interactive and dynamic visualization tools that
allow business users to analyze, view and benefit from the data.
81
Making our cities smarter

• To help them deal with the consequences of their fast expansion, an


increasing number of smart cities are indeed leveraging Big Data tools for the
benefit of their citizens and the environment.

• The city of Oslo in Norway, for instance, reduced street lighting energy
consumption by 62% with a smart solution. Since the Memphis Police
Department started using predictive software in 2006, it has been able to
reduce serious crime by 30 %.

82
Making our cities smarter

• The city of Portland, Oregon, used technology to optimize the timing of its
traffic signals and was able to eliminate more than 157,000 metric tonnes of
CO2 emissions in just six years – the equivalent of taking 30,000 passenger
vehicles off the roads for an entire year.

• The smart city project of Rivas Vaciamadrid in Spain – Ecopolis – has realized
energy savings of 35% and a 50% reduction in ICT spending through a winning
combination of smart grid and energy management, access control, air quality
monitoring, traffic management, IPTV, etc.

83
Elon Musk’s chilling warning over ‘unfriendly’ AI robots
VIDEO

You might also like