Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Introduction to Advanced Analytics

● Definition: Advanced analytics refers to the use of sophisticated mathematical


and statistical techniques to extract insights, make predictions, and guide
decision-making from large and complex datasets.

● Types of Analytics:
● Descriptive Analytics: Involves summarizing historical data to understand
past trends and patterns.
● Predictive Analytics: Utilizes statistical models and machine learning
algorithms to forecast future trends and outcomes.
● Prescriptive Analytics: Goes beyond predictions to recommend actions
that optimize outcomes based on predictive insights.

● Data Visualization Techniques:


● Tools: Tableau and Power BI are popular tools used for data visualization.
● Purpose: These tools help in creating interactive and visually engaging
representations of data, making it easier for stakeholders to comprehend
and derive insights.

What Are the Four Types of Analytics and How Do You Use Them?

Analytics is a broad term covering four different pillars in the modern analytics model:
descriptive, diagnostic, predictive, and prescriptive. Each type of analytics plays a role
in how your business can better understand what your data reveals and how you can
use those insights to drive business objectives. In this blog we will discuss what each
type of analytics provides to a business, when to use it and why, and how they all play a
critical role in your organization’s analytics maturity.

As organizations collect more data, understanding how to utilize it becomes paramount,


driving the need for nuanced data analysis and interpretation. Data without analytics
doesn’t make much sense, but analytics is a broad term that can mean a lot of different
things depending on where you sit on the data analytics maturity model.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Modern analytics tend to fall in four distinct categories: descriptive, diagnostic,
predictive, and prescriptive. How do you know which kind of analytics you should use,
when you should use it, and why?

Understanding the what, why, when, where, and how of your data analytics through
data analysis helps to drive better decision making and enables your organization to
meet its business objectives.

Four Types of Analytics

The four types of analytics maturity — descriptive, diagnostic, predictive, and


prescriptive analytics — each answer a key question about your data’s journey.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Descriptive Analytics

What is Descriptive Analytics?

Descriptive analytics answer the question, “What happened?”. This type of analytics is
by far the most commonly used by customers, providing reporting and analysis centered
on past events. It helps companies understand things such as:

​ How much did we sell as a company?


​ What was our overall productivity?
​ How many customers churned in the last quarter?

Descriptive analytics is used to understand the overall performance at an aggregate


level and is by far the easiest place for a company to start as data tends to be readily
available to build reports and applications.

It’s extremely important to build core competencies first in descriptive analytics before
attempting to advance upward in the data analytics maturity model. Core competencies
include things such as:

​ Data modeling fundamentals and the adoption of basic star schema best
practices,
​ Communicating data with the right visualizations, and
​ Basic dashboard design skills.

How Do You Get Started with Descriptive Analytics?

It’s likely you’ve adopted some form of descriptive analytics internally, whether that be
static P&L statements, PDF reports, or reporting within an analytics tool. For a true
descriptive analytics program to be implemented, the concepts of repeatability and
automation of tasks must be top of mind. Repeatability in that a data process is
standardized and can be regularly applied with minimal effort (think a weekly sales
report), and automation in that complex tasks (VLOOKUPS, merging of excel
spreadsheets, etc.) are automated—requiring little to no manual intervention. The most
effective means to achieve this is to adopt a modern analytics tool which can help
standardize and automate those processes on the back end and allow for a consistent
reporting framework on the front end for end users.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Despite only being the first pillar of analytics, descriptive analytics also tend to be where
most organizations stop in the analytics maturity model. While extremely useful in
framing historical indicators and trends, descriptive analytics tend to lack a tangible call
to action or inference on why something occurred which leads us to the next pillar of
analytics: diagnostic analytics.

Slightly low Slightly


Low value + high value + Medium value high value + High value +
risk slightly high + medium risk slightly low low risk
risk risk

CHAOTIC REACTIVE DEFINED MANAGED OPTIMIZED

No formal analytic No formal No formal Integrated Integrate


structure integrated integrated enterprise enterprise
enterprise enterprise analytics analytics
reporting reporting (including
advanced
analytics)

No analytics Analytics Analytics Analytics Strong


management/occ management is management is management analytics
urs in silos departmentaliz departmentalize and management
ed d processes and
exist processes

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Employees' Employees' Employee KPIs focus KPIs focus on
analytic analytic analytics on future predictions:
capabilities vary competencies competencies outcomes: provide strong
vary vary measures confidence
leading levels for
indicators taking
that provide business
prescriptive actions
changes
(e.g. how
does
changing x,
y, and z
impact
outcomes?)

KPIs undefined: KPIs focus on KPIs focus on Leveraging Data science


based on ad hoc the past: asking present internal and is
& chaotic metrics what happened metrics/real-tim external data operationalize
last e: asking what d
week/month/ye is happening
ar today and why

Low or mixed Defined key Machine


confidence in metrics shown learning and
reports within AI analysis
dashboards are used to
and scorecards project KPIs

The analytics maturity model—which has five levels—demonstrates where an


organization is in its ability to make data-driven decisions, as well as also act on them.

Diagnostic Analytics

What is Diagnostic Analytics?

Diagnostic analytics, just like descriptive analytics, uses historical data to answer a
question. But instead of focusing on “the what”, diagnostic analytics addresses the

Er. Prabhneet Singh


BDA Notes(UNIT-3)
critical question of why an occurrence or anomaly occurred within your data. Diagnostic
analytics also happen to be the most overlooked and skipped step within the analytics
maturity model. Anecdotally, I see most customers attempting to go from “what
happened” to “what will happen” without ever taking the time to address the “why did it
happen” step. This type of analytics helps companies answer questions such as:

​ Why did our company sales decrease in the previous quarter?


​ Why are we seeing an increase in customer churn?
​ Why are a specific basket of products vastly outperforming their prior year sales
figures?

Diagnostic analytics tends to be more accessible and fit a wider range of use cases
than machine learning/predictive analytics. You might even find that it solves some
business problems you earmarked for predictive analytics use cases.

How Do You Get Started with Diagnostic Analytics?

Being at the diagnostic analytics phase likely means you’ve adopted a modern analytics
tool. Most modern analytics tools contain a variety of search-based, or lightweight
artificial intelligence capabilities. These features allow for detailed insights a layer
deeper (for example: the Key Drivers visualization in Power BI, or Qlik’s search-based
insight functionality). To be clear, these are an effective lightweight means to address
diagnostic analytics use cases but are not a means to a full-scale implementation.
Software vendors like Sisu have built their core business around addressing diagnostic
analytics use cases (what they call “augmented analytics”) and are a great bet.

Diagnostic analytics is an important step in the maturity model that unfortunately tends
to get skipped or obscured. If you cannot infer why your sales decreased 20% in 2020,
then jumping to predictive analytics and trying to answer “what will happen to sales in
2021” is a stretch in advancing upward in the analytics maturity model.

Predictive Analytics

What is Predictive Analytics?

Predictive analytics is a form of advanced analytics that determines what is likely to


happen based on historical data using machine learning. Historical data that comprises
the bulk of descriptive and diagnostic analytics is used as the basis of building
predictive analytics models. Predictive analytics helps companies address use cases
such as:

​ Predicting maintenance issues and part breakdown in machines.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
​ Determining credit risk and identifying potential fraud.
​ Predict and avoid customer churn by identifying signs of customer
dissatisfaction.

How Do You Get Started with Predictive Analytics?

At the outset of any predictive analytics build, three core elements need to be
established:

​ Identify a problem to solve,


​ Define what is you want to predict, and
​ State what you will achieve by doing so.

To start you should collect existing data, organize data in a useful way to allow for data
modeling, cleanse your data and review overall quality, and finally determine your
modeling objective.

While modeling takes up the spotlight in predictive analytics, data prep is a crucial step
that needs to happen first. This is why organizations with a rock-solid foundation in
descriptive and diagnostic analytics are better equipped to handle predictive analytics.
Simply put, the time and effort to prep, transform, and ensure data quality for
retrospective reporting has already taken place. The groundwork should be relatively
well laid to quickly identify and leverage data for the modeling phase. I always
encourage customers with well-defined KPIs and business logic in a specific business
reporting area (think sales reporting for example) to use that as the first predictive
analytics use case. The goal is to derive value quickly, and there is no better place to
start than an area where you know data is well defined and of high quality.

Predictive analytics is the opening to the next step—prescriptive analytics.

Prescriptive Analytics

What is Prescriptive Analytics?

Prescriptive analytics is the fourth, and final pillar of modern analytics. Prescriptive
analytics pertains to true guided analytics where your analytics is prescribing or guiding
you toward a specific action to take. It is effectively the merging of descriptive,
diagnostic, and predictive analytics to drive decision making. Existing scenarios or
conditions (think your current fleet of freight trains) and the ramifications of a decision or
occurrence (parts breakdown on the freight trains) are applied to create a guided
decision or action for the user to take (proactively buy more parts for preventative
maintenance).
Er. Prabhneet Singh
BDA Notes(UNIT-3)
Prescriptive analytics requires strong competencies in descriptive, diagnostic, and
predictive analytics which is why it tends to be found in highly specialized industries (oil
and gas, clinical healthcare, finance, and insurance to name a few) where use cases
are well defined. Prescriptive analytics help to address use cases such as:

​ Automatic adjustment of product pricing based on anticipated customer


demand and external factors.
​ Flagging select employees for additional training based on incident reports in
the field.

Prescriptive analytics primary aim is to take the educated guess or assessment out of
data analytics and streamline the decision-making process.

How Do You Get Started with Prescriptive Analytics?

Prescriptive analytics is commonly considered the merging of descriptive, diagnostic,


and predictive analytics. Getting started isn’t so much a step-by-step list but rather the
time and effort up front to build your competencies within the analytics maturity curve.

Simply put, there is no starting point in prescriptive analytics without the requisite first
three pillars of modern analytics being established first. If you’re ready for prescriptive
analytics, then quantifying your call to action and the underlying criteria will be the first
requirement. For example: if the use case is to call corrective action for an employee
(i.e. – additional training based on poor performance) then the factors that necessitate
this action must be firmly established and the action itself must be clearly defined.

Moving through the data analytics maturity model shouldn’t be a race. Knowing how
each kind of analytics helps you better understand your data and how to use it move
your business objectives forward is key to realizing the return on investment in data and
analytics.

Enhancing Analytics with Generative AI

While descriptive, diagnostic, predictive, and prescriptive analytics form the backbone of
traditional data analysis and business decision-making, the introduction of generative AI
represents a paradigm shift in how we interact with and leverage data. This evolution is
not about replacing traditional analytics but enriching them and optimizing them through
innovative integration.

What is Generative AI?

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Generative AI – a branch of artificial intelligence — employs machine learning models
to create novel content or dataRather than simply analyzing existing information, it
generates original, realistic outputs that enhance business problem solving and
decision-making capabilities. It enables greater automation, more personalized
customer experiences, and a deeper level of creativity in extracting insights from data.

“Data visualization will become less relevant as people increasingly consume and
interact with data via natural language, blurring the lines between operational and
analytical systems even further.” – Analytics8 CTO, Patrick Vinton, on The Pros and
Cons of Gen AI

How Do You Get Started with Generative AI?

Implementing generative AI in your organization involves understanding its unique


capabilities and how they complement rather than replace your existing analytics
framework. Leverage generative AI to get more insights from your descriptive,
diagnostic, predictive, and prescriptive analytics, enhance overall decision-making, and
drive innovation.

Here are some steps to get started:

1. Align with Business Vision and Goals: Start by integrating generative AI with
your overarching business objectives. It should build upon and complement the
insights provided by traditional analytics methods.
2. Evaluate Current Analytics Infrastructure: Assess how your current analytics
capabilities can be augmented with generative AI. This involves identifying
potential use cases where it can add significant value.
3. Involve Stakeholders in Use Case Definition: Engage with key stakeholders to
define meaningful applications of generative AI. Ensure these applications
enhance decision-making processes and operational efficiency.
4. Plan for Technology and Talent: Consider the technology infrastructure and the
talent required to implement and sustain generative AI initiatives. This planning is
crucial for a successful transition to more advanced, AI-driven analytics.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Business intelligence tools to visualize and
analyze your data
There’s a broad spectrum of business intelligence (BI) tools out there, from highly technical
and powerful platforms to user-friendly and lightweight dashboard builders. Choosing the
right one depends on where your business is today, where you want it to end up, who needs
access to the data, your tech stack, and so on.

Selecting the right business intelligence tool is a lengthy, involved process that requires
buy-in from many stakeholders. But the first step is getting a lay of the land and
understand what the bigger players in BI have to offer.

To help you pick the right solution, we’ve rounded up ten business intelligence tools and
explored what types of businesses they’re best suited for.

1. Power BI

Microsoft’s Power BI is a business intelligence tool whose main differentiator is the fact
that it sits within the larger Microsoft ecosystem, integrating with Excel, Azure, Access,
and more. While these integrations are a plus for many, Power BI has proven to be difficult
for some to learn—and it has some unique quirks.

Who Is Power BI for?

Power BI is great for larger companies full of spreadsheet junkies that are already heavily
invested in Microsoft’s ecosystem.

It’s not the easiest business intelligence tool to set up and maintain, so teams with little
capacity to spare may want to look elsewhere. Once set up, it has a reputation for being

Er. Prabhneet Singh


BDA Notes(UNIT-3)
more malleable than other massive business intelligence tools. One user on Quora used
this comparison: Tableau is like iOS, while Power BI is like Android. The result is good
modeling functionality that, again, requires some expertise to set up and utilize.

Unique features

● Seamless integration with other Microsoft products.


● DAX (Data Analysis Expressions), Power BI’s proprietary language for modeling data.
It’s powerful but takes some getting used to.

Strengths

● If you’re fully fluent in Excel, you won’t have to work too hard to understand Power
BI. As Alainia Conrad of SelectHub says, “Users with experience in [Excel] will be
able to adapt to [Power BI].”
● Users of Power BI tend to praise its ability to handle the flow of data as well as its
modeling capabilities. In their comparison between Power BI and Tableau, user
Grovbolle on Reddit says, “Power BI is very strong on the data modeling and
infrastructure, a bit less on visualization.”

Weaknesses

● Power BI has a steep learning curve, and it requires expert knowledge to set up.
“You need to work with an IT person to interface Power BI with existing systems,”
says Shreshthi Mehta in her TrustRadius review. On G2, Xinito L. says, “This is not
an application for amateurs.”
● It also has some quirks that’ll cost you time, as the good people of Reddit are quick
to point out in r/PowerBI. Here is the top-rated post of all time in that community:

Er. Prabhneet Singh


BDA Notes(UNIT-3)
2. Chartio

Chartio (Hey there! 👋) is a business intelligence tool focused on making data accessible
to anyone. What sets it apart from the pack is how affordable it is and how easy it is to set
up and use, even for people with no coding experience.

Who is Chartio for?

Chartio is for any company that needs an affordable yet powerful business intelligence tool
that anyone can use.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
The biggest, most complex companies, like New York Shipping Exchange (NYSHEX), find it
“super easy to deploy” and assert that “the users get immediate value out of it.” At the
same time, the scrappiest of startups, like a student-run content curation startup, can
afford it and use it to improve their product.

Unique features

● Visual SQL, a proprietary language that allows anyone to query data without knowing
how to code in SQL.
● The visual form of querying allows for an intuitive drag-and-drop user interface used
to build queries.
● Dynamic dashboards that can automatically update based on your queries.
● In-dashboard commenting for collaboration and presentations.
● Top-level security and HIPPA compliance for the most sensitive data.

Strengths

● There’s no need for SQL knowledge to dive into data, thanks to Visual SQL. The end
business user with no coding knowledge can get the answers they need without
going through the development, data, or IT teams.
● Chartio has a reputation of having** simple, fast deployment**. While he was SVP of
engineering at Chartio, Arjun Anand led the charge for setting up Chartio. About his
experience, he says, “With Chartio, it only took a day to get everything going, and
then a week to get the initial dashboards to show what we could accomplish.”
● Because it’s so easy for end business users to set up and use, Chartio frees up data
and development teams to do the work they should be doing. In his G2 review, Dan D.
says, “Chartio’s SQL GUI (graphical user interface) enables end users to help
themselves, freeing up huge amounts of resources that can be redirected onto other
projects.”

Weaknesses

Er. Prabhneet Singh


BDA Notes(UNIT-3)
● Chartio prioritizes end usability above most other things. This means that there are
a few features it needs to develop in order to satisfy the most hardcore of data
analysts.
● Jason Harris at Panoply explains it this way: “While [Chartio] may not have all the
functionality that your data analysts are looking for, it’s well suited to business
users.” On G2, one executive in financial services says, “[Chartio] has relatively
fewer features than the other primary tools out there, like Tableau. I think in part,
that’s what enables it to be user-friendly, so it’s a trade-off rather than a downside.”

3. Looker

Looker is a powerful tool for modeling data using its proprietary language, LookerML, that
has limited visualization capabilities and data inputs. Google Cloud Platform acquired it in
2019, moving it into the Google walled garden, which means it may get combined with
Google Data Studio in the near future, as explained in-depth here.

Who is Looker for?

Looker is built for data teams that need powerful modeling capabilities above all else.

These teams also need the patience to learn Looker’s propriety language, LookerML. That
said, Looker has a robust library of analytics code called Looker Blocks, which can speed
up repetitive workflows once up and running.

Unique features

● Looker Blocks®, pre-built analytics code that provides a jumping-off point for your
own data modeling.
● LookML, Looker’s proprietary data modeling language.

Strengths

Er. Prabhneet Singh


BDA Notes(UNIT-3)
● LookML provides a powerful modeling layer that some veterans enjoy. On Reddit,
user rlaxx1 says, “The modeling layer allows you to basically turn SQL into
object-oriented code.”
● Presets like Looker Blocks® that can help your team get off the ground quicker if
they know SQL inside and out.

Weaknesses

● What Looker gains in modeling capabilities, it loses in its ability to manipulate and
visualize data. On Quora, Bill Ulammandakh says, “Expect to be able to do maybe 1%
of what you can do in Excel in terms of data manipulation.”
● Also, despite its strengths in modeling, some find it time-consuming to prepare data
for Looker. “You do need a preparation software before you use it, which means
you’re not able to cleanse and prepare your data before connecting to a data source,”

4. Google Data Studio

Google Data Studio is a data visualization tool from Google with easy integrations to the
entire Google ecosystem, from Google Analytics to Google Sheets to BigQuery. The amount
of integration and the fact that it’s free make it easier for more people to get into, but its
visualizations and formatting are often lacking.

Also, Google’s acquisition of Looker makes some people wonder about the future of Google
Data Studio.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Who is Google Data Studio for?

Google Data Studio is good for people who have bought in to the Google ecosystem and
want to visualize data quickly.

Google Data Studio has perhaps the lowest bar of entry for the business intelligence tools
listed here. But it lacks deeper data functionality, relying on other Google services like
BigQuery to fill in the gaps.

Unique features

● Seamless connection to Google products makes it easy to connect data across


Google’s ecosystem.
● A large library of reporting templates suited for many use cases.

Strengths

● If you have a Google account, you can start right away.


● Because it’s easy to get Google Data Studio up and running, it has a large number of
users, from students to hobbyists to companies alike. This user base provides great
community support.

Weaknesses

● Google Data Studio is part of the Google walled garden, so third-party integrations
will always be an issue because Google wants you to only use Google products with
Google Data Studio.
● Also, even with its low bar of entry, Google Data Studio can make it difficult to
format reports. Matthew O. on G2 says, “I’d like to be able to use a highlight tool to
mark important KPIs in my tables, but I’m not able to do that. Or… when I want to
have bold text in certain places, there’s no way to do it.”

Er. Prabhneet Singh


BDA Notes(UNIT-3)
5. Tableau Desktop

Tableau Desktop is the standalone data visualization tool from Tableau, a legacy giant
among other business intelligence tools. Tableau was one of the first BI tools to lower the
bar for entry into data visualization but still remains out of reach for the average business
user due to its older feature set designed for large, expert data teams.

Who is Tableau Desktop for?

Data scientists and analysts who need the power to create custom, dynamic charts and
complex visualizations.

Tableau’s older feature set makes it robust, but not very agile. It’s often used as a
base-level data tool that only a few people in the company know how to use well.

Unique features

● Tableau allows teams to join data from multiple databases.


● Its depth of features make it useful to data scientists, analysts, and developers alike,
but not the average business user.

Strengths

● Tableau has a vast user base that provides a lot of community support.
● It’s very flexible in how you manipulate and use data, making it a powerful data
visualization tool. One Capterra reviewer says, “The quality and variety of graphics
that can be created with Tableau is vast, and that’s the best part of it.”

Weaknesses

● Tableau is very difficult to pick up for most business users, which leads to situations
as described by user adventuringraw on Reddit: “The Tableau guy in my squad is in

Er. Prabhneet Singh


BDA Notes(UNIT-3)
HIGH demand, there’s multiple teams fighting over him. God help him if he ever
wants to do something other than Tableau, haha.”
● While Tableau has some very good legacy BI features, like visualization, it lacks
some important features other business intelligence tools have innovated that have
become fairly commonplace. Tristan Handy, CEO & founder of Fishtown Analytics,
puts it this way: “Tableau, for all its impressive visualization capabilities, can’t really
deal with production data: its drag-and-drop capabilities just don’t allow users to
express the complicated business logic that is required in real-world BI.”

Machine Learning for Big Data

● Overview of Machine Learning Algorithms:

● Supervised Learning: Involves training models using labeled data to make


predictions or classify new data.
● Unsupervised Learning: Focuses on finding patterns and structures in
unlabeled data through techniques like clustering and association.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
● Reinforcement Learning: A type of learning where an agent learns to
make decisions through trial and error, receiving feedback from its
environment.

● Distributed Machine Learning using Spark MLlib:

● Spark MLlib: A scalable machine learning library built on Apache Spark,


designed to handle large-scale data processing and machine learning
tasks.
● Advantages: Enables parallel processing of data across multiple nodes,
facilitating faster computations and analysis of big data.

● Feature Engineering and Model Evaluation in Big Data Context:

● Feature Engineering: Involves selecting, transforming, and creating new


features from raw data to improve model performance.
● Model Evaluation: Refers to assessing the accuracy, reliability, and
generalization ability of machine learning models using various evaluation
metrics such as accuracy, precision, recall, and F1 score.

What is Big Data and Machine Learning

Big Data and Machine Learning have become the reason behind the success of various
industries. Both these technologies are becoming popular day by day among all data
scientists and professionals. Big data is a term that is used to describe large,
hard-to-manage, structured, and unstructured voluminous data. Whereas,
Machine learning is a subfield of Artificial Intelligence that enables machines to
automatically learn and improve from experience/past data.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Both Machine learning and big data technologies are being used together by most
companies because it becomes difficult for the companies to manage, store, and
process the collected data efficiently; hence in such a case, Machine learning helps
them.

Before going in deep with these two most popular technologies, i.e., Big Data and
Machine Learning, we will discuss a quick introduction to big data and machine
learning. Further, we will discuss the relationship between big data and machine
learning. So, let's start with the introduction to Big data and Machine Learning.

What is Big Data?

Big Data is defined as large or voluminous data that is difficult to store and also
cannot be handled manually with traditional database systems. It is a collection of
structured as well as unstructured data.

Big data is a very vast field for anyone who is looking to make a career in the IT
industry.

Challenges in Big Data

Big data has tremendous growth and collection of structured as well as unstructured
data. Almost all companies are using this technology for running their business and to
store, process, and extract value from a bulk amount of data. Hence, it is becoming a
challenge for them to use the collected data in the most efficient way. There are a few
challenges while using Big data are, which are as follows:

○ Capturing
○ Curating
○ Storing
○ Searching
○ Sharing
○ Transferring
○ Analyzing
○ Visualization

5V's in Big Data

Big data is defined by 5V's, which refers to the volume, Variety, value, velocity, and
veracity. Let's discuss each term individually.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
○ Volume (Huge volume of data)
○ Data is the core of any technology, and the huge volume of data flow in the
system makes it necessary to appoint a dynamic storage system. Nowadays,
data is coming from various sources such as social media sites, e-commerce
platforms, new sites, financial transactions, etc., and it is becoming mandated to
store data in the most efficient manner. Although, with the passing of time,
storage cost is gradually decreasing, thus permitting storage of collected data.
The gravitas that the term big data owns is because of its volume.
○ Variety (Different formats of data from various sources)

Data can be structured as well as unstructured and comes from various sources. It can
be audio, video, text, emails, transactions, and many more. Due to various formats of
data, storing, managing, and organizing the data becomes a big challenge for
organizations. Although storing raw data is not difficult but converting unstructured data
into a structured format and making them accessible for business uses is practically
complex for IT expertise.

○ Velocity (velocity at which data is processed)

Rendering and data sorting is very necessary to control data flows. Further, the
superiority of processing data with high accuracy and speed is also necessary for

Er. Prabhneet Singh


BDA Notes(UNIT-3)
storing, managing, and organizing data in an efficient manner. Smart sensors, smart
metering, and RFID tags make it necessary to deal with huge data influx in almost
real-time. Sorting, assessing, and storing such deluges of data in a timely fashion
become necessary for most organizations.

○ Veracity (Accuracy)

In general, Veracity refers to the accuracy of data sets. But when it comes to Big data, it
is not only limited to the accuracy of big data but also tells us how trustworthy is the
data source. Further, it also determines the reliability of data and how meaningful it is for
analysis. In one line, we can say Veracity is defined as the quality and consistency of
data.

○ Value (Meaningful data)

Value in Big Data refers to the meaningful or usefulness of stored data for your
business. In big data, data is stored in structured as well as an unstructured format, but
regardless of its volume, usually, it is not meaningful. Hence, we need to convert it into
a useful format for the business requirements of organizations. For e.g., data having
missing or corrupt values, missing key structured elements, etc., are not useful for
companies to provide better customer service, create marketing campaigns, etc. Hence,
it leads to reducing the revenue and profit in their businesses.

Sources of data in Big Data

Big data can be of various formats of data either in structured as well as unstructured
form, and comes from various different sources. The main sources of big data can be of
the following types:

○ Social Media

Data is collected from various social media platforms such as Facebook, Twitter,
Instagram, Whatsapp, etc. Although data collected from these platforms can be
anything like text, audio, video, etc., the biggest challenge is to store, manage and
organize these data in an efficient way.

○ Online cloud platforms:

There are various online cloud platforms, such as Amazon AWS, Google Cloud, IBM
cloud, etc., that are also used as a source of big data for machine learning.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
○ Internet of things:

The Internet of Things (IoT) is a platform that offers cloud facilities, including data
storage and processing through IoT. Recently, cloud-based ML models are getting
popular. It starts with invoking input data from the client end and processing machine
learning algorithms using an artificial neural network (ANN) over cloud servers and then
returning with output to the client again.

○ Online Web pages:

Nowadays, every second, thousands of web pages are created and uploaded over the
internet. These web pages can be in the form of text, images, videos, etc. Hence, these
web pages are also a source of big data.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Difference between Big Data and Machine Learning

With the rise of big data, the use of machine learning has also increased in all
industries. Below is the table to show the differences between machine learning and big
data as follows:

Machine Learning Big Data

Machine Learning is used to predict the Big Data is defined as large or


data for the future based on applied input voluminous data that is difficult to store
and past experience. and also cannot be handled manually with
traditional database systems.

Machine Learning can be categorized Big Data can be categorized as


mainly as supervised learning, structured, unstructured, and
unsupervised learning, semi-supervised semi-structured data.
learning, and reinforcement learning.

It helps to analyze input datasets with the It helps in analyzing, storing, managing,
use of various algorithms. and organizing a huge volume of
unstructured data sets.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
It uses tools such as Numpy, Pandas, It uses tools such as Apache Hadoop,
Scikit Learn, TensorFlow, Keras. MongoDB.

In machine learning, machines or Big data mainly deals in extracting raw


systems learn from training data and are data and looks for a pattern that helps to
used to predict future results using build strong decision-making ability.
various algorithms.

It works with limited dimensional data; It works with high-dimensional data;


hence it is relatively easier to recognize hence it shows complexity in recognizing
features. features.

An ideal machine learning model does It requires human intervention because it


not require human intervention. mainly deals with a huge amount of
high-dimensional data.

It is useful for providing better customer It is also helpful in areas as diverse as


service, product recommendations, stock marketing analysis, medicine &
personal virtual assistance, email spam healthcare, agriculture, gambling,
filtering, automation, speech/text environmental protection, etc.
recognition, etc.

The scope of machine learning is to The scope of big data is very vast as it will
make automated learning machines with not be just limited to handling voluminous
improved quality of predictive analysis, data; instead, it will be used for optimizing
faster decision making, cognitive the data stored in a structured format for
analysis, more robust, etc. enabling easy analysis.

Big data with Machine Learning

Big Data and Machine Learning both technologies have their own advantages and
aren't competing for concepts or mutually exclusive. Although both are very crucial
individually, when combined, they provide the opportunity to achieve some incredible
results. When talking about 5V's in big data, machine learning models helps to
deal with them and predict accurate results. Similarly, while developing machine

Er. Prabhneet Singh


BDA Notes(UNIT-3)
learning models, big data helps to extract high-quality data as well as improved
learning methods by means of providing analytics teams.

There is no secret that almost all organizations, such as Google, Amazon, IBM, Netflix,
etc., have already discovered the power of big data analytics enhanced by machine
learning.

Machine Learning is a very crucial technology, and with big data, it has become more
powerful for data collection, data analysis, and data integration. All big organizations
use machine learning algorithms for running their business properly.

We can apply machine learning algorithms to every element of Big data operation,
including:

○ Data Labeling and Segmentation


○ Data Analytics
○ Scenario Simulation

In machine learning algorithms, we need multiple varieties of data for training a machine
and predicting accurate results. However, sometimes it becomes difficult to manage
these bulkified data. So, it becomes a challenge to manage and analyze Big Data.
Further, this unstructured data is useless until it is well interpreted. Thus, to use
information, there is a need for talent, algorithms, and computing infrastructure.

Machine Learning enables machines or systems to learn from past experience and use
data received from big data, and predict accurate results. Hence, this leads to
generating improved quality business operations and building better customer
relationship management. Big Data helps machine learning by providing a variety of
data so machines can learn more or multiple samples or training data.

In such ways, businesses can accomplish their dreams and get the benefit of big data
using ML algorithms. However, for using the combination of ML and big data,
companies need skilled data scientists.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
How to apply Machine Learning in Big data
Applying Machine Learning in Big Data involves using ML algorithms to enhance data
processing, analysis, and integration. Let's elaborate on each element and provide
examples:

1. Data Segmentation:

● Definition: Data segmentation involves dividing large datasets into smaller,


more manageable segments based on certain criteria or patterns.
● Example: In marketing, machine learning algorithms can segment
customer data based on demographics, behavior, and preferences to
personalize marketing campaigns. For instance, an e-commerce platform
can use clustering algorithms to group customers into segments for
targeted promotions.

2. Data Analytics:

● Definition: Data analytics refers to the process of examining datasets to


uncover insights, trends, and patterns that can drive decision-making.
● Example: Financial institutions use machine learning algorithms for fraud
detection by analyzing transactional data to identify unusual patterns or
anomalies that may indicate fraudulent activities. These algorithms
continuously learn from new data to improve accuracy over time.

3. Simulation:

● Definition: Simulation involves creating models or simulations based on


data to predict outcomes or simulate scenarios.
● Example: Manufacturing companies utilize machine learning-based
simulations to optimize production processes. By analyzing historical data
on equipment performance, machine learning models can predict
maintenance needs, optimize production schedules, and reduce
downtime.

By integrating machine learning at each stage of Big Data operations, organizations can
extract valuable insights, identify patterns, and make data-driven decisions. These
insights are then packaged into actionable formats for stakeholders to understand and
act upon, contributing to a more agile and data-driven business environment.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Conclusion:
The synergy between machine learning and big data heralds a new era of data
analytics, predictive insights, and data-driven decision-making. As organizations
continue to amass vast datasets, the marriage of scalable big data infrastructure with
the intelligence of machine learning algorithms becomes increasingly essential. This
convergence not only enables the extraction of meaningful insights from historical data
but also empowers organizations to predict future trends, optimize processes in real
time, and make decisions grounded in data-driven intelligence. The transformative
impact of this intersection is poised to reshape industries, redefine business strategies,
and unlock unprecedented opportunities for innovation and growth.

Case Studies

● Walmart's Utilization of Big Data:


● Explores how Walmart harnesses big data analytics for demand
forecasting, inventory optimization, and personalized marketing strategies.
● Uber's Big Data Transformation:
● Examines how Uber leverages big data for dynamic pricing, route
optimization, driver-partner matching, and customer experience
enhancement.
● Netflix's Big Data Revolution:
● Analyzes Netflix's use of big data for content recommendation, audience
segmentation, content production decisions, and user experience
enhancement.
● eBay's Big Data Transformation:
● Highlights eBay's adoption of big data analytics for fraud detection, market
trend analysis, customer behavior prediction, and business strategy
optimization.

Example of Walmart and outline a programmatic approach to this:

Step 1: Understanding the Problem


● Identify the specific problem or challenge that Walmart wants to address using
prescriptive analytics. For example, optimizing inventory levels to reduce
stockouts while minimizing holding costs.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
Step 2: Data Collection and Integration
● Collect relevant data from various sources such as sales transactions, inventory
levels, supplier data, weather data, and customer demographics.
● Integrate the data into a centralized data warehouse or data lake for analysis.

Step 3: Data Preprocessing


● Clean the data by handling missing values, removing duplicates, and
standardizing formats.
● Perform data transformation and aggregation as needed for analysis.

Step 4: Analysis and Modeling


● Use advanced analytics techniques such as machine learning algorithms and
optimization models to analyze the data.
● Build predictive models to forecast demand for products based on historical
sales data and external factors like promotions or seasonal trends.
● Develop optimization models to determine the optimal inventory levels and
replenishment strategies considering factors like lead times, supplier constraints,
and cost constraints.

Step 5: Prescriptive Recommendations


● Based on the analysis results, generate prescriptive recommendations for
Walmart. This could include:
● Optimal inventory reorder points and quantities for different products and
locations.
● Supplier selection and negotiation strategies to optimize costs and lead
times.
● Pricing strategies based on demand forecasts and competitor analysis.
● Promotional strategies to maximize sales and minimize excess inventory.

Step 6: Implementation and Monitoring


● Implement the prescriptive recommendations in Walmart's operations and
systems.

Er. Prabhneet Singh


BDA Notes(UNIT-3)
● Continuously monitor key performance indicators (KPIs) such as inventory
turnover, stockout rates, and profitability to assess the effectiveness of the
prescriptive analytics solution.
● Iteratively refine the models and recommendations based on real-time data and
feedback.

Sample Python Code Structure:

# Step 1: Understanding the Problem


# Define the problem statement and objectives

# Step 2: Data Collection and Integration


# Collect and integrate data from various sources

# Step 3: Data Preprocessing


# Clean and preprocess the data

# Step 4: Analysis and Modeling


# Build predictive and optimization models

# Step 5: Prescriptive Recommendations


# Generate prescriptive recommendations based on model outputs

# Step 6: Implementation and Monitoring


# Implement recommendations and monitor KPIs

Er. Prabhneet Singh


BDA Notes(UNIT-3)

You might also like