Professional Documents
Culture Documents
Unit_ 3 Bda Notes
Unit_ 3 Bda Notes
● Types of Analytics:
● Descriptive Analytics: Involves summarizing historical data to understand
past trends and patterns.
● Predictive Analytics: Utilizes statistical models and machine learning
algorithms to forecast future trends and outcomes.
● Prescriptive Analytics: Goes beyond predictions to recommend actions
that optimize outcomes based on predictive insights.
What Are the Four Types of Analytics and How Do You Use Them?
Analytics is a broad term covering four different pillars in the modern analytics model:
descriptive, diagnostic, predictive, and prescriptive. Each type of analytics plays a role
in how your business can better understand what your data reveals and how you can
use those insights to drive business objectives. In this blog we will discuss what each
type of analytics provides to a business, when to use it and why, and how they all play a
critical role in your organization’s analytics maturity.
Understanding the what, why, when, where, and how of your data analytics through
data analysis helps to drive better decision making and enables your organization to
meet its business objectives.
Descriptive analytics answer the question, “What happened?”. This type of analytics is
by far the most commonly used by customers, providing reporting and analysis centered
on past events. It helps companies understand things such as:
It’s extremely important to build core competencies first in descriptive analytics before
attempting to advance upward in the data analytics maturity model. Core competencies
include things such as:
Data modeling fundamentals and the adoption of basic star schema best
practices,
Communicating data with the right visualizations, and
Basic dashboard design skills.
It’s likely you’ve adopted some form of descriptive analytics internally, whether that be
static P&L statements, PDF reports, or reporting within an analytics tool. For a true
descriptive analytics program to be implemented, the concepts of repeatability and
automation of tasks must be top of mind. Repeatability in that a data process is
standardized and can be regularly applied with minimal effort (think a weekly sales
report), and automation in that complex tasks (VLOOKUPS, merging of excel
spreadsheets, etc.) are automated—requiring little to no manual intervention. The most
effective means to achieve this is to adopt a modern analytics tool which can help
standardize and automate those processes on the back end and allow for a consistent
reporting framework on the front end for end users.
Diagnostic Analytics
Diagnostic analytics, just like descriptive analytics, uses historical data to answer a
question. But instead of focusing on “the what”, diagnostic analytics addresses the
Diagnostic analytics tends to be more accessible and fit a wider range of use cases
than machine learning/predictive analytics. You might even find that it solves some
business problems you earmarked for predictive analytics use cases.
Being at the diagnostic analytics phase likely means you’ve adopted a modern analytics
tool. Most modern analytics tools contain a variety of search-based, or lightweight
artificial intelligence capabilities. These features allow for detailed insights a layer
deeper (for example: the Key Drivers visualization in Power BI, or Qlik’s search-based
insight functionality). To be clear, these are an effective lightweight means to address
diagnostic analytics use cases but are not a means to a full-scale implementation.
Software vendors like Sisu have built their core business around addressing diagnostic
analytics use cases (what they call “augmented analytics”) and are a great bet.
Diagnostic analytics is an important step in the maturity model that unfortunately tends
to get skipped or obscured. If you cannot infer why your sales decreased 20% in 2020,
then jumping to predictive analytics and trying to answer “what will happen to sales in
2021” is a stretch in advancing upward in the analytics maturity model.
Predictive Analytics
At the outset of any predictive analytics build, three core elements need to be
established:
To start you should collect existing data, organize data in a useful way to allow for data
modeling, cleanse your data and review overall quality, and finally determine your
modeling objective.
While modeling takes up the spotlight in predictive analytics, data prep is a crucial step
that needs to happen first. This is why organizations with a rock-solid foundation in
descriptive and diagnostic analytics are better equipped to handle predictive analytics.
Simply put, the time and effort to prep, transform, and ensure data quality for
retrospective reporting has already taken place. The groundwork should be relatively
well laid to quickly identify and leverage data for the modeling phase. I always
encourage customers with well-defined KPIs and business logic in a specific business
reporting area (think sales reporting for example) to use that as the first predictive
analytics use case. The goal is to derive value quickly, and there is no better place to
start than an area where you know data is well defined and of high quality.
Prescriptive Analytics
Prescriptive analytics is the fourth, and final pillar of modern analytics. Prescriptive
analytics pertains to true guided analytics where your analytics is prescribing or guiding
you toward a specific action to take. It is effectively the merging of descriptive,
diagnostic, and predictive analytics to drive decision making. Existing scenarios or
conditions (think your current fleet of freight trains) and the ramifications of a decision or
occurrence (parts breakdown on the freight trains) are applied to create a guided
decision or action for the user to take (proactively buy more parts for preventative
maintenance).
Er. Prabhneet Singh
BDA Notes(UNIT-3)
Prescriptive analytics requires strong competencies in descriptive, diagnostic, and
predictive analytics which is why it tends to be found in highly specialized industries (oil
and gas, clinical healthcare, finance, and insurance to name a few) where use cases
are well defined. Prescriptive analytics help to address use cases such as:
Prescriptive analytics primary aim is to take the educated guess or assessment out of
data analytics and streamline the decision-making process.
Simply put, there is no starting point in prescriptive analytics without the requisite first
three pillars of modern analytics being established first. If you’re ready for prescriptive
analytics, then quantifying your call to action and the underlying criteria will be the first
requirement. For example: if the use case is to call corrective action for an employee
(i.e. – additional training based on poor performance) then the factors that necessitate
this action must be firmly established and the action itself must be clearly defined.
Moving through the data analytics maturity model shouldn’t be a race. Knowing how
each kind of analytics helps you better understand your data and how to use it move
your business objectives forward is key to realizing the return on investment in data and
analytics.
While descriptive, diagnostic, predictive, and prescriptive analytics form the backbone of
traditional data analysis and business decision-making, the introduction of generative AI
represents a paradigm shift in how we interact with and leverage data. This evolution is
not about replacing traditional analytics but enriching them and optimizing them through
innovative integration.
“Data visualization will become less relevant as people increasingly consume and
interact with data via natural language, blurring the lines between operational and
analytical systems even further.” – Analytics8 CTO, Patrick Vinton, on The Pros and
Cons of Gen AI
1. Align with Business Vision and Goals: Start by integrating generative AI with
your overarching business objectives. It should build upon and complement the
insights provided by traditional analytics methods.
2. Evaluate Current Analytics Infrastructure: Assess how your current analytics
capabilities can be augmented with generative AI. This involves identifying
potential use cases where it can add significant value.
3. Involve Stakeholders in Use Case Definition: Engage with key stakeholders to
define meaningful applications of generative AI. Ensure these applications
enhance decision-making processes and operational efficiency.
4. Plan for Technology and Talent: Consider the technology infrastructure and the
talent required to implement and sustain generative AI initiatives. This planning is
crucial for a successful transition to more advanced, AI-driven analytics.
Selecting the right business intelligence tool is a lengthy, involved process that requires
buy-in from many stakeholders. But the first step is getting a lay of the land and
understand what the bigger players in BI have to offer.
To help you pick the right solution, we’ve rounded up ten business intelligence tools and
explored what types of businesses they’re best suited for.
1. Power BI
Microsoft’s Power BI is a business intelligence tool whose main differentiator is the fact
that it sits within the larger Microsoft ecosystem, integrating with Excel, Azure, Access,
and more. While these integrations are a plus for many, Power BI has proven to be difficult
for some to learn—and it has some unique quirks.
Power BI is great for larger companies full of spreadsheet junkies that are already heavily
invested in Microsoft’s ecosystem.
It’s not the easiest business intelligence tool to set up and maintain, so teams with little
capacity to spare may want to look elsewhere. Once set up, it has a reputation for being
Unique features
Strengths
● If you’re fully fluent in Excel, you won’t have to work too hard to understand Power
BI. As Alainia Conrad of SelectHub says, “Users with experience in [Excel] will be
able to adapt to [Power BI].”
● Users of Power BI tend to praise its ability to handle the flow of data as well as its
modeling capabilities. In their comparison between Power BI and Tableau, user
Grovbolle on Reddit says, “Power BI is very strong on the data modeling and
infrastructure, a bit less on visualization.”
Weaknesses
● Power BI has a steep learning curve, and it requires expert knowledge to set up.
“You need to work with an IT person to interface Power BI with existing systems,”
says Shreshthi Mehta in her TrustRadius review. On G2, Xinito L. says, “This is not
an application for amateurs.”
● It also has some quirks that’ll cost you time, as the good people of Reddit are quick
to point out in r/PowerBI. Here is the top-rated post of all time in that community:
Chartio (Hey there! 👋) is a business intelligence tool focused on making data accessible
to anyone. What sets it apart from the pack is how affordable it is and how easy it is to set
up and use, even for people with no coding experience.
Chartio is for any company that needs an affordable yet powerful business intelligence tool
that anyone can use.
Unique features
● Visual SQL, a proprietary language that allows anyone to query data without knowing
how to code in SQL.
● The visual form of querying allows for an intuitive drag-and-drop user interface used
to build queries.
● Dynamic dashboards that can automatically update based on your queries.
● In-dashboard commenting for collaboration and presentations.
● Top-level security and HIPPA compliance for the most sensitive data.
Strengths
● There’s no need for SQL knowledge to dive into data, thanks to Visual SQL. The end
business user with no coding knowledge can get the answers they need without
going through the development, data, or IT teams.
● Chartio has a reputation of having** simple, fast deployment**. While he was SVP of
engineering at Chartio, Arjun Anand led the charge for setting up Chartio. About his
experience, he says, “With Chartio, it only took a day to get everything going, and
then a week to get the initial dashboards to show what we could accomplish.”
● Because it’s so easy for end business users to set up and use, Chartio frees up data
and development teams to do the work they should be doing. In his G2 review, Dan D.
says, “Chartio’s SQL GUI (graphical user interface) enables end users to help
themselves, freeing up huge amounts of resources that can be redirected onto other
projects.”
Weaknesses
3. Looker
Looker is a powerful tool for modeling data using its proprietary language, LookerML, that
has limited visualization capabilities and data inputs. Google Cloud Platform acquired it in
2019, moving it into the Google walled garden, which means it may get combined with
Google Data Studio in the near future, as explained in-depth here.
Looker is built for data teams that need powerful modeling capabilities above all else.
These teams also need the patience to learn Looker’s propriety language, LookerML. That
said, Looker has a robust library of analytics code called Looker Blocks, which can speed
up repetitive workflows once up and running.
Unique features
● Looker Blocks®, pre-built analytics code that provides a jumping-off point for your
own data modeling.
● LookML, Looker’s proprietary data modeling language.
Strengths
Weaknesses
● What Looker gains in modeling capabilities, it loses in its ability to manipulate and
visualize data. On Quora, Bill Ulammandakh says, “Expect to be able to do maybe 1%
of what you can do in Excel in terms of data manipulation.”
● Also, despite its strengths in modeling, some find it time-consuming to prepare data
for Looker. “You do need a preparation software before you use it, which means
you’re not able to cleanse and prepare your data before connecting to a data source,”
Google Data Studio is a data visualization tool from Google with easy integrations to the
entire Google ecosystem, from Google Analytics to Google Sheets to BigQuery. The amount
of integration and the fact that it’s free make it easier for more people to get into, but its
visualizations and formatting are often lacking.
Also, Google’s acquisition of Looker makes some people wonder about the future of Google
Data Studio.
Google Data Studio is good for people who have bought in to the Google ecosystem and
want to visualize data quickly.
Google Data Studio has perhaps the lowest bar of entry for the business intelligence tools
listed here. But it lacks deeper data functionality, relying on other Google services like
BigQuery to fill in the gaps.
Unique features
Strengths
Weaknesses
● Google Data Studio is part of the Google walled garden, so third-party integrations
will always be an issue because Google wants you to only use Google products with
Google Data Studio.
● Also, even with its low bar of entry, Google Data Studio can make it difficult to
format reports. Matthew O. on G2 says, “I’d like to be able to use a highlight tool to
mark important KPIs in my tables, but I’m not able to do that. Or… when I want to
have bold text in certain places, there’s no way to do it.”
Tableau Desktop is the standalone data visualization tool from Tableau, a legacy giant
among other business intelligence tools. Tableau was one of the first BI tools to lower the
bar for entry into data visualization but still remains out of reach for the average business
user due to its older feature set designed for large, expert data teams.
Data scientists and analysts who need the power to create custom, dynamic charts and
complex visualizations.
Tableau’s older feature set makes it robust, but not very agile. It’s often used as a
base-level data tool that only a few people in the company know how to use well.
Unique features
Strengths
● Tableau has a vast user base that provides a lot of community support.
● It’s very flexible in how you manipulate and use data, making it a powerful data
visualization tool. One Capterra reviewer says, “The quality and variety of graphics
that can be created with Tableau is vast, and that’s the best part of it.”
Weaknesses
● Tableau is very difficult to pick up for most business users, which leads to situations
as described by user adventuringraw on Reddit: “The Tableau guy in my squad is in
Big Data and Machine Learning have become the reason behind the success of various
industries. Both these technologies are becoming popular day by day among all data
scientists and professionals. Big data is a term that is used to describe large,
hard-to-manage, structured, and unstructured voluminous data. Whereas,
Machine learning is a subfield of Artificial Intelligence that enables machines to
automatically learn and improve from experience/past data.
Before going in deep with these two most popular technologies, i.e., Big Data and
Machine Learning, we will discuss a quick introduction to big data and machine
learning. Further, we will discuss the relationship between big data and machine
learning. So, let's start with the introduction to Big data and Machine Learning.
Big Data is defined as large or voluminous data that is difficult to store and also
cannot be handled manually with traditional database systems. It is a collection of
structured as well as unstructured data.
Big data is a very vast field for anyone who is looking to make a career in the IT
industry.
Big data has tremendous growth and collection of structured as well as unstructured
data. Almost all companies are using this technology for running their business and to
store, process, and extract value from a bulk amount of data. Hence, it is becoming a
challenge for them to use the collected data in the most efficient way. There are a few
challenges while using Big data are, which are as follows:
○ Capturing
○ Curating
○ Storing
○ Searching
○ Sharing
○ Transferring
○ Analyzing
○ Visualization
Big data is defined by 5V's, which refers to the volume, Variety, value, velocity, and
veracity. Let's discuss each term individually.
Data can be structured as well as unstructured and comes from various sources. It can
be audio, video, text, emails, transactions, and many more. Due to various formats of
data, storing, managing, and organizing the data becomes a big challenge for
organizations. Although storing raw data is not difficult but converting unstructured data
into a structured format and making them accessible for business uses is practically
complex for IT expertise.
Rendering and data sorting is very necessary to control data flows. Further, the
superiority of processing data with high accuracy and speed is also necessary for
○ Veracity (Accuracy)
In general, Veracity refers to the accuracy of data sets. But when it comes to Big data, it
is not only limited to the accuracy of big data but also tells us how trustworthy is the
data source. Further, it also determines the reliability of data and how meaningful it is for
analysis. In one line, we can say Veracity is defined as the quality and consistency of
data.
Value in Big Data refers to the meaningful or usefulness of stored data for your
business. In big data, data is stored in structured as well as an unstructured format, but
regardless of its volume, usually, it is not meaningful. Hence, we need to convert it into
a useful format for the business requirements of organizations. For e.g., data having
missing or corrupt values, missing key structured elements, etc., are not useful for
companies to provide better customer service, create marketing campaigns, etc. Hence,
it leads to reducing the revenue and profit in their businesses.
Big data can be of various formats of data either in structured as well as unstructured
form, and comes from various different sources. The main sources of big data can be of
the following types:
○ Social Media
Data is collected from various social media platforms such as Facebook, Twitter,
Instagram, Whatsapp, etc. Although data collected from these platforms can be
anything like text, audio, video, etc., the biggest challenge is to store, manage and
organize these data in an efficient way.
There are various online cloud platforms, such as Amazon AWS, Google Cloud, IBM
cloud, etc., that are also used as a source of big data for machine learning.
The Internet of Things (IoT) is a platform that offers cloud facilities, including data
storage and processing through IoT. Recently, cloud-based ML models are getting
popular. It starts with invoking input data from the client end and processing machine
learning algorithms using an artificial neural network (ANN) over cloud servers and then
returning with output to the client again.
Nowadays, every second, thousands of web pages are created and uploaded over the
internet. These web pages can be in the form of text, images, videos, etc. Hence, these
web pages are also a source of big data.
With the rise of big data, the use of machine learning has also increased in all
industries. Below is the table to show the differences between machine learning and big
data as follows:
It helps to analyze input datasets with the It helps in analyzing, storing, managing,
use of various algorithms. and organizing a huge volume of
unstructured data sets.
The scope of machine learning is to The scope of big data is very vast as it will
make automated learning machines with not be just limited to handling voluminous
improved quality of predictive analysis, data; instead, it will be used for optimizing
faster decision making, cognitive the data stored in a structured format for
analysis, more robust, etc. enabling easy analysis.
Big Data and Machine Learning both technologies have their own advantages and
aren't competing for concepts or mutually exclusive. Although both are very crucial
individually, when combined, they provide the opportunity to achieve some incredible
results. When talking about 5V's in big data, machine learning models helps to
deal with them and predict accurate results. Similarly, while developing machine
There is no secret that almost all organizations, such as Google, Amazon, IBM, Netflix,
etc., have already discovered the power of big data analytics enhanced by machine
learning.
Machine Learning is a very crucial technology, and with big data, it has become more
powerful for data collection, data analysis, and data integration. All big organizations
use machine learning algorithms for running their business properly.
We can apply machine learning algorithms to every element of Big data operation,
including:
In machine learning algorithms, we need multiple varieties of data for training a machine
and predicting accurate results. However, sometimes it becomes difficult to manage
these bulkified data. So, it becomes a challenge to manage and analyze Big Data.
Further, this unstructured data is useless until it is well interpreted. Thus, to use
information, there is a need for talent, algorithms, and computing infrastructure.
Machine Learning enables machines or systems to learn from past experience and use
data received from big data, and predict accurate results. Hence, this leads to
generating improved quality business operations and building better customer
relationship management. Big Data helps machine learning by providing a variety of
data so machines can learn more or multiple samples or training data.
In such ways, businesses can accomplish their dreams and get the benefit of big data
using ML algorithms. However, for using the combination of ML and big data,
companies need skilled data scientists.
1. Data Segmentation:
2. Data Analytics:
3. Simulation:
By integrating machine learning at each stage of Big Data operations, organizations can
extract valuable insights, identify patterns, and make data-driven decisions. These
insights are then packaged into actionable formats for stakeholders to understand and
act upon, contributing to a more agile and data-driven business environment.
Case Studies