Professional Documents
Culture Documents
Business Analytics Toolkit Portfolio
Business Analytics Toolkit Portfolio
Md Khaled Hasan
1
Table of Contents
Sl No. Page
1 SAP Lumira 5
1.1 Introduction 5
1.2 Dataset 5
1.3 Implementation of the tool and Findings 5
1.3.1 What is the trend of population increase from 1971 to 2019? 5
1.3.2 Which are the top provinces with the highest number of people in 6
the 65 Years and Older group in 2015-2019?
1.3.3 What is the age group-wise proportion of the population in 2015- 7
2019?
1.3.4 What is the gender-wise population trend? What is the impact of 7
gender on the population aged 65 Years and Older?
1.3.5 Which province has the highest number of populations in 2015- 8
2019? Which gender dominates each of the provinces?
1.4 Conclusion and Personal Reflection 9
2
3.3 Implementation of the tool and Findings 18
3.3.1 Which were the top 3 revenue-earning products, and what year they 18
made the highest revenue?
3.3.2 What was the associated gross margin ratio of those products? In 20
which country is the ratio higher?
3.3.3 Which division had a higher gross margin ratio in that country, and 20
what are the top 5 customers with the highest gross margin in that
division?
3.3.4 What are the total net sales of those 5 customers? 21
5 Tableau 29
5.1 Introduction 29
5.2 Dataset 29
5.3 Implementation of the tool and Findings 30
5.3.1 What are the top products in terms of total quantity sold and 30
average price?
5.3.2 What is the top product among the previously found products? 31
Which area earned the highest revenue from the top products?
5.3.3 Which is the highest revenue earning distribution channel in that 31
area for all the top products? What is the highest amount of revenue
earned from one product in any distribution channel?
3
5.3.4 What are the top CO2 emitting countries from 1991-2011? 33
5.3.5 What is the year-wise CO2 emission trend in the top countries from 33
1991-2011?
5.4 Conclusion and Personal Reflection 34
7 SAP HANA 42
7.1 Introduction 42
7.2 Dataset 42
7.3 Methodology 42
7.3.1 Step-1: Creating the database table 42
7.3.2 Step-2: Data provisioning 43
7.3.3 Step-3: Creating calculation view 43
References 47
4
Chapter 1
SAP Lumira
1.1. Introduction
SAP Lumira is a data visualization and analytics software that Outline
can manipulate, edit, analyze and visualize data, and gain useful SAP Lumira is a visualization
insights to make strategic decisions. It also helps to publish and intelligence tool that helps to
share the developed visualizations and dashboard among the create visual stories from datasets.
team members (SAP, 2021). The dataset is acquired from This tool will be used to access,
different data sources, and it is imported into the SAP Lumira transform, and create some
from different databases, including SAP HANA Live Data, SQL exploratory visualizations from a
or as an Excel spreadsheet and CSV file. The acquired data dataset.
elements exist in three different forms.
- Measures: It is the numerical data that are usually
aggregated and calculated based on sum, average, Main Task
count, min, max, etc. In this exercise, I will use SAP
- Dimensions: All kinds of categorical data fall under this Lumira to manipulate and visualize
type. It can be nominal, ordinal, or interval data. a population dataset using
- Hierarchies: It represents the relationship between different kinds of charts and
correlated entities. One of the common examples is graphs. For the data manipulation,
geographical location - country and province. I will create few calculated fields,
hierarchy, and filters will be
1.2. Dataset applied. The location hierarchy will
A Canadian population dataset will be analyzed using the SAP be created for spatial
Lumira to gain insight into demography. The dataset has 16 representation of the data.
attributes, including year, geographical location (province), age,
sex, population, etc. Some of the attributes will be derived from age using the calculation
function to get the age group-wise number of populations. Some of the questions that
will be answered through the analysis are as follows.
5
14 Years, 15-64 Years, and 65 Years and Older) and population were then used as
the measures, and the ‘Line Chart’ was generated to show the trend.
• Filter: The data was filtered using ‘Age Data Category = Age (Years)’, ‘Geography
= Canada’ and ‘Sex = Both sexes’.
• Findings:
- The trend shows that the total population has been on an increasing trend.
- Both ’15-64 Years’ and ’65 Years and Older’ have been increasing, although
the ‘0-14 Years’ group has not increased.
- Interestingly, after 2015 the number of aged people (65 Years and Older)
has been increasing compared to the ‘0-14 Years’ group.
1.3.2. Which are the top provinces with the highest number of people in the 65 Years and
Older group in 2015-2019?
• Pre-processing: To determine the age group-wise proportion of the population,
three newly derived dimensions (0-14 Years, 15-64 Years, and 65 Years and Older)
were used as the measures, and a ‘Stacked Column Chart’ was created.
• Filter: The data was filtered using ‘Geography = All Provinces and Territories’ and
‘Ref_Date = 2015-2019’.
• Rank: Measure (65 Years and Older), Value (Top 5), Apply on (Geography).
• Findings: Ontario has the highest population in the ‘65 Years and Older’ age
category, which is followed by Quebec, British Columbia, Alberta, and Manitoba.
6
Fig 1.2: Top 5 provinces with People Age 65 Years and Older (2015-2019)
1.3.4. What is the gender-wise population trend? What is the impact of gender on the
population aged 65 Years and Older?
• Pre-processing: The gender-wise population trend from 1990 to 2019 is illustrated
using a bar chart. It is selected as the primary axis of the graph. This trend is then
7
compared with people aged ’65 Years and Older’ using a line chart in the secondary
axis.
• Filter: The data was filtered using ‘Sex = Females and Males’ and ‘Ref_Date =
>1989’.
• Findings: The bar chart in the primary axis shows a continuously increasing trend
of the total population. The number of females is marginally higher than the males.
In the case of seniors, females have always been higher in number. Interestingly,
there is a slight jump in this increasing trend of females in 2001, making the
difference a bit higher between males and females.
Fig 1.4: Gender-wise population trend with gender impact on the elderly group
1.3.5. Which province has the highest number of populations in 2019? Which gender
dominates each of the provinces?
• Pre-processing: The map layout is used to illustrate the geographical locations.
Two layers were created, one of which shows different provinces of Canada, and
another one shows the bubble of the dominant gender according to the
population size. The bigger bubble size indicates the higher population.
• Filter: The data was filtered using ‘Sex = Females and Males’ and ‘Ref_Date = 2019’.
• Findings: According to the bubble size, Ontario is the biggest province in terms of
the population where ‘Male’ is the dominant gender. It is closely followed by ‘Male’
dominant Quebec. ‘Male’ is also the dominant gender across the Maritime
provinces.
8
Fig 1.5: Province-wide population and dominant gender in every province
9
• Application stability: SAP Lumira sometimes keeps crashing with a higher amount
of data components and is not quite stable. So, it is better to save the work
regularly to secure the work done.
• Limited export option: The storyboards can only be exported as PDF, and
sometimes it limits the size of a chart or graph.
In comparison to other visualization software, SAP Lumira is running behind in some
attributes; for instance- it has limitations in exporting reports compared to SAP Analytics
Cloud, Tableau, and PowerBI. Additionally, users cannot put comments or annotation on
the working files, which is advantageous in other tools (Self-Service BI tools, 2020).
Besides, it is good for visualization, but unlike SAP Analytics Cloud, it does not support
predictive analysis or other advanced features. In summary, SAP Lumira is a decent option
for data visualization. However, to get deeper in the field of business analysis, the analysts
may need to switch to other available BI tools.
10
Chapter 2
SAP BEx Query Designer and SAP BusinessObjects Analysis
2.1. Introduction
SAP Business Explorer (BEx) Query Designer is a tool that helps Outline
to create queries and retrieve data based on those queries from SAP BEx Query Designer will be
SAP Business Wearhouse (BW). The data can be analyzed by utilized as a bridge between SAP
defining queries for InfoProviders (SAP Help Portal, 2021). It BW and front-end reporting tool.
helps in data modelling for online analytical processing (OLAP) It will help to define queries and
systems. Data modelling is specifically more helpful in the case extract the right data for analysis.
of data warehousing and creating datacubes.
SAP BusinessObjects Analysis facilitates multidimensional OLAP SAP BusinessObjects Analysis is
analysis. Using the MS Excel edition, the data can be efficiently useful for multi-dimensional
analyzed and visualized using different charts. It also helps to analysis of OLAP sources.
detect any specific trends and outliers.
Main Task
2.2. Dataset
In this exercise, I am using SAP BEx
The dataset is collected from an Infocube in the SAP NetWeaver Query Designer for creating
BW. The source of it is provided during the class tutorials. This queries and then analyze the data
data contains Global Bike Company’s business transaction, in the SAP BusinessObjects
which includes the information on time, location, customer, Analysis for MS Excel.
division, product, cost of goods, net sales, discount, revenue,
etc.
The data from GBI Infocube was explored, and the query was created using the SAP
Business Explorer (BEx) Query Designer. The process started with modelling the data, and
then the key figures and dimensions were explored. The query was created using the
drag-and-drop feature in BEx Query Designer. Primarily in this query, material, sales
organization, and customer were selected as dimensions, and cost of goods, revenue,
discount, net sales, and sales quantity were selected as measures. The designed query was
then used for further analysis through SAP BusinessObjects Analysis (Edition: MS Office
Excel). The analyses helped to answer the following research questions.
11
• Findings:
- Overall, the revenue decreased from 2007 to 2011, and it had the lowest
revenue in 2009.
- After the introduction of E-Bikes in 2010, the revenue gained some positive
momentum.
- Touring Bikes is the highest revenue earning product category, followed by
Offroad Bikes and Road Bikes. Accessories and Trend Bikes did not show a
significant change in their revenue over this time frame.
2.3.2. How the discount impacts the overall revenue? Does this trend vary in different
countries?
• Pre-processing: Two key measures (revenue and discount) and two dimensions
(calendar year and country) were selected. First, a line chart was plotted to
determine the trend of total revenue versus discount, and then another combo
chart was created to compare the same trend in terms of country. In both charts,
revenue and discount were assigned in the primary and secondary axis,
respectively.
• Filter: The data was filtered using ‘Measures = Revenue and Discount’.
• Findings:
- The total revenue versus discount (Fig 2.2) chart illustrates a similar year-
wise trend. With the decrease in discount (from 2007 to 2009), total revenue
also decreased. Then, the revenue again increased in 2010 with the increase
in discount. The launching of E-Bikes (Fig 2.1) added extra revenue in total
business in 2010 and 2011.
12
Fig 2.2: Total revenue vs discount trend (2007-2011)
- The country-wise analysis (Fig 2.3) shows that the United States had higher
discounts every year than Germany in terms of revenue-discount pattern.
2.3.3. What are the top 3 products in each country that got the highest total discount of all
time?
• Pre-processing: Top 3 products with the highest total discount in each country
were filtered based on creating a rule. Then, the advanced calculation option was
utilized to create a new measure called ‘Discount-Revenue Ratio’. It represents
the amount of discount in terms of the revenue earned by each product, and it is
represented as %. This measure is used in the secondary axis by creating a line
chart.
• Filter: The data was filtered using the rule ‘Measures = Discount (Top N = 3)’ for
each country.
13
• Findings:
- Among the top 3 products with the highest discount, Professional Road Bike
(Shimano) and Professional Touring Bike (Silver) are common in both
countries. Another one is Men’s Off Road Bike Fully in Germany and Deluxe
Touring Bike in the United States.
- Professional Touring Bike (Silver) in the United States has the highest
Discount-Revenue Ratio (Fig 2.4) among all the top 3 products in both
countries.
- Overall, the total Discount-Revenue Ratio is higher in the United States than
in Germany, which concurs with the previous finding in Fig 2.3.
Fig 2.4: Country-wise top 3 products with the highest total discount
2.3.4. Which division in each country has got the highest discount? What are the top three
customers in the division (found from the previous question) in terms of discounts for each
calendar year?
• Pre-processing: A cross tab was generated using the country and division in terms
of discount amount. Then, the top three
Discount
customers were filtered by creating a rule. Country Division $
The output was presented using a bar chart US AS 71,562.82
the rule ‘Measures = Discount Fig 2.5: Country-wise highest discount in each division
(Top N = 3)’ for customers in the
BI division.
14
• Findings:
- In both of the countries, the highest discount is allocated to the BI division.
The BI division’s total discount in the US and Germany is 8.46 and 9.13
million (Fig 2.5).
- From 2007 to 2010, the top 3 customers that got the highest discount in the
BI division are – Bavaria Bikes, Silicon Valley Bikes, and Beantown Bikes.
However, in 2011 Rädlelland replaced the position of Silicon Valley Bikes
(Fig 2.6).
Fig 2.6: Top 3 customer with the highest discount in the BI division (2007-2011)
2.3.5. In each division, what are the three bottom-performing products in terms of revenue
and what are their discount amount? Did they get a good discount to perform well?
• Pre-processing: The three bottom-performing products were identified in terms of
revenue by creating a cross tab, and also their associated discount was analyzed.
These measures further helped to identify if a good discount was given on those
products. The revenue-discount ratio was calculated for these products (Fig 2.8)
using the revenue and discount measures.
• Filter: The data was filtered using Revenue Discount
higher, between 3.0 and 3.11 Fig 2.7: Bottom 3 performing products with Revenue-
Discount Ratio
15
were average, and value < 3.0 were marked to lower discounted products.
• Findings:
- Repair Kit, T-shirt, and Water Bottle are the bottom three performing
products in AS division. Simultaneously, City Bike Max, E-Bike Tailwind, and
Fixed Gear Bike Plus performed poorly in the BI division (Fig 2.7).
- The discount ratio for all products is 3.11, and the previously identified
bottom-performing products were evaluated considering it as standard. As
Fig 2.8 shows, City Bike Max and Fixed Gear Bike Plus had a higher discount,
but these products did not perform well. In contrast, E-Bike Tailwind had a
discount less than the overall standard discount. If more discounts were
given, it might have performed better. Products in the AS division had
discounts around the standard level but could not perform well.
Revenue Discount Revenue-Discount Ratio
Division Material $ $
AS Repair Kit 268,634.24 8,321.92 3.10
T-shirt 252,889.85 7,717.49 3.05
Water Bottle 166,629.58 5,117.12 3.07
Result 4,680,229.91 142,933.86 3.05
BI C ity Bike Max 1,175,840.87 40,224.48 3.42
E-Bike Tailwind 8,046,390.46 215,103.70 2.67
Fixed Gear Bike Plus 187,446.68 7,162.41 3.82
Result 565,821,518.67 17,596,520.10 3.11
Overall Result 570,501,748.58 17,739,453.97 3.11
Fig 2.8: Discount status indication of the bottom three performing products
16
- It cannot generate an interactive dashboard, which is one of the weakest
features. This can be easily done in SAP Analytics Cloud, Tableau, or
PowerBI.
- The map feature is not that strong. It only shows the geographical location
(e.g., country), but putting an additional layer (or chart) on that is not
possible.
- It does not have the option to create custom colours on the chart. Only a
few colourful and monochromatic options are available.
- SAP BusinessObjects Analysis is a good option for BI reporting, but for data
discovery and visualization, Tableau or PowerBI could be a better option. So
it would be better to use a combination of tools in business analysis rather
than choosing a particular one.
17
Chapter 3
SAP Predictive Analytics for Visualization
3.1. Introduction
SAP Predictive Analytics is a business intelligence software that Outline
enables the users to process, analyze and visualize larger SAP Predictive Analytics provides
datasets to make effective business decisions. As the name data solution from very basic
indicates, it is also a powerful tool for predictive modelling and stage of data acquisition to build
forecasting future events. Starting from the initial data prediction model based on
manipulation and data modelling, it can help to conduct historical and current data. Due to
statistical analysis and data mining tasks (O'Donnell, 2016). the limited scope of this exercise,
Some of the prominent features are- only the exploratory analysis is
- It is a very intuitive, user-friendly (drag-and-drop), and conducted. However, in future
code-free tool with the optionality to add R script codes. exercises the advanced
- Data from a wide range of sources, including- SAP methodologies, including
HANA Server, SAP BW, SQL Database, and simple prediction modeling will be
spreadsheet, can be easily imported. utilized.
- The interface is very organized and user-friendly.
Different major functionalities are categorized under
Main Task
Prepare, Predict, Visualize, Compose, and Share tabs,
making navigation easier. In this exercise, I am using slicing
and dicing techniques to filter
3.2. Dataset data from a larger dataset and
bring more granularity in the data.
For this analysis, the sales data of the wholesale division of
Some of the techniques is used
Global Bike Company is used, which was provided during the
include- sorting, filtering, ranking,
class tutorial. The dataset contains 171010 tuples having 23
and aggregations.
dimensions, and out of that 10 are used as measures. Some of
the measures that are used in the analysis include- Cost (USD),
Discount (USD), Gross margin (USD), Revenue (USD), and Sales quantity. A few calculated
variables will also be created using these measures.
18
Fig 3.1: Top 3 Revenue earning Products
• Findings:
- The top 3 revenue-earning products were Professional Touring Bike Silver,
Deluxe Touring Bike Silver, and the Road Bike Carbon Shimano (Fig 3.1).
- Revenue for the products were 218.24, 207.05, and 202.39, respectively.
19
- The top 3 products made the highest revenue in 2019, followed by 2017
and 2018, respectively (Fig 3.2).
- Further dicing of the data showed that Professional Touring Bike Silver
earned the highest revenue among the top 3 products in 2019. It has been
consistent in all the top 4 revenue earning years.
3.3.2. What was the associated gross margin ratio of those products? In which country is the
ratio higher?
• Pre-processing: A new measure, ‘Gross Margin Ratio’, is created using three pre-
existing measures ‘GrossMarginUSD’, ‘RevenueUSD’, and ‘DiscountUSD’. A
crosstab is generated using the ‘Gross Margin Ratio’ and ‘ProductDecsr’.
• Filter: The data is filtered using ‘ProductDecsr = Deluxe Touring Bike-Silver,
Professional Touring Bike-Silver, and Road Bike Carbon Shimano’.
• Findings:
- The gross margin ratio of Deluxe Touring Bike-Silver, Professional Touring
Bike-Silver, and Road Bike Carbon Shimano were 40.97%, 41.09%, and
43.65%, respectively. Here an interesting finding is, the gross margin ratio
of Road Bike Carbon Shimano was the highest (43.65%), although its
revenue was the lowest among the top 3 products (Fig 3.4).
- In terms of the country, Germany had a higher gross margin ratio than the
United States. Also, in both countries, Road Bike Carbon Shimano had the
highest gross margin ratio.
Fig 3.4: Product- and Country-wise Gross Margin Ratio (top 3 products)
3.3.3. Which division had a higher gross margin ratio in that country, and what are the top
5 customers with the highest gross margin in that division?
• Pre-processing: The previously created ‘Gross Margin Ratio’ measure is used to
determine the higher value division. A pie chart is created to visualize the
comparison between the two divisions.
• Filter: The data was filtered using ‘Country = DE’, as in the previous analysis, we
found Germany was the country with the higher gross margin ratio.
20
• Findings:
- Compared to the BI (44.47%) division, AS (55.53%) division had a higher
gross margin ratio in Germany. Surprisingly, this trend was similar in both
the countries.
- The top 5 customers in the division were Velodrom, Ostseerad, Fahrpott,
Drahtesel, and Rädlelland (Fig 3.5).
21
Fig 3.6: Net sales of the top 5 customers
• The chart formatting option is very difficult to find in predictive analytics than SAP
Analytics Cloud and Tableau.
• There is no sorting option when the Trellis chart is used.
• SAP Predictive Analytics has a limited colour variation and palette than Tableau.
• Data label formatting is not user-friendly.
• It is not an ideal tool to construct business dashboards.
However, considering other advanced features, including- working with big data, data
mining techniques, and building predictive model capability, it should be a must-to-have
tool to utilize in the business world.
22
Chapter 4
SAP Analytics Cloud
4.1. Introduction
SAP Analytics Cloud is a platform that allows the users to Outline
conduct a business analysis, augmented and predictive This analysis will cover many of
analysis, and planning based on a common cloud environment the functionalities of SAP Analytics
(SAP Analytics Cloud, n.d.). This tool can be used across Cloud.
different devices due to its cloud-based compatibility. One of
the most powerful functionalities in this tool is the ‘Smart Main Task
Discovery’ option. This AI-driven approach helps the analysts
In this exercise, I am going to
and executives to make the appropriate data-driven decision
explore the ERP simulation game
for future business planning. The key capabilities of this tool
data, and also will conduct some
are-
analysis and data visualization. I
- Easy to find out the ‘Key Influencers’ by using the Smart
will follow a particular pathway
Discovery option.
which is thematically presented
- The ‘Simulation’ function provides the opportunity to
below.
find how a particular influencer impacts the outcome
and to do better future planning. Revenue Product Quantity Sold
23
4.3.1. What were the top 5 revenue-earning products? Was there any relationship between
revenue and quantity sold?
• Pre-processing: From the dataset (ERPSIM.xlsx), the ‘Revenue’ is selected as a
measure and the ‘Product’ as the dimension. ‘Product’ is also used in the Color
parameter. The findings are sorted highest to lowest by revenue.
• Rank: The products are ranked by selecting the ‘Top 5’ option in the rank.
• Findings:
- The top 5 revenue-earning products were 500g Nut Muesli, 1kg Original
Muesli, 1kg Nut Muesli, 500g Raisin Muesli, and 500g Blueberry Muesli (Fig
4.1). Revenue for the products were 24.45, 22.86, 22.42, 18.38, and 15.94,
respectively.
- There was a linear relationship between the revenue and the quantity sold
(Fig 4.2). Revenue increased as the quantity increase, which implies that the
average unit price of the top 5 revenue earning products was almost similar.
Further analysis of the average unit price indicated that the highest average
unit price was 4.64 for 500g Blueberry Muesli and the lowest was 4.04 for
1kg Original Muesli.
4.3.2. Which team had the highest market share of the highest selling product? What was
the revenue in different distribution channels for that product sold by that team?
• Pre-processing: The market share is represented using a pie chart. The revenue is
used as the measure, and the colour indicates the team.
• Filter: The data is filtered using ‘Product = 500g Nut Muesli’, as it was the highest-
selling product, and then ‘Team = NN’, as the highest-selling team.
24
Fig 4.2: Relationship between Revenue and Quantity Sold
• Findings:
- Team NN had the highest market share (21.63%) of the 500g Nut Muesli
(Fig 4.3), which was 5.29 million in value.
- Team NN earned the highest revenue from the Convenience Stores (3.83
million) distribution channel and earned 1.46 million from Grocery Chains
(Fig 4.3). However, they did not have any sales in Hypermarkets.
4.3.3. What were the revenues of different areas in each distribution channel? Which area
had the highest revenue?
• Pre-processing: In this step, further dicing of the previously found results is done.
Here the revenues are analyzed in each area for the previously found distribution
channels. A Marimekko chart is created using the ‘Area’ in colour, ‘Distribution
Channel’ in the dimensions, and ‘Revenue’ in the measures.
25
• Filter: The data is filtered using ‘Product = 500g Nut Muesli’, as it was the highest-
selling product, and then ‘Team = NN’, as the highest-selling team.
• Findings:
- The revenue of North, South, and West area in the Convenience Stores were
1.73, 1.05, and 1.05, respectively. These were 0.37, 0.48, and 0.60,
respectively, in the Grocery Chains distribution channel. Interestingly, the
highest revenue earning area was North in the Convenience Stores, whereas
it was the lowest-earning area in the Grocery Chains (1.73 vs 0.37 million).
- The North area in the Convenience Stores distribution channel had the
highest (1.73 million) revenue (Fig 4.4). It implies that convenience stores in
the north area were best to sell 500g Nut Muesli (average unit price 4.17),
so it could also be a good place to sell other lower-priced products.
Fig 4.4: Revenue per Area and Distribution Channel (highest revenue product & team)
4.3.4. In which round that area (from the previous question) had the highest revenue? What
was the highest revenue earning day for a particular product in that round? What were the
top 2 key influencers for that sale?
• Pre-processing: Here, the objective is to determine which was the best performing
round for the North area. The round and revenue are used in the dimension and
measures field, respectively. A bar chart is created, and a colour palette is used for
better visualization.
In the second part, the highest revenue earning day is found for that particular
round. A heatmap is generated to highlight both the day and the associated
product that earned the highest revenue.
• Filter: Previously used filters remain the same, and a new filter, ‘Area = North’, is
added.
26
Fig 4.5: Round-wise Revenue Trend (highest revenue area, distribution channel & team)
• Findings:
- Interestingly, the North area earned the highest revenue in rounds 7 and 8
(Fig 4.5). In both these rounds, the revenue was 0.92 million.
- Also, in these rounds, the highest combined revenue earned by a particular
product was day 29 (Fig 4.6). Cumulatively, on day 29 of rounds 7 and 8,
500g Nut Muesli earned 93 thousand.
Fig 4.6: Highest Revenue earning Day for any Products (during top-earning Rounds)
- The top 3 key influencers are Quantity, Price, and Distribution Channel (Fig
4.7). The quantity has a strongly positive impact on the revenue, whereas,
Price and Distribution Channel has a weakly positive impact. The revenue
can further increase by changing the distribution channel from Convenience
Stores to Grocery Chains.
27
Fig 4.7: Top 3 key influencers on the Revenue of 500g Nut Muesli
Every business user needs to adapt multiple tools to attain a complete business analysis
task. SAP Analytics Cloud is a complete package that enables analysts to perform basic
exploration, prediction, and presentation of data in a single platform (Anto, 2020). Also,
as the entire tool works based on the cloud, there is no need to worry about the local
computer’s working capacity. Considering its powerful business analysis capability, it
should be a must-have tool for organizations.
28
Chapter 5
Tableau
5.1. Introduction
Tableau is one of the leading data visualization tools in the Outline
business intelligence field. Due to its easy-to-understand, This analysis will mainly cover data
deployable, and highly scalable features, it has been recognized exploration and visualization. No
as the best tool for many years by Gartner Magic Quadrant data manipulation or predictive
(Ajenstat, 2021). This tool can be deployed in the cloud or on- modeling will be conducted.
premise and can be used with data from different sources,
including- file-based, web data, cloud-based, relational Main Task
database, and OLAP data sources. In addition to faster analysis
and visualization, Tableau can help in data collaboration, real- In this exercise, the ERP simulation
time data analysis, analyzing big data volume, and add-on game data is re-used for further
scripting with other statistical languages (e.g., R or Python). in-depth analysis and visualization.
Below are some features that can justify why Tableau should be The pathway is changed from the
the tool of choice. previous one (page 9) and is
- Quick and interactive visualization: The simple drag and illustrated below.
drop functionality and toggle between rows and Price
5.2. Dataset
In the previous analysis with SAP Analytics Cloud, I used the ERPSIM dataset and
conducted some analytical tasks following a particular pathway (page 9). I felt that more
analysis could be done to gain more insights and generate business acumen by analyzing
that dataset from a different perspective. Hence, I decided to use the ERPSIM dataset
again in this chapter, but in a different pathway (illustrated above).
29
In the last part of the exercise, another dataset from the World Bank on CO2 emission is
used. This dataset contains four dimensions- country name, region, year, and CO2
emission per capita, and it has a total of 11127 tuples.
• Filter: Top 4 products found in the first analysis then filtered to see how the average
price and quantity sold are changed.
• Findings:
- The top 4 products with the highest quantity sold were 500g Nut Muesli,
1kg Original Muesli, 1kg Nut Muesli, and 500g Raisin Muesli (Fig 5.1), and
their average prices were 4.11, 4.04, 4.22, and 4.19, respectively.
- The maximum, minimum, and average price of all 12 products were 5.08
(1kg Strawberry Muesli), 3.91 (500g Original Muesli), and 4.47. The average
quantity sold of all the products was 3.75 million (Fig 5.1).
- However, for the top 4 products, the average quantity sold increased to 5.66
million and the average price dropped down to 4.14. The analysis further
revealed, when the product price was less than 4.22, their sold quantity was
much higher. The difference in average price between 500g Nut Muesli
(4.11) and 1kg Nut Muesli (4.22) was only 0.11 (Fig 5.2).
30
Fig 5.2: Product-wise Quantity & Avg. Price Product-wise Quantity & Avg. Price (Top 4)
5.3.2. What is the top product among the previously found products? Which area earned the
highest revenue from the top products?
• Pre-processing: ‘Area’ is used as the dimension, and ‘Revenue’ and ‘Quantity’ are
used as measures. A stack column chart is created to illustrate the revenue earned
and quantity sold (in %) for the top 4 products. It is further mapped as per the
‘Area’. The findings are sorted based on the revenue (descending).
• Filter: The data is filtered using ‘Product = 1kg Nut Muesli, 1kg Original Muesli,
500g Nut Muesli, and 500g Raisin Muesli’.
• Findings:
- 500g Nut Muesli is the top-selling product in terms of revenue (27.75%) and
quantity (27.46%).
- South area earned the highest total revenue from the top 4 products.
Interestingly, 1kg Original Muesli and 1kg Nut Muesli earned more revenue
than 500g Nut Muesli in this area. In contrast, 500g Nut Muesli earned the
most revenue in the North area (Fig 5.3).
5.3.3. Which is the highest revenue earning distribution channel in that area for all the top
products? What is the highest amount of revenue earned from one product in any
distribution channel?
• Pre-processing: Finding from the previous analysis is further diced in this step. The
highest revenue earning distribution channel (in terms of revenue %) is analyzed
and presented using a pie chart.
• Filter: ‘Product = 1kg Nut Muesli, 1kg Original Muesli, 500g Nut Muesli, and 500g
Raisin Muesli’ and ‘Area = South’.
31
Fig 5.3: Area-wise Revenue & Quantity (%) of Top 4 Products
• Findings:
- The highest revenue earning distribution channel was Grocery Chains
(36.89%), followed by Hypermarkets (36.14%) and Convenience Stores
(26.97%).
Fig 5.4: Revenue Market Share (%) of Distribution Channels of Top 4 Products in South area
- In round 7, 1kg Original Muesli earned the highest revenue from the
Hypermarkets distribution channel. Hypermarkets did not sell any 500g
category products, whereas Convenience Stores did not sell 1kg category
products (Fig 5.5).
32
Fig 5.5: Distribution Channel & Round-wise Revenue of Top 4 Products
5.3.4. What are the top CO2 emitting countries from 1991-2011?
• Pre-processing: The average CO2 per capita is considered as the measure to find
the highest CO2 emitting countries. A geographic map is created using Longitude
and Latitude.
• Filter: Two filters are incorporated – ‘Year’ = 1991-2011, and ‘Country Name’ is
filtered by the field where Avg. CO2 per capita >= 18.
Fig 5.6: Geographic representation of top CO2 (per capita) emitting countries
• Findings:
- The top CO2 (per capita) emitting countries were - Qatar, Kuwait, United
Arab Emirates, Aruba, Bahrain, Luxembourg, United States, and Brunei (Fig
5.6).
- Interestingly, the highest CO2 emitting cluster was in the Central Asia region
(red marked), where the top 4 out of 8 countries were present.
5.3.5. What is the year-wise CO2 emission trend in the top countries from 1991-2011?
33
• Pre-processing: A line chart is created to show the trend using the average CO2
per capita as a measure and ‘Year’ as a dimension. Further, ‘Country Name’ is used
as the Color Marks for country-wise representation.
• Filter: Two filters are incorporated – ‘Year’ = 1991-2011, and ‘Country Name’ is
filtered by the field where Avg. CO2 per capita >= 18.
• Findings:
- Among the top CO2 (per capita) emitting countries, Qatar was in the top
position that emitted 44.02 metric tons of CO2 in 2011. It was on an
increasing trend. Besides, the CO2 emission of Brunei was also shown a
similar pattern. However, other countries were on a stagnant trend (Fig 5.7).
- Rapid urbanization and industrialization in these countries might be the
reason behind the CO2 emission.
34
Chapter 6
SAP Predictive Analytics for Data Mining
6.1. Introduction
The capability of SAP Predictive Analytics is not only confined Outline
within the boundary of basic data analysis and visualization; it In the previous chapter of SAP
is also an ideal tool for data mining, predictive analysis, and Predictive Analytics, the
forecasts of future events. Besides the simple drag and drop exploratory analysis and
visualization, its functional areas include- automated analysis, visualization were performed. This
expert analysis, model and data management, predictive chapter is focused on advanced
scoring, and social media analytics (Uhlig, 2020). It mainly has data analysis and data mining
two different features for the data mining tasks, which are as techniques which include-
follows. knowledge discovery, machine
- Automated Analytics: It automates the data preparation, learning, and prediction modeling.
predictive modelling, and the user does not require
formal knowledge of data science.
Main Task
- Expert Analytics: This feature is ideal for data science
experts. Different algorithms can be easily used, and the In this exercise, I am using 5
embedded R-programming language further facilitates different data mining techniques
the functionalities of the algorithms. in different datasets. The
techniques include- clustering,
6.2. Dataset association analysis, time-series
analysis, regression analysis, and
For this analysis, different datasets are used in different
classification trees.
techniques. For instance, information about the passengers on
Titanic is used in the association analysis technique. A total of
5 different datasets are used, which is further described in the
Methodology and Analysis part. All the datasets are provided during the class tutorial.
35
clusters? What strategy can increase the sales in the cluster with the highest number of
stores?
Finding: Cluster 1 has the highest density among the three clusters. Its darker colour
represents the high density (Fig 6.1). Also, within sum of square is the lowest in cluster 1
(15.24), which concurs with the previous finding. In k-means clustering, the lower the
within sum of square inside a cluster, the higher similarity in its data points. Out of 150
stores, 50 stores belong to cluster 1 (Fig 6.1). Cluster 2 and 3 have 38 and 62 stores,
respectively.
Although cluster 1 has the highest density, the sales turnover and profit margin are the
lowest in this cluster.
36
Fig 6.3 illustrates cluster 1 has the smallest staff
size compared to clusters 2 and 3. Hence, by
increasing the number of staff and reducing the
store size, cluster 1 may increase the sales
turnover and profit margin.
The rule “{Class=1st, Sex=Female} => {Survived=Yes}” - is the most dependable rule. It
has the confidence of 97% along with support of 0.06 and lift 3.01. Although, the rule -
{Class=2nd, Age=Child} => {Survived=Yes} has 100% confidence and lift 3.10, the support
in only 0.01, which is the lowest (Fig 6.5).
37
Fig 6.5: Bubble chart of the association rules
38
Fig 6.6: Sales forecasting in Visualization tab
39
Finding: The top 3 products with the highest predicted sales quantity are Air Pump, Road
Bike Alu Shimano, and Men’s Off Road Bike Hard Tail SRM and their predicted sales
quantity will be 111634, 66895, and 65874 (Fig 6.7).
40
Finding: The top 4 influencers are Overloads (60+), AssetType (Voltage Transformer),
PMLate (N), and Repairs (Rebuild+1) (Fig 6.9).
The probability of failure of equipment with the stated values is 60%, with a score of status
of 0.28.
41
Chapter 7
SAP HANA
7.1. Introduction
SAP HANA is a column-oriented and in-memory database Outline
system that utilizes the RAM to store compressed data rather Two tools will be utilized in this
than storing it on disk drives. Enterprises use this system as it exercise - SAP HANA for data
enables the analysts to query a large amount of data in real- modeling and SAP Predictive
time. Due to its capability to use in-memory storage, the system Analytics for data analysis and
can work faster than other tools during information extraction visualization. The main focus is
(Eliav, 2021). This tool can help to stay ahead of the competition creating a data model and
due to its faster processing capacity and real-time output, populate the model with
which can help business decision-makers to make the right appropriate information from
decision at the right time. different flat files.
42
Fig 7.2: Product attribute table
Creating
Adding Creating Establishing Executing
SAP HANA Data Model for
Virtual Data Flowgraph Data Flowgraph
Catalogue Provisioning Product and
Tables Model Connection Model
Sales Data
43
The process starts Created calculation view for
in SAP HANA customer, product, and sales data
Editor
Executing the steps and checking the final Mapping for the country and sales
output table organization table
Fig 7.4: Process of creating calculation view for the customer table
44
7.4. Analysis using SAP Predictive Analytics
7.4.1. Which country earned the most net revenue in GBI? Which are the top 5 cities in terms
of net revenue in that particular country?
USA earned the most net revenue (51.6%) for GBI, whereas Germany earned 48.4%. The
top 5 cities in the USA were Boston, Palo Alto, Chicago, New York City, and Denver.
Fig 7.6: Top net revenue earning country and cities for GBI
7.4.2. Show the year-wise net revenue trend in the top 5 cities. Which products are driving
the net revenue?
As we saw in the previous analysis, Boston was the leading city, followed by Palo Alto.
However, the year-wise trend revealed that Palo Alto did not earn any revenue after 2010;
also, Boston was on a decreasing trend. Whereas, Chicago got a good revenue earning
momentum from 2009, and it earned almost the same revenue as Boston in 2011. This
indicates that Chicago could be a potential city in future for earning higher revenue than
other cities (Fig 7.7).
Fig 7.7: Year-wise Net Revenue trend of top 5 cities in the USA
45
Top products driving the net revenue in the top cities were Road Bike Carbon Shimano,
Professional Touring Bike Silver, Men’s Off Road Bike Fully, Deluxe Touring Bike Silver,
Road Bike Alu Shimano, and Men’s Off Road Bike Hard Tail SRAM (Fig 7.8).
Fig 7.8: Top products driving Net Revenue in the top 5 cities
46
References
Ajenstat, F. (2021, February 18). 9 years a leader in Gartner magic quadrant for analytics
and business intelligence platforms. Retrieved March 12, 2021, from
https://www.tableau.com/about/blog/2021/2/tableau-9-years-leader-gartner-
magic-quadrant
Anto, A. (2020, August 28). An overview of SAP Analytics Cloud. Retrieved April 9, 2021,
from https://www.zarantech.com/blog/an-overview-of-sap-analytics-cloud/
Eliav, R. (2021, March 08). What is SAP HANA? Retrieved April 08, 2021, from
https://www.panaya.com/blog/sap/what-is-sap-hana/
Hart, M. (2015, August 16). SAP Predictive Analytics - benefits and features. Retrieved
April 09, 2021, from https://sap.walkme.com/sap-predictive-analytics-benefits-
and-features/
O'Donnell, J. (2016, February 29). What is SAP Predictive Analytics? - definition from
whatis.com. Retrieved March 11, 2021, from
https://searchsap.techtarget.com/definition/SAP-Predictive-Analytics
Post, T. (2020, July 19). Tableau public: Pros and CONS (Straight Talk Review). Retrieved
April 09, 2021, from https://dashboardfox.com/blog/tableau-public-pros-and-
cons-straight-talk-review/
SAP. (2021). SAP Lumira - Data Visualization and Analytics Software.
https://www.sap.com/canada/products/lumira.html.
SAP Analytics Cloud: BI, planning, and predictive analysis tools. (n.d.). Retrieved March
12, 2021, from https://www.sap.com/canada/products/cloud-analytics.html
SAP Business Objects vs SAS Business Intelligence: Who’s the winner? (2021, February
03). Retrieved April 09, 2021, from https://www.experfy.com/blog/bigdata-
cloud/sap-business-objects-vs-sas-business-intelligence-comparison/
SAP HANA Pros and Cons - the good and the bad of database technology. (2019, May
18). Retrieved April 09, 2021, from https://data-flair.training/blogs/sap-hana-pros-
and-cons/
SAP Help Portal. (2021). SAP Business Explorer: BEx Query Designer.
https://help.sap.com/viewer/73e6551e26244281884fd2fa36cdb678/7.5.7/en-
US/9d76563cc368b60fe10000000a114084.html.
Self-Service BI tools - An in-depth COMPARISON -TABLEAU vs Power BI Vs Qlik sense vs
Tibco Spotfire Vs SAP Lumira Vs SAP Analytics Cloud. (2020, March 17). Retrieved
April 09, 2021, from https://visualbi.com/blogs/business-intelligence/data-
discovery/self-service-bi-tools-comparison-tableau-power-bi-qlik-spotfire-sap-
lumira-sap-analytics-cloud/
47
Tableau. (n.d.). Business intelligence and analytics software. Retrieved March 12, 2021,
from https://www.tableau.com/#platform
Technologies, M. (2018, November 13). Why Tableau is the best BI tool? Retrieved
March 12, 2021, from https://mindmajix.com/tableau-best-bi-tool
Uhlig, S. (2020, May 20). Machine learning with sap predictive analytics - possibilities
and limitations. Retrieved April 08, 2021, from
https://www.nextlytics.com/blog/machine-learning-with-sap-predictive-analytics
48