You are on page 1of 48

Business Analytics Toolkit Portfolio

Md Khaled Hasan

1
Table of Contents
Sl No. Page
1 SAP Lumira 5
1.1 Introduction 5
1.2 Dataset 5
1.3 Implementation of the tool and Findings 5
1.3.1 What is the trend of population increase from 1971 to 2019? 5
1.3.2 Which are the top provinces with the highest number of people in 6
the 65 Years and Older group in 2015-2019?
1.3.3 What is the age group-wise proportion of the population in 2015- 7
2019?
1.3.4 What is the gender-wise population trend? What is the impact of 7
gender on the population aged 65 Years and Older?
1.3.5 Which province has the highest number of populations in 2015- 8
2019? Which gender dominates each of the provinces?
1.4 Conclusion and Personal Reflection 9

2 SAP BEx Query Designer and SAP BusinessObjects Analysis 11


2.1 Introduction 11
2.2 Dataset 11
2.3 Implementation of the tool and Findings 11
2.3.1 What is the product category-wise revenue from 2007-2011? 11
2.3.2 How the discount impacts the overall revenue? Does this trend vary 12
in different countries?
2.3.3 What are the top 3 products in each country that got the highest 13
total discount of all time?
2.3.4 Which division in each country has got the highest discount? What 14
are the top three customers in the division (found from the previous
question) in terms of discounts for each calendar year?
2.3.5 In each division, what are the three bottom-performing products in 15
terms of revenue and what are their discount amount? Did they get
a good discount to perform well?
2.4 Conclusion and Personal Reflection 16

3 SAP Predictive Analytics for Visualization 18


3.1 Introduction 18
3.2 Dataset 18

2
3.3 Implementation of the tool and Findings 18
3.3.1 Which were the top 3 revenue-earning products, and what year they 18
made the highest revenue?
3.3.2 What was the associated gross margin ratio of those products? In 20
which country is the ratio higher?
3.3.3 Which division had a higher gross margin ratio in that country, and 20
what are the top 5 customers with the highest gross margin in that
division?
3.3.4 What are the total net sales of those 5 customers? 21

3.4 Conclusion and Personal Reflection 21

4 SAP Analytics Cloud 23


4.1 Introduction 23
4.2 Dataset 23
4.3 Implementation of the tool and Findings 23
4.3.1 What were the top 5 revenue-earning products? Was there any 23
relationship between revenue and quantity sold?
4.3.2 Which team had the highest market share of the highest selling 24
product? What was the revenue in different distribution channels for
that product sold by that team?
4.3.3 What were the revenues of different areas in each distribution 25
channel? Which area had the highest revenue?
4.3.4 In which round that area (from the previous question) had the 26
highest revenue? What was the highest revenue earning day for a
particular product in that round? What were the top 2 key
influencers for that sale?
4.4 Conclusion and Personal Reflection 28

5 Tableau 29
5.1 Introduction 29
5.2 Dataset 29
5.3 Implementation of the tool and Findings 30
5.3.1 What are the top products in terms of total quantity sold and 30
average price?
5.3.2 What is the top product among the previously found products? 31
Which area earned the highest revenue from the top products?
5.3.3 Which is the highest revenue earning distribution channel in that 31
area for all the top products? What is the highest amount of revenue
earned from one product in any distribution channel?

3
5.3.4 What are the top CO2 emitting countries from 1991-2011? 33
5.3.5 What is the year-wise CO2 emission trend in the top countries from 33
1991-2011?
5.4 Conclusion and Personal Reflection 34

6 SAP Predictive Analytics for Data Mining 35


6.1 Introduction 35
6.2 Dataset 35
6.3 Methodology and Analysis 35
6.3.1 Clustering 35
6.3.2 Association Analysis 37
6.3.3 Time Series Analysis 38
6.3.4 Regression Analysis 39
6.3.5 Classification Trees 40

6.4 Conclusion and Personal Reflection 41

7 SAP HANA 42
7.1 Introduction 42
7.2 Dataset 42
7.3 Methodology 42
7.3.1 Step-1: Creating the database table 42
7.3.2 Step-2: Data provisioning 43
7.3.3 Step-3: Creating calculation view 43

7.4 Analysis using SAP Predictive Analytics 45


7.4.1 Which country earned the most net revenue in GBI? Which are the 45
top 5 cities in terms of net revenue in that particular country?
7.4.2 Show the year-wise net revenue trend in the top 5 cities. Which 45
products are driving the net revenue?
7.5 Conclusion and Personal Reflection 46

References 47

4
Chapter 1
SAP Lumira
1.1. Introduction
SAP Lumira is a data visualization and analytics software that Outline
can manipulate, edit, analyze and visualize data, and gain useful SAP Lumira is a visualization
insights to make strategic decisions. It also helps to publish and intelligence tool that helps to
share the developed visualizations and dashboard among the create visual stories from datasets.
team members (SAP, 2021). The dataset is acquired from This tool will be used to access,
different data sources, and it is imported into the SAP Lumira transform, and create some
from different databases, including SAP HANA Live Data, SQL exploratory visualizations from a
or as an Excel spreadsheet and CSV file. The acquired data dataset.
elements exist in three different forms.
- Measures: It is the numerical data that are usually
aggregated and calculated based on sum, average, Main Task
count, min, max, etc. In this exercise, I will use SAP
- Dimensions: All kinds of categorical data fall under this Lumira to manipulate and visualize
type. It can be nominal, ordinal, or interval data. a population dataset using
- Hierarchies: It represents the relationship between different kinds of charts and
correlated entities. One of the common examples is graphs. For the data manipulation,
geographical location - country and province. I will create few calculated fields,
hierarchy, and filters will be
1.2. Dataset applied. The location hierarchy will
A Canadian population dataset will be analyzed using the SAP be created for spatial
Lumira to gain insight into demography. The dataset has 16 representation of the data.
attributes, including year, geographical location (province), age,
sex, population, etc. Some of the attributes will be derived from age using the calculation
function to get the age group-wise number of populations. Some of the questions that
will be answered through the analysis are as follows.

1.3. Implementation of the tool and Findings


1.3.1. What is the trend of population increase from 1971 to 2019?
• Pre-processing: From the dataset (Population.csv), the ‘Age group’ dimension was
selected, and the ‘Age Data Category’ dimension was created (with two
components: ‘Age Range’ and ‘Age (Years)’). Then ‘Years Old’ dimension was
derived from ‘Age (Years)’ using the Create Calculation function. It then converted
to numeric data and relabeled as ‘Age in Years’. Finally, three new dimensions (0-
14 Years, 15-64 Years, and 65 Years and Older) were created from the ‘Age in Years’
dimension using the Create Calculation function. The newly derived dimensions (0-

5
14 Years, 15-64 Years, and 65 Years and Older) and population were then used as
the measures, and the ‘Line Chart’ was generated to show the trend.
• Filter: The data was filtered using ‘Age Data Category = Age (Years)’, ‘Geography
= Canada’ and ‘Sex = Both sexes’.
• Findings:
- The trend shows that the total population has been on an increasing trend.
- Both ’15-64 Years’ and ’65 Years and Older’ have been increasing, although
the ‘0-14 Years’ group has not increased.
- Interestingly, after 2015 the number of aged people (65 Years and Older)
has been increasing compared to the ‘0-14 Years’ group.

Fig 1.1: Population trend from 1971 to 2019

1.3.2. Which are the top provinces with the highest number of people in the 65 Years and
Older group in 2015-2019?
• Pre-processing: To determine the age group-wise proportion of the population,
three newly derived dimensions (0-14 Years, 15-64 Years, and 65 Years and Older)
were used as the measures, and a ‘Stacked Column Chart’ was created.
• Filter: The data was filtered using ‘Geography = All Provinces and Territories’ and
‘Ref_Date = 2015-2019’.
• Rank: Measure (65 Years and Older), Value (Top 5), Apply on (Geography).
• Findings: Ontario has the highest population in the ‘65 Years and Older’ age
category, which is followed by Quebec, British Columbia, Alberta, and Manitoba.

6
Fig 1.2: Top 5 provinces with People Age 65 Years and Older (2015-2019)

1.3.3. What is the age group-wise proportion of the population in 2015-2019?


• Pre-processing: To determine the age group-wise proportion of the population,
three newly derived dimensions (0-14 Years,
15-64 Years, and 65 Years and Older) were
used as the measures, and a ‘Donut Chart’ was
created.
• Filter: The data was filtered using ‘Geography
= Canada’ and ‘Ref_Date = 2015-2019’.
• Findings: Among the overall population,
67.1% people fall under the ‘15-64 Years’ age
group representing the major portion of the
population. It is followed by the ‘65 Years and
Older’ and ’0-14 Years’ group with 16.8% and
16.2%.
Fig 1.3: Age-group wise proportion of the population (2015-2019)

1.3.4. What is the gender-wise population trend? What is the impact of gender on the
population aged 65 Years and Older?
• Pre-processing: The gender-wise population trend from 1990 to 2019 is illustrated
using a bar chart. It is selected as the primary axis of the graph. This trend is then

7
compared with people aged ’65 Years and Older’ using a line chart in the secondary
axis.
• Filter: The data was filtered using ‘Sex = Females and Males’ and ‘Ref_Date =
>1989’.
• Findings: The bar chart in the primary axis shows a continuously increasing trend
of the total population. The number of females is marginally higher than the males.
In the case of seniors, females have always been higher in number. Interestingly,
there is a slight jump in this increasing trend of females in 2001, making the
difference a bit higher between males and females.

Fig 1.4: Gender-wise population trend with gender impact on the elderly group

1.3.5. Which province has the highest number of populations in 2019? Which gender
dominates each of the provinces?
• Pre-processing: The map layout is used to illustrate the geographical locations.
Two layers were created, one of which shows different provinces of Canada, and
another one shows the bubble of the dominant gender according to the
population size. The bigger bubble size indicates the higher population.
• Filter: The data was filtered using ‘Sex = Females and Males’ and ‘Ref_Date = 2019’.
• Findings: According to the bubble size, Ontario is the biggest province in terms of
the population where ‘Male’ is the dominant gender. It is closely followed by ‘Male’
dominant Quebec. ‘Male’ is also the dominant gender across the Maritime
provinces.

8
Fig 1.5: Province-wide population and dominant gender in every province

1.4. Conclusion and Personal Reflection


SAP Lumira is a good starting tool to get into the world of data analytics. It is very user-
friendly in many ways, but sometimes it is hard to deal with the tool to find the proper
answer. Some of the positive experiences and hardships that I experienced during my
work are discussed below.
• Easy filtering process: The drag-and-drop feature in SAP Lumira is very useful for
easy and fast filtering of the data.
• No coding required: The best part of this application is to provide eye-catchy and
intuitive visualization without coding.
• Using functions: All the functions are simple enough to understand and use.
Specifically, the example provided with each function makes it easier to use.
Creating the calculation is quite straightforward, which is one of the biggest
strengths of this application.
• Data import: At first sight, it may be difficult for some users to find how to import
a CSV file. Without clicking the “Show More” option, it is impossible to know what
kind of data files can be uploaded.
• Visualization limit: If it cannot visualize a big amount of data, it gives a pop-up
message – “Unable to show more than 10000 values”. It would be good to have
some messages informing the alternate ways to handle those situations.
• Storyboard: All the storyboards keep the same measures, and if the measures are
changed in a new storyboard, it also changes the previous ones. It should have the
compatibility of using different measures in different storyboards.

9
• Application stability: SAP Lumira sometimes keeps crashing with a higher amount
of data components and is not quite stable. So, it is better to save the work
regularly to secure the work done.
• Limited export option: The storyboards can only be exported as PDF, and
sometimes it limits the size of a chart or graph.
In comparison to other visualization software, SAP Lumira is running behind in some
attributes; for instance- it has limitations in exporting reports compared to SAP Analytics
Cloud, Tableau, and PowerBI. Additionally, users cannot put comments or annotation on
the working files, which is advantageous in other tools (Self-Service BI tools, 2020).
Besides, it is good for visualization, but unlike SAP Analytics Cloud, it does not support
predictive analysis or other advanced features. In summary, SAP Lumira is a decent option
for data visualization. However, to get deeper in the field of business analysis, the analysts
may need to switch to other available BI tools.

10
Chapter 2
SAP BEx Query Designer and SAP BusinessObjects Analysis
2.1. Introduction
SAP Business Explorer (BEx) Query Designer is a tool that helps Outline
to create queries and retrieve data based on those queries from SAP BEx Query Designer will be
SAP Business Wearhouse (BW). The data can be analyzed by utilized as a bridge between SAP
defining queries for InfoProviders (SAP Help Portal, 2021). It BW and front-end reporting tool.
helps in data modelling for online analytical processing (OLAP) It will help to define queries and
systems. Data modelling is specifically more helpful in the case extract the right data for analysis.
of data warehousing and creating datacubes.
SAP BusinessObjects Analysis facilitates multidimensional OLAP SAP BusinessObjects Analysis is
analysis. Using the MS Excel edition, the data can be efficiently useful for multi-dimensional
analyzed and visualized using different charts. It also helps to analysis of OLAP sources.
detect any specific trends and outliers.
Main Task
2.2. Dataset
In this exercise, I am using SAP BEx
The dataset is collected from an Infocube in the SAP NetWeaver Query Designer for creating
BW. The source of it is provided during the class tutorials. This queries and then analyze the data
data contains Global Bike Company’s business transaction, in the SAP BusinessObjects
which includes the information on time, location, customer, Analysis for MS Excel.
division, product, cost of goods, net sales, discount, revenue,
etc.
The data from GBI Infocube was explored, and the query was created using the SAP
Business Explorer (BEx) Query Designer. The process started with modelling the data, and
then the key figures and dimensions were explored. The query was created using the
drag-and-drop feature in BEx Query Designer. Primarily in this query, material, sales
organization, and customer were selected as dimensions, and cost of goods, revenue,
discount, net sales, and sales quantity were selected as measures. The designed query was
then used for further analysis through SAP BusinessObjects Analysis (Edition: MS Office
Excel). The analyses helped to answer the following research questions.

2.3. Implementation of the tool and Finding


2.3.1. What is the product category-wise revenue from 2007-2011?
• Pre-processing: All the measurements were translated to USD using the currency
translation function to maintain a standard currency unit. The calendar year and
revenue were placed in the column and product category in the rows to create the
cross tab. Then, a clustered column chart was created to visualize the findings.
• Filter: The data was filtered using ‘Measures = Revenue’.

11
• Findings:
- Overall, the revenue decreased from 2007 to 2011, and it had the lowest
revenue in 2009.
- After the introduction of E-Bikes in 2010, the revenue gained some positive
momentum.
- Touring Bikes is the highest revenue earning product category, followed by
Offroad Bikes and Road Bikes. Accessories and Trend Bikes did not show a
significant change in their revenue over this time frame.

Fig 2.1: Product category-wise revenue trend (2007-2011)

2.3.2. How the discount impacts the overall revenue? Does this trend vary in different
countries?
• Pre-processing: Two key measures (revenue and discount) and two dimensions
(calendar year and country) were selected. First, a line chart was plotted to
determine the trend of total revenue versus discount, and then another combo
chart was created to compare the same trend in terms of country. In both charts,
revenue and discount were assigned in the primary and secondary axis,
respectively.
• Filter: The data was filtered using ‘Measures = Revenue and Discount’.
• Findings:
- The total revenue versus discount (Fig 2.2) chart illustrates a similar year-
wise trend. With the decrease in discount (from 2007 to 2009), total revenue
also decreased. Then, the revenue again increased in 2010 with the increase
in discount. The launching of E-Bikes (Fig 2.1) added extra revenue in total
business in 2010 and 2011.

12
Fig 2.2: Total revenue vs discount trend (2007-2011)

- The country-wise analysis (Fig 2.3) shows that the United States had higher
discounts every year than Germany in terms of revenue-discount pattern.

Fig 2.3: Country-wise revenue vs discount trend (2007-2011)

2.3.3. What are the top 3 products in each country that got the highest total discount of all
time?
• Pre-processing: Top 3 products with the highest total discount in each country
were filtered based on creating a rule. Then, the advanced calculation option was
utilized to create a new measure called ‘Discount-Revenue Ratio’. It represents
the amount of discount in terms of the revenue earned by each product, and it is
represented as %. This measure is used in the secondary axis by creating a line
chart.
• Filter: The data was filtered using the rule ‘Measures = Discount (Top N = 3)’ for
each country.

13
• Findings:
- Among the top 3 products with the highest discount, Professional Road Bike
(Shimano) and Professional Touring Bike (Silver) are common in both
countries. Another one is Men’s Off Road Bike Fully in Germany and Deluxe
Touring Bike in the United States.
- Professional Touring Bike (Silver) in the United States has the highest
Discount-Revenue Ratio (Fig 2.4) among all the top 3 products in both
countries.
- Overall, the total Discount-Revenue Ratio is higher in the United States than
in Germany, which concurs with the previous finding in Fig 2.3.

Fig 2.4: Country-wise top 3 products with the highest total discount

2.3.4. Which division in each country has got the highest discount? What are the top three
customers in the division (found from the previous question) in terms of discounts for each
calendar year?
• Pre-processing: A cross tab was generated using the country and division in terms
of discount amount. Then, the top three
Discount
customers were filtered by creating a rule. Country Division $
The output was presented using a bar chart US AS 71,562.82

for clear visualization and easy comparison BI 8,461,404.22

(Fig 2.6). DE AS 71,371.04


BI 9,135,115.88

• Filter: The data was filtered using Overall Result 17,739,453.97

the rule ‘Measures = Discount Fig 2.5: Country-wise highest discount in each division
(Top N = 3)’ for customers in the
BI division.

14
• Findings:
- In both of the countries, the highest discount is allocated to the BI division.
The BI division’s total discount in the US and Germany is 8.46 and 9.13
million (Fig 2.5).
- From 2007 to 2010, the top 3 customers that got the highest discount in the
BI division are – Bavaria Bikes, Silicon Valley Bikes, and Beantown Bikes.
However, in 2011 Rädlelland replaced the position of Silicon Valley Bikes
(Fig 2.6).

Fig 2.6: Top 3 customer with the highest discount in the BI division (2007-2011)

2.3.5. In each division, what are the three bottom-performing products in terms of revenue
and what are their discount amount? Did they get a good discount to perform well?
• Pre-processing: The three bottom-performing products were identified in terms of
revenue by creating a cross tab, and also their associated discount was analyzed.
These measures further helped to identify if a good discount was given on those
products. The revenue-discount ratio was calculated for these products (Fig 2.8)
using the revenue and discount measures.
• Filter: The data was filtered using Revenue Discount

the rule ‘Measures = Revenue Division Material $ $


AS Repair Kit 268,634.24 8,321.92
(Bottom N = 3)’ for both T-shirt 252,889.85 7,717.49
divisions’ products. Conditional Water Bottle 166,629.58 5,117.12

formatting was applied based on BI C ity Bike Max 1,175,840.87 40,224.48


E-Bike Tailwind 8,046,390.46 215,103.70
the revenue-discount ratio. Ratio Fixed Gear Bike Plus 187,446.68 7,162.41
value > 3.11 considered Overall Result 570,501,748.58 17,739,453.97

higher, between 3.0 and 3.11 Fig 2.7: Bottom 3 performing products with Revenue-
Discount Ratio

15
were average, and value < 3.0 were marked to lower discounted products.
• Findings:
- Repair Kit, T-shirt, and Water Bottle are the bottom three performing
products in AS division. Simultaneously, City Bike Max, E-Bike Tailwind, and
Fixed Gear Bike Plus performed poorly in the BI division (Fig 2.7).
- The discount ratio for all products is 3.11, and the previously identified
bottom-performing products were evaluated considering it as standard. As
Fig 2.8 shows, City Bike Max and Fixed Gear Bike Plus had a higher discount,
but these products did not perform well. In contrast, E-Bike Tailwind had a
discount less than the overall standard discount. If more discounts were
given, it might have performed better. Products in the AS division had
discounts around the standard level but could not perform well.
Revenue Discount Revenue-Discount Ratio
Division Material $ $
AS Repair Kit 268,634.24 8,321.92 3.10
T-shirt 252,889.85 7,717.49 3.05
Water Bottle 166,629.58 5,117.12 3.07
Result 4,680,229.91 142,933.86 3.05
BI C ity Bike Max 1,175,840.87 40,224.48 3.42
E-Bike Tailwind 8,046,390.46 215,103.70 2.67
Fixed Gear Bike Plus 187,446.68 7,162.41 3.82
Result 565,821,518.67 17,596,520.10 3.11
Overall Result 570,501,748.58 17,739,453.97 3.11

Fig 2.8: Discount status indication of the bottom three performing products

2.4. Conclusion and Personal Reflection


• SAP Business Explorer Query Designer: The user interface is not much user-
friendly in terms of navigation. However, the preview panel is helpful to get an
initial idea about the layout. Free characteristics ensure the opportunity to have
additional characteristics, and it is great flexibility to be utilized during the analysis
in a later stage.
• SAP BusinessObjects Analysis (MS Excel edition):
- As it is the Excel edition, most of the features are very much familiar.
Specifically, an analyst can easily use this tool if s/he has experience in using
the Pivot table only.
- The drag-and-drop feature makes the filtering process much easier, and it
could be a good starting point to work with multidimensional data.
- Although it is a powerful tool to do different exploratory analyses, it is not
efficient for better visualization than Tableau or PowerBI.

16
- It cannot generate an interactive dashboard, which is one of the weakest
features. This can be easily done in SAP Analytics Cloud, Tableau, or
PowerBI.
- The map feature is not that strong. It only shows the geographical location
(e.g., country), but putting an additional layer (or chart) on that is not
possible.
- It does not have the option to create custom colours on the chart. Only a
few colourful and monochromatic options are available.
- SAP BusinessObjects Analysis is a good option for BI reporting, but for data
discovery and visualization, Tableau or PowerBI could be a better option. So
it would be better to use a combination of tools in business analysis rather
than choosing a particular one.

To conclude, SAP BusinessObjects enables users to integrate with other enterprise


applications (SAP Business Objects vs SAS Business Intelligence, 2021), and it is easier to
use. However, it lacks complex and advanced business analysis capabilities.

17
Chapter 3
SAP Predictive Analytics for Visualization
3.1. Introduction
SAP Predictive Analytics is a business intelligence software that Outline
enables the users to process, analyze and visualize larger SAP Predictive Analytics provides
datasets to make effective business decisions. As the name data solution from very basic
indicates, it is also a powerful tool for predictive modelling and stage of data acquisition to build
forecasting future events. Starting from the initial data prediction model based on
manipulation and data modelling, it can help to conduct historical and current data. Due to
statistical analysis and data mining tasks (O'Donnell, 2016). the limited scope of this exercise,
Some of the prominent features are- only the exploratory analysis is
- It is a very intuitive, user-friendly (drag-and-drop), and conducted. However, in future
code-free tool with the optionality to add R script codes. exercises the advanced
- Data from a wide range of sources, including- SAP methodologies, including
HANA Server, SAP BW, SQL Database, and simple prediction modeling will be
spreadsheet, can be easily imported. utilized.
- The interface is very organized and user-friendly.
Different major functionalities are categorized under
Main Task
Prepare, Predict, Visualize, Compose, and Share tabs,
making navigation easier. In this exercise, I am using slicing
and dicing techniques to filter
3.2. Dataset data from a larger dataset and
bring more granularity in the data.
For this analysis, the sales data of the wholesale division of
Some of the techniques is used
Global Bike Company is used, which was provided during the
include- sorting, filtering, ranking,
class tutorial. The dataset contains 171010 tuples having 23
and aggregations.
dimensions, and out of that 10 are used as measures. Some of
the measures that are used in the analysis include- Cost (USD),
Discount (USD), Gross margin (USD), Revenue (USD), and Sales quantity. A few calculated
variables will also be created using these measures.

3.3. Implementation of the tool and Findings


3.3.1. Which were the top 3 revenue-earning products, and what year they made the highest
revenue?
• Pre-processing: From the dataset (GlobalBikeSales.xlsx), the ‘ProdDescr’ dimension
is selected, and the ‘RevenueUSD’ is used as the measure. ‘ProdDescr’ is also used
in the Legend Color parameter. The findings are sorted by descending.
• Rank: The products are ranked using the ‘Top 3’ argument in the ranking panel.

18
Fig 3.1: Top 3 Revenue earning Products

• Findings:
- The top 3 revenue-earning products were Professional Touring Bike Silver,
Deluxe Touring Bike Silver, and the Road Bike Carbon Shimano (Fig 3.1).
- Revenue for the products were 218.24, 207.05, and 202.39, respectively.

Fig 3.2: Year-wise Revenue trend of the top 3 Products

Fig 3.3: Product-wise Revenue in the highest-earning Years

19
- The top 3 products made the highest revenue in 2019, followed by 2017
and 2018, respectively (Fig 3.2).
- Further dicing of the data showed that Professional Touring Bike Silver
earned the highest revenue among the top 3 products in 2019. It has been
consistent in all the top 4 revenue earning years.
3.3.2. What was the associated gross margin ratio of those products? In which country is the
ratio higher?
• Pre-processing: A new measure, ‘Gross Margin Ratio’, is created using three pre-
existing measures ‘GrossMarginUSD’, ‘RevenueUSD’, and ‘DiscountUSD’. A
crosstab is generated using the ‘Gross Margin Ratio’ and ‘ProductDecsr’.
• Filter: The data is filtered using ‘ProductDecsr = Deluxe Touring Bike-Silver,
Professional Touring Bike-Silver, and Road Bike Carbon Shimano’.
• Findings:
- The gross margin ratio of Deluxe Touring Bike-Silver, Professional Touring
Bike-Silver, and Road Bike Carbon Shimano were 40.97%, 41.09%, and
43.65%, respectively. Here an interesting finding is, the gross margin ratio
of Road Bike Carbon Shimano was the highest (43.65%), although its
revenue was the lowest among the top 3 products (Fig 3.4).
- In terms of the country, Germany had a higher gross margin ratio than the
United States. Also, in both countries, Road Bike Carbon Shimano had the
highest gross margin ratio.

Fig 3.4: Product- and Country-wise Gross Margin Ratio (top 3 products)
3.3.3. Which division had a higher gross margin ratio in that country, and what are the top
5 customers with the highest gross margin in that division?
• Pre-processing: The previously created ‘Gross Margin Ratio’ measure is used to
determine the higher value division. A pie chart is created to visualize the
comparison between the two divisions.
• Filter: The data was filtered using ‘Country = DE’, as in the previous analysis, we
found Germany was the country with the higher gross margin ratio.

20
• Findings:
- Compared to the BI (44.47%) division, AS (55.53%) division had a higher
gross margin ratio in Germany. Surprisingly, this trend was similar in both
the countries.
- The top 5 customers in the division were Velodrom, Ostseerad, Fahrpott,
Drahtesel, and Rädlelland (Fig 3.5).

Fig 3.5: Division-wise gross margin ratio in Germany

3.3.4. What are the total net sales of those 5 customers?


• Pre-processing: To analyze the net sales, a new measure, ‘NetSalesUSD’, is created
using two pre-existing measures, ‘RevenueUSD’ and ‘DiscountUSD’. A crosstab is
generated using the ‘NetSalesUSD’ and ‘CustDecsr’. Also, row subtotal is used to
find the total net sales of those customers.
• Filter: The data is filtered using ‘Division = AS’ and ‘CustDescr = Velodrom,
Ostseerad, Fahrpott, Drahtesel, and Rädlelland’.
• Rank: The customers are ranked using the ‘Top 5’ argument in the ranking panel.
• Findings:
- The total net sales of the top 5 customers with the highest gross margin in
AS division of Germany was 2.42 million USD (Fig 3.6).
- Among those 5 customers, Rädlelland had the highest net sales of 837.25
thousand, although its gross margin ratio was the lowest among them (Fig
3.5).

3.4. Conclusion and Personal Reflection


- SAP Predictive Analytics is a great tool that can help both regular business analysis
and heavier data science tasks (Hart, 2015). Compared to other available tools used
in data science, SAP Predictive Analytics is much easier to use, especially it does
not require any coding expertise (Hart, 2015). Despite the advantages it has, I
encountered few limitations compared to other similar applications.

21
Fig 3.6: Net sales of the top 5 customers

• The chart formatting option is very difficult to find in predictive analytics than SAP
Analytics Cloud and Tableau.
• There is no sorting option when the Trellis chart is used.
• SAP Predictive Analytics has a limited colour variation and palette than Tableau.
• Data label formatting is not user-friendly.
• It is not an ideal tool to construct business dashboards.

However, considering other advanced features, including- working with big data, data
mining techniques, and building predictive model capability, it should be a must-to-have
tool to utilize in the business world.

22
Chapter 4
SAP Analytics Cloud
4.1. Introduction
SAP Analytics Cloud is a platform that allows the users to Outline
conduct a business analysis, augmented and predictive This analysis will cover many of
analysis, and planning based on a common cloud environment the functionalities of SAP Analytics
(SAP Analytics Cloud, n.d.). This tool can be used across Cloud.
different devices due to its cloud-based compatibility. One of
the most powerful functionalities in this tool is the ‘Smart Main Task
Discovery’ option. This AI-driven approach helps the analysts
In this exercise, I am going to
and executives to make the appropriate data-driven decision
explore the ERP simulation game
for future business planning. The key capabilities of this tool
data, and also will conduct some
are-
analysis and data visualization. I
- Easy to find out the ‘Key Influencers’ by using the Smart
will follow a particular pathway
Discovery option.
which is thematically presented
- The ‘Simulation’ function provides the opportunity to
below.
find how a particular influencer impacts the outcome
and to do better future planning. Revenue Product Quantity Sold

- The visualization capacity of this tool is very intuitive and


strong. Team

- Being a cloud-based tool, it does not occupy the


processing capacity of the local computer. Distribution
Channel

4.2. Dataset Area

The ERPSIM dataset is used in this analysis which was provided


during the class tutorial. This dataset was generated from Round Day

students at SAP University Alliances member schools, who


played the ERP Simulation Game that leveraged SAP HANA. The dataset contains 6558
tuples having 10 dimensions, and out of that, 3 are used as measures. The dimensions
include- area, day, distribution channel, product, round, sales order, and team. On the
other hand, the measures are- price, quantity, and revenue. Some transformations are
done at the very beginning; for instance- area names are changed from abbreviated form
to full form (e.g., “North” instead of “NO”) and distribution channel transformed from
numeric to nominal data (e.g., “10” replaced by “Hypermarkets”)

4.3. Implementation of the tool and Findings

23
4.3.1. What were the top 5 revenue-earning products? Was there any relationship between
revenue and quantity sold?
• Pre-processing: From the dataset (ERPSIM.xlsx), the ‘Revenue’ is selected as a
measure and the ‘Product’ as the dimension. ‘Product’ is also used in the Color
parameter. The findings are sorted highest to lowest by revenue.
• Rank: The products are ranked by selecting the ‘Top 5’ option in the rank.

Fig 4.1: Top 5 Revenue earning Products

• Findings:
- The top 5 revenue-earning products were 500g Nut Muesli, 1kg Original
Muesli, 1kg Nut Muesli, 500g Raisin Muesli, and 500g Blueberry Muesli (Fig
4.1). Revenue for the products were 24.45, 22.86, 22.42, 18.38, and 15.94,
respectively.
- There was a linear relationship between the revenue and the quantity sold
(Fig 4.2). Revenue increased as the quantity increase, which implies that the
average unit price of the top 5 revenue earning products was almost similar.
Further analysis of the average unit price indicated that the highest average
unit price was 4.64 for 500g Blueberry Muesli and the lowest was 4.04 for
1kg Original Muesli.
4.3.2. Which team had the highest market share of the highest selling product? What was
the revenue in different distribution channels for that product sold by that team?
• Pre-processing: The market share is represented using a pie chart. The revenue is
used as the measure, and the colour indicates the team.
• Filter: The data is filtered using ‘Product = 500g Nut Muesli’, as it was the highest-
selling product, and then ‘Team = NN’, as the highest-selling team.

24
Fig 4.2: Relationship between Revenue and Quantity Sold
• Findings:
- Team NN had the highest market share (21.63%) of the 500g Nut Muesli
(Fig 4.3), which was 5.29 million in value.
- Team NN earned the highest revenue from the Convenience Stores (3.83
million) distribution channel and earned 1.46 million from Grocery Chains
(Fig 4.3). However, they did not have any sales in Hypermarkets.

Fig 4.3: Team-wise Market Share (highest revenue earning product)

4.3.3. What were the revenues of different areas in each distribution channel? Which area
had the highest revenue?
• Pre-processing: In this step, further dicing of the previously found results is done.
Here the revenues are analyzed in each area for the previously found distribution
channels. A Marimekko chart is created using the ‘Area’ in colour, ‘Distribution
Channel’ in the dimensions, and ‘Revenue’ in the measures.

25
• Filter: The data is filtered using ‘Product = 500g Nut Muesli’, as it was the highest-
selling product, and then ‘Team = NN’, as the highest-selling team.
• Findings:
- The revenue of North, South, and West area in the Convenience Stores were
1.73, 1.05, and 1.05, respectively. These were 0.37, 0.48, and 0.60,
respectively, in the Grocery Chains distribution channel. Interestingly, the
highest revenue earning area was North in the Convenience Stores, whereas
it was the lowest-earning area in the Grocery Chains (1.73 vs 0.37 million).
- The North area in the Convenience Stores distribution channel had the
highest (1.73 million) revenue (Fig 4.4). It implies that convenience stores in
the north area were best to sell 500g Nut Muesli (average unit price 4.17),
so it could also be a good place to sell other lower-priced products.

Fig 4.4: Revenue per Area and Distribution Channel (highest revenue product & team)

4.3.4. In which round that area (from the previous question) had the highest revenue? What
was the highest revenue earning day for a particular product in that round? What were the
top 2 key influencers for that sale?
• Pre-processing: Here, the objective is to determine which was the best performing
round for the North area. The round and revenue are used in the dimension and
measures field, respectively. A bar chart is created, and a colour palette is used for
better visualization.
In the second part, the highest revenue earning day is found for that particular
round. A heatmap is generated to highlight both the day and the associated
product that earned the highest revenue.
• Filter: Previously used filters remain the same, and a new filter, ‘Area = North’, is
added.

26
Fig 4.5: Round-wise Revenue Trend (highest revenue area, distribution channel & team)

• Findings:
- Interestingly, the North area earned the highest revenue in rounds 7 and 8
(Fig 4.5). In both these rounds, the revenue was 0.92 million.
- Also, in these rounds, the highest combined revenue earned by a particular
product was day 29 (Fig 4.6). Cumulatively, on day 29 of rounds 7 and 8,
500g Nut Muesli earned 93 thousand.

Fig 4.6: Highest Revenue earning Day for any Products (during top-earning Rounds)

- The top 3 key influencers are Quantity, Price, and Distribution Channel (Fig
4.7). The quantity has a strongly positive impact on the revenue, whereas,
Price and Distribution Channel has a weakly positive impact. The revenue
can further increase by changing the distribution channel from Convenience
Stores to Grocery Chains.

27
Fig 4.7: Top 3 key influencers on the Revenue of 500g Nut Muesli

4.4. Conclusion and Personal Reflection


After using the SAP Analytics Cloud, I realized the benefits and power of the cloud-based
platform. It provides the opportunity to utilize different advanced analytical techniques
and helps to create outstanding visualizations. Unlike previously used analytical tools, it
is much easier to format the chart area with SAP Analytics Cloud. From my experience
during this exercise, I encountered only a few shortcomings, and those are as follows.
• The ‘Smart Discovery’ function requires big amount of data to provide insights into
the analysis. Without having a higher volume of information, it is unable to find
out ‘Key Influencers’, and also, the ‘Simulation’ function does not appear.
• I could not find any data mining techniques (e.g., classification, clustering, etc.) in
this tool. The algorithms might be working in the backend, but it is difficult to
understand.
• Although it is a good tool for visualization, there is a limited charting option
compared to Tableau and PowerBI.

Every business user needs to adapt multiple tools to attain a complete business analysis
task. SAP Analytics Cloud is a complete package that enables analysts to perform basic
exploration, prediction, and presentation of data in a single platform (Anto, 2020). Also,
as the entire tool works based on the cloud, there is no need to worry about the local
computer’s working capacity. Considering its powerful business analysis capability, it
should be a must-have tool for organizations.

28
Chapter 5
Tableau
5.1. Introduction
Tableau is one of the leading data visualization tools in the Outline
business intelligence field. Due to its easy-to-understand, This analysis will mainly cover data
deployable, and highly scalable features, it has been recognized exploration and visualization. No
as the best tool for many years by Gartner Magic Quadrant data manipulation or predictive
(Ajenstat, 2021). This tool can be deployed in the cloud or on- modeling will be conducted.
premise and can be used with data from different sources,
including- file-based, web data, cloud-based, relational Main Task
database, and OLAP data sources. In addition to faster analysis
and visualization, Tableau can help in data collaboration, real- In this exercise, the ERP simulation
time data analysis, analyzing big data volume, and add-on game data is re-used for further
scripting with other statistical languages (e.g., R or Python). in-depth analysis and visualization.
Below are some features that can justify why Tableau should be The pathway is changed from the
the tool of choice. previous one (page 9) and is
- Quick and interactive visualization: The simple drag and illustrated below.
drop functionality and toggle between rows and Price

columns have made the world of visualization much


Product Quantity Sold
easier.
- Handling enormous size data: Tableau can handle Revenue Area

millions of rows of data without any flaws (Technologies,


2018). Distribution
Channel

- Dashboard: Intuitive and eye-catchy dashboards that


can be used in different kinds of devices (e.g., laptop, Round

mobile, or tablet) is possible to produce with Tableau.


Day
- Cutting-edge feature: Tableau allows to connect the
data with integrated AI capability, which helps gain more
insight and make an effective data-driven decision (Tableau, 2021).

5.2. Dataset
In the previous analysis with SAP Analytics Cloud, I used the ERPSIM dataset and
conducted some analytical tasks following a particular pathway (page 9). I felt that more
analysis could be done to gain more insights and generate business acumen by analyzing
that dataset from a different perspective. Hence, I decided to use the ERPSIM dataset
again in this chapter, but in a different pathway (illustrated above).

29
In the last part of the exercise, another dataset from the World Bank on CO2 emission is
used. This dataset contains four dimensions- country name, region, year, and CO2
emission per capita, and it has a total of 11127 tuples.

5.3. Implementation of the tool and Findings


5.3.1. What are the top products in terms of total quantity sold and average price?
• Pre-processing: ‘Price’ and ‘Quantity’ are used as the measures, and these are then
analyzed for all the 12 ‘Products’. A scatterplot is created to find the highest
quantity sold and the highest average of the products. Also, it helps to find the
relation between the average price and quantity sold.
• Aggregation: Sum of the ‘Quantity’ and average of the ‘Price’.

Fig 5.1: Product-wise Quantity & Avg. Price

• Filter: Top 4 products found in the first analysis then filtered to see how the average
price and quantity sold are changed.
• Findings:
- The top 4 products with the highest quantity sold were 500g Nut Muesli,
1kg Original Muesli, 1kg Nut Muesli, and 500g Raisin Muesli (Fig 5.1), and
their average prices were 4.11, 4.04, 4.22, and 4.19, respectively.
- The maximum, minimum, and average price of all 12 products were 5.08
(1kg Strawberry Muesli), 3.91 (500g Original Muesli), and 4.47. The average
quantity sold of all the products was 3.75 million (Fig 5.1).
- However, for the top 4 products, the average quantity sold increased to 5.66
million and the average price dropped down to 4.14. The analysis further
revealed, when the product price was less than 4.22, their sold quantity was
much higher. The difference in average price between 500g Nut Muesli
(4.11) and 1kg Nut Muesli (4.22) was only 0.11 (Fig 5.2).

30
Fig 5.2: Product-wise Quantity & Avg. Price Product-wise Quantity & Avg. Price (Top 4)

5.3.2. What is the top product among the previously found products? Which area earned the
highest revenue from the top products?
• Pre-processing: ‘Area’ is used as the dimension, and ‘Revenue’ and ‘Quantity’ are
used as measures. A stack column chart is created to illustrate the revenue earned
and quantity sold (in %) for the top 4 products. It is further mapped as per the
‘Area’. The findings are sorted based on the revenue (descending).
• Filter: The data is filtered using ‘Product = 1kg Nut Muesli, 1kg Original Muesli,
500g Nut Muesli, and 500g Raisin Muesli’.
• Findings:
- 500g Nut Muesli is the top-selling product in terms of revenue (27.75%) and
quantity (27.46%).
- South area earned the highest total revenue from the top 4 products.
Interestingly, 1kg Original Muesli and 1kg Nut Muesli earned more revenue
than 500g Nut Muesli in this area. In contrast, 500g Nut Muesli earned the
most revenue in the North area (Fig 5.3).
5.3.3. Which is the highest revenue earning distribution channel in that area for all the top
products? What is the highest amount of revenue earned from one product in any
distribution channel?
• Pre-processing: Finding from the previous analysis is further diced in this step. The
highest revenue earning distribution channel (in terms of revenue %) is analyzed
and presented using a pie chart.
• Filter: ‘Product = 1kg Nut Muesli, 1kg Original Muesli, 500g Nut Muesli, and 500g
Raisin Muesli’ and ‘Area = South’.

31
Fig 5.3: Area-wise Revenue & Quantity (%) of Top 4 Products

• Findings:
- The highest revenue earning distribution channel was Grocery Chains
(36.89%), followed by Hypermarkets (36.14%) and Convenience Stores
(26.97%).

Fig 5.4: Revenue Market Share (%) of Distribution Channels of Top 4 Products in South area

- In round 7, 1kg Original Muesli earned the highest revenue from the
Hypermarkets distribution channel. Hypermarkets did not sell any 500g
category products, whereas Convenience Stores did not sell 1kg category
products (Fig 5.5).

32
Fig 5.5: Distribution Channel & Round-wise Revenue of Top 4 Products

5.3.4. What are the top CO2 emitting countries from 1991-2011?
• Pre-processing: The average CO2 per capita is considered as the measure to find
the highest CO2 emitting countries. A geographic map is created using Longitude
and Latitude.
• Filter: Two filters are incorporated – ‘Year’ = 1991-2011, and ‘Country Name’ is
filtered by the field where Avg. CO2 per capita >= 18.

Fig 5.6: Geographic representation of top CO2 (per capita) emitting countries

• Findings:
- The top CO2 (per capita) emitting countries were - Qatar, Kuwait, United
Arab Emirates, Aruba, Bahrain, Luxembourg, United States, and Brunei (Fig
5.6).
- Interestingly, the highest CO2 emitting cluster was in the Central Asia region
(red marked), where the top 4 out of 8 countries were present.
5.3.5. What is the year-wise CO2 emission trend in the top countries from 1991-2011?

33
• Pre-processing: A line chart is created to show the trend using the average CO2
per capita as a measure and ‘Year’ as a dimension. Further, ‘Country Name’ is used
as the Color Marks for country-wise representation.
• Filter: Two filters are incorporated – ‘Year’ = 1991-2011, and ‘Country Name’ is
filtered by the field where Avg. CO2 per capita >= 18.

Fig 5.7: Day-wise Revenue of Top 4 Products (in Round 5)

• Findings:
- Among the top CO2 (per capita) emitting countries, Qatar was in the top
position that emitted 44.02 metric tons of CO2 in 2011. It was on an
increasing trend. Besides, the CO2 emission of Brunei was also shown a
similar pattern. However, other countries were on a stagnant trend (Fig 5.7).
- Rapid urbanization and industrialization in these countries might be the
reason behind the CO2 emission.

5.4. Conclusion and Personal Reflection


Tableau is undoubtedly one of the most powerful, user-friendly, and faster tools in the
world of analytics that can leverage, find hidden acumen from the data and make a critical
decision. Tableau may focus on a couple of small things to improve it a bit more. First,
there is no direct way to do ranking in Tableau. Users need to use the rank function
instead, which is not handy as SAP Predictive Analytics or SAP Analytics Cloud. Another is
limited in-built charting options in Tableau. For instance, to make the Trellis chart, there
is no pre-formatted chart in Tableau. Tableau Public does not offer to save the work
locally, and it needs to publish on Tableau Profile. It results in less security, and as it is a
public platform, people can get easier access. In terms of data source, the Public version
does not provide direct access to databases, and it is limited to MS Excel, Google sheets,
and different text files (Post, 2020).

34
Chapter 6
SAP Predictive Analytics for Data Mining
6.1. Introduction
The capability of SAP Predictive Analytics is not only confined Outline
within the boundary of basic data analysis and visualization; it In the previous chapter of SAP
is also an ideal tool for data mining, predictive analysis, and Predictive Analytics, the
forecasts of future events. Besides the simple drag and drop exploratory analysis and
visualization, its functional areas include- automated analysis, visualization were performed. This
expert analysis, model and data management, predictive chapter is focused on advanced
scoring, and social media analytics (Uhlig, 2020). It mainly has data analysis and data mining
two different features for the data mining tasks, which are as techniques which include-
follows. knowledge discovery, machine
- Automated Analytics: It automates the data preparation, learning, and prediction modeling.
predictive modelling, and the user does not require
formal knowledge of data science.
Main Task
- Expert Analytics: This feature is ideal for data science
experts. Different algorithms can be easily used, and the In this exercise, I am using 5
embedded R-programming language further facilitates different data mining techniques
the functionalities of the algorithms. in different datasets. The
techniques include- clustering,
6.2. Dataset association analysis, time-series
analysis, regression analysis, and
For this analysis, different datasets are used in different
classification trees.
techniques. For instance, information about the passengers on
Titanic is used in the association analysis technique. A total of
5 different datasets are used, which is further described in the
Methodology and Analysis part. All the datasets are provided during the class tutorial.

6.3. Methodology and Analysis


6.3.1. Clustering
Dataset: A dataset that contains different information of 150 stores of a retail chain is
utilized to cluster the stores based on their attributes. The attributes in the dataset are-
store location (city), sales turnover, store size, staff size, and profit margin.
Methodology: K-means clustering algorithm is used to find the clusters. The number of
clusters is decided to set at 3. All the attributes (excluding store location) are used for the
analysis. After running the algorithm, it creates another column called- Cluster Number.
Question: Among the three clusters, which cluster has the highest density and how many
stores belong to that cluster? What are the sales turnover and profit margin in different

35
clusters? What strategy can increase the sales in the cluster with the highest number of
stores?
Finding: Cluster 1 has the highest density among the three clusters. Its darker colour
represents the high density (Fig 6.1). Also, within sum of square is the lowest in cluster 1
(15.24), which concurs with the previous finding. In k-means clustering, the lower the
within sum of square inside a cluster, the higher similarity in its data points. Out of 150
stores, 50 stores belong to cluster 1 (Fig 6.1). Cluster 2 and 3 have 38 and 62 stores,
respectively.

Fig 6.1: Highest density cluster and number of stores

Although cluster 1 has the highest density, the sales turnover and profit margin are the
lowest in this cluster.

Fig 6.2: Sales turnover and profit margin in different clusters

36
Fig 6.3 illustrates cluster 1 has the smallest staff
size compared to clusters 2 and 3. Hence, by
increasing the number of staff and reducing the
store size, cluster 1 may increase the sales
turnover and profit margin.

Fig 6.3: Cluster center in terms of different measures

6.3.2. Association Analysis


Dataset: The dataset contains information about the passengers on the Titanic. There are
a total of 2201 tuples of passengers’ data on sex, age (category), passenger class, and
survival status.
Methodology: In this exercise, the Apriori algorithm is used to find out the association
among items. The outcome is presented using three metrics- confidence, support, and
lift. All the attributes are used in the analysis.
Question: What are the association rules derived on survivability based on the passenger
data on RMS Titanic? Which is the most dependable rule, and what is the rationale?
Finding: By applying the R-Apriori algorithm, we derived nine rules from the Titanic
passenger data (Fig 6.4). The rules are as follow-

Fig 6.4: Association rules derived from the Titanic Data

The rule “{Class=1st, Sex=Female} => {Survived=Yes}” - is the most dependable rule. It
has the confidence of 97% along with support of 0.06 and lift 3.01. Although, the rule -
{Class=2nd, Age=Child} => {Survived=Yes} has 100% confidence and lift 3.10, the support
in only 0.01, which is the lowest (Fig 6.5).

37
Fig 6.5: Bubble chart of the association rules

6.3.3. Time Series Analysis


Dataset: The dataset contains the date-wise (from 2007-2019) sales revenue of Global
Bike US. It has four attributes – year, month, date, revenue, and currency, and 171,010
tuples.
Methodology: The sales forecasting of Global Bike US is done using the time series
analysis method. For ease of calculation, Euros are converted to Dollars, and a year-month
time hierarchy is created. Two different methods were used for doing the time series
analysis – ‘predictive calculation’ in the Visualization tab and ‘time-series analysis’ in the
Predict tab. In both of these methods, the Triple Exponential Smoothing algorithm is
chosen as the forecast type, and the forecasting is done for 24 months.
Question: What is the US sales revenue forecast for the years 2020 and 2021 of Global
Bike? Compare different forecasting techniques using time series analysis.
Finding: Forecasting using two different methods resulted in slightly different output (Fig
6.6 & 6.7). For instance, according to the Visualization tab, the highest sales in 2021 will
be in June with a revenue of 13.81 M, whereas, in Predict tab, the revenue changed to
13.70 M.
Time series analysis using the Visualization tab is pretty straightforward and easier to use.
On the other hand, the same analysis in the Predict tab provides more options for
customization (using alpha, beta, and gamma values). More importantly, it provides
information on statistical parameters (R2 value, Goodness of Fit, f-value). In this analysis,
the alpha, beta, and gamma values were set to 0.2, 0.1, and 0.1, respectively, as it resulted
in the highest R2 value (0.94) and Goodness of Fit (0.944). These are good indicators for
ensuring better forecasting.

38
Fig 6.6: Sales forecasting in Visualization tab

Fig 6.7: Sales forecasting in Predict tab

6.3.4. Regression Analysis


Dataset: The dataset contains the sales information of Global Bike with all the associated
variables. There are a total of 25 variables in the dataset, and the majority of them are
dimensions. The measures in this dataset are- cost, discount, revenue, and sales quantity.
Also, the dataset contains 171,010 rows or observations.
Methodology: For better inventory control, the sales quantity of Global Bike is predicted
using the regression analysis. At first, the regression analysis model is created by
implementing the Partitioning methods and then using the Auto Regression algorithm.
Here, the sales quantity is selected as the target variable. After running the algorithm, the
resulted model is saved. Finally, the sales quantity is predicted using the previously built
regression model.
Question: What will be the top 3 products in terms of predicted quantity sales? Which are
the top 2 contributing variables to predict quantities required to meet the sales demand
of Global Bike?

39
Finding: The top 3 products with the highest predicted sales quantity are Air Pump, Road
Bike Alu Shimano, and Men’s Off Road Bike Hard Tail SRM and their predicted sales
quantity will be 111634, 66895, and 65874 (Fig 6.7).

Fig 6.8: Top 3 products with highest predicted sales

The top 2 contributing variables


to predict the quantities
required to meet sales demand
are- Product Description and
City. Their contribution indices
are 0.488 and 0.288,
respectively. Other variables
are- Category Description,
Month, Sales Organization, and
Customer Description (Fig 6.8).

Fig 6.9: Top contributing variables for highest predicted sales

6.3.5. Classification Trees


Dataset: In this exercise, a dataset on equipment failure is used. The dataset has 22
attributes and 1716 tuples. Each of the tuples contains information about the equipment
type, overloads, age, equipment status, repairs, repairing cost, and so on.
Methodology: I am going to create a predictive model for the preventive maintenance of
different equipment. The historical equipment data will facilitate building the model of a
decision tree. This model will classify which equipment has a higher likelihood of having
failure based on its attributes.
The Automated Analytics feature of SAP PA is used in this exercise. The target variable is
the equipment status. The model is first trained using the existing data, and then it is
applied to build the decision tree. Different metrics of the model (Predictive Power,
Confidence, Profit/ROC curve) are evaluated to determine the accuracy, reliability, and
performance of the model.
Question: What are the top 4 influencers of the equipment failure (illustrate using a
decision tree)? What is the probability of failure with Overloads (60+), AssetType (Voltage
Transformer), PMLate (N), AvgRepairCost (20000), Repairs (Rebuild+1), & Age (20)?

40
Finding: The top 4 influencers are Overloads (60+), AssetType (Voltage Transformer),
PMLate (N), and Repairs (Rebuild+1) (Fig 6.9).

Fig 6.10: Top 4 influencers for equipment failure

The probability of failure of equipment with the stated values is 60%, with a score of status
of 0.28.

6.4. Conclusion and Personal Reflection


Unlike the other data science tools, like- R Studio, Python, the SAP Predictive Analytics
requires no hard programming language. The simple and user-friendly drag and drop
feature helps to build models. Specifically, the Automated Analytics feature helps build
models for the light-users of data science tools, as it does not require setting up different
sophisticated parameters. Also, customizable R-code can be easily incorporated through
the R-interface (Uhlig, 2020).
However, it has a limited boundary in terms of data science algorithms. For instance- more
complex modelling and deep learning algorithm is not supported (Uhlig, 2020). Besides,
only the R-code can be integrated, but the Python language is not yet compatible,
although it has a higher number of users in the present data analytics world.
In a nutshell, SAP Predictive Analytics is a very handy and quite powerful tool for business
users, which can solve most of the day-to-day business needs.

41
Chapter 7
SAP HANA
7.1. Introduction
SAP HANA is a column-oriented and in-memory database Outline
system that utilizes the RAM to store compressed data rather Two tools will be utilized in this
than storing it on disk drives. Enterprises use this system as it exercise - SAP HANA for data
enables the analysts to query a large amount of data in real- modeling and SAP Predictive
time. Due to its capability to use in-memory storage, the system Analytics for data analysis and
can work faster than other tools during information extraction visualization. The main focus is
(Eliav, 2021). This tool can help to stay ahead of the competition creating a data model and
due to its faster processing capacity and real-time output, populate the model with
which can help business decision-makers to make the right appropriate information from
decision at the right time. different flat files.

7.2. Dataset Main Task


The data tables that are modelled in the SAP HANA In this exercise, I will create three
environment utilize the customer, product, and sales data of different database tables by using
the Global Bike Inc (GBI). The flat files of these datasets are appropriate table definition in the
already located in the SAP HANA server and using the Smart SAP HANA environment. Then the
Data Integration Tool, the data are integrated into the tables. tables will be populated with
The customer, product, and sales databases have 7, 8, and 12 necessary contents from .csv files.
attributes, respectively. The attribute CUSTOMER_NUMBER is The tables will be joined using star
used as the primary key in the customer table and sales table schema. Few calculated key
during the star-joining phase. Also, PRODUCT is the primary measures will also be created for
key for the referential join between product and sales table. further analysis. And finally, some
data analysis will be done using
7.3. Methodology SAP Predictive Analytics.
In this exercise, the sales data of Global Bike Inc. (GBI) is
analyzed using SAP Predictive Analytics. Before going to the
analysis, the data model is prepared based on the in-memory database SAP HANA. The
step-wise methodology taken in SAP HANA is discussed first, followed by the analysis in
SAP Predictive Analytics.
7.3.1. Step-1: Creating the database table
Three different tables are defined for the
customer, product, and sales transaction
master data. The customer and product
table are shown in Fig. 7.1 and 7.2,
respectively. Fig 7.1: Customer attribute table

42
Fig 7.2: Product attribute table

7.3.2. Step-2: Data provisioning


In this step, data flows are created to extract data from flat files (of customer, product,
and sales) and transfer that into the database tables. The process is illustrated in Fig. 7.3.

Creating
Adding Creating Establishing Executing
SAP HANA Data Model for
Virtual Data Flowgraph Data Flowgraph
Catalogue Provisioning Product and
Tables Model Connection Model
Sales Data

Fig 7.3: Data Provisioning Process

7.3.3. Step-3: Creating calculation view


A calculation view is created for all the customer, product, and sales data. Here, the
process of creating a calculation view of customer data is illustrated (Fig. 7.4).
For the product data, the calculation view is created the same way as the customer data.
However, for the sales data, a star schema is created using both the customer and product
data along with the sales data. The calculation view is created with a star join among these
data tables. At first, a calculated column (NET_REVENUE) is created by subtracting
DISCOUNT from the REVENUE. Then the customer table and sales table are joined using
the CUSTOMER_NUMBER and a referential join. Also, the product table is joined using the
PRODUCT and a referential join (Fig 7.5). All the fields are mapped except the
CUSTOMER_NUMBER and the PRODUCT field. The fields in the fact table are then
assigned the correct type (dimension or measure). After executing this step, the analysis
is done.

43
The process starts Created calculation view for
in SAP HANA customer, product, and sales data
Editor

Adding the customer table, mapping, and


joining country & sales organization table

Executing the steps and checking the final Mapping for the country and sales
output table organization table

Fig 7.4: Process of creating calculation view for the customer table

Fig 7.5: Star join among the data tables

44
7.4. Analysis using SAP Predictive Analytics
7.4.1. Which country earned the most net revenue in GBI? Which are the top 5 cities in terms
of net revenue in that particular country?
USA earned the most net revenue (51.6%) for GBI, whereas Germany earned 48.4%. The
top 5 cities in the USA were Boston, Palo Alto, Chicago, New York City, and Denver.

Fig 7.6: Top net revenue earning country and cities for GBI

7.4.2. Show the year-wise net revenue trend in the top 5 cities. Which products are driving
the net revenue?
As we saw in the previous analysis, Boston was the leading city, followed by Palo Alto.
However, the year-wise trend revealed that Palo Alto did not earn any revenue after 2010;
also, Boston was on a decreasing trend. Whereas, Chicago got a good revenue earning
momentum from 2009, and it earned almost the same revenue as Boston in 2011. This
indicates that Chicago could be a potential city in future for earning higher revenue than
other cities (Fig 7.7).

Fig 7.7: Year-wise Net Revenue trend of top 5 cities in the USA

45
Top products driving the net revenue in the top cities were Road Bike Carbon Shimano,
Professional Touring Bike Silver, Men’s Off Road Bike Fully, Deluxe Touring Bike Silver,
Road Bike Alu Shimano, and Men’s Off Road Bike Hard Tail SRAM (Fig 7.8).

Fig 7.8: Top products driving Net Revenue in the top 5 cities

7.5. Conclusion and Personal Reflection


SAP HANA is a very useful tool for real-time analysis and decision-making. Its faster
processing power also saves much time. All the traditional database technology can only
use one process (OLTP or OLAP) at a single time, whereas SAP HANA can access various
processes parallelly (SAP HANA Pros and Cons, 2019).
However, in some cases, SAP HANA requires technical competencies. Specifically, data
modelling requires a proper understanding of the schema and its different properties.
Defining the table requires appropriate coding and expertise (Eliav, 2021). These are some
of the shortcomings that I experienced in using SAP HANA. Nevertheless, as nowadays
the businesses require real-time and faster data processing, SAP HANA is an ideal solution
for that, and its advantages outweigh the limitations.

46
References
Ajenstat, F. (2021, February 18). 9 years a leader in Gartner magic quadrant for analytics
and business intelligence platforms. Retrieved March 12, 2021, from
https://www.tableau.com/about/blog/2021/2/tableau-9-years-leader-gartner-
magic-quadrant
Anto, A. (2020, August 28). An overview of SAP Analytics Cloud. Retrieved April 9, 2021,
from https://www.zarantech.com/blog/an-overview-of-sap-analytics-cloud/
Eliav, R. (2021, March 08). What is SAP HANA? Retrieved April 08, 2021, from
https://www.panaya.com/blog/sap/what-is-sap-hana/
Hart, M. (2015, August 16). SAP Predictive Analytics - benefits and features. Retrieved
April 09, 2021, from https://sap.walkme.com/sap-predictive-analytics-benefits-
and-features/
O'Donnell, J. (2016, February 29). What is SAP Predictive Analytics? - definition from
whatis.com. Retrieved March 11, 2021, from
https://searchsap.techtarget.com/definition/SAP-Predictive-Analytics
Post, T. (2020, July 19). Tableau public: Pros and CONS (Straight Talk Review). Retrieved
April 09, 2021, from https://dashboardfox.com/blog/tableau-public-pros-and-
cons-straight-talk-review/
SAP. (2021). SAP Lumira - Data Visualization and Analytics Software.
https://www.sap.com/canada/products/lumira.html.
SAP Analytics Cloud: BI, planning, and predictive analysis tools. (n.d.). Retrieved March
12, 2021, from https://www.sap.com/canada/products/cloud-analytics.html
SAP Business Objects vs SAS Business Intelligence: Who’s the winner? (2021, February
03). Retrieved April 09, 2021, from https://www.experfy.com/blog/bigdata-
cloud/sap-business-objects-vs-sas-business-intelligence-comparison/
SAP HANA Pros and Cons - the good and the bad of database technology. (2019, May
18). Retrieved April 09, 2021, from https://data-flair.training/blogs/sap-hana-pros-
and-cons/
SAP Help Portal. (2021). SAP Business Explorer: BEx Query Designer.
https://help.sap.com/viewer/73e6551e26244281884fd2fa36cdb678/7.5.7/en-
US/9d76563cc368b60fe10000000a114084.html.
Self-Service BI tools - An in-depth COMPARISON -TABLEAU vs Power BI Vs Qlik sense vs
Tibco Spotfire Vs SAP Lumira Vs SAP Analytics Cloud. (2020, March 17). Retrieved
April 09, 2021, from https://visualbi.com/blogs/business-intelligence/data-
discovery/self-service-bi-tools-comparison-tableau-power-bi-qlik-spotfire-sap-
lumira-sap-analytics-cloud/

47
Tableau. (n.d.). Business intelligence and analytics software. Retrieved March 12, 2021,
from https://www.tableau.com/#platform
Technologies, M. (2018, November 13). Why Tableau is the best BI tool? Retrieved
March 12, 2021, from https://mindmajix.com/tableau-best-bi-tool
Uhlig, S. (2020, May 20). Machine learning with sap predictive analytics - possibilities
and limitations. Retrieved April 08, 2021, from
https://www.nextlytics.com/blog/machine-learning-with-sap-predictive-analytics

48

You might also like