Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

2021

Huskie Motor
Corporation
INF20016 CASE STUDY ANALYSIS
BRONSON JOHNSON (102094694)
CUONG NGUYEN (102840305)
OSKAR MAIER (102582319)
VAN HOANG (102578923)
Executive summary

This report will provide a detailed analysis and look to provide information on the performance of
Huskie Motor Corporation (HMC) from the big data collected. The company is an emerging superstar
in the automotive manufacturing industry, and it has built its outstanding brand name and its global
performance in areas such as the North American countries (especially in the USA). The report is
going to break down the success of its performance in brands, models, and sale channels. It’ll even
turn the analysis into insights such as a recommendation to leave the Canadian through conducting a
four-quarter advance forecast in sales volume and contribution margins. Furthermore, the report
will showcase other critical issues such as consent and privacy in the collection of data and the errors
that had to be cleansed before. Ultimately, the report strives towards providing actionable intel for
HMC to evolve their business.

1
Contents
Executive summary ................................................................................................................................. 1
Introduction ............................................................................................................................................ 3
Business Overview .................................................................................................................................. 3
Dataset problems .................................................................................................................................... 3
Current situations and Forecast.............................................................................................................. 5
Overall performance Analytics............................................................................................................ 5
How is HMC performing globally? .................................................................................................. 5
How are various HMC brands performing? .................................................................................... 7
How are the various sales channels performing? ........................................................................... 7
What are the most and least profitable models? ........................................................................... 8
Financial analytics ............................................................................................................................... 9
Current contribution margin per model ......................................................................................... 9
Average variable cost per model, and how has that changed over time ..................................... 10
Which model has the most variability in variable costs ............................................................... 11
What is the current CM per channel? ........................................................................................... 11
Operations Analytics ......................................................................................................................... 12
Best performing and low performing models............................................................................... 12
Days each model spent on lot prior to sale .................................................................................. 13
Forecasting analytics ......................................................................................................................... 13
Recommendations/insights from forecast ................................................................................... 14
Other Critical Issues .............................................................................................................................. 17
Conclusion ............................................................................................................................................. 18
References ............................................................................................................................................ 19
Appendix ............................................................................................................................................... 20
Table 1: List of errors on the dataset of HMC operations: ............................................................... 20
Steps taken in the data cleaning process.......................................................................................... 23

2
Introduction

The main objective of this report is to employ the use of Tableau in a large corporation such as
Huskie Motor Corporation (HMC), to gain greater insight from looking at the data, and allow easier
understanding of overall performance analytics and profitability within the brands HMC manage.
Ultimately, incorporating analysis tools such as Tableau allows for processes to become streamlined
and in turn responsible for the direction of HMC and profitability for the shareholders.

The report highlights problems within the existing database of HMC, which creates a negative effect
upon the understanding of data when attempting to analysis figures. Moreover, through a range of
graphics, the cleaned data is transformed into a digestible amount of useful information to drive
home the key ideas, while highlighting factors which may need to be improved. By incorporating the
use of Tableau, and concluding with a proposed approach for the future.

Business Overview

Huskie Motor Corporation is responsible for manufacturing of motor vehicles, successively managing
three brands with several models and multiple segments within. This process creates a large amount
of data which can be recorded and stored to help future planning and decision making. Through the
use of forecasts and analysis data, HMC can more accurately plan production schedules and monitor
the corporation's market share within each county it operates within. HMC has created a very
precise method of collecting and recording data, all through the use of each Vehicles Identification
Number (VIN), A VIN is allocated to every vehicle produced and acts as a key item in collecting data.

It is important to state the company is not new but a “spin off” of the older company blue diamond
automotive. They have a relatively new structure and have accumulated a great deal of 2019-2020
sales data from the processes mentioned to be analysed. From these data, dealers can put their own
in-house datasets to use with a big data platform, in order to optimize their sales, revenue streams,
marketing, margins, operating expenses, and what not (izmocars 2021). However, the data that was
collected had issues with accuracy, or veracity, which often went unnoticed due to the size of the
dataset, and the quality could be decreased (Boyd & Crawford 2012, p.669, Sheng, Amankwah-
Amoah & Wang 2019 p.321). The report is going to look at cleansing the dataset and performing big
data analytics to provide a read on the current performance and a view into the future with a
forecasting analysis.

Dataset problems

There are numerous errors that were found in the HMC dataset. Some of the most serious problems will
be explained and listed below, as well as how it was mitigated in the data cleaning process.

1. There are 3 duplicate values (6 rows) in the VIN # field (as in figure 3.1), which
means there was an error in the data entry since that field is unique for each car. To resolve this,
after finding the duplicate data in Tableau by scrolling down the column, we delete both of the rows.

3
Figure 3.1: Example of a duplicate value in the dataset.

2. The variable cost value for all cars is wrongly calculated (as in the figure 3.2). It also influences
other values in the dataset such as contribution margin, net revenue and after-tax. Therefore, we need
to create new columns to calculate these values and delete old columns.

Figure 3.2: Filtering values in ‘Total Variable Cost’ column, which is similar to the calculation result, none
of them was found.

4
3. Some car segments, such as mid-size luxury, sport coupe and micro are not in the initial definition.
To resolve this, we merge full-size, mid-size and entry-level luxury to one car segment (luxury). As for
the sport coupe, we merge them with sport utility, because now sport utility coupe has emerged as a type
of coupe cars with features from sport utility vehicles (Elliot, 2020). As for the micro cars, our decision is
to remove it, since this type of car is smaller than a subcompact car (Fullard, 2015), so it could not be
merged with any type of car.

Figure 3.3: The result after fixing the segment column.

For a more in-depth list of errors in the dataset, it could be seen at Table 1 in the appendix.

Current situations and Forecast

Overall performance Analytics

How is HMC performing globally?


Through conducting an analysis on the dataset provided by HMC it is clear to see that the company’s
global performance varies from region to region. When evaluating the total profit after taxes and
operational costs it is clear that certain regions present a higher dominance in their global sales
market. The North American region encompassing countries such as the United States, Canada and
Mexico showcase their dominance by nearly providing 60% of the company’s profit (see figure
4.1.1). Furthermore, the highest performing by far was the United States who contributed to an
enormous 41% of total profit. The European region was the next most profitable region, having a
total profit of nearly 22% with a steady range of 2.9%-4.1% profit generated by each country. Finally,
South America was the least profitable market, providing about 19% of the company's revenue.

5
Looking at the tariff rates amongst the countries (excluding Mexico as an outlier), it can be seen that
the countries with a higher tariff rate are performing worse, and vice versa. For example, Argentina’s
tariff rate is one of the highest (12.58%) but its contribution to profit is the lowest with only 1.9%.
Therefore, it would be wise for HMC to shift their business from sales in high-tariff countries to
lower ones (particularly to North America and Europe, as in figure 4.1.2).

Figure 4.1.1. Proportion of profit for countries and regions.

Figure 4.1.2. Relationship between tariff rate and profit.

6
How are various HMC brands performing?
When looking at the variety of brands offered by HMC, customers will have a choice between
Apechete, Jackson and Tatra. Analysis upon the brand performance indicates that the most
profitable brands are Apechete and Tatra who are each contributing around 39% of profit gained by
the company. Jackson on the other hand is the lowest performing brand due to it only bringing in
22% of HMC’s global turnover. Upon further analysis an outstanding statistic emerged involving the
Tatra brand, the data shows that 36% of the 39% of profit generated was made in North America
(see figure 4.1.3).

When it comes to making a recommendation about brand sales, the focus should be surrounded by
the anomaly emerged from Tatra sales. The brand has sold a total of 127 cars in Europe and South
America combined compared to the 1006 sold in North America. The recommendation is to drop
Tatra sales and Europe and South America, this would give HMC a capital of $2,345,166 (see figure
4.1.4) to push into the more profitable brands or the North American Tatra infrastructure.

Figure 4.1.3: Percentage of total brand contribution after-tax broken down by region

Figure 4.1.4: Variance in percent of total contribution after-tax, sales volume and variable
cost per brand for each region

How are the various sales channels performing?


It can be seen that in terms of HMC’s sheer sales volume the fleet option in the sales channel L1 was
performing the best with a total of 1010 cars being sold (figure 4.1.6). To break down the sales even
further, by analysing sales channel L2 it can be seen that the performance of the Employee/Partner
programs is the peak performer with a sales volume of 571 cars (figure 4.1.6 ). Furthermore, to delve
in deeper in the volume of cars sold, sales channel L3 showcases that leasing is the most popular
option displaying a sales volume of 901 cars being leased out to the customers (figure 4.1.6 ). These

7
statistics may seem disconnected but there is a direct correlation between all channels of sale, an
example of ascertaining an insight from the listed statistics is to say, “of the customers who choose
the fleet option they typically choose to lease a car through an Employee/Partner program.”

The statistics discussed above is measured in the number of cards sold/leased/rented but when
making recommendations it is important to look at the most profitable instead of capacity of sale.
The retail, rental and leasing are the most profitable dimensions in all three of the sales channels. As
for the fleet option, even though it sells the most cars (1010 in total) it only contributes to 26% of
the total profit for the company. When making a recommendation it is wise to adjust just the
marketing costs from fleet and government customers to everyone else as they have the highest
variable cost and bring in the least amount of profit (figure 4.1.5) .

Figure 4.1.5: Profit vs marketing of sales channel L1

Figure 4.1.6. Sales volume of all 3 performance channels

What are the most and least profitable models?


HMC currently offers 15 different models on the current market, when making a general analysis on
the performance in terms of profitability it can be said that all but 2 models are showcasing some
sort of profit (figure 4.1.7). Sitting on top of the table in terms of most profitable is the Advantage
model which brings in an immense turnaround of $4,136,753 (figure 4.1.7). In the grand scheme of
models Advantage contributes to around 27% of the total profit for the company. On the other
hand, Jespie can be viewed as an unprofitable model due to its negative impact to the HMC’s profit
with its loss of $400,629 in total earnings (figure 4.1.8). Furthermore, Jespie attributes to almost a
negative 3% effect on total profit earned for the company.

For a quick and easy solution to increase $640,986.55 to revenue HMC can remove the sale of Jespie
and Mortimer. However, when looking at the number of cars sold globally Jespie is the peak
performer, through further investigation it can be determined that the loss to the company stems

8
from the obscene total variable cost of $4,377,573. This cost is nearly $800,000 more than the
advantage mode. Ultimately, if HMC can’t improve the aspects of labour, materials, overhead,
freight, and warranty the best course of action would be to scrap it.

Figure 4.1.7 Total percent of contribution for each model in the profit.

Figure 4.1.8: Performance analysis by profit, sales volume, marketing cost and variable cost.

Financial analytics

Current contribution margin per model


A graph depicting the models’ contribution margins (CM) for all models (figure 4.2.1) displays a large
spread between different models and brands, within all 3 brands, some models have a significantly
higher CM compared to the rest. “CM is the difference between total revenues (TR) and total
variable costs (TVC) or it is defined by the sum of contribution margins (CM) of individual products.”
(Marek Potkány, 2009). Through further analysis, it is shown that the models with the highest CM

9
are models with the highest sales volume. The high CM may be due to different amounts of variable
cost on each different model when being sold, in addition to the large ranges of prices for each
model and variants.

Figure 4.2.1: The contribution margin for each model

Average variable cost per model, and how has that changed over time
From 2019 to 2020, there was very little change in variable cost per model. Each model within every
brand incurred a slight change of variable cost, this may be due to an increase in selling process and
possible changes to the prices of materials within different markets influencing the variable cost.
However, within the Apechete brand (figure 4.2.2), the Crux model shows a large decrease of 48% of
its variable cost. This decrease may be due to a decrease in sales volume of the Crux within later
years, resulting in a decrease in variable cost. Moreover, within all other models the average variable
cost has been largely consistent with no large changes.

Figure 4.2.2: Average variable cost per model.

10
Which model has the most variability in variable costs
Figure 4.2.3 displays a box chart of the real variable cost per model. From this figure, it is clear that
some models possess a higher variance, this may be due to the vast array in number of options
offered per model. It is clear that with most models, there is a high upper end that represents a large
number of vehicles with a lower variable cost compared to the amount of car sold with a higher
variable cost.

What is the current CM per channel?


Figure 4.2.4 represents the aggregate of each sales channel’s contribution margin to HMC. All three
sales channels have roughly similar contribution margins (approximately $33 million). However, it is
evident that the retail sale of HMC vehicles is one of the highest categories for contribution margin,
as retail has the largest sales volume. This can be linked with sales channel 3, as the largest market is
the leasing of vehicles, which gives potential customers the opportunity to use a HMC vehicle. This
option allows HMC to access a very large market and have a positive outcome.

11
Figure 4.2.4: Contribution margin for different sales channels.

Operations Analytics

Best performing and low performing models


The top seller for each model is shown in figure 4.3.1, with the Jespie model being the highest selling
model based on sales volume with 350 cars being sold. The two following models were the pebble
from the Apechete brand and the Cruz from the Jackson brand, making up the top three best sellers
for the company. With this data, these models could be considered as being most popular among
consumers. Therefore, HMC should focus their marketing on certain models due to its popularity
and stock more of the popular models to ensure there will always be cars of that model to be sold.

The lowest selling model was the Rebel from the Jackson brand with a sales volume of 65, followed
by the Robin model from the Apechete and the Mortimer from Tatra. This data can lead to further
investigation on why the models are performing so poorly. However, just from this data analysis we
can provide the information for HMC to either stock less amounts of the car models to save room for
the other more popular models or to completely remove the models from their inventory.

12
Figure 4.3.2: 4 lowest selling models of HMC.

Days each model spent on lot prior to sale


In order to have a better understanding of how long each of the models are actually spending on the
lot, the data was analysed using the average number of days each model spends on the lot. This way
we can see how each model is actually spending on the lot, averaging it out and removing any
external factors such as public holidays or factors that can affect the time spent on a lot. As shown in
the chart (figure 4.3.3), we can see that the Summit model spends the least number of days on the
lot, suggesting that it is more popular amongst customers, however when cross checking its sales it
is actually one of the lowest performing models. Through this we can infer that the model is more
popular among customers who go to browse through the car lot before making a purchase hence
the models lower purchasing rate and lower days spent on the lot. This is due to few customers
going to purchase cars without actually having a clear idea of what they wish to purchase and relying
on first impressions.

Figure 4.3.3: The average day spent on a lot for each model.

Forecasting analytics
As for the sales value, as we can see in the figure 4.4.1, it is likely to rise from 252 cars in quarter 4
(Q4) of 2020 to 380 cars in Q1 of 2021. After that it is projected to decline by over 40% to 225 cars in
the next quarter, then recovering to 294 cars in the next quarter, before slightly falling by the end of
2021.

13
Figure 4.4.1: The sales value forecast for 2021.

For the contribution margin, it is projected to follow the same trend as the sales value, with its value
expected to surge from nearly $4 million to approximately $6 million in Q1 of 2021, then decline to
$3.7 million in the second quarter of the year. After that there would be a rebound in the next
quarter to $4.4 million before a slight drop by the end of 2021 (as shown in the figure 4.4.2).

Figure 4.4.2: The contribution margin forecast for HMC in 4 incoming quarters.

Recommendations/insights from forecast


1. The sales of cars by HMC in Canada could not be considered as profitable in 2021, since in
this country, both the sales volume and the contribution margin have rapidly declined over 2
years of 2019 and 2020 and they are expected to continue falling, with the contribution

14
margin becoming negative for all quarters of 2021. The sales volume is also expected to fall
to zero from the second quarter of 2021 (because a negative sales volume is impossible), as
seen in Figure 5.1 below.

Figure 5.1. The forecast for sales volume and contribution margin for sales in Canada.

Due to these reasons, it is advised that the company should leave the Canadian market, or if it still
wants to keep its business in Canada it should not keep the marketing campaign for segment and
miscellaneous incentives, since if it abandons this campaign, both of the sales volume and the
contribution margin will stay positive (as shown in figure 5.2).

15
Figure 5.2: The sales volume and contribution margin forecast in Canada (without the marketing
campaign cited above).

2. HMC's vast array of different models provide consumers with a lot of options, however not
all models perform the same and to further optimize the business we will need to adjust the
inventory. In figure 5.3, we can see the various models of each brand and how well they are
currently performing. We can see that both Apechete and Jackson brands have a Cruz model
with the latter performing much better. This would lead us to assume that it would be more
viable to stock up on the Jackson brand of Crux rather than the Apechete brand in order to
free up the cost of holding cars that do not perform well.

16
Figure 5.3: The performance of different models across 3 brands (Apechete, Jackson and Tatra) in
gross sales.

We can also safely assume that Jackson’s Crux would be more viable to stock due to its popularity
and sales number, as it is currently the best performing Jackson brand model as shown in figure 5.4.
This would mean more stock is needed to keep up with the demand for it and hence the importance
of freeing up inventory space for more popular cars.

Figure 5.4: The gross sales of different models for the Jackson brand.

Other Critical Issues


In regard to critical issues the company should be aware of, they have to be conscious of ethical
application of data usage and data collection to ensure they follow the guidelines set by the Law.
This ensures no laws are broken and consumers feel safe and trust in the transaction made with the
brand. The Australian Privacy Act of 1988 is an important guideline for the company to follow as it
ensures the safety and privacy of user data. The company will need to consult this document to
ensure no customer rights have been breached during their data collection and usage (Privacy Act
1988).

Other important issues are in regard to data security, especially transactional and consumer
personal data. The company will need to protect consumer personal data in response to this, as the
law also covers businesses that hold consumer data (Privacy and Data Protection Act 2014). This
means HMC will have to set up security banks to protect the data from being leaked to outside
sources and provide security walls to prevent cyberattacks that might aim to steal consumer data.

17
However, due to the nature of the dataset collected by the company most of these issues will not be
addressed as the data is collected from within the companies’ own production, sales and operations.
But these critical issues will still need to be taken into consideration to ensure the safety of the
company and the safety of their consumers.

Another critical issue that needs to be addressed is data inconsistency, here are some standard
practises that could be used to mitigate this problem:

• For data being randomly missed, if only a small number of values in rows is missed, it is
possible to drop these rows out of the dataset. However, this approach is not feasible if the
number of rows with missing values is too large, since it will affect to the prediction’s
accuracy (Gudivada et al. 2017, p.34).

• For duplicate data, they could be dropped from the database, since while there are other
methods that could be used to keep a record of these data such as inference of missing
values or finding false records, they are more complex and time-consuming than eliminating
all rows being duplicated (Lup Low et al. 2001, p.586).

• As for inconsistent data, our approach is to create a set of rules based on the definition in
other tables, and then removing data from the original column if it is not consistent with
these definitions, since these constraints have an important role to maintain a high data
quality (Volkovs et al. 2014, p.244).

Conclusion
In conclusion, an in-depth analysis of HMC’s raw data has uncovered interesting knowledge which is
useful in driving profits and market share forward. The company should focus its business in
countries with lower tariff rate and reduce its sales in high-tariff markets, as well as in Canada. Also,
within the sales channels, the variance of sales between the Advantage and Jespie model can be
noted as an important topic, this is also reflected within the financial analytics where a high
percentage of the top sellers are largely responsible for the company's contribution margin within all
sales channels and models. As highlighted earlier, key issues and recommendations can be found
within data privacy and security and the high level of data inconsistency, and the company should
mitigate these problems to enhance its data governance and its competitiveness in the future.

Please view attached Infographic for a summary of this report.

18
References

• Boyd, D & Crawford, K 2012, "Critical questions for Big Data", Information, Communication &
Society, vol. 15, no. 5, pp. 662-679.

• Elliot, H 2020, Half Sports-Car, Half Off-Roader: The Era of the SUV Coupe Has Begun,
Bloomberg, viewed 12 October 2021, <https://www.bloomberg.com/news/articles/2020-05-
13/mercedes-porsche-tout-suv-coupe-as-car-for-the-covid-19-era>.

• Fullard, M 2015, Complete guide to understanding car segments, Gulf News, viewed 12
October 2021, <https://gulfnews.com/lifestyle/complete-guide-to-understanding-car-
segments-1.1595406>.

• Gudivada, V., Apon, A. & Ding, J. 2017, “Data Quality Considerations for Big Data and
Machine Learning: Going Beyond Data Cleaning and Transformations", International Journal
on Advances in Software, vol. 10, no.1, pp. 1 - 20.

• izmocars 2021, Understanding Big Data Analytics for Auto Dealerships, izmocars, viewed 26
October, 2021, <https://www.izmocars.com/article/understanding-big-data-analytics-for-
auto-dealerships-1370-en-us.htm>.

• Lup Low, W., Li Lee, M. & Wang Ling, T. 2001, “A knowledge-based approach for duplicate
elimination in data cleaning”, Information Systems, vol.26, pp.585 - 606.

• Privacy Act 1988, the Office of Parliamentary Counsel, Canberra, p.5,


<https://www.legislation.gov.au/Details/C2014C00076>.

• Privacy and Data Protection Act 2014, the Office of Parliamentary Counsel, Canberra, p. 1,
<https://www.legislation.vic.gov.au/in-force/acts/privacy-and-data-protection-act-
2014/027>.

• Volkovs, M., Chiang, F., Szlichta, J. & Miller, R. J. 2014, "Continuous data cleaning," 2014 IEEE
30th International Conference on Data Engineering, pp. 244-255.

19
Appendix

Table 1: List of errors on the dataset of HMC operations:

Number Errors Solutions Reasons

1 6 rows of the dataset


has duplicate values in
VIN # field (as in figure
3.1, part III).
Because the amount of
2 4 cars sold have a duplicated rows is small and
“Summer” package, Remove all of these removing it will have little
which does not exist rows from the impact on the analytical result
in the requirements. dataset. (as cited in part III).

3 Package Costs and


Option Costs both
have 3 rows being
null.

4 4 rows have their


Days on Lot being null
.

5 1 row in the Seat


column is null .

6 4 rows in the Series


column is null

7 3 rows have their


MoonRoof cost being
null (they also have
their cost for Parking
Assist, Keyless Entry
Keypad, Remote Start,
Premium Radio and
Power Mirror being
null,
apart from one car
with Remote Start
being 0)

8 664 cars have its Total Create a new Because the amount of rows
Fixed Cost value being column based on the with this error is too big, and
wrong formula the data to calculate the right

20
(Depreciation + value are already available.
Engineering +
Tooling), then delete
the old column

9 The data for Total Create a new Because the data for
Variable Cost of all column based on the calculating the right Total
(2672) cars were formula, then fixing Variable Cost is available, and
mismatched with the all columns related all columns are affected by this
formula (Label + to it (Contribution error.
Material + Overhead + Margin, Net Revenue
Freight + Warranty) and After-tax) with
these new values,
and delete all old
columns

10 1706 cars have their After joining The data for package cost is
package cost in the Packages and Cost already available in the
sales data different with the sales data, ‘Packages and Cost’ table, and
from the definition (in we delete the old data in the old column do not
‘Packages and Cost’ package costs follow the definition in this
table). column and put the table.
cost from the
‘Packages and Cost’
table as the new
package cost.

11 2 cars are in the Change the Avatar Both of these cars are from
Avatar model (which brand in these cars Tatra brand (similar to
is not the actual to Advantage. Advantage), and they have
model that these cars other similarities: like being in
are sold with). full-size segment, having pick-
up truck bodystyle like
Advantage, and AWD drive
configuration.

12 2 cars are in Flower Removing both cars They are in two different
model (which is not in from the dataset brands (Jackson and Tatra),
the company’s model which made it difficult to
list) determine what model are
they in, since each model is
only in a brand

13 1 car in Cx7 serie Change Cx7 serie to This car has numerous
(which is not defined Cx2 similarities with Cx2 car series:
in the beginning) in Chare model, is full-size
luxury, has SUV bodystyle, so

21
we could assume that is an
error when entering the
original Cx2 model for the car.

14 In Crux model there Remove both cars Because for the car with C1
are 2 cars with C1 and from the dataset serie, it has several differences
S2 series (while (which is performed with other cars in Cr1 serie: its
originally it only have in Excel file, the segment is full-size luxury,
2 series Cr1 and Cr2) result shown in while other cars are either
figure 2). compact or full-size. It also has
an SUV body style compared to
other cars’ type of pick-up
truck and sedan. Similarly, the
S2 series is also full-size luxury
with a SUV bodystyle, while
other cars in Crux model don’t
have these characteristics.

15 In the South America Changing the region Canada and the USA are both
region, 3 cars are column for these in the region of North
labelled as sold in cars to North America (which exists in the
Canada and 2 cars are America (in the Excel database), so it is feasible to
labelled as sold in the file, the result shown modify the rows’ data rather
USA. in figure 3). than delete all of them.

16 There are full-size Change the segment All of these segments have
luxury, mid-size luxury for all of these cars ‘Luxury’ components in their
and entry-level luxury to ‘Luxury’. name, which means that they
segments (which do could be considered as in the
not exist in original same luxury segment.
segments of the
company and are
respectively
presented in 63, 62
and 35 cars).

17 1543 cars have their Join the ‘Packages Because the value of the
package cost being and Cost’ table to package cost for each package
different to the cost the sales data, delete is already available in the
defined by the the old ‘Package ‘Packages and Cost’ table and
packages used for Costs’ column and each package type only has a
them. replace it with the cost, so it is possible to replace
value from the the old package cost values
‘Costs’ column of the with new values.
‘Packages and Cost’
table.

22
18 194 cars are in Micro Delete all of these As explained by Fullard (2015),
segment, which is not cars from the Micro cars are the type of cars
in the list of segment dataset. smaller than subcompact cars,
of cars sold by the so they could be considered as
HMC a different segment from any
other segments sold by HMC
and thus could not be merged.

19 114 cars are in the Change the segment Sport Coupe or Sport Utility
Sport Coupe segment of all of those cars to Coupe is emerging as a new
(which is not in the ‘Sport Utility’ type of sport utility vehicle
initial list of (Elliot (2020)).
segments).

20 71 cars have their Change the spelling This is only a difference in


segment listed as for all ‘Mid-size’ cars spelling, so we could fix it to
‘Mid-size’, instead of to ‘Mid-Size’ prevent data inconsistency
‘Mid-Size’ when analysing it.

21 664 cars have their Recalculate the tariff Because the data for tariff rate
tariff being calculated (multiply the rate already exist in the dataset,
wrong from the ‘Tariff rate’ and it could be calculated using
table with the gross the definition (Gross Sales *
sales) and then Tariff Rate).
delete the old tariff
column.

Steps taken in the data cleaning process

23
Figure 1: The workflow used for data cleaning.

Figure 2: The result after removing the cars with C1 and S2 series from the Crux model.

Figure 3: The result after removing cars sold in Canada and USA from the South America region and
putting them in North America.

The following figures show how the dataset is cleaned in Tableau Prep Builder.

24
25
26

You might also like