Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11




LIM ZHI JIAN (PBS18308161)




Table of Contents

1.0 Executive Summary..................................................................................................3

2.0 Background Information..........................................................................................3
3.0 How Do You Improve Vaccine Distribution with Big Data and Data Mining?............4
4.0 Combat National Football League Injuries with Data Mining....................................5
5.0 Need a Movie? Netflix has the Right Recommendation for You................................6
6.0 United Overseas Bank: Data-driven Solutions in Banking..........................................7
7.0 Advancing Mental Health Care with Predictive Analytics..........................................9
8.0 Conclusion and Recommendation..........................................................................10
9.0 Reference...............................................................................................................11

1.0 Executive Summary

Data mining is the extraction of hidden knowledge from raw data by applying appropriate
techniques from a properly engineered roadmap. It is characterised by the automatic
discovery of patterns, prediction, create actionable information, and large data sets. Data
mining has a long history since 1990s. Data mining is seen in almost all industries now in this
digital era, including manufacturer, marketers, banking, insurer, and many more.

This report consists of 5 case studies that discuss different industries that utilize different data
mining techniques, these include pharmaceutical, entertainment, sport, banking, as well as
healthcare industries. The techniques they use range from descriptive to predictive modelling.
The process of how these organizations improve their operational efficiency, revenue,
business process, customer relationship and experience, and cost effectiveness through data
mining are discussed in this report in detail.

Each case study also concludes that data mining and other relevant data application
technologies have brought upon new insight in their own industries, and also improvement in
various aspect.

2.0 Background Information

According to SAS, data mining has a long history, and the term appeared around 1990s. Its
foundation comprises three intertwined scientific disciplines: statistics, artificial intelligence,
and machine learning. Data mining is a process where extraction of hidden knowledge is
done from volumes of raw data. The process involves the use of algorithm and techniques
from statistics, machine learning, and also data base management. It is also known as
knowledge discovery in big data.

Data mining techniques can be classified into two big groups, which are supervised learning
and unsupervised learning. Supervised learning can be further breakdown into classification
and regression, and from classification there are four subgroups, which are neural networks,
Bayesian networks, decision tress, and supported vector machine. Whereas unsupervised
learning have two branches, namely association rules and clustering (Martín et al., 2014).

The key characteristics of data mining are automatic discovery of patterns, prediction, create
actionable information, and it focuses on large data sets (Rehman, 2017). Data mining allows
retailers, bankers, manufacturers, telco providers, marketers to discover relationship from
various interrelated domain. This enables them to get the knowledge of how one thing is
affecting the others.

3.0 How Do You Improve Vaccine Distribution with Big Data and Data Mining?

Merck & Co., Inc. is a world leading human pharmaceutical company. It has been a challenge
for the company to improve their healthcare services. Every year, company like Merck is
forced to discard stocks such as flu vaccines due to temperature fluctuations. A certain range
of temperature must be maintained to ensure the potency and efficacy of the vaccines. To
tackle this problem, as a member of Merck’s horizon three technology team, Nitin Kaul has a
story to tell.

Managing and monitoring cold chain are indeed big challenges. Sensors are being used for
the monitoring function, but obviously these data did not tell them how to prevent cold chain
failure. Nitin’s team chose Microsoft R Server for Hadoop to perform big data analysis and
data mining. They had 14 years’ worth of data to mine in order to identify the variables that
could be correlated to temperature excursion in order to predict future breakdown.

Nitin and his team then created a predictive model that could determine the probability of
cold chain breakdown based on the product being shipped, the logistics provider, the origin,
the shipment route, and so on. They then pin-point the failure using certain cold chain
hypotheses. For example, the refrigeration containers are shipped by air and by sea. The
hypothesis is that the breakdown can occur when the container is disconnected from power
upon arrival and then reconnected in the distribution centre. A consistent delay in power
supply will show as a huge spike in the logger data using certain routes, the team then
identified where these issues occurred. However in real life, the number of variables in the
model is too much to handle.

To solve this, they created an interactive web that allows the users to enter all the variables
for a upcoming shipment. The predictive model then scores the data and tell the team how

high is the risk of temperature excursion this round. The Microsoft R Server provided the
platform for Nitin’ team. The accurate predictive model saves time and cost, allows safe
transportation of goods, and it achieves the ultimate goal to improve the healthcare quality

4.0 Combat National Football League Injuries with Data Mining

A statistic states that more than 200 National Football League (NFL) players ended their
2018-2019 season on Injured Reserved. Most of the coaches are forced to deal with these
injuries and they have to give up their players on the rosters. These unfortunate events had
incurred more than half a billion loss to the American football industry (Walker & Fenn,

Thus, preventing injury and predicting future risk of getting injury are top priorities. Dr. Phil
Wagner, founder of Sparta Science, uses data collected from equipment to test a player’s
movement, which in turn plays a crucial role in diagnosing muscle overload. However the
most interesting thing is, the data has the “fortune telling power” of predicting future injury

“You get a granular device and then leverage artificial intelligence – machine learning – to
store and make sense of as much as you can stuff in the cloud.” Says Wagner. Sparta Science
utilises the data it collects in the cloud, and then starts identify the injury trend. They employ
force plates that could record functional movements. The screening exercises from players
are the data pool: balancing, planking, and jumping segments. These exercises can help to
identify foot and knee issues, lower back and groin problems, soft tissues and hamstring
injuries respectively.

Sparta Science has successfully collect more than 28,000 players and documented more than
7,000 injury in the database. Then it uses the cloud storage and machine learning technique to
have prediction on the future injury risks analysed. This improves the decision making
accuracy of a GM from the football team. If the player is at risk, they will either reserve him
until the teams really need him, or make him better with proper treatments (Young, 2020).

These data, in combination with on-field play data, helps teams to make prescriptions tailored
to individual player with the aim of improving their performances and reduce injuries.

Other than that, machine learning also allows Sparta Science to partner with top universities,
such as Auburn and Clemson. The data also helps the athletes to understand the body
development, conditioning, and performance improvement over the years before they become

In short, prevention is better than cure. With the power of data, they can keep players
heathier, cut down rehabilitation costs, and enhance overall performance. Sparta Science has
save 18% of all injuries for a given NFL team (2019).

5.0 Need a Movie? Netflix has the Right Recommendation for You

Every time when you are back home and sit on sofa, Netflix has just the right shows for your
after dinner time. Ever wonder what is the magic behind? The answer is data mining. Netflix
uses various data mining techniques over the pool of data that they have collected over the

Of course, it is easier to say than done. Netflix has done tremendous effort in incorporating
recommendations to personalize experience as much as they could. Its recommendation
system and ranking system are two important keys to the success. The personalization is
suitable for a single lady, a business man, parents with three kids, and even young couples.
The important elements are accuracy, diversity, awareness, explanations, freshness, and
similarity. Netflix adapts to the subscribers’ taste, and provides explanations to gain their
trust. As with other personalization freshness and diversity are taken into account where there
are thousands of shows to “promote” to the subscribers. Lastly, similarity is another
dimension that Netflix considers.

Netflix has several billion item ratings, popularity data, and search terms for their data mining
process. Moreover, they utilize metadata such as genres, actors, directors, reviews, and so on.
Impression data is another thing to look at, it can observe the member’s interactions with the
recommendations: such as scrolls, mouse-overs, clicks, or time spent on ta given page. Social
data and other external data are also the targets of Netflix (Amatriain, 2013).

With such availability of data, Netflix data science team could perform plenty of data mining
techniques. There is no single model that works best, they have implemented many models,
from clustering algorithm to linear regression, association rules, Restricted Boltzmann
Machines, Singular Value Decomposition, and so on. These help in improving and enhancing
the content-based recommendation systems and collaborative filtering recommendation

One of the crucial elements in successful personalizing recommendations is contextual

awareness. It improves the performance of the recommendation system, and it prompts users
to provide feedbacks that lead to a higher quality recommendation. There are explicit and
inferred contexts. Explicit contexts are such as location, language, time of the day, and
device. Inferred contexts are binging patterns and companion. To perform context prediction,
Netflix utilize deep learning technique. It is a subset of machine learning however it can
discover features without guidance or programming from the engineer.

Netflix also performs contextual prediction based on sequence classification, thus, they have
sequence prediction. This allows the recommendation engine to answer a question that
probably a girlfriend could: “What is the TV shows that he will play right now at this time?”.
It is all based on historical data that a user has done (2019).

In conclusion, by collecting volumes of data does not get the job done. Optimized models,
appropriate matrix, and the constant evolution of the system are the keys to the best user
experience in Netflix. The efforts done by the team has made Netflix 167 million subscribers
in total worldwide in 2019, puts it in the No. 1 spot by a mile.

6.0 United Overseas Bank: Data-driven Solutions in Banking

United Overseas Bank (UOB) is the third largest bank in Southeast Asia. It has more than
500 offices in 19 countries and territories across the continents of Asia Pacific, Western
Europe, and North America. In 2017, UOB set up its Big Data Analytics Centre, which is the
first centralized big data unit in Singapore. The objective of this investment is to enhance the
digital capability of the bank, and also to improve the bank’s performance.

Prior to this, all of the data are in separate system, with limited accessibility and complicated
processes. Combining all the database in a data warehouse, including all ranges of
unstructured data includes voice and text messages is the effort that they have done. With the
solutions from Cloudera, UOB has successfully migrated thousands of data into their
platform. This includes transaction, customer, trade, deposit, loan data, and so on. These
wholesome and interconnected data now allow different business functions such as retail
banking, asset management, compliance, and more to have a more holistic view of their
clients. This also helps to optimize the business process, create better customer experience,
and to have more insights on financial crimes.

With tonnes of data on hand, and also appropriate architecture of data roadmap, they are able
to use self-service analytics and machine-learning capabilities to improve UOB’s digital
banking, asset management, compliance, anti-money laundering, and so on. Firstly, with the
help of AI and machine learning, a new recommendation system is created to understand
clients’ preferences. For examples, their dining and shopping habits, UOB is able to
personalize these offers to a huge number of customers and merchants. Secondly, the
advanced anti money laundering detection system is able to detect suspicious transactions in
a shorter period of time. They needed 3 months before they gain the big data capability, but
now they only require 3 weeks and less. Any late or miss identification could mean a lot to
anti money laundering activity as these hidden companies and high risk individuals are
difficult to grab.

Besides that, UOB makes use of the customer analytics insights from the data mining and
natural language processing results to have a market sentiment analysis. This solution allows
the bank to identify latest market trend and predict potential opportunities that would
eventually lead to new revenue (2019). These results also save customer relationship
managers more than 1,000 hours, as previously they had to manually review the documents.
In conclusion, the big data and data mining capabilities that UOB has employed have brought
tremendous improvements to the business processes, operational efficiency, and more
revenue to the company.

7.0 Advancing Mental Health Care with Predictive Analytics

“We are at the forefront of capturing meaningful information about the state of mental illness
to better measure and improve health outcomes.” Says Rebecca Comrie, the executive
director of performance improvement, Canada’s Centre for Addiction and Mental Health
(CAMH). Rebecca is the one who is responsible for using big data and predictive analysis to
lead the hospital achieve its mission.

According to Mental Health Commission of Canada, 1 in 5 people in Canada will experience

a mental problem or illness, and this is higher than the global average. The illness affects
people from different backgrounds, ages, income levels, and cultures. CAMH is on its
mission to enhance the hospital capabilities to treat mental health patients, and they are
optimizing the patient care and hospital resources by utilizing predictive modelling powered
by SAS.

Electronic health records (EHR) is a common in hospitals now, however, what are the other
values EHR bring? The team then started to work on an enterprise analytics strategy that set
its goals to combine EHR with analytic tools. Rebecca and the team believe with proper tools
and this copious amount of data in their hands, they could work something out of it.

They then first started the journey with the emergency department activity. They found out
that emergency department visits had increased 82% in the past six years. The next thing they
did was to do a modelling based on the population data provided by the Ministry of Health to
predict future emergency department activity. By anticipating the amount of patient that
would visit the department, hospital could adjust staffing, resources like drugs, beds, and so
on will be arranged in a manner that the hospital would not be overwhelmed.

Similarly, the team used SAS platform to optimize care for alternate level of care (ALC)
patients. ALC patients are the ones who are still occupying the hospital’s resources of
intensive care or acute care, however they are not acutely ill or they are no longer needing the
services. By applying the right predictive model, they successfully hit an accuracy that was
80%. CAMH applies social determinant data to perform this analysis. By identifying which

patients are ALC at the admission counter, the hospital was able to allocate the resources for
the right patients, freeing up resources for those who need the services the most.

CAMH is a public organization, it receives funding from government. Justifications are

warranted for capital projects. Money from private donors are not easily obtained. Again,
CAMH began to employ data analysis to secure funding from government. By using SAS
predictive analysis, they were able to forecast items such as bedding and staffing needs,
number of patients expected each year, emergency department activity, so on and so forth.

In short, CAMH had armed itself a data-driven approach to health treatment and hospital
operations. Equip the right data mining techniques with proper big data analytics, streamline
hospital resources, quality improvement, and program planning are only a few clicks away.

8.0 Conclusion and Recommendation

“When data mining and predictive analytics are done right, the analyses are not a means to a
predictive end; rather, the desired predictions become a means to analytical insight and
discovery” Michael Schrage says in Harvard Business Review Insight Center Report.

There is no right or wrong in data mining, the result is the key. There are vendors out there
with fancy and comprehensive tools to help anyone who can afford the spend, but the purpose
behind the adoption of data mining and the actionable results are probably what worth the
money and effort. The transformation in an organization to adapt even the first step of data
mining, is frequently, but not always the barrier. Successful adoption of data analytics
depends on the digital capability, manpower, knowledge, and resources of an organization.

One should never stop from current result obtained from data mining, this composite
discipline requires constant evolution, and it will not be ended with just the results. With the
era of Industrial Revolution 4.0, the field of data science can see no border.

9.0 Reference
1) Martín, L., Baena, L., Garach, L., López, G., & Oña, J. de. (2014, December 27).
Using Data Mining Techniques to Road Safety Improvement in Spanish Roads.
Retrieved April 2, 2020, from
2) Rehman, N. (2017). Data Mining Techniques Methods Algorithms and
Tools. International Journal of Computer Science and Mobile Computing, 6(7), 227–
3) How do you improve vaccine distribution with big data? (2017, June 21). Retrieved
March 20, 2020, from
4) Walker, T. M., & Fenn, L. (2020, January 29). AP analysis: NFL teams lost over
$500M to injuries in 2019. Retrieved March 21, 2020, from
5) Young, J. (2020, March 1). Tech company Sparta Science combats NFL injuries with
machine learning. Retrieved March 21, 2020, from
6) How Big Data Can Reduce Football Injuries. (2019, February 11). Retrieved from
7) Amatriain, X. (2013). Big & personal: Data and models behind Netflix
recommendations. Proceedings of the 2nd International Workshop on Big Data,
Streams and Heterogeneous Source Mining Algorithms, Systems, Programming
Models and Applications - BigMine 13. doi: 10.1145/2501221.2501222
8) Data Science at Netflix - A Must Read Case Study for Aspiring Data Scientists.
(2019, May 4). Retrieved March 23, 2020, from
9) United Overseas Bank: Customer Success. (2019, October 22). Retrieved March 23,
2020, from
10) Making the Case for Investing in Mental Health in Canada. (2013). Retrieved March
23, 2020, from


You might also like