SIM - Chapters - DA T1

1
BIG DATA ANALYTICS
LEARNING OUTCOMES
At the end of this topic, students should be able to:
• Differentiate between big data, data analytics and data science
• Differentiate among machine learning techniques
• Discuss real-world application of big data analytics in related field
INTRODUCTION
Have your ever been in a situation where you search or bought something on an online platform, let
say, Shopee, and then suddenly you see tons of advertisements about that similar product on Facebook?
Have you ever wondered how this could happened? Actually, what happens is that your behaviour is
being accurately predicted or monitored using big data analytics techniques. The aim is so that the
businesses can tailor customized advertising campaigns to different target groups.
In this topic, you will learn about the concepts in big data analytics where you will be introduced to the
common terms in this field which are big data, data analytics, and data science. You will also be exposed
to the different types of machine learning techniques that can be used to achieve different objectives in
big data analytics.
1.1 DATA AND THE NEW TRENDS

In the era of the Fourth Industrial Revolution (IR 4.0), data has become an important asset to an
organization. There are various claims such as “data is the new oil” and “information is the oil, analytics
is the combustion engine”, that show how data has been increasingly getting attention.
The importance of data to the business can be seen from its significant implications on global economy.
The nature of data creation has now changed, whereby the old model is where data is generated by
media sources like news, radio and television and it is utilized by the people. Whereas the new model
shows that data is generated and consumed by both the media and the people. The data is now generated
from numerous sources like social media apps, mobile devices, sensors/IoTs and many others. All of
these have been contributing to the existence of “big data”.
1.2 CONCEPTS IN BIG DATA ANALYTICS

The term Big Data is formally defined by Gartner Inc. [1] as “..high-volume, and high-velocity and/or
high-variety information assets that demand cost-effective, innovative forms of information processing
that enable enhanced insight, decision making, and process automation”. Simply put, big data means
data that is so huge, fast and varied that it is impossible to process using traditional methods. If you
are an avid user of social media, for example, Facebook, you also contribute to the huge generation of
data when you regularly update your profile, post status updates, upload photos and comment on your
friends’ posts. With billions of active Facebook users like you, Facebook receives petabytes of data per
day in the form of structured data (i.e., profile details) and unstructured data (i.e., text, photos,
videos) that later can be used to understand your behaviour. There is also semi-structured data form
where it contains the mixture of the two forms mentioned previously.
Big data is also commonly defined based on its characteristics of “Vs”. Despite many variations of the
definition (4Vs, 5Vs, 6Vs, 10Vs etc.), for the purpose of learning, Figure 1 containing 10Vs definition
is shown here.
Figure 1 10Vs of Big Data [2]
The field of Data Science makes understanding of your behaviour easier by processing and analysing
the big data using a combination of Computer Science methods, statistics, and the right domain
knowledge. A data science lifecycle typically includes business understanding, data acquisition and
understanding, modelling, and deployment (see Figure 2). Refer to the following video for the
description of Big Data:
**Sample vide: https://www.youtube.com/watch?v=bAyrObl7TYE
The details of each process are as follows:

• Business understanding. Understand the problem you are trying to solve, define business
objective (s), identify data sources and key variables.
• Data acquisition and understanding. Collect data, fix inconsistencies within the data, analyse
the data to understand the patterns behind the data using data summarization and visualization
tools.
• Modelling. Develop a model that fits the business objective (i.e., prediction, classification)
using machine learning techniques and evaluate the model.
• Deployment. Deploy the model to various end-user application (i.e., websites, dashboard,
front-end application) and finalize project output.
Figure 1 Data Science Lifecycle [3]
Data Analytics is a part of Data Science that examines the data to get meaningful information with the
help of specialized tools. Based on Figure 1, the data acquisition and understanding process basically
defines data analytics. Using Facebook as an example, let say Facebook want to understand their users’
behaviours (business objective), they can collect the users’ data, clean the data, visualize users’
demographic information (data acquisition and understanding), predict the users’ interests and
recommend the right articles and notifications on their news feed (modelling and deployment).
There are many different roles involve in a data science project, among them – Big Data Professional,
Data Analyst and Data Scientist. To further understand this, let’s look at the video on the roles in data
science team based on several case studies (refer the video “Roles in Data Science” in ULearnX).
**sample video: https://www.youtube.com/watch?v=jOB64yOzq2E&t=179s
After you have watched the video, attempt the exercise on “Big Data Analytics Concepts” in ULearnX.
SELF-LEARNING ACTIVITY
Define Big Data, Data Analytics and Data Science. Discuss an example that relates these terms.
1.2 MACHINE LEARNING TECHNIQUES

Based on the Data Science Lifecycle in Section 1.1, the modelling process make use of machine learning
techniques to learn from data and predict future trends of a certain business objective. Machine learning
trains an algorithm (algorithm is basically a set of rules or operations to get some output) so that it can
learn how to make a decision for itself. There are three ways that a machine learning algorithm works
– supervised learning, unsupervised learning and reinforcement learning.
For example, you want to teach a child about animals. You have three options:
• Option 1: You show him pictures of the animals that you already know and explain their
characteristics to them, so that he can recognize and name animals with similar characteristics
when he goes outside.
• Option 2: You can take him outside and let him figure out himself about the different kinds of
animals. In this way, he may use his own judgment to group the animals.
• Option 3: You let him play some sort of animal guessing game, and reward him if he answers
correctly. In this way, he can learn from his mistakes.
If you choose Option 1, you categorize him as a supervised learning classifier, where in supervised
learning, you provide the model with labelled data. If you go with Option 2, he is categorized as an
unsupervised learning classifier where the data are fed to the algorithm, which decides how to label
them. If you go with Option 3, he is categorized as a reinforcement learning classifier, or a reward-
based learner that learns from incorrect selection.
To further understand this topic, let’s have a look at the video on “Machine Learning Case Studies”
in ULearnX.
**sample videos: https://www.youtube.com/watch?v=hjh1ikznScg ,
https://www.youtube.com/watch?v=ukzFI9rgwfU
After you have watched the video, attempt the exercise on “Machine Learning Techniques” in
ULearnX.
SELF-LEARNING ACTIVITY
1. Give an example of Machine Learning project/problem. State which class of machine learning will
the solution be, and justify.
2. Now that you have learnt about big data analytics, can you give an example of how big data analytics
can be applied in the Process Engineering field? Write your answer in the forum.
SUMMARY
As you have learnt in this topic, the terms big data, data analytics and data science have relations with
each other and there are different types of machine learning techniques that can be used to achieve
specific objective. The following topic will be on the “Data Acquisition and Understanding” process of
the data science lifecycle where you will learn how to fix inconsistencies in your data.
KEYWORD
Big data, data analytics, data science, machine learning
REFERENCES
[1] Gartner IT Glossary (n.d.). Retrieved from http://www.gartner.com/it-glossary/big-data/.
[2] towardsdatascience.com (n.d). Retrieved from https://towardsdatascience.com/big-data-analysis-
spark-and-hadoop-a11ba591c057
[3] Microsoft (n.d.). Retrieved from https://docs.microsoft.com/en-us/azure/architecture/data-science-
process/lifecycle

SIM - Chapters - DA T1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SIM - Chapters - DA T1

Uploaded by

Copyright:

Available Formats

1

BIG DATA ANALYTICS

1.1 DATA AND THE NEW TRENDS

1.2 CONCEPTS IN BIG DATA ANALYTICS

Figure 1 10Vs of Big Data [2]

The details of each process are as follows:

1.2 MACHINE LEARNING TECHNIQUES

You might also like