Professional Documents
Culture Documents
SIM - Chapters - DA T1
SIM - Chapters - DA T1
LEARNING OUTCOMES
At the end of this topic, students should be able to:
• Differentiate between big data, data analytics and data science
• Differentiate among machine learning techniques
• Discuss real-world application of big data analytics in related field
INTRODUCTION
Have your ever been in a situation where you search or bought something on an online platform, let
say, Shopee, and then suddenly you see tons of advertisements about that similar product on Facebook?
Have you ever wondered how this could happened? Actually, what happens is that your behaviour is
being accurately predicted or monitored using big data analytics techniques. The aim is so that the
businesses can tailor customized advertising campaigns to different target groups.
In this topic, you will learn about the concepts in big data analytics where you will be introduced to the
common terms in this field which are big data, data analytics, and data science. You will also be exposed
to the different types of machine learning techniques that can be used to achieve different objectives in
big data analytics.
The importance of data to the business can be seen from its significant implications on global economy.
The nature of data creation has now changed, whereby the old model is where data is generated by
media sources like news, radio and television and it is utilized by the people. Whereas the new model
shows that data is generated and consumed by both the media and the people. The data is now generated
from numerous sources like social media apps, mobile devices, sensors/IoTs and many others. All of
these have been contributing to the existence of “big data”.
Big data is also commonly defined based on its characteristics of “Vs”. Despite many variations of the
definition (4Vs, 5Vs, 6Vs, 10Vs etc.), for the purpose of learning, Figure 1 containing 10Vs definition
is shown here.
The field of Data Science makes understanding of your behaviour easier by processing and analysing
the big data using a combination of Computer Science methods, statistics, and the right domain
knowledge. A data science lifecycle typically includes business understanding, data acquisition and
understanding, modelling, and deployment (see Figure 2). Refer to the following video for the
description of Big Data:
**Sample vide: https://www.youtube.com/watch?v=bAyrObl7TYE
Data Analytics is a part of Data Science that examines the data to get meaningful information with the
help of specialized tools. Based on Figure 1, the data acquisition and understanding process basically
defines data analytics. Using Facebook as an example, let say Facebook want to understand their users’
behaviours (business objective), they can collect the users’ data, clean the data, visualize users’
demographic information (data acquisition and understanding), predict the users’ interests and
recommend the right articles and notifications on their news feed (modelling and deployment).
There are many different roles involve in a data science project, among them – Big Data Professional,
Data Analyst and Data Scientist. To further understand this, let’s look at the video on the roles in data
science team based on several case studies (refer the video “Roles in Data Science” in ULearnX).
**sample video: https://www.youtube.com/watch?v=jOB64yOzq2E&t=179s
After you have watched the video, attempt the exercise on “Big Data Analytics Concepts” in ULearnX.
SELF-LEARNING ACTIVITY
Define Big Data, Data Analytics and Data Science. Discuss an example that relates these terms.
For example, you want to teach a child about animals. You have three options:
• Option 1: You show him pictures of the animals that you already know and explain their
characteristics to them, so that he can recognize and name animals with similar characteristics
when he goes outside.
• Option 2: You can take him outside and let him figure out himself about the different kinds of
animals. In this way, he may use his own judgment to group the animals.
• Option 3: You let him play some sort of animal guessing game, and reward him if he answers
correctly. In this way, he can learn from his mistakes.
If you choose Option 1, you categorize him as a supervised learning classifier, where in supervised
learning, you provide the model with labelled data. If you go with Option 2, he is categorized as an
unsupervised learning classifier where the data are fed to the algorithm, which decides how to label
them. If you go with Option 3, he is categorized as a reinforcement learning classifier, or a reward-
based learner that learns from incorrect selection.
To further understand this topic, let’s have a look at the video on “Machine Learning Case Studies”
in ULearnX.
**sample videos: https://www.youtube.com/watch?v=hjh1ikznScg ,
https://www.youtube.com/watch?v=ukzFI9rgwfU
After you have watched the video, attempt the exercise on “Machine Learning Techniques” in
ULearnX.
SELF-LEARNING ACTIVITY
1. Give an example of Machine Learning project/problem. State which class of machine learning will
the solution be, and justify.
2. Now that you have learnt about big data analytics, can you give an example of how big data analytics
can be applied in the Process Engineering field? Write your answer in the forum.
SUMMARY
As you have learnt in this topic, the terms big data, data analytics and data science have relations with
each other and there are different types of machine learning techniques that can be used to achieve
specific objective. The following topic will be on the “Data Acquisition and Understanding” process of
the data science lifecycle where you will learn how to fix inconsistencies in your data.
KEYWORD
Big data, data analytics, data science, machine learning
REFERENCES
[1] Gartner IT Glossary (n.d.). Retrieved from http://www.gartner.com/it-glossary/big-data/.
[2] towardsdatascience.com (n.d). Retrieved from https://towardsdatascience.com/big-data-analysis-
spark-and-hadoop-a11ba591c057
[3] Microsoft (n.d.). Retrieved from https://docs.microsoft.com/en-us/azure/architecture/data-science-
process/lifecycle