Professional Documents
Culture Documents
M1 2021-DS Pendahuluan-1
M1 2021-DS Pendahuluan-1
M1: PENDAHULUAN
Achmad Benny Mutiara
Universitas Gunadarma
2021
Definisi Data Science dari NIST
Data Scientist
Apa itu Sains Data ?
Sains Data: Multi-Disiplin
Siklus Hidup-nya
Komponen-Komponen-nya
Set Ketrampilan dan Peran Data Scientist
Penerapan Utama Data Science
Penerapan Data Science
Proses Sains Data
Definisi Data Scientist dari NIST
Definitions by NIST Big Data WG (NIST SP1500 - 2015)
A Data Scientist is
a practitioner who has sufficient knowledge in the
overlapping regimes of expertise in business needs,
domain knowledge, analytical skills, and pro-
gramming and systems engineering expertise to
manage the end-to-end scientific method process
through each stage in the big data lifecycle.
Data science is the empirical synthesis of actionable Legacy: NIST BDWG definition of Data Science
knowledge and technologies required to handle data
from raw data through the complete data lifecycle
process.
Peran Data Scientist
Ciri-Ciri Data Scientist
Modern Data Scientist
Pilihan Karir Data Scientist
Tipikal Proyek Data Scientist
Jenjang Karir
Data Scientist vs Data Analyst
Data Scientist vs Statistian
Profil Lulusan
Program Studi Sains Data
Jenjang S1 dan S2
Session 2
Daftar Profil Lulusan Prodi Sains Data
Profil Profesional Sains Data tergolong keluarga pekerjaan (okupasi)
terkait data. Profil ini didefinisikan sebagai perluasan dari taksonomi
pekerjaan (okupasi) ESCO (European Skills, Competences,
Qualiications and Occupations)
Pekerjaan baru yang diusulkan ditempatkan dalam empat kelompok
klasifikasi teratas:
1) Manager, untuk peran manajerial
2) Professional, untuk pengembang aplikasi dan insinyur/perekayasa
infrastruktur (infrastructure engineers)
3) Teknisi dan Profesional Madya (associate professionals), untuk operator
dan teknisi
4) Pekerja pendukung klerikal (Clerical support workers) , untuk kurator
dan pengurus (stewards) data
Daftar Profil Lulusan Prodi Sains Data
1. Manager (S2) Peran/Deskripsi Tugas
A. Data science (group) manager data atau Proposes, plans and manages functional and technical
analytics department manager evolutions of the data science operations within the relevant
domain (technical, research, business)
B. Data science infrastructure manager atau Proposes, plans and manages functional and technical
research infrastructure data storage evolutions of the big data infrastructure within the relevant
facilities manager domain (technical research business)
C. Research infrastructure manager atau Proposes, plans and manages functional and technical
research infrastructure data storage evolutions of the research infrastructure within the relevant
facilities manager) scientific domain.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Data science professionals) Peran/Deskripsi Tugas
A. Data scientist (S2) Data scientists find and interpret rich data sources,
manage large amounts of data, merge data sources,
ensure consistency of datasets and create visualizations
to aid in understanding data. Build mathematical
models, present and communicate data insights and
findings to specialists and scientists and recommend
ways to apply the data.
B. Data science researcher (S2) Data science researcher applies scientific discovery
research/process, including hypothesis and hypothesis
testing, to obtain actionable knowledge related to
scientific problem, business process, or reveal hidden
relations between multiple processes.
C. Data science architect atau system architect Designs and maintains the architecture of data science
atau applications architect (S1 atau S2) applications and facilities. Creates relevant data models
and processes worklows.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Data science professionals) Peran/Deskripsi Tugas
D. Data science (application) programmer/ Designs/develops/codes large data analytics
engineer atau scientific programmer, data applications to support scientific or enterprise/business
engineer) (S1 atau S2) processes
E. (Big) Data analyst (S1 atau S2) Analyses a large variety of data to extract information
about system, service or organization performance and
presents them in usable/actionable form.
F. Business analyst (S1) Analyses a large variety of data Information system for
improving business performance.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Data science technology Peran/Deskripsi Tugas
professionals)
A. Data steward (S1) Plans, implements and manages (research) data input, storage,
search, presentation; creates data model for domain specific
data; supports and advises domain scientists/researchers.
Creates data model for domain-specific data, supports and
advises domain scientists/researchers during the whole research
cycle and data management life cycle
B. Digital data curator atau digital curator, digital Finds, selects, organizes, shares (exhibits) digital data collections,
archivist, digital librarian (S1) maintains their integrity, up-to-date status and freshness,
discoverability.
C. Data librarian (S1) Data librarians perform or support one or more of the following:
acquisition (collection development), organization (cataloguing
and metadata) and the implementation of appropriate user
services. Data librarians apply traditional librarianship principles
and practices to data management, including data citation, digital
object identifiers (DOIs), ethics and metadata.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Data science technology Peran/Deskripsi Tugas
professionals)
D. Data archivist atau digital archivists (S1) Maintain historically signiicant collections of datasets,
documents and records and other electronic data and
seek out new items for archiving.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Database and network Peran/Deskripsi Tugas
professionals)
Large-scale (cloud) data storage designers and
administrators
A. Large-scale (cloud) database designer (data Designs/develops/codes large-scale databases and their
engineer, data architect) (S1) use in domain/subject-specific applications according to
the customer needs
B. Large-scale (cloud) database administrator Designs and implements or monitors and maintains
large-scale cloud databases.
C. Scientific database administrator (S1) Designs and implements or monitors and maintains
large-scale scientiic databases
Daftar Profil Lulusan Prodi Sains Data
3. Teknisi dan Profesional Madya Peran/Deskripsi Tugas
(Technicians and associate professionals)
Data infrastructure engineers and technicians
A. Big data facilities operators (D3 atau S1) Manages daily operation of facilities and resources and
responds to customer requests. Includes all operations
related to data management and data life cycle.
B. Large-scale (cloud) data storage operators (D3 Manages daily operation of cloud storage, including
atau S1) related to data life cycle, and responds to requests from
storage users
C. Scientific database operator (D3 – S1) Manages daily operation of scientific databases,
including related to data life cycle, and responds to
requests from database users.
Daftar Profil Lulusan Prodi Sains Data
4. Pekerja pendukung klerikal (Clerical Peran/Deskripsi Tugas
support worker)
Data and information entry and access
A. Data entry/access desk/terminal workers(D3) Enter data into data management systems directly
reading them from source, documents or obtained from
people/users
B. Data entry ield workers (D3) The same work done on field when collecting data from
disconnected sensors or doing direct counting or
reading
C. User support data services (D3 ) User support data services. Support users to entry their
data into governmental service and user facing
applications.
Data Science Professions Family (EDISON Data Science Framework (EDSF) )
Managers: Chief Data Officer (CDO), Data Science
(group/dept) manager, Data Science infrastructure manager, EDISON – Education for Data Intensive
Research Infrastructure manager Science to Open New science frontiers
Source: adaptation from Ian Goodfellow, et.al 2016 & and Matthew Mayo, 2016
Machine learning techniques
Machine learning mainly has three types of learning
techniques:
Supervised learning
Unsupervised learning
Reinforcement learning
Machine Learning tasks categories
1. Classification
2. Regression
3. Clustering
4. Anomaly detection
5. Association
6. Recommendation
7. Dimensionality reduction
8. Computer Vision
9. Text Analytics
Proses Machine Learning
Tool Implementasi: Matlab
• Matlab https://www.mathworks.com/products/matlab.html
• Komersial versi terakhir R2020a
• Tersedia Toolbox: AI, Data Science, and Statistics
• Statistics and Machine Learning Toolbox
• Deep Learning Toolbox
• Reinforcement Learning Toolbox
• Text Analytics Toolbox
• Predictive Maintenance Toolbox