Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

KULIAH SAINS DATA

M1: PENDAHULUAN
Achmad Benny Mutiara
Universitas Gunadarma
2021
Definisi Data Science dari NIST

Definisi Data Science dari NIST (2018).


Data science is the extraction of useful knowledge
directly from data through a process of discovery, or
of hypothesis formulation and hypothesis testing.
Gambaran Umum
Tentang Sains Data
Session 1
Apa itu Sains Data ?

Programmer Statistian Programmer


Business Analyst Business Analyst

Data Scientist
Apa itu Sains Data ?
Sains Data: Multi-Disiplin
Siklus Hidup-nya
Komponen-Komponen-nya
Set Ketrampilan dan Peran Data Scientist
Penerapan Utama Data Science
Penerapan Data Science
Proses Sains Data
Definisi Data Scientist dari NIST
Definitions by NIST Big Data WG (NIST SP1500 - 2015)
 A Data Scientist is
 a practitioner who has sufficient knowledge in the
overlapping regimes of expertise in business needs,
domain knowledge, analytical skills, and pro-
gramming and systems engineering expertise to
manage the end-to-end scientific method process
through each stage in the big data lifecycle.
 Data science is the empirical synthesis of actionable Legacy: NIST BDWG definition of Data Science
knowledge and technologies required to handle data
from raw data through the complete data lifecycle
process.
Peran Data Scientist
Ciri-Ciri Data Scientist
Modern Data Scientist
Pilihan Karir Data Scientist
Tipikal Proyek Data Scientist
Jenjang Karir
Data Scientist vs Data Analyst
Data Scientist vs Statistian
Profil Lulusan
Program Studi Sains Data
Jenjang S1 dan S2
Session 2
Daftar Profil Lulusan Prodi Sains Data
Profil Profesional Sains Data tergolong keluarga pekerjaan (okupasi)
terkait data. Profil ini didefinisikan sebagai perluasan dari taksonomi
pekerjaan (okupasi) ESCO (European Skills, Competences,
Qualiications and Occupations)
Pekerjaan baru yang diusulkan ditempatkan dalam empat kelompok
klasifikasi teratas:
1) Manager, untuk peran manajerial
2) Professional, untuk pengembang aplikasi dan insinyur/perekayasa
infrastruktur (infrastructure engineers)
3) Teknisi dan Profesional Madya (associate professionals), untuk operator
dan teknisi
4) Pekerja pendukung klerikal (Clerical support workers) , untuk kurator
dan pengurus (stewards) data
Daftar Profil Lulusan Prodi Sains Data
1. Manager (S2) Peran/Deskripsi Tugas
A. Data science (group) manager data atau Proposes, plans and manages functional and technical
analytics department manager evolutions of the data science operations within the relevant
domain (technical, research, business)
B. Data science infrastructure manager atau Proposes, plans and manages functional and technical
research infrastructure data storage evolutions of the big data infrastructure within the relevant
facilities manager domain (technical research business)
C. Research infrastructure manager atau Proposes, plans and manages functional and technical
research infrastructure data storage evolutions of the research infrastructure within the relevant
facilities manager) scientific domain.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Data science professionals) Peran/Deskripsi Tugas
A. Data scientist (S2) Data scientists find and interpret rich data sources,
manage large amounts of data, merge data sources,
ensure consistency of datasets and create visualizations
to aid in understanding data. Build mathematical
models, present and communicate data insights and
findings to specialists and scientists and recommend
ways to apply the data.
B. Data science researcher (S2) Data science researcher applies scientific discovery
research/process, including hypothesis and hypothesis
testing, to obtain actionable knowledge related to
scientific problem, business process, or reveal hidden
relations between multiple processes.
C. Data science architect atau system architect Designs and maintains the architecture of data science
atau applications architect (S1 atau S2) applications and facilities. Creates relevant data models
and processes worklows.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Data science professionals) Peran/Deskripsi Tugas
D. Data science (application) programmer/ Designs/develops/codes large data analytics
engineer atau scientific programmer, data applications to support scientific or enterprise/business
engineer) (S1 atau S2) processes
E. (Big) Data analyst (S1 atau S2) Analyses a large variety of data to extract information
about system, service or organization performance and
presents them in usable/actionable form.
F. Business analyst (S1) Analyses a large variety of data Information system for
improving business performance.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Data science technology Peran/Deskripsi Tugas
professionals)
A. Data steward (S1) Plans, implements and manages (research) data input, storage,
search, presentation; creates data model for domain specific
data; supports and advises domain scientists/researchers.
Creates data model for domain-specific data, supports and
advises domain scientists/researchers during the whole research
cycle and data management life cycle
B. Digital data curator atau digital curator, digital Finds, selects, organizes, shares (exhibits) digital data collections,
archivist, digital librarian (S1) maintains their integrity, up-to-date status and freshness,
discoverability.
C. Data librarian (S1) Data librarians perform or support one or more of the following:
acquisition (collection development), organization (cataloguing
and metadata) and the implementation of appropriate user
services. Data librarians apply traditional librarianship principles
and practices to data management, including data citation, digital
object identifiers (DOIs), ethics and metadata.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Data science technology Peran/Deskripsi Tugas
professionals)
D. Data archivist atau digital archivists (S1) Maintain historically signiicant collections of datasets,
documents and records and other electronic data and
seek out new items for archiving.
Daftar Profil Lulusan Prodi Sains Data
2. Profesional (Database and network Peran/Deskripsi Tugas
professionals)
Large-scale (cloud) data storage designers and
administrators
A. Large-scale (cloud) database designer (data Designs/develops/codes large-scale databases and their
engineer, data architect) (S1) use in domain/subject-specific applications according to
the customer needs
B. Large-scale (cloud) database administrator Designs and implements or monitors and maintains
large-scale cloud databases.
C. Scientific database administrator (S1) Designs and implements or monitors and maintains
large-scale scientiic databases
Daftar Profil Lulusan Prodi Sains Data
3. Teknisi dan Profesional Madya Peran/Deskripsi Tugas
(Technicians and associate professionals)
Data infrastructure engineers and technicians
A. Big data facilities operators (D3 atau S1) Manages daily operation of facilities and resources and
responds to customer requests. Includes all operations
related to data management and data life cycle.
B. Large-scale (cloud) data storage operators (D3 Manages daily operation of cloud storage, including
atau S1) related to data life cycle, and responds to requests from
storage users
C. Scientific database operator (D3 – S1) Manages daily operation of scientific databases,
including related to data life cycle, and responds to
requests from database users.
Daftar Profil Lulusan Prodi Sains Data
4. Pekerja pendukung klerikal (Clerical Peran/Deskripsi Tugas
support worker)
Data and information entry and access
A. Data entry/access desk/terminal workers(D3) Enter data into data management systems directly
reading them from source, documents or obtained from
people/users
B. Data entry ield workers (D3) The same work done on field when collecting data from
disconnected sensors or doing direct counting or
reading
C. User support data services (D3 ) User support data services. Support users to entry their
data into governmental service and user facing
applications.
Data Science Professions Family (EDISON Data Science Framework (EDSF) )
Managers: Chief Data Officer (CDO), Data Science
(group/dept) manager, Data Science infrastructure manager, EDISON – Education for Data Intensive
Research Infrastructure manager Science to Open New science frontiers

Professionals: Data Scientist, Data Science Researcher, Data


Science Architect, Data Science (applications)
programmer/engineer, Data Analyst, Business Analyst, etc.

Professional (database): Large scale (cloud) database


designers and administrators, scientific database designers and
administrators

Professional and clerical (data handling/management):


Data Stewards, Digital Data Curator, Digital Librarians, Data
Archivists

Technicians and associate professionals: Big Data facilities


operators, scientific database/infrastructure operators

Icons used: Credit to [ref] https://www.datacamp.com/community/tutorials/data-science-industry-infographic


Membangun Tim Sains Data
Hubungan antara Sains Data,
Big Data, AI, Machine Learning
dan Deep Learning
Session 3
Hubungan DS-BD-AI-ML-DL Dewasa ini

Source: adaptation from Ian Goodfellow, et.al 2016 & and Matthew Mayo, 2016
Machine learning techniques
Machine learning mainly has three types of learning
techniques:
 Supervised learning
 Unsupervised learning
 Reinforcement learning
Machine Learning tasks categories

1. Classification
2. Regression
3. Clustering
4. Anomaly detection
5. Association
6. Recommendation
7. Dimensionality reduction
8. Computer Vision
9. Text Analytics
Proses Machine Learning
Tool Implementasi: Matlab
• Matlab https://www.mathworks.com/products/matlab.html
• Komersial versi terakhir R2020a
• Tersedia Toolbox: AI, Data Science, and Statistics
• Statistics and Machine Learning Toolbox
• Deep Learning Toolbox
• Reinforcement Learning Toolbox
• Text Analytics Toolbox
• Predictive Maintenance Toolbox

• Link buku Matlab:


https://drive.google.com/drive/folders/1qHLqc2kYrI7REC2UClijIZhrzICmm8A
F?usp=sharing
• Link buku Deep Learning with Matlab:
https://drive.google.com/drive/folders/1QuU9tAMPF-
XPwM4WmSBRiSYQoj8aA9Wg?usp=sharing
Tool Implementasi: RapidMiner
 RapidMiner https://rapidminer.com/
 platform perangkat lunak data science
 yang dikembangkan oleh perusahaan bernama sama dengan yang menyediakan lingkungan
terintegrasi untuk data preparation, machine learning, deep learning, text mining, and
predictive analytics.
 Digunakan untuk bisnis dan komersial, juga untuk penelitian, pendidikan, pelatihan, rapid
prototyping, dan pengembangan aplikasi serta mendukung semua langkah dalam proses
machine learning termasuk data preparation, results visualization, model validation and
optimization.
 RapidMiner dikembangkan pada open core model. Dengan RapidMiner Studio Free Edition,
yang terbatas untuk 1 prosesor logika dan 10.000 baris data, tersedia di bawah lisensi AGPL.
RapidMiner Studio 9.7 (https://my.rapidminer.com/nexus/account/index.html#downloads)
Harga komersial dimulai dari $2.500 dan tersedia dari pengembang.
 Link buku RapidMiner:
https://drive.google.com/drive/folders/1ln2R4ryr2qj_Iwbk-
ZZT_T9wTyvpuhaN?usp=sharing
Tool Implementasi: R-Studio
Mengapa Pakai R Language ?
 R is a free, open-source software and programming language developed in
1995 at the University of Auckland as an environment for statistical
computing and graphics (Ikaha and Gentleman, 1996).
 Since then R has become one of the dominant software environments for
data analysis and is used by a variety of scientific disiplines, including soil
science, ecology, and geoinformatics (Envirometrics CRAN Task View; Spatial
CRAN Task View).
 R is particularly popular for its graphical capabilities, but it is also prized for
it’s GIS capabilities which make it relatively easy to generate raster-based
models.
 More recently, R has also gained several packages which are designed
specifically for analyzing soil data.
Tool Implementasi: Python, Jupyter, Anaconda
 Python
 Versi 3.8.X
 Tersedia IDE: Spyder https://www.spyder-ide.org/
 Tool interactive: Jupyter (Project Jupyter exists to develop open-source software,
open-standards, and services for interactive computing across dozens of programming
languages.) https://jupyter.org/
 Toolkit: Anaconda (the open-source Individual Edition (Distribution) is the easiest way
to perform Python/R data science and machine learning on a single machine.
Developed for solo practitioners, it is the toolkit that equips you to work with
thousands of open-source packages and libraries) https://www.anaconda.com/
 Google Colab Colaboratory, or "Colab" for short, allows you to write and execute
Python in your browser, with
 Zero configuration required
 Free access to GPUs
 Easy sharing
 Whether you're a student, a data scientist or an AI researcher, Colab can make your work easier
https://colab.research.google.com/notebooks/intro.ipynb
Jupyter
Anaconda
 Link buku-buku
 Big-data dan Data Science:
https://drive.google.com/drive/folders/18jbNHjUWsRor8W64oNDxggOd
_yWqMzHs?usp=sharing
 Deep Learning dan Machine Learning:
https://drive.google.com/drive/folders/1hJ-
E5OJhg35R7LC7_bHy99ccoY7CJ3nO?usp=sharing
 Python:
https://drive.google.com/drive/folders/1zqr5GPjQhP96XqKcWMxcmeWi
MAAZ1iVx?usp=sharing
scikit-learn
 scikit-learn user guide, Mar 01, 2019 :
https://drive.google.com/drive/folders/1rRsU6WdnPUlT3d9NcsTuk
2f6N92PLZkc?usp=sharing
Terima Kasih

You might also like