Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

DS: Statistics using

R for Data Science

Created by Muhamad Rifki Taufik


ABOUT ME

•> Hi!! My name is Muhamad Rifki


Taufik. You may call me Rifki
MY EVOLUTION

Mathematics
Bachelor of Science in UNY
Mathematics and
Computer Science
Data Science Master of Science in PSU
Lecturer
University of Darussalam Gontor
Data Scientist
Innovatz Sollution

Data Scientist
Puslitbang, BMKG
Learning Objective

In Statistics using R for Data Science course you will:

A. Understand the function and characteristics of data in statistics.


B. Able to differentiate between Descriptive Statistics and
Inferential Statistics
C. Able to read data with CSV format
D. Understand the characteristics of data estimation
(modus,median, mean) and data distribution (range,
variance, standard deviation)
E. Understand the correlation between variables.

#Jadijagoandigital
Data Science

Data science is the domain of study that deals with vast


volumes of data using modern tools and techniques to find
unseen patterns, derive meaningful information, and make
business decisions. Data science uses complex machine
learning algorithms to build predictive models [2].

[1] Risk, Aya & Elragal, Ahmed. (2020). Data science: developing theoretical contributions in information systems via
text analysis. Journal of Big Data, Vol 7 (7).https://doi.org/10.1186/s40537-019-0280-6
#Jadijagoandigital
Course Definition
STATISTICS - Probability (generalize population from the sample)

Nilai Statistik = nilai


karakteristik sampel
(estimasi populasi).

Parameter = nilai
karakteristik
populasi.

[2] https://databasetown.com/statistics-for-data-science-descriptive-inferential-statistics/

#Jadijagoandigital
Course Definition

#Jadijagoandigital
Chapter #1 Summary :

1. Statistik - ilmu pengolahan, penyajian dan analisis data


2. Jenis data - nominal, ordinal, interval, rasio.
3. Jenis analisis data - statistik deskriptif & statistik inferensia
- statistik deskriptif = analisis data utk menggambarkan data sampel.
- statistik inferensia = analisis data utk membuat kesimpulan thd populasi.
1. Membaca dataset dengan read.csv
syntax = nama_variabel <-read.csv("path to file", sep=";")
1. Melihat tipe data dengan str untuk mengetahui skala pengukuran data
syntax = str (nama_variabel)
1. Mengubah tipe data menjadi Character dengan as.character
2. Mengubah kolom menjadi data kategorik (Factor) dengan as.factor
3. Estimasi karakteristik (modus, median, mean) - gunakan library (pracma)
4. Outlier - data yang jaraknya jauh dari keseluruhan data - gunakan median
5. Ukuran sebaran data (range, varians, simpangan baku)
*semakin kecil nilai std maka semakin akurat hasil estimasi.

#Jadijagoandigital
Chapter #2 SUMMARY :
#Jadijagoandigital
1. Analisis Deskriptif- analisa untuk membangun hipotesis (kesimpulan awal)
2. Function Summary - menampilkan kesimpulan tiap variabel.
syntax = summary(nama_variabel)
1. Visualisasi - analisis eksplorasi utk mengetahui sebaran data.
- function plot = grafik Bar Plot (untuk variabel tipe factor)
- function hist = grafik Histogram (untuk variabel tipe numerik/int)
1. Uji Hipotesis - pengambilan keputusan berdasarkan analisis data.
- hipotesis null (Ho) - berlawanan dengan teori yang akan dibuktikan
- hipotesis alternatif (Ha) - berhubungan dengan teori yg dibuktikan
1. Pengujian statistik = z-test, t-test, chi square-test, f-test.
2. P-value - peluang terkecil dalam menolak Ho.
3. Alpha - tingkat kesalahan (1%, 5%, atau 10%).
4. Ho ditolak jika p-value < alpha (5%), dan Ho diterima jika p-value > alpha (5%)
ANALISIS HUBUNGAN ANTAR VARIABEL

1. Scatter Plot = melihat arah hubungan


(positif/negatif).
2. Analsis Korelasi= menguji hubungan antar
variabel dan seberapa besar kuat hubungan
keduanya. gunakan function
cor.test/chisq.test/t.test
3. Hipotesis utk analisis korelasi :
- Ho = tidak ada hubungan antara kedua
variabel
- Ha = ada hubungan antara kedua variabel
1. Rentang nilai koefisien korelasi antara -1
sampai 1.
- Kuat jika mendekati -1 atau 1
- Lemah jika mendekati 0
1. Hubungan antar variabel numerik = scatter
plot, cor.test
2. Hubungan antar variabel kategorik =
tabulasi silang, chi-square test
3. Hubungan antar variabel numerik dan
kategorik = boxplot, t-test

#Jadijagoandigital
Reference on Statistics using R

● https://www.youtube.com/c/rprogramming101
● https://learningstatisticswithr.com/
● https://book.stat420.org/applied_statistics.pdf
● https://modernstatisticswithr.com/
● https://www.statmethods.net/stats/descriptives.html
● http://www.r-tutor.com/elementary-statistics
● https://bookdown.org/mikemahoney218/LectureBook/basic-statistics-using-r.html
● https://r4ds.had.co.nz/index.html
● https://www.rstudio.com/resources/cheatsheets/
● http://r-tutorial.nl/
● https://bookdown.org/steve_midway/DAR/learning-r.html
● http://scipy-lectures.org/

#Jadijagoandigital
Thank you

“Everyone is a genius.
But if you judge a fish
by its ability to climb a
tree, it will live its
whole life believing
that it is stupid.”
Lets’s
Hands-On

You might also like