Ngon Ngu R

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Gii thiu ngn ng R

Trang 1

GII THIU NGN NG R


Phn tch v x l s liu l mt trong nhng thao tc cn thit v quan trng i vi cc nh nghin cu trong nhiu ngnh, nh sinh hc, a l, ton hc,Trc y, cc cng ty phn mm pht trin cc phn mm chuyn nghip nh SPSS, Excel, Stata, cho vic phn tch s liu. Tuy nhin, cc phn mm ny u l cc phn mm thng mi, c gi t vi trm n vi nghn USD, khng phi trng i hc hay trung tm nghin cu no cng c th mua c. Do , trong khong mi nm li y, cc nh nghin cu thng k trn th gii tp hp nhau li v pht trin mt cng c theo hng m ngun m sao cho tt c mi ngi u c th s dng v hon ton min ph. Cng c ny c tn l ngn ng R, mt trong nhng ngn ng c gii nghin cu s dng nhiu nht hin nay. Vit Nam, vic s dng ngn ng R vn cn mi m, v nhiu l do. Trong ti liu ny, chng ti mun cung cp mt cch nhn tng quan v ngn ng R. Cc ni dung chuyn su hn s c cung cp trong thi gian ti. 1. Tng quan v ngn ng R Ni mt cch ngn gn, R l mt phn mm s dng cho phn tch thng k v th. Tht ra v bn cht, R l ngn ng my tnh a nng, c th s dng cho nhiu mc tiu khc nhau, t tnh ton n gin, ton hc gi tr, tnh ton ma trn, n cc phn tch thng k phc tp. V l mt ngn ng cho nn ngi ta c th s dng R pht trin cc thnh phn mm chuyn mn cho mt vn tnh ton c bit. 2. Ci t v chy R s dng R vic u tin chng ta cn lm l ci t R trong my tnh ca mnh. lm iu ny chng ta truy cp vo website. http://cran.R-project.org v ti R xung. Khi ti R xung my tnh, bc k tip l ci t vo my tnh. lm vic ny, chng ta ch n gin nhn chut vo ti liu trn v lm theo hng dn cch ci t trn mn hnh. y l mt bc rt n gin ch cn 1 pht l vic ci t s hon tt. Sau khi hon tt vic ci t mt icon

s xut hin trn desktop ca my tnh. n y th chng ta sn sng s dng R c th nhp chut vo icon ny v chng ta s c mt window nh sau :
Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 2

3. Tnh ton dng lnh trong R R thng s dng di dng command line c ngha l chng ta phi g trc tip cc lnh vo prompt mu trn hnh. Cc lnh phi tun th nghim ngt cc lut ca ngn ng R. Mt cu lnh s c thc thi ngay sau khi nhn phm Enter R phn bit ch hoa v ch thng vd: library khc vi Library. Mt vn phm khc na l khi c hai ch ri nhau, R thng dng du chm thay khong trng, chng hn nh data.frame, t.test, read.table ... iu ny rt l quan trng nu khng s lm mt th gi ca ng s dng. Nu lnh g ra ng Vn phm th R s cho chng ta mt ci prompt khc hay cho ra kt qu no (ty theo lnh); nu lnh khng ng Vn Phm th R s a ra mt thng bo ngn l khng ng hay khng hiu. V d : khi chng ta g. > x <- rnorm(20) > th R s hiu v cho chng ta mt ci prompt khc. Nhng nu chng ta g lnh sau :
Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 3

> R is great R s khng hiu v a ra mt thng bo li. > Error: syntax error Khi mun ri khi R, chng ta s n gin nhn nt (x) trn gc tri window hay g lnh q(). 3.1 Vn phm ngn ng R Vn phm chung ca R l mt lnh (command) hay function. M l hm th phi c tham s; cho nn theo sau hm l nhng tham s m chng ta phi cung cp. chng hn nh: > reg <- lm(y~x) bit mt hm c nhng tham s no, chng ta dng lnh args(x), m trong x l hm m chng ta cn bit: R l mt ngn ng i tng . iu ny c ngha l cc d liu trong R c cha trong object. nh hng ny cng c vi nh hng n cch vit cu R. Chng hn nh thay v vit x = 5 nh thng thng chng ta vn vit, th R yu cu vit x == 5. i vi R php gn x = 5 tng ng vi x <- 5. Cch vit sau (dng k hiu <- ) c khuyn khch hn l cch vit trc (=). Mt s k hiu hay dng trong R : x == y x != y y<x x>y x <= y x >= y is.na(x) A&B x bng y x khng bng y y nh hn x x ln hn y x nh hn hoc bng y x ln hn hoc bng y c phi x l bin s missing A v B

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 4

A|B !

A hoc B khng l

Vi R th tt c cc cu ch hay lnh sau k hiu # u khng c hiu ng, v # l k hiu dnh cho ngi s dng thm vo cc ghi ch. 3.2 Cch t tn trong R t tn mt i tng hay mt bin s trong R kh linh hot, v R khng c nhiu gii hn nh trong cc phn mn khc. Tn mt i tng phi c vit lin nhau. Chng hn nh R chp nhn myobject nhng khng chp nhn my object . > myobject <- rnorm(10) > my object <- rnorm(10) Error: syntax error in my object Nhng i khi tn myobject kh c cho nn chng ta nn tch ri bng . Nh my.object. > my.object <- rnorm(10) Mt iu quan trng cn lu l R phn bit mu k t vit hoa v vit thng. Cho nn My.object khc so vi my.object. V d: > My.object.u <- 15 > my.object.L <- 5 > My.object.u + my.object.L > [1] 20 Mt vi iu cn lu khi t tn trong R. Khng nn t tn mt bin s bng k hiu _ nh my_object hay my-object.

Khng nn t tn mt object ging nh mt bin s trong mt d liu. V d: nu chng ta c mt data.frame vi bin s age trong , th chng ta c mt i tng trng tn vi age, tc l khng nn vit age <- age. Tuy nhin, nu data.frame tn l data th chng ta

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 5

c th cp n bin s age vi mt k t $ nh sau: data$age. (tc l bin s age trong data.frame data), v trong trng hp , age <- data$age c th chp nhn c. 2.2 Tr gip trong R Ngoi lnh args() R cn cung cp lnh help() ngi s dng c th hiu Vn phm ca tng hm. Chng hn nh mun bit hm lm c nhng tham s g chng ta ch cn g lnh: > help() hay > ?lm mt ca s s hin ra bn ngoi ca mn hnh ch r cch s dng ra sao v thm ch c c v du. S dng lnh help.start() mt ca s s xut hin ch dn ton b h thng R. Hm apropos cng rt c ch v n cung cp cho chng ta tt c cc hm trong R bt u bng k t m chng ta mun tm. Chng hn nh chng ta mun bit hm no trong R c k t lm th ch g lnh: > apropos(lm) .

4. Lm vic vi d liu trong R : 4.1 Nhp d liu : Mun lm phn tch d liu bng R, chng ta phi c sn d liu dng m R c th hiu c x l. D liu m R hiu c phi l d liu trong mt data.frame. C nhiu cch nhp s liu vo mt data.frame trong R, t nhp trc tip n nhp t cc ngun khc nhau. Sau y l nhng cch thng dng nht: 4.1.1 Nhp s liu bng dng lnh : nhp s liu trc tip chng ta s dng function c( ). Lnh ny cho php chng ta to ra mt ct d liu . C php ca hm ny :

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 6

>Tn_bin_lu_d_liu <- c(phn_t_th_1, phn_t_th_2, phn_t_th_n). V d 1: a <-c(1,2,3,4,5,6,7,8,9,10,11,12 ) b <-c(12,11,10,9,8,7,6,5,4,3,2,1) 2 Cu lnh trn cho thy chng ta mun to mt ct d liu gm cc phn t t 1 n 12 , v ct d liu ny c lu trong mt bin c tn l a v mt ct d liu bao gm cc phn t t 12 tr v 1 v c lu trong mt bin c tn l b. R l mt ngn ng hng i tng hai bin a v b trn l 2 i tng ring l, ta c th kt hp chng to thnh mt khung s liu , R c th x l chng sau ny . lm c iu ny , chng ta dng phng thc data.frame. C php ca phng thc ny nh sau : Tn_bin _lu_tr <-data.frame(tham_s_1, tham_s_2, . tham_s_n) Cc tham s y l cc ct d liu c khi to bng function c(). V d : ab <- data.frame(a,b) (vi a v b l cc bin c khi to trong v d 1) cu lnh ny cho R bit rng chng ta mun kt hp hai ct ring l a v b thnh mt khung s liu v c lu tr trong bin c tn l ab. xem thng tin c lu tr trong bin ab va to ra , ta ch cn g lnh : >ab Sau R s hin th : a 1 2 3 4 b 1 2 3 4 12 11 10 9

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 7

5 6 7 8 9 10 11 12

5 6 7 8 9 10 11 12

8 7 6 5 4 3 2 1

lu cc s liu ny trong mt file c nh dang ph hp vi R nhm mc ch s dng cho cc ln sau, chng ta dng lnh save vi c php nh sau : save(tn_bin , tnfile.rda) Khi bin s c lu vo a vi v tr c xc nh bi lnh setwd(). C php ca lnh ny : setwd(tn_ng _dn_m_ta mun_lu_file) 4.1.2 Nhp s liu trc tip : Chng ta c th nhp s liu v tui v insulin cho 10 bnh nhn bng mt function rt c ch, l: edit(data.frame()). Vi function ny, R s cung cp cho chng ta mt window mi vi mt dy ct v dng ging nh Excel, v chng ta c th nhp s liu trong bng . V d: > ins <- edit(data.frame()) Chng ta s c mt ca s nh sau:

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 8

y, R khng bit chng ta c bin s no, cho nn R lit k cc bin s var1,var2, v.v Nhp chut vo ct var1 v thay i bng cch g vo a. Nhp chut vo ct var2 v thay i bng cch g vo b. Sau g s liu cho tng ct. Sau khi xong, click vo biu tng close gc phi ca spreadsheet, chng ta s c mt data.frame tn ins vi hai bin s a v b. 4.1.3 Nhp d liu t file text : Chng ta thu thp s liu v tui v cholesterol t mt nghin cu 50 bnh nhn mc bnh cao huyt p. Cc s liu ny c lu trong mt text file c tn l chol.txt ti directory c:\works\stats. S liu ny nh sau: ct 1 l m s ca bnh nhn, ct 2 l gii tnh, ct 3 l body mass index (bmi), ct 4 l HDL cholesterol (vit tt l hdl), k n l LDL cholesterol, total cholesterol (tc) v triglycerides (tg).

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 9

Chng ta mun nhp cc d liu ny vo R tin vic phn tch sau ny. Chng ta s s dng lnh read.table nh sau: > setwd(c:/works/stats) > chol <- read.table("chol.txt", header=TRUE) Lnh th nht chng ta mun m bo R truy nhp ng directory m s liu ang c lu gi. Lnh th hai yu cu R nhp s liu t file c tn l chol.txt(trong directory c:\works\stats) v cho vo i tng chol. Trong lnh ny, header=TRUE c ngha l yu cu

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 10

> chol Hay > names(chol) R s cho bit c cc ct nh sau trong d liu (name l lnh hi trong d liu c nhng ct no v tn g): [1] "id" "sex" "age" "bmi" "hdl" "ldl" "tc" "tg" By gi chng ta c th lu d liu di dng R x l sau ny bng cch ra lnh: > save(chol, file="chol.rda") 4.1.4 Nhp d liu t file xls (Excel) : R cng cho php chng ta c th nhp d liu t mt file nh dng xls ca Excel mt cch n gin ch vi vi thao tc. Trc tin chng ta lu li file xls di nh dng *.csv R c th x l c. Sau s dng lnh read.csv() x l. C php ca lnh ny nh sau : Tn_bin_lu_liu<-read.csv(ng_dn_n_file_csv,HEADER=true) Tham s HEADER = true cho R bit chng ta mun chn dng u tin ca file xls lm tn ca cc ct. Sau khi thc hin lnh ny chng ta c mt i tng chuNn ca R lu tr d liu ca file xls ban u. Chng ta c th lu li i tng ny cho cc ln lm vic sau bng ln save() c gii thiu trn. 4.2 X l d liu : Bin tp s liu y khng c ngha l thay i s liu gc (v l mt ti ln, mt s gian di trong khoa hc khng th chp nhn c), m ch c ngha t chc s liu sao cho R c th phn tch mt cch hu hiu. N hiu khi trong phn tch thng k, chng ta cn phi tp trung s liu thnh mt nhm, hay tch ri thnh tng nhm, hay thay th t k t (characters) sang s (numeric) cho tin vic tnh ton. Trong chng ny, ti s bn qua mt s lnh cn bn cho vic bin tp s liu. Chng ta s quay li vi d liu chol trong v d 1.

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 11

> setwd(c:/works/stats) > chol <- read.table(chol.txt, header=TRUE) > attach(chol) 4.2.1 Kim tra s liu trng khng (missing value) Trong nghin cu, v nhiu l do s liu khng th thu thp c cho tt c i tng, hay khng th o lng tt c bin s cho mt i tng. Trong trng hp , s liu trng c xem l missing value . R xem cc s liu trng khng l N A. C mt s kim nh thng k i hi cc s liu trng khng phi c loi ra (v khng th tnh ton c) trc khi phn tch. R c mt lnh rt c ch cho vic ny: na.omit, v cch s dng nh sau: > chol.new <- na.omit(chol) Trong lnh trn, chng ta yu cu R loi b cc s liu trng khng trong data.frame chol v a cc s liu khng trng vo data.frame mi tn l chol.new. Ch lnh trn ch l v d, v trong d liu chol khng c s liu trng khng. 4.2.2 Tch ri d liu: subset N u chng ta, v mt l do no , ch mun phn tch ring cho nam gii, chng ta c th tch chol ra thnh hai data.frame, tm gi l nam v nu. lm chuyn ny,chng ta dng lnh subset(data, cond), trong data l data.frame m chng ta mun tch ri, v cond l iu kin. V d: > nam <- subset(chol, sex==N am) > nu <- subset(chol, sex==N u) Sau khi ra hai lnh ny, chng ta c 2 d liu (hai data.frame) mi tn l nam v nu. Ch iu kin sex == N am v sex == N u chng ta dng == thay v = ch iu kin chnh xc. Tt nhin, chng ta cng c th tch d liu thnh nhiu data.frame khc nhau vi nhng iu kin da vo cc bin s khc. Chng hn nh lnh sau y to ra mt data.frame mi tn l old vi nhng bnh nhn trn 60 tui: > old <- subset(chol, age>=60)

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 12

> dim(old) [1] 25 8 Hay mt data.frame mi vi nhng bnh nhn trn 60 tui v nam gii: > n60 <- subset(chol, age>=60 & sex==N am) > dim(n60) [1] 9 8 4.2.3 Chit s liu t mt data .frame Trong chol c 8 bin s. Chng ta c th chit d liu chol v ch gi li nhng bin s cn thit nh m s (id), tui (age) v total cholestrol (tc). t lnh names(chol) rng bin s id l ct s 1, age l ct s 3, v bin s tc l ct s 7. Chng ta c th dng lnh sau y: > data2 <- chol[, c(1,3,7)] y, chng ta lnh cho R bit rng chng ta mun chn ct s 1, 3 v 7, v a tt c s liu ca hai ct ny vo data.frame mi c tn l data2. Ch chng ta s dng ngoc kp vung [] ch khng phi ngoc kp vng (), v chol khng phi lm mt function. Du phNy pha trc c, c ngha l chng ta chn tt c cc dng s liu trong data.frame chol. N hng nu chng ta ch mun chn 10 dng s liu u tin, th lnh s l: > data3 <- chol[1:10, c(1,3,7)] > print(data3) id sex tc 1 1 N am 4.0 2 2 N u 3.5 3 3 N u 4.7 4 4 N am 7.7 5 5 N am 5.0

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 13

6 6 N u 4.2 7 7 N am 5.9 8 8 N am 6.1 9 9 N am 5.9 10 10 N u 4.0 Ch lnh print(arg) n gin lit k tt c s liu trong data.frame arg. Tht ra, chng ta ch cn n gin g data3, kt qu cng ging y nh print(data3). 4.2.4 Nhp hai data.frame thnh mt: merge Gi d nh chng ta c d liu cha trong hai data.frame. D liu th nht tn l d1 gm 3 ct: id, sex, tc nh sau: id sex tc 1 N am 4.0 2 N u 3.5 3 N u 4.7 4 N am 7.7 5 N am 5.0 6 N u 4.2 7 N am 5.9 8 N am 6.1 9 N am 5.9 10 N u 4.0 D liu th hai tn l d2 gm 3 ct: id, sex, tg nh sau: id sex tg

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 14

1 N am 1.1 2 N u 2.1 3 N u 0.8 4 N am 1.1 5 N am 2.1 6 N u 1.5 7 N am 2.6 8 N am 1.5 9 N am 5.4 10 N u 1.9 11 N u 1.7 Hai d liu ny c chung hai bin s id v sex. N hng d liu d1 c 10 dng, cn d liu d2 c 11 dng. Chng ta c th nhp hai d liu thnh mt data.frame bng cch dng lnh merge nh sau: > d <- merge(d1, d2, by="id", all=TRUE) >d id sex.x tc sex.y tg 1 1 N am 4.0 N am 1.1 2 2 N u 3.5 N u 2.1 3 3 N u 4.7 N u 0.8 4 4 N am 7.7 N am 1.1 5 5 N am 5.0 N am 2.1 6 6 N u 4.2 N u 1.5

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 15

7 7 N am 5.9 N am 2.6 8 8 N am 6.1 N am 1.5 9 9 N am 5.9 N am 5.4 10 10 N u 4.0 N u 1.9 11 11 <N A> N A N u 1.7 Trong lnh merge, chng ta yu cu R nhp 2 d liu d1 v d2 thnh mt v a vo data.frame mi tn l d, v dng bin s id lm chuNn. Chng ta thy bnh nhn s 11 khng c s liu cho tc, cho nn R cho l N A (mt dng not available). 4.2.5 M ha s liu (data coding) Trong vic x l s liu dch t hc, nhiu khi chng ta cn phi bin i s liu t bin lin tc sang bin mang tnh cch phn loi. Chng hn nh trong chNn on long xng, nhng ph n c ch s T ca mt cht khong trong xng (bone mineral density hay BMD) bng hay thp hn -2.5 c xem l long xng, nhng ai c BMD gia -2.5 v -1.0 l xp xng (osteopenia), v trn -1.0 l bnh thng. V d, chng ta c s liu BMD t 10 bnh nhn nh sau: -0.92, 0.21, 0.17, -3.21, -1.80, -2.60, -2.00, 1.71, 2.12, -2.11 nhp cc s liu ny vo R chng ta c th s dng function c nh sau: bmd <- c(-0.92,0.21,0.17,-3.21,-1.80,-2.60,-2.00,1.71,2.12,-2.11) phn loi 3 nhm long xng, xp xng, v bnh thng, chng ta c th dng m s 1, 2 v 3. N i cch khc, chng ta mun to nn mt bin s khc (hy gi l diagnosis) gm 3 gi tr trn da vo gi tr ca bmd. lm vic ny, chng ta s dng lnh: # tm thi cho bin s diagnosis bng bmd > diagnosis <- bmd # bin i bmd thnh diagnosis > diagnosis[bmd <= -2.5] <- 1 > diagnosis[bmd > -2.5 & bmd <= 1.0] <- 2

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 16

> diagnosis[bmd > -1.0] <- 3 # to thnh mt data frame > data <- data.frame(bmd, diagnosis) # lit k kim tra xem lnh c hiu qu khng > data bmd diagnosis 1 -0.92 3 2 0.21 3 3 0.17 3 4 -3.21 1 5 -1.80 2 6 -2.60 1 7 -2.00 2 8 1.71 3 9 2.12 3 10 -2.11 2 4 .2.5.1 Bin i s liu bng cch dng replace Mt cch bin i s liu khc l dng replace, d cch ny c v rm r cht t. Tip tc v d trn, chng ta bin i t bmd sang diagnosis nh sau: > diagnosis <- bmd > diagnosis <- replace(diagnosis, bmd <= -2.5, 1) > diagnosis <- replace(diagnosis, bmd > -2.5 & bmd <= 1.0, 2)

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 17

> diagnosis <- replace(diagnosis, bmd > -1.0, 3) 4 .2.5.2 Bin i thnh yu t (factor) Trong phn tch thng k, chng ta phn bit mt bin s mang tnh yu t (factor) v bin s lin tc bnh thng. Bin s yu t khng th dng tnh ton nh cng tr nhn chia, nhng bin s s hc c th s dng tnh ton. Chng hn nh trong v d bmd v diagnosis trn, diagnosis l yu t v gi tr trung bnh gia 1 v 2 chng c ngha thc t g c; cn bmd l bin s s hc. N hng hin nay, diagnosis c xem l mt bin s s hc. bin thnh bin s yu t, chng ta cn s dng function factor nh sau: > diag <- factor(diagnosis) > diag [1] 3 3 3 1 2 1 2 3 3 2 Levels: 1 2 3 Ch R by gi thng bo cho chng ta bit diag c 3 bc: 1, 2 v 3. N u chng ta yu cu R tnh s trung bnh ca diag, R s khng lm theo yu cu ny, v khng phi l mt bin s s hc: > mean(diag) [1] N A Warning message: argument is not numeric or logical: returning N A in: mean.default(diag) D nhin, chng ta c th tnh gi tr trung bnh ca diagnosis: > mean(diagnosis) [1] 2.3 nhng kt qu 2.3 ny khng c ngha g trong thc t c. 4 .2.6 Chia nhm bng cut

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 18

Vi mt bin lin tc, chng ta c th chia thnh nhiu nhm bng hm cut. V d, chng ta c bin age nh sau: > age <- c(17,19,22,43,14,8,12,19,20,51,8,12,27,31,44) tui thp nht l 8 v cao nht l 51. N u chng ta mun chia thnh 2 nhm tui: > cut(age, 2) [1] (7.96,29.5] (7.96,29.5] (7.96,29.5] (29.5,51] (7.96,29.5] (7.96,29.5] (7.96,29.5] (7.96,29.5] [9] (7.96,29.5] (29.5,51] (7.96,29.5] (7.96,29.5] (7.96,29.5] (29.5,51] (29.5,51] Levels: (7.96,29.5] (29.5,51] cut chia bin age thnh 2 nhm: nhm 1 tui t 7.96 n 29.5; nhm 2 t 29.5 n 51. Chng ta c th m s i tng trong tng nhm tui bng hm table nh sau: > table(cut(age, 2)) (7.96,29.5] (29.5,51] 11 4 > ageg <- cut(age, 3, labels=c("low", "medium", "high")) [1] low low low high low low low low low high low low medium medium [15] high Levels: low medium high > ageg <- cut(age, 3, labels=c("low", "medium", "high")) > table(ageg) ageg

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 19

low medium high 10 2 3 Tt nhin, chng ta cng c th chia age thnh 4 nhm (quartiles) bng cch cho nhng thng s 0, 0.25, 0.50 v 0.75 nh sau: cut(age, breaks=quantiles(age, c(0, 0.25, 0.50, 0.75, 1)), labels=c(q1, q2, q3, q4), include.lowest=TRUE) cut(age, breaks=quantiles(c(0, 0.25, 0.50, 0.75, 1)), labels=c(q1, q2, q3, q4), include.lowest=TRUE) 4 .7. Tp hp s liu bng cut2 (Hmisc) Hm cut trn chia bin s theo gi tr ca bin, ch khng da vo s mu, cho nn s lng mu trong tng nhm khng bng nhau. Tuy nhin, trong phn tch thng k, c khi chng ta cn phi phn chia mt bin s lin tc thnh nhiu nhm da vo phn phi ca bin s nhng s mu bng hay tng ng nhau. Chng hn nh i vi bin s bmd chng ta c th ct dy s thnh 3 nhm vi s mu tng ng nhau bng cch dng function cut2 (trong th vin Hmisc) nh sau: > # nhp th vin Hmisc c th dng function cut2 > library(Hmisc) > bmd <- c(-0.92,0.21,0.17,-3.21,-1.80,-2.60,-2.00,1.71,2.12,-2.11) > # chia bin s bmd thnh 2 nhm v trong i tng group > group <- cut2(bmd, g=2) > table(group)
Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 20

group [-3.21,-0.92) [-0.92, 2.12] 55 N h thy qua v d trn, g = 2 c ngha l chia thnh 2 nhm (g = group). R t ng chia thnh nhm 1 gm gi tr bmd t -3.21 n -0.92, v nhm 2 t -0.92 n 2.12. Mi nhm gm c 5 s. Tt nhin, chng ta cng c th chia thnh 3 nhm bng lnh: > group <- cut2(bmd, g=3) V vi lnh table chng ta s bit c 3 nhm, nhm 1 gm 4 s, nhm 2 v 3 mi nhm c 3 s: > table(group) group [-3.21,-1.80) [-1.80, 0.21) [ 0.21, 2.12] 433 5. Tnh ton dng lnh trong R : R cung cp rt nhiu cc php ton v cc hm (function) a dng phuc v cho vic tnh ton , hu ht cc hm s thng dng u c h tr bi R. N goi ra cn rt nhiu cc hm phc v cho cho cc cng vic tnh ton phc tp v nng cao cng c cung cp bi rt nhiu cc gi m rng dnh cho R. Chng ta c th ti cc gi m rng ti a ch sau: http://www.cran.r-project.org/ Chn mc packages sau bn c th ti cc gi cn thit mt cch d dng . Cc gi ny l hon ton min ph. Sau y l mt s lnh n gin dung tnh ton ca R :

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 21

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 22

6. Lp trnh vi ngn ng R : 6.1 Tng quan v lp trnh vi R : N gn ng R c rt nhiu u im so vi cc ngn ng lp trnh bc cao nh C , C++ , Java. R c kh nng iu khin d liu v lu tr s liu, R cn c tnh nguyn bn. R cho php s dng ma trn i s. C th s dng bng bm v cc biu thc chnh quy R cng h tr lp trnh hng i tng. Kh nng biu din ha phong ph. N gn ng R cng cung cp cc cu trc iu khin c bn nh cc ngn ng lp trnh bc cao khc. V d nh : Ifelse;while.;forv..v..

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 23

R cng c mt s nhc im hay c th gi l thiu st , tuy nhin cc nhc im ny c th c khc phc d dng bi chnh R : R khng phi l mt c s d liu nhng li c th kt ni vi cc h qun tr c s d liu (DBMS) R khng c giao din ha ngi dng, nhng n co th kt ni vi Java, TclTk. Vic din gii ngn ng R c th rt chm, nhng c th cho php gi ti cc m C hoc C++. R khng c cc bng tnh quan st d liu, nhng n c th kt ni vi Excel/MSOffice. Mi cu lnh ca R kt thc bng phm Enter, iu ny gy ra s bt tin trong khi lp trnh, t bit l khi xy dng mt hm, ch cn sai mt dng lnh, ta s phi lm li t u. Mt nhc im khc ca R l n khng chuyn nghip v khng h tr thng mi .

6.2 Cc kiu d liu c bn s dng trong lp trnh vi R : N gn ng R khng bt buc phi khai bo kiu d liu ngay khi khai bo 1 bin , kiu d liu ca mt bin s c xc nh khi ta gn mt gi tr c th cho n. R gm c 4 kiu d liu c bn : - numeric (kiu s): - character - string - logical Khc vi cc ngn ng lp trnh bc cao, cc gi tr s trong R khng chia thnh kiu thc(real) v kiu nguyn(integer). D liu kiu s trong R ch c 1 kiu duy nht. V d : >a<-10 >b<-1.5 Du <-tng t vi php gn = trong cc ngn ng lp trnh bc cao, chng ta cng c th thay du <- bi du = nhng c khuyn co l nn s dng du <-.

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 24

Kiu character v string trong R cng tng t nh trong cc ngn ng lp trnh bc cao. Cc gi tr char v string c t trong cp du . V d : >a<-nguyen trung kien >b<-a Kiu logical tng t nh kiu bool (gi tr logic ) trong cc ngn ng lp trnh bc cao. Kiu d liu ny ch gm 2 gi tr ng hoc sai. V d : >a<-((1+3)==4) >a [1]TRUE 6.3 Cc kiu d liu tuyn tnh trong R : y ta s tm hiu v 2 kiu d liu tuyn tnh trong R l array v list. Mt mng (array) trong R c khai bo theo c php sau : >a<- c(1,2,3,4) >a[3] [1]3 Khc vi mng, list trong R cng l mt dy cc phn t nhng chng c gi theo tn ca phn t c t khi khai bo, v d khai bo mt list : >kien<-list(name=kien,class=k57b,age=21) >kien$name [1]kien >kien$class [1]k57b >kien$age

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 25

[1]21 6.4 Cc cu trc iu khin c bn trong R : N gn ng R cung cp cho chng ta cc cu trc iu khin c bn nh trong a s cc ngn ng lp trnh bc cao khc 6.4.1 Cu trc r nhnh : Cu trc ifelse..:

C php : if(biu_thc_logic){khi_lnh_1}else{khi_lnh_2} khi_lnh_1 c thc hin nu biu thc logic trong du ngoc tr v kt qu true (ng), trong trng hp ngc li th khi_lnh_2 s c thc hin. V d : if(1==0){print(1)}else{print} Cu trc ifelse: C php : ifelse(Biu thc logic, khi lnh 1, khi lnh 2) Khi lnh 1 c thc hin khi biu thc logic tr v TRUE, ngc li th s thc hin khi lnh 2. V d: x <- 1:10 ifelse(x<5 | x>8, x, 0) 6.4.2 Cu trc lp : Cu trc lp vi s ln lp xc nh for :

C php :

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 26

for(bin_iu_khin in khong_gi_tr) { on_lnh_cn_lp }

V d :

x <- 1:10; z <- NULL for(i in seq(along=x)) { if (x[i]<5) { z <- c(z,x[i]-1) } else { z <- c(z,x[i]/x[i]) } }

Cu trc lp vi s ln lp khng bit trc :

>while(iu_kin_lp){on_lnh_cn_lp} Trong cu trc lp ny , on lnh cn lp s c thc hin trong khi biu_thc_iu_kin cn ng . V d : >i=0 >while(i<5){print(i) i=i+1 } Trong R mt on lnh c ng gi bi mt cp du {}, v vy khi bt u thc hin mt on lnh bi du { cc cu lnh trong on lnh s khng c thc thi ngay sau khi nhn Enter, lc con tr dng lnh s chuyn sang du + , khi chng trnh gp du } th on lnh trong cp du {} mi c thc thi. 6.5 Xy dng hm trong R : C php nh ngha mt hm trong R : >myfct <- function(arg1, arg2, ...) { function_body }

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 27

Gi tr c tr v bi mt hm l gi tr c to bi thn hm , gi tr ny thng c tr v trong dng lnh cui ca thn hm , v d ta c th dng cu lnh return() tr v gi tr ca hm . C php khi gi mt hm c nh ngha : myfct(arg1=..., arg2=...) Cc quy tc c php : Thng thng : Cc chc nng c nh ngha bng vic gn bi t kha function. Phn khai bo cc tham s c t trong cp du () , cc tham s c ngn cch bi du ,. Cc cu lnh thc hin chc nng ca hm nm trong phn thn hm gia hai du {} , cn phi gn tn cho hm c th gi li sau ny. Cch t tn hm : Tn hm gn nh c th t bng bt c cch no, tuy nhin cn trnh t tn hm trng cc hm sn c trong R. Chc nng ca phn thn hm : Ti y cc cu lnh iu khin v thc hin chc nng ca hm c khai bo. Cc cu lnh ring bit c ngn cch nhau bi du ; Phm vi ca bin : Mt bin c khai bo trong mt hm s ch tn ti trong thi gian hot ng ca hm . V vy, chng ta khng th gi ti 1 bin c khai bo bn trong mt hm t bn ngoi hm . Chng ta c th lm iu ny bng cch khai bo bin l mt bin ton cc vi ton t <<- thay v ton t gn <- thng thng. V D: vit hm so snh 2 s a v b sosanh <- function(a,b) { if(a >= b)

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

Gii thiu ngn ng R

Trang 28

{return (TRUE)}else{return (FALSE)} } V d: gii phng trnh bc 2: giai <- function(a,b,c) {if(a==0) {ifelse(b==0,result <- c("pt vo nghiem"),result <- -c/b)} else {delta <- b*b-4*a*c if(delta > 0) { re1 <- (-b+sqrt(delta))/2*a re2 <- (-b-sqrt(delta))/2*a result <- c(re1,re2) } else{if(delta == 0){result <- -b/2*a}else{ result <- c("pt vo nghiem")} } } return (result) }

Bi Quang H & Nguyn Trung Kin K57 Khoa CNTT - HSPHN

You might also like