Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

TP SAN TIN HC QUN L

Tp 03, s 1&2, 2014, 53-73.

BIG DATA: BC TRANH TON CNH


L Th Qunh Nga1
Nguyn Mnh Tun1
Abstract: Given the advantages of Big Data and the significant impact that Big Data and
its related applications have had on achieving competitive edge, Big Data has been
considered as new capability for driving and achieving business value. However, to gain
the full potential of Big Data and to successfully implement a Big Data project, it is
necessary that Big Data related problems are defined. This paper provides an overview
about Big Data, including Big Data definition, characteristics and what we need to know
about Big Data in business and technology perspectives. Then, this paper provides some
Big Data problems in research and in practice.
Tm tt: Vi nhng u im v tc ng mnh m ca D liu ln (Big Data) v cc ng
dng lin quan, Big Data ang c xem nh mt yu t quyt nh n vic pht trin
cng nh mang li li th cnh tranh ca cc t chc. Tuy nhin, t c s thnh
cng trong vic xy dng v thc hin cc d n Big Data, nhng vn c lin quan
cn c xc nh, t tm ra phng hng gii quyt. Bi bo ny cung cp ci
nhn tng quan v Big Data, cc ng dng ca Big Data v kha cnh k thut ca Big
Data. Bi bo cng nu mt s kh khn m cc nh nghin cu v cc doanh nghip
cn quan tm.
T kha:Big Data, cc ng dng, Hadoop, d liu.

PHN 1

GII THIU

Ngy nay, s pht trin ca Internet lm thay i mnh m cch thc hot ng
ca cc t chc. Cc ng dng Web 2.0, mng x hi, in ton m my mt
phn mang li cho cc t chc phng thc kinh doanh mi [1]. Trong k nguyn
ca IoT2, cc cm bin c nhng vo trong cc thit b di ng nh in thoi di
ng, t, v my mc cng nghip gp phn vo vic to v chuyn d liu, dn
n s bng n ca d liu c th thu thp c [2]. Theo mt bo co ca IDC,
nm 2011, lng d liu c to ra trn th gii l 1.8ZB 3, tng gn 9 ln ch
trong 5 nm [3]. Di s bng n ny, thut ng Big Data c s dng ch
nhng b d liu khng l, ch yu khng c cu trc 4, c thu thp t nhiu
ngun khc nhau.
Vi nhng tc ng trong vic khm ph gi tr tim n to ln, Big Data ang
c xem l mt yu t mi quan trng mang li li ch cho cc t chc trong
nhiu lnh vc khc nhau [4, 5]. Trong mt kho st ca t chc Oracle Corp v

Khoa H Thng Thng Tin Kinh Doanh H Kinh T HCM

IoT (Internet of Things) ch cc cm bin c cm bin c nhng trong cc thit b, c lin kt vi


nhau bi cc mng my tnh (Trch t ngun
http://www.mckinsey.com/insights/high_tech_telecoms_internet/the_internet_of_things)
3
4

1 zettabyte (ZB) = 240 gigabytes (GB) ~ 1,000,000,000,000GB

D liu khng cu trc ch cc loi d liu khng theo mt nh dng c th nh: hnh nh, m thanh,
video, vn bn

Big data: bc tranh ton cnh

Accenture PLC, 57% chuyn gia ti chnh nh gi u t vo Big Data s l yu


t then cht t c li th cnh tranh [6]. Chnh v nhng li ch to ln m
Big Data c th mang li, nhiu t chc u t mnh vo vic nghin cu v
ng dng vo Big Data. Theo mt bo co t Gartner, nm 2014, 73% t chc
c kho st mua hoc c nh u t vo cc d n Big Data, con s ny
nm 2013 l 64%. [7]
Mc tiu ca nghin cu ny nhm a ci nhn ton cnh v Big Data, nhng ng
dng ca Big Data trong cc lnh vc, cc yu t k thut lin quan n Big Data.
ng thi, mt s vn cng cn xem xt khi thc hin cc d n Big Data nhm
m bo d n thnh cng.
Cu trc ca nghin cu nh sau: Phn 2. nh ngha, cc c trng ca Big Data.
Phn 3. 4. Cc ng dng ca Big Data trong mt s lnh vc, trong trnh by r
mt s trng hp c th v ng dng ca Big Data trong hot ng ti chnh, bo
him v thng mi, Phn 5. Kha cnh k thut ca Big Data. Phn 6. Nhng
thch thc cn gii quyt. Phn 7. Kt lun.
PHN 2

BIG DATA: NH NGHA V CC C TRNG

D liu ln (Big data) l thut ng dng m t cc b d liu c kch thc rt


ln, kh nng pht trin nhanh, v rt kh thu thp, lu tr, qun l v phn tch
vi cc cng c thng k hay ng dng c s d liu truyn thng [5]. Mt s c
trng ca Big Data bao gm Dung lng (volume), Tc (velocity), Tnh a
dng (variety), v Gi tr (value)
(1) Dung lng (Volume): Dung lng ca Big Data ang tng ln mnh m tng
ngy. Theo ti liu ca Intel vo thng 9/2013, c mi 11 giy, 1 PB1 d liu c
to ra trn ton th gii, tng ng vi mt on video HD di 13 nm [8].
Facebook phi x l khong 500 TB2 d liu mi ngy [9]. Li ch thu c t
vic x l mt khi lng ln d liu chnh l im thu ht ch yu ca Big Data,
tuy nhin cng t ra nhiu kh khn trong vic tm ra nhng phng php, k
thut x l khi lng d liu ny.
(2) Tc (velocity): vi s ra i ca cc k thut, cng c, ng dng lu tr, ngun
d liu lin tc c b sung vi tc nhanh chng. T chc McKinsey Global
c tnh lng d liu ang tng trng vi tc 40%/nm, v s tng 44 ln t
nm 2009 n 2020. [6].
(3) Tnh a dng (variety): D liu c thu thp t nhiu ngun khc nhau, t cc
thit b cm bin, thit b di ng, qua mng x hi .v.v [4]. Cc kiu d liu c
cu trc, bn cu trc v khng c cu trc tn ti di nhiu hnh thc bao gm
hnh nh, m thanh, video, vn bn, v.v

1 petabyte (PB)= 220 gigabytes (GB) ~ 1,000,000 GB

1 terabyte (TB) = 210 gigabytes (GB) ~ 1000 GB

54

Tp san Tin hc qun l, tp 03, s 1&2, 2014.

(4) Gi tr (value): y l c trng quan trng nht ca Big Data, cp n qu


trnh trch xut cc gi tr to ln ang tim n trong cc b d liu khng l.

PHN 3

CC NG DNG BIG DATA

Big Data v cc ng dng c lin quan ang ngy cng c s dng rng ri
trong cc t chc, trong cc lnh vc khc nhau, nhm gim thiu cc ri ro, h tr
t chc trong vic qun l cc hot ng hng ngy cng nh ra quyt nh [7].
Cc c quan chnh ph tm cch phn tch d liu nhm tm ra nhng cch thc thu
thu mt cch kho lo, d on c t l tht nghip, xu hng ngh nghip
trong tng lai [8], cc doanh nghip trong lnh vc y t cng ang ch ng hn
trong vic qun l v theo di sc khe khch hng, thit k cc gi sn phm hp
l nhm gim chi ph chm sc sc khe. Ngnh khch sn v du lch s dng d
liu t nhiu ngun nh mng x hi v to ra nhng gi k ngh c nhn cho cc
khch hng. Cc doanh nghip phn tch d liu nhm tm hiu hnh vi khch
hng v t vn cho h v danh mc sn phm, thi gian v a im mua c nhng
chnh sch gi hp dn.
Nhiu nghin cu tm hiu v cc ng dng ca Big Data v cc lnh vc trong
Big Data c th c p dng. Chng hn, Hsinchun, Chiang [10] phn tch
mt s ng dng ca Big Data bao gm thng mi in t, chnh ph in t, (3)
khoa hc v cng ngh, chm sc sc khe, v an ninh v an ton cng cng.
O'Leary [4] m t mt s u im cng nh tr ngi ca Big Data v cc ng dng
nn tng cm bin trn thit b di ng trong qun l c s h tng ng b.
McKinsey & Company thc hin nghin cu v nhng gi tr d liu mang li i
vi y t, qun l cng, bn l, sn xut M. Bo co nu r nu Big Data c s
dng mt cch sng to v hiu qu ci tin nng sut v cht lng cng vic,
cc doanh nghip bn l M c th tng li nhun trn 60%, chi tiu cho cng
nghip y t M c th gim trn 8%, cc nn kinh t pht trin chu u cng c
th tit kim c 149 triu Euro nh vic ci tin hiu sut hot ng [5].
Bi vit ny tng hp mt s lnh vc m Big Data c p dng, cng nh cc k
thut c s dng qua Bng 1 v mt s v d c th v ng dng ca Big Data
qua Bng 2. Cc kha cnh k thut lin quan n Big Data s c trnh by c
th hn Phn 5
Lnh vc
Thng mi

ng dng

K thut

Trch dn

Phn khc th trng v khch hng


Phn tch cu trc d liu [1, 5, 10]
Phn tch hnh vi khch hng ti ca Phn tch mng x hi
hng
Phn tch vn bn v web
Tip th trn nn tng nh v
Phn tch tm l
Phn tch tip th cho knh, tip th Pht hin yu t bt
a knh
thng
Qun l cc chin dch tip th v
khch hng thn thit
So snh gi
Phn tch v qun l chui cung ng

55

Big data: bc tranh ton cnh


Phn tch thng tin khch hng trong
thi gian thc
Qun l mi quan h khch hng
Qun l v phn tch ri ro
Phn tch v pht hin gian ln
Phn tch, xp hng ri ro tn dng

Ti chnh

Chnh
chnh
in t

tr, H thng chnh ph in t.


ph H thng bu c, b phiu
Phn tch quy nh v vic tun th
quy nh.
Phn tch, gim st, theo di v pht
hin gian ln, mi e da, an ninh mng,
pht hin xm nhp.
Qun l tiu th nng lng v kh
thi carbon

An ninh v an Phn tch ti phm


ton
cng Ti phm cng ngh cao
cng
Khng b
An ninh mng

Phn tch a phng tin


Phn tch mng
Phn tch mng x hi
Phn tch vn bn v web
Phn tch tm l

[1]

Phn tch ni dung v vn [1, 10]


bn
Phn tch thng tin ng
ngha
Phn tch v gim st
phng tin truyn thng x
hi
Phn tch mng x hi
Phn tch tm l
[10]
Lut kt hp
Phn cm v phn lp d
liu
Phn tch mng li ti
phm
Phn tch quan im
Phn tch tn cng mng

Phn tch hnh vi, thi quen ngi


tiu dng
Phn tch nh v ngi dng di ng
Ghi nhn chi tit cuc gi trong thi
gian thc
Ti u ha h thng

X l chi tit cuc gi


M hnh d bo
Phn tch tm l
Phn tch di ng

[1, 11]

Y t v chm Phn tch d liu chn on trong


sc sc khe thi gian thc
Phn tch cht lng dch v chm
sc bnh nhn
Pht hin gian ln bo him y t
Ti u ha cc gi sc khe.

Phn tch vn bn
Phn tch Web
Phn tch a phng tin
Phn tch mng

[1, 10, 12]

Giao thng Phn tch d liu thi tit v giao


vn ti
thng trong thi gian thc
Ti u ha tuyn ng vn chuyn
Gim thiu tnh trng n tc giao
thng

Phn tch vn bn
Phn tch Web
Phn tch a phng tin
Phn tch di ng

[4, 13]

Vin thng

Bng 1: Mt s lnh vc khai thc ng dng Big Data


Lnh vc

56

Cng ty/ S
kin

Mc tiu

K thut p dng

Kt qu

Trch
dn

Tp san Tin hc qun l, tp 03, s 1&2, 2014.


Ti chnh

China
Merchants
Bank (CMB)

Thng mi Amazon

Qun l mi quan Hot ng:


h vi khch hng Tch im, i
im tch ly
Xy dng h
thng cnh bo
khch
hng
ngng s dng
dch v

Bn c cc sn [13]
phm tn dng li sut
cao cho 20% khch
hng c kh nng
ngng s dng dch v
ca ngn hng
T l khch ngng
s dng th Gold Cards
gim
15%,
th
Sunflower Cards gim
7%.

Qun l mi quan
h vi khch hng,
tng doanh thu bn
hng

Doanh thu bn hng [14]


ca cng ty tng 29%
t USD 9.9 t la
(qu 2, 2011) ln
$12.83 t (qu 2,
2012)

Giao thng UPS2 s dng Ti u ha cc


vn ti, vn ng dng Big tuyn ng vn
chuyn
Data v IoT
chuyn, thit k li
vic bc v d
hng ca ti x
trong thi gian
thc

Xy dng h t
vn, s dng
thut ton itemto-item
collaborative
filtering match1

S dng h Nm 2011, cng ty tit [13]


thng nh v ton kim c 42.28 triu
cu (GPS), thit km vn chuyn
b cm bin
theo di v tr cc
xe ti
Cc k thut
ti u ha tm
tuyn ng ti
u nht

Y t

Google v dch Phn tch d liu Phn tch cc truy


cm A/H1N1 dch cm trong vn tm kim v
nm 2009
thi gian thc, dch cm
nhm d bo kh
nng lan ta dch
cm

D bo chnh xc mc [15]
hin ti ca dch
cm ti mi khu vc
ca Hoa K, thi gian,
a bn lan ta dch
cm.

Y t

Ph sn Quc t Qun l mi quan


Si Gn (SIH)
h vi khch hng,
cung cp dch v
tt nht cho bnh
nhn

S dng dch
v phn cng v
phn mm ca
IBM xy dng
m hnh d on
nhu cu bnh
nhn, ln lch v
iu phi khm
bnh

Cc thng tin v bnh [16]


nhn c cung cp
kp thi gip nng cao
hot ng ca bnh
vin v bc s

Thut ton item-to-item collaborative filtering match: thut ton ny xy dng mt ma trn cc sn phm
tng ng bng cch tm kim nhng sn phm thng c mua cng vi nhau t vn cho ngi dng
nhng sn phm i km ph hp nht i vi sn phm h la chn
2

UPS (United Parcel Service of North America, Inc.) l cng ty vn ti ln nht th gii.

57

Big data: bc tranh ton cnh


Qun
cng

l Obama thng Tm nhng hnh Thu thp v


chin dch bu thc vn ng c to ra mt c s
c
tri thch hp
d liu cha
thng tin ca cc
c tri tim nng
bao gm tiu s,
s thch, cng
vic, bn b
S dng cc
k thut khai ph
d liu

An ninh v S cnh st Los


an
ton Angeles
s
cng cng dng ng dng
m my ca
PredPol
Inc.
trong d bo ti
phm

Xc nh thi gian,
v tr c th xy ra
cc hnh vi phm
ti nh trm cp,
bo hnh

Da trn d liu
lch s d on
thi gian, a
im ca hot
ng phm ti

Obama v i ng ca [17]
mnh c nhng hot
ng vn ng thch
hp vi c tri, gp
phn ng k vo
chin thng cui cng.

Trong 6 thng, s v [18]


trm cp gim 33%,
hnh vi bo lc gim
21%

Bng 2 Mt s v d tiu biu v ng dng Big Data

PHN 4
NHN MNH NG DNG CA BIG DATA TRONG LNH
VC TI CHNH, THNG MI, BO HIM V QUN L CNG
4.1 Ti chnh ngn hng, bo him
Nhiu cuc kho st c thc hin xc nh vai tr ca Big Data trong hot
ng ca t chc. Kho st ca Gartner FEI nm 2013 nhn mnh tm quan trng
ca BI&A1 trong cng vic ca cc gim c ti chnh [19]. Nh khung nhn tng
quan, r rng vo d liu ca t chc, cc gim c ti chnh c th c nhng
quyt nh tt hn, lm tng hiu qu hot ng ca t chc, tng tnh lin kt gia
ti chnh v hot ng kinh doanh chung, cng nh tng cng tnh linh hot ca
t chc. Mt v d t ngn hng China Merchants Bank (CMB), Trung Quc cho
thy hiu qu ca vic ng dng Big Data. thu ht khch hng, ngn hng s
dng dch v tch im v i im2. Ngn hng cng s dng m hnh cnh bo
kh nng ngi dng ngng s dng dch v xy dng cc gi dch v tn dng
li sut cao nhm gi chn khch hng. ng thi, thng qua vic phn tch d
liu cc giao dch, cc khch hng tim nng, l cc doanh nghip nh, cng c
xc nh mt cch hiu qu [13].

BI&A (Business Intelligence and Analytics): ch nhng k thut, cng ngh, phng php, ng dng, h
thng phn tch cc d liu nghip v nhm gip nh qun l hiu r hn v tnh hnh hot ng kinh doanh
ca t chc, cng nh tnh hnh th trng, t ra cc quyt nh kp thi nhm t c mc tiu ca t
chc (Xem Hsinchun, C., Chiang, R. H. L., & Storey, V. C. (2012). Business intelligence and analytics:
from big data to big impact. MIS Quarterly).
2

Tch im: Multi-times score accumulation

i im: score exchange in shops


58

Tp san Tin hc qun l, tp 03, s 1&2, 2014.

C nhiu nguyn nhn dn n quyt nh u t vo cc d n Big Data. Bi bo


ny tng hp ng dng tiu biu ca Big Data trong ngnh ti chnh, ngn hng v
bo him v chia thnh 3 nhm chnh: qun l ri ro, t vn v d bo. Trn thc
t, nhiu ng dng v Big Data c nghin cu v pht trin nhm ci tin hiu
qu hot ng ca cc t chc tn dng v bo him [20].
4.1.1 Qun l ri ro
Hot ng qun l ri ro c ci thin ng k nh nhng tc ng ca Big Data.
Trc y, hot ng phn tch cc tnh hung ri ro ch yu ph thuc vo vic
phn tch khch hng, cc danh mc u t, tin cy tn dng. Hin nay, vi
nhng ngun d liu t cc phng tin truyn thng x hi cho php to ra nhng
hiu bit mi v cc danh mc ri ro ca khch hng. Cc d liu thu c t
nhiu ngun khng lin kt lm tng kh nng pht hin cc hot ng gian ln
sm hn so vi cc phng php hin hnh [5].
Hiu v ri ro v lm th no qun l ri ro tt hn l mi quan tm chnh ca
cc cng ty bo him. Phn tch ri ro bao gm vic anh gi kh nng ri ro xy
ra v chi ph phi b ra trong tng trng hp ri ro. Nhng d liu nh ma ,
chy rng, bo lt, ti phm v cc yu t khc cn c khai thc v tn dng
nh gi ri ro. Cc d liu t cc thit b vin thng, thit b cm bin c ci
t trong cc phng tin giao thng c th thu thp nhng d liu nh a im,
tc , qung ng i, tnh trng vn hnh ca phng tin trong thi gian thc,
gip ci thin kh nng nh gi ri ro, t , doanh nghip c th to ra nhiu
chin lc gi khc nhau.
4.1.2 T vn
Big Data v cc ng dng lin quan cho php cc t chc ti chnh thu thp v t
chc cc d liu nh s thch ca khch hng, lch s giao dch, phng thc giao
dch, v tr a l, thng tin gia nh, v.v... T , h t vn s da vo mc tiu
kinh doanh ca ngn hng, nhu cu ca Khch hng t a ra cc kin ngh
v bn cho1, bn thm2 hoc cung cp cc dch v, tt hn cho khch hng.
Thng qua vic phn tch d liu khch hng cp tinh vi hn, cc t chc cn
c th to ra nhng c hi mi t vic to ra nhng sn phm mc tiu mi.
4.1.3 D bo
Cc k thut thng k trn d liu lch s cho php d on cc hnh ng tip
theo ca khch hng. Nn tng phn tch d liu ln thng qua vic s dng cc
k thut x l phn tn (Map-Reduce) cho php t chc ti chnh, ngn hng c
th lu tr, x l khi lng d liu rt ln. Nh vy, cc m hnh d bo c th
chy trn ton b cc tp d liu, gip rt ngn thi gian trch xut, khm ph
nhng thng tin qu gi cn tim n. [1]

Bn cho (Cross-selling): l mt thut ng ch cch thc gii thiu nhng sn phm hoc dch v c
lin quan n sn phm khch hng ang hoc mua. V d, nu khch hng mua in thoi, th thuyt
phc khch hng mua thm v in thoi.
2

Bn thm (Up-selling): l mt thut ng ch cch thc gii thiu nhng sn phm hoc dch v c gi
cao hn, hay nng cp sn phm, dch v vi nhng tnh nng b sung

59

Big data: bc tranh ton cnh

4.2 Thng mi
Cc phn tch trn lng d liu ln cn gp phn ci tin v ti u ha qu trnh
ra quyt nh, gim thiu ri ro, to ra nhng gi tr gia tng cho doanh nghip.
Bng vic khai thc nn tng phn tch d liu ln, cc doanh nghip c th khm
ph cc gi tr tim n to ln, thng qua cc khung nhn tng hp v hnh vi mua
hng ca khch hng. Chng hn, cc cng ty kinh doanh qua mng chng nhng
c th theo di bit c khng ch nhng thng tin nh khch hng mua g, m
cn bit c h xem nhng mt hng no, h xem nhng g, lm g mi ln h
truy cp vo trang web, hay mc khch hng b tc ng bi nhng chnh sch
khuyn mi hay bnh lun t nhng khch hng khc; t pht hin ra c
nhng im chung ca nhng nhm khch hng.
Ngoi ra, s pht trin ca Internet, web 2.0, cc thit b di ng cho php t chc
s dng nhiu phng thc khc nhau tng tc vi khch hng bn cnh cc
phng tin truyn thng. Vic phn tch cc giao dch ca khch hng qua cc
knh khc nhau ny cho php t chc hiu hnh vi khch hng, phn cm nhm
khch hng, t c th cung cp cc sn phm v dch v ph hp vi yu cu
khch hng.
Big Data cn mang li li ch cho cc doanh nghip trong vic ln k hoch bn
hng. Bng vic so snh cc yu t khc nhau t ngun d liu khng l, doanh
nghip c th ti u ha vic nh gi cho cc sn phm. [13]. Vic s dng Big
Data trong qun l chui cung ng cho php cc doanh nghip ti u ha d tr
kho, vn chuyn, phi hp vi nh cung cp nhm gim thiu khong cch gia
cung v cu, kim sot ngn sch, v ci thin dch v. [5, 13]
PHN 5

KHA CNH K THUT

5.1 Lung d liu trong Big Data


H thng Big data thng ln v phc tp, n cung cp cc chc nng x l
Big Data t lc hnh thnh n lc kt thc. Thng lung d liu trong Big Data
c phn lm 4 giai on: Ngun to ra d liu, Thu thp d liu, Lu tr d liu
v Phn tch d liu[13]. Hnh 1 bn di m t cc cng ngh lin quan n 4 giai
on ca lung d liu:

60

Tp san Tin hc qun l, tp 03, s 1&2, 2014.

Hnh 1: Bn cng ngh ca Big data theo lung d liu [12]

5.1.1 Ngun to ra d liu


Do s pht trin vt bc ca cc cng ngh hin i nn ngun to ra d liu
ngy cng pht trin mnh m. Tht vy, IBM c tnh rng 90% d liu trong th
gii ngy nay c to ra trong 2 nm qua [21]. Nguyn nhn ca s bng n
d liu ny cng c nhiu tranh ci. Theo [12] s bng n d liu c lin h mt
thit n s pht trin ca cng ngh thng tin, c chia lm 3 giai on nh sau:
Giai on 1: bt u t nhng nm 1990. Khi cng ngh s v nhng h thng c
s d liu c p dng rng ri, nhiu t chc s dng chng lu tr nhng
d liu ln ca h nh cc giao dch trong lnh vc ngn hng hay cc trung tm
ti chnh, cc ti liu ca chnh ph y l nhng d liu c cu trc v c
phn tch thng qua h thng CSDL quan h.
Giai on 2: giai on 2 bt u bng s bng n ca Internet. Vo nhng nm
cui ca thp nin 90, h thng Web 1.0, c trng bi cc cng c tm kim v
thng mi in t, to ra 1 lng ln d liu bn cu trc v/hoc khng cu trc,
bao gm cc trang web v lch s giao dch. K t nhng nm 2000, rt nhiu cc
ng dng Web 2.0 to ra mt lng d liu phong ph cc d liu do ngi
dng ng gp t cc din n, nhm, blog, cc trang web, mng x hi.
Giai on 3: c kch thch bi cc thit b di ng nh in thoi thng minh,
my tnh bng, cm bin v cc thit b h tr Internet da trn cm bin.
Vi cch phn loi ny, chng ta thy rng cc m hnh to d liu pht trin 1
cch nhanh chng, t lu tr th ng trong giai on 1 n to d liu tch cc
trong giai on 2 v to d liu t ng trong giai on 3. Ba loi d liu ny
chnh l ngun d liu chnh ca Big Data, trong cc d liu t ng s ng
gp nhiu nht trong tng lai gn.

61

Big data: bc tranh ton cnh

Bng 3 bn di m t s pht trin nhanh chng ca cc d liu v giao dch ca


cc dch v ph bin trn mng hin nay. y l d liu rt quan trng i vi cc
doanh nghip, thng qua vic khai thc v phn tch cc loi d liu ny, nhng
thng tin hu ch nh thi quen v s thch ca ngi s dng c th c xc
nh, v n thm ch c th d on hnh vi v trng thi cm xc ca ngi s
dng.
Dch v
YouTube

M t
(i) Mi pht ngi dng ti ln 100 gi video
(ii) Mi thng, hn 1 t ngi s dng truy cp vo YouTube

Facebook

(i) Mi pht, 34.722 Likes


(ii) 100 terabytes (TB) d liu c ti ln mi ngy
(iii) Hin nay, c 1,4 t ngi s dng

Twitter

(i) C hn 645 triu ngi s dng


(ii) Mi ngy c 175 triu tweet

Google

(i) Hn 2 triu truy vn tm kim mi pht


(ii) Mi ngy, 25 petabyte (PB) c x l

Apple

Khong 47.000 ng dng c ti xung mi pht

Flickr

3.125 ngi dng ti ln cc bc nh mi mi pht

WordPress

Mi pht c gn 350 blogs mi

Bng 3: S pht trin ca cc dch v ph bin trn mng hin nay[22]

Big Data c to ra trong nhiu lnh vc khc nhau. Bng 4 lit k cc ngun Big
Data t cc lnh vc khc nhau cng vi cc thuc tnh quan trng ca cc loi d
liu ny. Ta d dng nhn thy phn ln cc ngun d liu l khng c cu trc,
vi ln l PB v i hi phi phn tch nhanh chng, chnh xc vi 1 lng ln
ngi dng
Ngun

ln

Loi

Thi gian S lng chnh Tham


p ng
ngi
xc
kho
dng

Walmart

Bn l

2.5 PB/gi C
trc

cu Rt nhanh 1 triu/gi Rt cao

[23]

Amazon

TMT

Nhiu
PB/ngy

Bn
trc

cu Rt nhanh 0.5 triu/ Rt cao


ngy

[12]

Tm kim Tm kim 25
Google
PB/ngy

Bn
trc

cu Nhanh

2
Cao
triu/pht

[12]

1.4 t

[24]

Facebook

62

Lnh vc

Mng
hi

x 100
TB/ngy

C
cu Nhanh
trc,
khng cu
trc

Cao

Tp san Tin hc qun l, tp 03, s 1&2, 2014.


AT&T

Mng
ng

di 323 TB

SDSS

Khoa hc 20
TB/ngy

C
trc

cu Nhanh

Khng
cu trc

Chm

Rt ln

Cao

[24]

Nh

Rt cao

[12]

Bng 4: Mt s ngun Big Data in hnh

5.1.2 Thu thp d liu


Thu thp d liu trong Big data thng gm 3 bc: thu nhn d liu, truyn d
liu v tin x l d liu. i khi vic thu nhn d liu s cha nhng d liu d
tha hoc khng cn thit lm tng dung lng lu tr v cng nh hng n tc
x l ca giai on phn tch tip theo. Do , hot ng tin x l l khng th
thiu m bo cho vic lu tr v khai thc d liu hiu qu.
i. Thu nhn d liu
Thu nhp d liu l qu trnh ly d liu th t cc i tng trong th gii thc.
Qu trnh ny cn c thit k tt v phng pht thu nhn d liu khng ch ph
thuc vo cc c tnh vt l ca cc ngun d liu, m cn ph thuc vo mc
tiu ca phn tch d liu. Bng 5 m t cc phng php thu nhn d liu ph
bin hin nay: Kt qu l c nhiu loi phn tch d liu nh bng bn di:
Phng php

Loi

ln

phc tp

ng dng

Tham
kho

Log file

Cu trc hoc Nh
bn cu trc

Nht k web,
ti chnh

Cm bin

Cu trc hoc Trung bnh


bn cu trc

Cao

Video gim
[12]
st, nghin cu
mi trng

Web crawler

Hn hp

Ln

Trung bnh

Tm kim

Trung bnh

Trung bnh

Gim st mng [13]

Nh

Trung bnh

Mobile gin
ip

Libpcap-based Cu trc
hoc Zero-copy
Mobile

Hn hp

[12]

[12]

[13]

Bng 5: Cc phng php thu nhn d liu

ii. Truyn d liu


Sau khi thu nhn d liu, chng ta phi chuyn n vo trung tm d liu chun
b cho cc bc tip theo. C ch truyn d liu c th c chia lm 2 giai on:
truyn qua IP backbone v truyn v trung tm d liu:

63

Big data: bc tranh ton cnh

Hnh 2: C ch truyn d liu [12]

iii. Tin x l d liu


Tin x l d liu l mt giai on quan trng trong qu trnh thu thp d liu
tng cht lng ca d liu trong nhng h thng Big Data. Theo [13] c 3
phng php chnh thc hin tin x l d liu, c trnh by Bng 6
K thut

M t

Phng php

ng dng

Tch hp

Kt hp d liu t nhiu
ngun khc nhau v cung
cp cho ngi dng mt
khung nhn thng nht

Kho d liu: cng c Cng c x l


gi l ETL (chit, chuyn i lu lng
v np) [25]
Cng c tm
D liu lin kt (data kim [26]
federation): mt CSDL o
c to ra truy vn v
tng hp d liu

Lm sch

Xc nh d liu khng
chnh xc, khng y ,
hoc khng hp l, sau
chnh sa hay xa d liu
tng cht lng ca d
liu

Thng gm 5 bc: xc Thng


nh loi li, tm kim li, in t [28]
sa li, ti liu ha loi li, RFID
thay i cch nhp d liu Sinh hc
gim li trong tng lai [27]

Loi b d
tha

Loi b nhng d liu d


tha v b lp li

Pht hin d tha, lc d Tm kim a


liu, nn d liu
phng tin
Phn
tch
ADN

Bng 6: Cc phng php chnh thc hin tin x l d liu [13]

5.1.3 Lu tr d liu
C ch lu tr trong Big Data cng khc so vi cch lu tr bnh thng. V s
lng d liu qu ln, phi cn 1 c ch lu tr sao cho hiu qu v mt kinh t,
cng nh hiu qu v mt k thut, phc v cho vic phn tch v thng k sau
ny. Hin nay c 3 cch lu tr thng s dng: (i) H thng tp tin; (ii) C s d
liu v (iii) m hnh lp trnh
i. H thng tp tin
64

mi

Tp san Tin hc qun l, tp 03, s 1&2, 2014.

Cch lu tr kiu tp tin pht trin t lu v tng i trng thnh sau 1 thi
gian di p dng. C th k n 1 s k thut ni ting v s dng ph bin hin
nay nh Bng 7:
K thut

c im

Hng

C th hot ng trn nhng my ch r tin cung cp kh nng


chu li v hiu sut cao cho mt s lng ln my khch. N ph
hp cho ng dng vi kch thc file v hot ng c nhiu hn
ghi

GFS

Google

Cosmos

Microsoft H tr tm kim v qung co

Haystack

Facebook H tr lu tr s lng ln nhng file nh nh


Bng 7: Mt s k thut lu tr d liu

ii. C s d liu
K thut CSDL c pht trin hn 30 nm. Rt nhiu h thng CSDL c
pht trin nhng loi d liu khc nhau v d dng m rng. CSDL quan h
truyn thng khng th p ng c tnh cht 4V ca Big Data. Do , CSDL
NoSQL ang tr thnh cng ngh ct li cho Big Data. Bi v n c nhiu u im
vt bc v rt thch hp vi Big Data nh: lc linh hot, API n gin, thng
nht, h tr 1 s lng ln d liu. Hin ti c 3 loi NoSQL chnh: Key-value,
hng ct v hng ti liu. Bng 8 m t s lc v 3 loi CSDL ny cng vi
cc th vin ni ting h tr:
K thut

c im

Th vin

Kiu lu tr

ng dng

Key-value c thit lp bi 1 Dynamo


m hnh d liu n (Amazon)
gin, d liu c Voldemort
lu tr theo kiu (LinkedIn)
kha v gi tr

Plug-in
RAM

Y t [2]
Khng gian
[19]
Mng x hi,
thng mi in
t [7]

Hng ct Lu tr v x l d BigTable
liu theo ct. C (Google)
hng v ct c Cassandra
phn on v lu tr (Facebook)
trong nhiu my ch Hbase (Apache)
tng kh nng
m rng

GFS

Google Earth,
Analystics [29]
Nhiu
lnh
vc1
Chnh
ph
in t [30], nhiu
lnh vc

a
HDFS

Rt nhiu cng ty t cc lnh vc khc p dng v s dng Casandra cng nh Hbase, c th tham
kho thm ti: http://planetcassandra.org/companies/ v http://wiki.apache.org/hadoop/Hbase/PoweredBy

65

Big data: bc tranh ton cnh


Hng ti H tr lu tr kiu SingleDB
liu
d liu phc tp hn (Amazon)
kiu
Key-value. MongoDB
Khng c rng buc (10gen)
no v mu ca ti CouchDB
liu lu tr
(Couchbase)

S3
a
a

Logging,
game online
Sinh hc[31]
Rt nhiu lnh
vc khc1

Bng 8: Cc loi C s d liu NoSQL

Ngoi nhng CSDL trn, nhiu d n c pht trin h tr nhng kiu d liu
khc nhau nh: biu (Neo4j, DEX) v PNUTS. Bi v CSDL quan h v CSDL
NoSQL c nhng u v khuyt im ring nn nu kt hp c 2 loi CSDL ny
th s sinh ra mt loi CSDL mi va mnh m trong truy vn ging CSDL quan
h va linh hot v d dng m rng nh CSDL NoSql. Hin ti, Google ang i
tin phong pht trin 1 loi cc loi CSDL mi theo hng ny nh: Megastore,
Spanner v F1.
Khng c 1 loi CSDL no l ph hp cho mi tnh hung v mi loi d liu.
Ty theo tng bi ton c th m chng ta nn chn CSDL cho ph hp, v mi
loi CSDL cng c nhng u v khuyt ring. Nhiu khi chng ta phi nh i
gia hiu sut c v hiu sut ghi, ng b v khng ng b, tr v bn,
phn vng d liu [11]
iii. Nhng m hnh lp trnh
Mc d NoSQL c rt nhiu im mnh v ph hp vi Big Data nhng n vn
cn nhng mc hn ch v truy vn v phn tch d liu. M hnh lp trnh rt ph
hp vi cc ng dng logic v phn tch d liu. Tuy nhin, nhng m hnh song
song truyn thng (Message Passing Interface (MPI) and Open Multi-Processing
(OpenMP)) vn cn nhng mt hn ch gii quyt cc bi ton song song trn
quy m Big Data, tc l hng trm thm ch hng ngn my ch trn din rng.
Nhiu m hnh lp trnh song song mi cho Big Data c xut. Bng bn
di so snh tnh nng ca nhng m hnh lp trnh hin nay cho Big Data[12]:
MapReduce
M t

Pregel

GraphLab

Storm

X l song
X l song X l dng Khai ph d X l phn
song quy m song quy th quy liu v my phi thi
ln
m ln
m ln
hc quy
gian thc
m ln

M hnh lp Map v
trnh
Reduce

Dryad

th phi th trc
chu trnh tip
trc tip

th trc
tip

th phi
chu trnh
trc tip

Rt nhiu lnh vc khc, c th tham kho thm ti: https://www.mongodb.com/industries

66

S4
X l phn
phi thi
gian thc
th phi
chu trnh
trc tip

Tp san Tin hc qun l, tp 03, s 1&2, 2014.


X l d
liu

H thng tp Nhiu kiu H thng tp B nh hay


tin phn phi lu tr khc tin phn phi a
nhau

B nh

B nh

Kin trc

Ch-Khch

Ch-Khch Ch-Khch

Ch-Khch

Phn cp v
i xng

Chu li

Cp node

Cp node

Mt phn

Mt phn

Ch-Khch

Checkpoint1 Checkpoint

Bng 9: Tng hp nhng m hnh lp trnh

Theo bng so snh trn, ta nhn thy rng: i) Mc d x l thi gian thc ang
tp trung nghin cu hin nay, nhng phn ln cc m hnh vn tp trung vo x
l hng lot; ii) Hu ht cc m hnh u s dng th bi v th c th th
hin cc tc v phc tp hn; iii) X l thi gian thc s dng b nh nh l
phng tin lu tr d liu t c tc truy cp v x l cao hn, trong khi
m hnh hng lot s dng h thng tp tin hay a lu tr d liu ln hn v h
tr nhiu client; iv) Kin trc ca cc m hnh thng l Ch-Khch; v) Cc chin
lc kh nng chu li l khc nhau
5.1.4 Phn tch d liu
Phn tch d liu l giai on quan trng nht trong lung d liu ca Big Data vi
mc ch rt trch nhng d liu c ch, cung cp cc xut v cc quyt nh.
Ty tng lnh vc khc nhau m vic phn tch d liu s mang li nhng gi tr
tim nng khc nhau [5]. Tuy nhin, phn tch d liu l 1 mt lnh vc rt rng
ln, thng xuyn thay i v v cng phc tp. Nn bi vit ny ch tp trung vo
cc phng php, kin trc v cc cng c phn tch Big Data.
i. Nhng phng php phn tch chung
Mc d cc kiu d liu, mc ch v ng dng khc nhau, nhng mt s phng
php phn tch chung vn hu ch cho cc loi khc nhau. Di y l 3 loi phn
tch ph bin hay dng hin nay:
D liu trc quan (Data Visualization): l phng php nhm truyn t
thng tin r rng v hiu qu thng qua cc phng tin ha. Hin th
trc quan cho Big Data l mt lnh vc nghin cu ang c quan tm
hin nay [3, 5]
Phn tch thng k: l da trn l thuyt thng k, m l mt nhnh ca ton hc
ng dng. Phn tch thng k c th phc v cho 2 mc ch ca Big Data:
m t v suy lun.
Khai ph d liu: l qu trnh tnh ton pht hin cc m hnh trong
Big Data. Ti hi ngh quc t IEEE 2006 v khai ph d liu, 10 thut

Checkpoint: m ch k thut lu 1 snapshot trng thi ca ng dng, c th phc hi li t snapshot ny


trong trng hp tht bi

67

Big data: bc tranh ton cnh

ton thng c x dng nht l: C4.5, K-means, SVM, Apriori, EM,


PageRank, AdaBoost, kNN, Naive Bayes, and CART [18].
Hin ti c kh nhiu cng c h tr cc phng php trn, c nhng phn mm
chuyn nghip, nghip d, thng mi v m ngun m. 5 cng c hng u c
s dng rng ri hin nay phi k n[8]: Rapid-I RapidMiner/RapidAnalytics
(39.2%), R (37.4%), Excel (28%), Weka / Pentaho (14.3%), Python (13.3%)
ii. Nhng phng php phn tch mi trong Big Data
Do s pht trin mnh m ca Internet v cc thit b cng ngh cao, c bit trong
cc lnh vc kinh doanh, mng v khoa hc, y mnh vic nghin cu cc
phng php phn tch Big Data mi nhm phc v cho vic khai thc nhng gi
tr tim n trong cc lnh vc trn. Bng 10 m t v nhng phng php phn
tch d liu ph bin hin nay trong Big Data [12]:
Lnh vc phn
tch

Ngun d liu

Tnh cht

Giao
dch
ca C cu trc
Phn tch cu trc khch hng
Khi lng t
d liu
D liu th nghim v c tnh thi gian
khoa hc
thc

Khai ph d liu
Phn tch thng k

Phn tch vn bn

Log
Khng cu trc
Trnh by ti liu
Email
Vn bn phong NLP (x l ngn ng t
ph
nhin)
Ti liu cng ty

Theo
ng
cnh
Chit lc thng tin
Quy tc v quy nh
ca chnh ph
C ng ngha
M hnh ch
Ni dung trang Ph thuc ngn Tm tt
Web
ng
Phn loi
Thng tin phn hi
Phn cm
v kin
Hi v tr li
Khai thc quan im

Phn tch Web

Nhiu
Web

Phn
tch
phng tin

68

Gii php

loi

trang

a Phim trng
Ngi dng
Thit b gim st

Tch hp vn
bn v lin kt
C cc biu
tng
Siu d liu

Khai thc ni dung


Khai thc cu trc
Khai thc s dng

Hnh nh, m
thanh video
Ln
C d liu d
tha
C ng ngha
theo thi gian

Tm tt
Ch thch
Lp ch mc v tm kim
Khuyn ngh
Pht hin s kin

Tp san Tin hc qun l, tp 03, s 1&2, 2014.


Phn tch mng

Mng x hi

Ni dung phong
ph
Cc mi quan
h x hi
Nhiu v c
tnh d tha
Tin ha nhanh

Phn tch di ng

ng dng di ng
Cm bin
RFID

Da trn v tr
C nhn
Thng tin b
phn on

Gim st
Khai thc v tr

Pht hin cng ng


Tin ha mng
Phn tch nh hng
Tm kim t kha
Phn loi
Phn cm
Hc tp chuyn giao

Bng 10: Nhng phng php phn tch d liu trong Big Data

5.2 Nn tng Hadoop


Hadoop c to ra vo nm 2005 bi Doug Cutting v Mike Cafarella gii
quyt cc vn ca Big Data. n nm 2011, Hadoop c s dng rng ri
trong cc trong ty ln. Hn 50% cc cng ty trong nhm Fortune 50 s dng
Hadoop. Ventana Research cng b nhng kt qu iu tra kh n tng v
vic s dng Hadoop trong cc doanh nghip [9]: khong 63% cc t chc s dng
Hadoop qun l Big Data khng cu trc; 94% ngi s dng Hadoop phn
tch Big Data, iu m trc y khng th thc hin; 88% phn tch d liu chi
tit hn; trong khi 82% cc doanh nghip c th lu tr nhiu d liu hn.
Hadoop l mt framework m ngun m h tr lu tr v x l Big Data vi cc
cu trc khc nhau (k c khng cu trc) trn nhng my ch bnh thng.
Hadoop c nhiu li th so vi cc framework khc:
Kh nng m rng: cho php thay i s lng phn cng m khng
cn thay i nh dng d liu hay khi ng li h thng
Hiu qu chi ph: h tr lu tr v x l song song trn nhng my ch
bnh thng
Linh hot: h tr bt k loi d liu t bt k ngun no
Chu li: thiu d liu v phn tch tht bi l hin tng thng gp
trong phn tch tch Big Data. Hadoop c th phc hi v pht hin
nguyn nhn tht bi do tc nghn mng
Hadoop gm nhiu module kt hp vi nhau h tr tt c cc giai on trong
lung Big Data t giai on thu thp n phn tch v qun l d liu.
Giai on

Module

M t

Flume

Thu thp, tp hp v chuyn 1 lng ln d liu t cc


ngun khc nhau v trung tm lu tr

Sqoop

Cho php d dng nhp v xut d liu gia Hadoop v


cc kho d liu c cu trc

Thu thp d liu

69

Big data: bc tranh ton cnh


HDFS

H thng file phn phi c th chy trn nhng my ch


bnh thng, da trn thit k ca GFS. Gm 1
NameNode qun l file metadata v nhiu DataNode
lu tr d liu thc t. Mt file c chia lm nhiu
khi v cc khi s lu trong cc DataNode

Hbase

CSDL hng ct da trn Bigtable ca Google

MapReduce

L ct li tnh ton phn tch Big Data. MapReduce


framework s gm 1 master v 1 slave trn mi node.
Master c trch nhim lp k hoch cho nhng slave, theo
di v thc hin li cc nhim v tht bi. Cc slave thc
hin cc nhim v theo ch dn ca ca master. Gm 2
chc nng chnh: map v reduce

Pig Latin

Ngn ng cho x l d liu

Hive

Tng hp d liu v truy vn adhoc

Mahout

Th vin khai ph d liu v my hc, gm 4 nhm: lc


tp hp, gom cm, phn loi, khai ph m hnh theo
hng song song

Zokeeper

L 1 trung tm dch v cho vic bo tr cu hnh, t tn,


ng b phn phi v cung cp cc dch v theo nhm

Chukwa

Chu trch nhim theo di tnh trng h thng v c th


hin th, gim st v phn tch cc d liu thu thp c

Lu tr d liu

Tnh ton

Phn tch d liu

Qun l

Bng 11: Nhng Module chnh trong Hadoop

Vi nhng li th trn, Hadoop c s dng nhiu d n v mi cng ty s


dng cho nhng nhu cu ring ca mnh. Nh Yahoo chy Hadoop trn 42.000
my ch ti 4 trung tm d liu vo thng 7, 2012 h tr chc nng lc th rc
v tm kim. Hay Facebook dng Hadoop lu tr v x l 100 PB c d liu c
cu trc v phi cu trc. Bng 12 m t vic vic s dng Hadoop trong cc cng
ty hng u v mc ch ca h:
Chc nng
Tm kim
X l Log

c s dng bi
Yahoo,
Amazon,
Zvents,
Facebook,
Yahoo,
ContexWeb.Joost, Last.fm

Phn tch nh v Video NewYorkTimes,Eyelike


Kho d liu

Facebook, AOL

Khuyn ngh

Facebook
Bng 12: S dng Hadoop

70

Tp san Tin hc qun l, tp 03, s 1&2, 2014.

Hadoop c thit k cho cc ng dng loi batch. Trong nhiu ng dng thi
gian thc, Storm l ng vin thch hp cho c ch x l lung d liu lin tc.
Storm c th c s dng phn tch thi gian thc, tnh ton lin tc Gn
y Twitter ang pht trin 1 d n m ngun m ca h l Summingbird [32]
tch hp Hadoop v Storm.
PHN 6
6.1

NHNG THCH THC CN GII QUYT

V k thut, cng ngh

Vic phn tch Big Data ang i mt vi nhiu thch thc, nhng cc nghin cu
vn ang trong giai on u. Nhng n lc nghin cu tip theo l cn thit
nng cao hiu qu trong vic hin th, lu tr v phn tch d liu:
Truyn d liu: Nh tho lun, truyn d liu ln thng phi gnh chu chi ph
cao, y l nt c chai ca vic tnh ton Big Data. Tuy nhin, truyn d liu l
khng th trnh khi trong cc ng dng Big Data. Nng cao hiu qu truyn d
liu ln l mt yu t quan trng nng cao tnh ton Big Data.
Tc x l trong cc yu cu thi gian thc: khi d liu s lng d liu tng
nhanh chng, gy ra 1 thch thc rt ln i vi cc ng dng thi gian thc. Nn
vic tm cc phng php hiu qu trong sut lung d liu l cn thit p ng
yu cu v thi gian thc.
Nn tng Big Data: Mc d Hadoop tr thnh mt tr ct trong nn tng phn
tch Big Data, n vn cn trong giai on pht trin, so vi CSDL quan h (hn 30
nm pht trin. u tin, Hadoop phi tch hp vi thi gian thc cho vic thu
thp v truyn Big Data, v cung cp x l nhanh hn da trn cc m hnh x l
hng lot. Th hai, Hadoop nn cung cp mt giao din lp trnh ngn gn, v n
nhng tin trnh x l phc tp bn di. Th ba, trong nhng h thng Hadoop
ln, s lng my ch ln hng ngn thm ch hng trm ngn, c ngha l nng
lng tiu th ng k. Nn Hadoop nn c c ch s dng nng lng hiu qu.
C kh nhiu nghin cu ci thin cng nh khc phc nhng im yu ca
Hadoop c tho lun ti [33, 34]
Bo mt d liu v quyn ring t: l vn rt quan trng. Mt s v d trong
thc t cho thy khng ch thng tin c nhn ngi tiu dng, thng tin mt ca
cc t chc m ngay c cc b mt an ninh quc gia cng c th b xm phm. Do
vy, gii quyt cc vn an ninh d liu bng cc cng c k thut v cc chnh
sch tr nn v cng cp bch. Cc nn tng Big Data nn cn bng tt gia vic
truy cp d liu v x l d liu [5].
6.2 V t chc:
Thiu ht ngun lc c kin thc su v thng k, cng ngh thng tin, cng nh
ngun nhn lc c k nng phn tch v qun l cho cc d n Big Data. Bi v
phn ln ngun d liu c gi tr nm ngoi phm vi ca t chc, cc nh qun l
phi i mt vi kh khn trong vic t ng cu hi v s dng cc kt qu phn
tch d liu mt cch hiu qu. Do , nu u t ln vo Big Data, s lng d
liu thu thp c nhiu, nhng khng c tn dng a ra nhng gi tr
thng tin tim n th s dn n s lng ph ti nguyn [8]. Theo t chc McKinsey
71

Big data: bc tranh ton cnh

Global, n nm 2018, Hoa K c th phi i mt s thiu ht t 140,000 n


190,000 nhn lc c k nng phn tch. [5]
PHN 7

KT LUN

Big Data ng vai tr quan trng trong vic mang li nhng gi tr to ln, khng
ch cho cc t chc doanh nghip m cn cho nn kinh t quc gia v cho cc cng
dn trong nn kinh t . Thng tin c thu thp ngy cng minh bch, chi tit,
chnh xc, gip cc nh lnh o c th ra nhng quyt nh ng v hp l hn,
gim thiu cc ri ro c th xy ra, gip cho cc c nhn c tri nghim cc dch
v m cc t chc v chnh ph mang li mt cch tt hn. Tuy nhin, y vn l
lnh vc cn rt mi, t ra nhiu vn v thch thc m cc t chc v cc nh
nghin cu cn gii quyt.
Ti liu tham kho
[1] Mohanty, S., et al., Big Data Imperatives. 2013: Apress.
[2] Chui, M., M. Lffler, and R. Roberts. The Internet of Things. 2010 [cited 2014 7/11];
Available
from:
http://www.mckinsey.com/insights/high_tech_telecoms_internet/the_internet_of_thin
gs.
[3] Gantz, J. and D. Reinsel, Extracting value from chaos. IDC iview, 2011: p. 1-12.
[4] O'Leary, D.E., Exploiting Big Data from Mobile Device Sensor-Based Apps:
Challenges and Benefits. MIS Quarterly Executive, 2013. 12(4): p. 179-187.
[5] Manyika, J., et al., Big data: The next frontier for innovation, competition, and
productivity. 2011.
[6] Clayton, R., CFOs Take Notice Big Data May Be Your New Best Friend. Financial
Executive, 2013. 29(10): p. 22-25.
[7] Rivera, J. and R.v.d. Meulen. Gartner Survey Reveals That 73 Percent of
Organizations Have Invested or Plan to Invest in Big Data in the Next Two Years.
2014 [cited 7 11]; Available from: http://www.gartner.com/newsroom/id/2848718.
[8] Luan, D. Big Data l g v ngi ta khai thc, ng dng n vo cuc sng nh th
no? 2013 [cited 2014 11/10]; Available from: https://www.tinhte.vn/threads/bigdata-la-gi-va-nguoi-ta-khai-thac-ung-dung-no-vao-cuoc-song-nhu-the-nao.2210939/.
[9] Tuan, D.Q. Facebook x l hn 500 TB d liu mi ngy. 2012 [cited 2014 10/10];
Available from: http://www.thongtincongnghe.com/article/37841.
[10]
Hsinchun, C., R.H.L. Chiang, and V.C. Storey, BUSINESS INTELLIGENCE
AND ANALYTICS: FROM BIG DATA TO BIG IMPACT. MIS Quarterly, 2012. 36(4):
p. 1165-1188.
[11]
Qun, . Khai thc Big Data trong lnh vc thng tin - vin thng. 2013 [cited
2014 10/10]; Available from: http://www.pcworld.com.vn/articles/kinh-doanh/giaiphap/2013/09/1234258/khai-thac-big-data-trong-linh-vuc-thong-tin-vien-thong/.
[12]
Han, H., et al., Toward Scalable Systems for Big Data Analytics: A Technology
Tutorial. Access, IEEE, 2014. 2: p. 652-687.
[13]
Chen, M., S. Mao, and Y. Liu, Big Data: A Survey. Mobile Networks and
Applications, 2014. 19(2): p. 171-209.
[14]
Mangalindan, J., Amazons recommendation secret. Cable News Network. A
Time Warner Company. Online referred to, 2012. 4: p. 2013.
[15]
Cook, S., et al., Assessing Google flu trends performance in the United States
during the 2009 influenza virus A (H1N1) pandemic. PloS one, 2011. 6(8): p. e23610.
[16]
HP. ng dng ca Big Data trong lnh vc y t. 2014 [cited 2014 7/11];
Available from: http://vht.com.vn/ung-dung-cua-big-data-trong-linh-vuc-y-te/.
72

Tp san Tin hc qun l, tp 03, s 1&2, 2014.


[17]
Lampitt, A., The real story of how Big Data analytics helped Obama win. Info
World, 2013. 14.
[18]
Goll, D. Santa Cruz firm PredPol helps predict, prevent property crimes. 2012
[cited
2014
10/11];
Available
from:
http://www.bizjournals.com/sanjose/news/2012/06/11/new-santa-cruz-company-toplead-case.html?page=all.
[19]
Meulen, R.v.d. Gartner Says Business Intelligence/Analytics Is Top Area for
CFO Technology Investment Through 2014. 2013 [cited 2014 10/11]; Available
from: http://www.gartner.com/newsroom/id/2488616.
[20]
Lewis, H. Big data - Time for a lean approach in financial services. 2012 [cited
2014
10/10];
Available
from:
http://www2.deloitte.com/content/dam/Deloitte/global/Documents/FinancialServices/dttl-fsi-uk-mi-da-big-data.pdf.
[21]
IBM. What is big data? [cited 2014 7/10]; Available from: http://www01.ibm.com/software/data/Big Data/what-is-big-data.html.
[22]
Khan, N., et al., Big data: survey, technologies, opportunities, and challenges.
ScientificWorldJournal, 2014. 2014: p. 712826.
[23]
Cukier, K., Data, data everywhere: A special report on managing information.
2010: Economist Newspaper.
[24]
Wikibon, A Comprehensive List of Big Data Statistics [Online]. 2013.
[25]
Lenzerini, M. Data integration: A theoretical perspective. in Proceedings of the
twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database
systems. 2002. ACM.
[26]
Cafarella, M.J., A. Halevy, and N. Khoussainova, Data integration for the
relational web. Proceedings of the VLDB Endowment, 2009. 2(1): p. 1090-1101.
[27]
Maletic, J.I. and A. Marcus. Data Cleansing: Beyond Integrity Analysis. in IQ.
2000. Citeseer.
[28]
Kohavi, R., et al., Lessons and challenges from mining retail e-commerce data.
Machine Learning, 2004. 57(1-2): p. 83-113.
[29]
Chang, F., et al., Bigtable: A distributed storage system for structured data. ACM
Transactions on Computer Systems (TOCS), 2008. 26(2): p. 4.
[30]
Xie, X.L., Z.X. Sun, and Z. Xiong, The research of the key tech-application
based on HBase for e-government cloud of minority areas. Applied Mechanics and
Materials, 2014. 530: p. 827-831.
[31]
Manyam, G., et al., Relax with CouchDBInto the non-relational DBMS era of
bioinformatics. Genomics, 2012. 100(1): p. 1-7.
[32]
Boykin, O., et al., Summingbird: A Framework for Integrating Batch and Online
MapReduce Computations. Proceedings of the VLDB Endowment, 2014. 7(13).
[33]
Lee, K.-H., et al., Parallel data processing with MapReduce: a survey. AcM
sIGMoD Record, 2012. 40(4): p. 11-20.
[34]
Rao, B.T. and L. Reddy, Survey on improved scheduling in hadoop mapreduce in
cloud environments. arXiv preprint arXiv:1207.0780, 2012.

Thng tin tc gi
L Th Qunh Nga,
Khoa H Thng Thng Tin Kinh Doanh H Kinh T HCM,
Email: nga.lethiquynh@ueh.edu.vn
Nguyn Mnh Tun,
Khoa H Thng Thng Tin Kinh Doanh H Kinh T HCM,
Email: tuannm@ueh.edu.vn

73

You might also like