202406-NMIMS-MBA-BA-Hadoop-Project

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

NMIMS MBA BA

Big Data Analytics Project

Big Data Technology – Project – Hadoop

Pr o je ct Gr o up s
Th e p roj e ct wo u l d b e w o r k ed u p on su b mitt ed in g r ou p s o f 4 t o 5 s tu d en t s
Th e r e sh ou l d b e on l y on e su bm is si on p e r g r ou p .

O b je ct iv e s
Th i s p r og r am e n ab l e s the p ar ti cipan t s to r e vi e w an d i mpl em en t th e
le ar n i n g s of th e c ou rs e Big Dat a An al yti c s U sin g H ad oo p & it s
c omp on en ts .
Th e p ri ma r y obj e ct i v e o f th e p r oj e ct i s t o en h an c e th e p a rti cip an t ’s
kn owl e dg e o f P I G, H IVE & S Q O O P .

D at a se t
▪ E v e ry g ro u p sh ou l d pr o cu r e / u s e th e i r ow n da ta s et a ft e r s ea r ch in g f o r
sa m e on th e i n t e rn e t.
▪ N o t w o g r ou p sh ou l d u s e th e sa m e dat a .
▪ Th e d ata sh ou l d o f st ru ctu r ed t yp e w ith at l e a st th r e e c on tin u ou s
n u m er i c c ol u mn s an d t w o a lph an u m e ri c cat e go r ic c olu mn s (mo r e th an
tw o ca t eg o ri e s i n ea ch c olu mn ) .
▪ Th e r e sh ou l d b e a t l ea st th r e e f il es o f m in imu m 4 00 M B e ac h
▪ Ih e i n si gh t s y ou wi l l g en e rat e ou t of th e dat a s et is gi v en b el ow
▪ Th e r e i s n o n e ed t o g et y ou r dat as et app r o v ed b y y ou r p r of e s s o r .
Ma r ks w i l l b e gi v en on th e qu al ity o f th e dat a s et

Pr o je ct R e qu i r em e nt s
▪ Co py d ata s et t o y ou r w o r kin g di r e ct o ry o f y ou r ch oi c e in HD FS
▪ M er g e an d p a r s e th e fi l es u sin g PI G an d st o r e as .C SV fil e
▪ Re ad th e c s v fi l e u si n g h i v e an d p r o vid e in sigh t s t o th e dat a s e t
▪ Fo r th e an y t h r e e c o n tin u ou s n u me r ic c o lu mn s p r o vid e th e f oll o win g:
o Su m o f th e n u mb e r s in ea ch c olu mn
o Mi n o f th e n u mb e r s in ea ch c o lu mn
o Av e rag e o f t h e n u m be r s in e a ch c olu mn
o Ma x o f th e n u mb e r s in ea ch c olu mn
o Std D e v o f th e n u m be r s in e a ch c olu mn
o Va ri an c e of th e n u m be r s in e a ch c olu mn
o Co u n t o f o dd an d e v en n u mb e r s in ea ch c olu mn
▪ Fo r th e an y t w o al p h an u m e ri c c at eg o ri c c o lu mn s p r ovi d e th e f ol lo win g:
o Fr equ en c y tab l e o f t h e c at eg o ri e s
o Mo d e o f th e va l u e in ea ch c olu mn
▪ T ran sf e r d ata f r o m Ha d oo p to My S QL o n l o cal m ach in e .

202406-NMIMS-MBA-BA-Hadoop-Project.docx Page: 1/3


NMIMS MBA BA
Big Data Analytics Project

O ut pu t R eq u i re d
H DF S
▪ Al l H DF S C om man d s t o c r e at e th e f ol d e r an d c op y th e da ta s et t o HD FS
f ro m l oc al di r e ct o ry
▪ As p r o of of e x ec u ti o n , p ro vi d e ou t pu t o f “h df s df s - l s <h d f s - f old e r- n am e> ”
PI G
▪ Al l PI G C o mm an d s t o ext r ac t, t ran s f o rm an d l oa d th e fil e .
▪ As p r o of of e x ecu ti o n , pr o vi de ou t pu t o f th e DU MP c om man d b e f o r e th e C SV
fi l e i s sa v ed
H I VE
▪ Al l th e HIVE S Q L Co mm an d s t o g en e rat e t h e an sw e r s t o th e ab o v e
qu e ri e s
▪ As p r o of of e x ec u ti o n , p ro vi d e ou t pu t o f th e H IVE S Q L C om man d s
SQ O O P
▪ Th e S Q OO P c o mma n d r equ i r ed t o t ra n sf e r a ll dat a t o My SQ L ta bl e
wh er e MY S QL - DB i s dep l oy e d on y ou r l o cal fil e s y st em .
N ot e: a s pr o o f of ex e cu ti on
▪ Pr o vi d e th e ou tpu t o f My S QL d e s c
▪ Pr o vi d e th e ou tpu t o f My S QL stat e m en t
Sel e ct c ou n t( *) f r o m <tab l e- n a m e>

Pr o je ct R e po rt (W or d F i l e)
▪ Pr oj e ct O v er vi e w
▪ Co d e & C o mma n d S e cti on
▪ Su mm ar y
Pr o je ct O v e rv i e w
▪ Br i ef Ov e r vi e w O f T h e P r oj e ct
▪ L ea rn i n g Obj e ct i v e
Co de & Co mm a n d Se ct io n
▪ Al l th e c od e an d ou t pu t s e cti on a s st at e d in Ou tpu t R e qu i r e d
▪ Cl ea rl y m a rk th e t y pe o f c od e b e in g p r ov id ed u sin g r el e v an t p r om pt
Li n u x> o r G ru n t> o r Hiv e > or My S QL > e tc
Su mm a r y
▪ E xpl ai n h o w y ou u s e d Had o o p f o r Big D a ta A n al yti c s
▪ D es c ri b e y ou r e xp e r i en c e of u sin g Had o op f o r an al yz in g B ig D ata

202406-NMIMS-MBA-BA-Hadoop-Project.docx Page: 2/3


NMIMS MBA BA
Big Data Analytics Project

Ru b ri c / E v a l u at io n M et ho do l og y
Marks Excellent Good Unsatisfactory Poor
90% - 100% 60% - 80% 30-% – 50% 0 – 20%
HDFS Commands & 2 All required HDFS All required HDFS All required HDFS Not Attempted
Output commands are commands are commands are
correct or with correct or with small correct or with
minute error(s) in error(s) in code and / major error(s) in
code and / or or output code and / or
output output
PIG Commands & 10 All required PIG All required PIG All required PIG Not Attempted
Output commands are commands are commands are
correct or with correct or with small correct or with
minute error(s) in error(s) in code and / major error(s) in
code and / or or output code and / or
output output
HIVE Commands & 10 All required HIVE All required HIVE All required HIVE Not Attempted
Output commands are commands are commands are
correct or with correct or with small correct or with
minute error(s) in error(s) in code and / major error(s) in
code and / or or output code and / or
output output
Quality Of Data & 20 Dataset Dataset effectively Dataset effectively Dataset
Insights effectively represents a real represents a real effectively
represents a real business problem. business problem. represents a real
business problem. Dataset has Dataset has good business
Dataset has good reasonable number of number of problem.
number of features to apply features to apply Dataset has few
features to predictive model to predictive model features to apply
analyze to help help decision making. to help decision predictive model
decision making. making. to help decision
making.
Project Overview & 8 The project The project The project The project
Summary requirements and requirements and requirements and requirements and
specifications are specifications are specifications are specifications are
accurately met acceptably met improperly met poorly met
Total 50

Scaled To 20

202406-NMIMS-MBA-BA-Hadoop-Project.docx Page: 3/3

You might also like