Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Capstone

 Projects  Introduc1on  
MAS  on  Data  Science  and  Engineering  
 
Dr.  Ilkay  Al1ntas  
al1ntas@sdsc.edu    
What  is  a  Capstone  Project?  
•  Objec&ve:  To  complete  an  end  to  end  analysis  of  a  large  
(>10GB)  dataset.    
–  Includes    
•  data  collec1on,    
•  ETL,    
•  exploratory  analysis,    
•  model  building,  and    
•  visualiza1on  /  repor1ng.  

•  Products:    
–  Final  report  (preferred  if  publishable  as  a  conference  paper)  
–  Output  data  products  
–  Developed  analy1cal  tools/methods/workflows  (if  applicable)  
Milestones  for  the  Capstone  Project  
•  First  Year:  
–  Fall  and  winter  quarters:  Advisors  give  short  presenta1ons  so  
that  students  can  iden1fy  who  they  want  to  work  with.    
–  Spring  Quarter:  Students  form  teams,  define  project  and  find  
advisor.    

•  Second  Year:  
–  Fall  and  Winter  quarters:  Teams  work  on  their  projects  and  
present  progress  reports.  Alternate  mee1ngs:  once  a  month  for  
2  hours  with  advisor,  once  a  month  for  15  minutes  with  over-­‐
seer.    
–  Spring  quarter:  Teams  finalize  their  projects,  including  
documenta1on  and  final  report.  Teams  make  open  
presenta1ons  to  their  peers,  advisor  and  over-­‐seer,  and  receive  
final  grade.    
Ques&on:  Is  team  work  allowed?  
•  Team  work  is  encouraged  

•  A  project  team  can  consist  of  1-­‐3  students  

•  In  case  of  team  work,  a  full  project  plan  on  


distribu1on  of  tasks  and  roles  should  be  
submi^ed  to  and  approved  by  the  advisor  and  
Dr.  Al1ntas  
Ques&on:  How  much  work  per  week  is  
expected?  
•  A  project  should  require  5-­‐10  hours  per  week  
by  all  project  par1cipants.  

•  The  ac1vi1es  per  week  should  be  discussed  in  


a  regular  mee1ng  with  advisors    
–  Mee1ngs  at  least  twice  month  is  advised  
Ques&on:  Can  I  use  company  data  or  
work  on  a  company  related  project?  
•  Admissible  under  the  following  condi1ons:  

1.  The  input  data  and  the  results  files  will  be  made  
available  to  the  project  advisor.  

2.  The  final  report  on  the  project,  on  which  the  
project  grade  will  be  based,  is  a  public  
document.  
Example  Project  1:  Traffic  Analysis  
•  Combine  several  traffic  informa1on  sources  
–  e.g.,
h^p://www.programmableweb.com/category/all/apis?
keyword=traffic&order=field_popularity)  to  track  the  traffic  in  san  diego  
county.  

•  Analyze  pa^erns  of  traffic  and  accidents  and  create  a  predic1ve  model  
that  can  be  used  to:    

–  Find  the  loca1on  and  cause  of  recurring  problems.  (advise  for  authori1es)  
–  Predict  the  travel  1me  between  two  points  using  different  routes  currently  or  
at  a  1me  in  the  future.  

•  Create  a  web  site  or  web  API  for  dissemina1ng  this  informa1on  

•  Write    a  10  page  report  on  the  project  and  on  the  main  conclusions  
Example  Project  2:  Wildfire  Data  Analysis  
•  Integrate  satellite,  sensor  and  archived  model  data  
–  e.g.:  h^p://wifire.ucsd.edu      

•  Analyze  wildfire  phenomena  before,  during  and  aeer  a  wildfire  


–  What  are  the  wildfire  weather  pa^erns?  
–  What  is  the  fire  risks  for  an  area  based  on  environmental  condi1ons?  

•  Create  a  Kepler  workflow  to  integrate  the  analysis  and  to  


disseminate  the  project  via  the  web  

•  Write    a  10  page  report  on  the  project  and  on  the  main  conclusions  
Conceptualizing    
a  Capstone  Project  
1:  Start  with  the  Applica1on  As  a  
Blackbox    
•  Treat  the  whole  project  
as  a  blackbox  
–  What  is  the  usecase/
applica1on?   Input  data   f   Outputs  
•  What  is  the  ques1on/
phenomena  this  project  
is  solving?  
–  What  is  the  input  data?   My  
–  What  are  the  expected   capstone  
outcomes/data  science  
value?   project  
•  Give  the  project  a  1tle  based  on  ini1al  assessment!  
2:  Conceptualiza1on  of  Data  Science  
and  Analy1cal  Steps  
Conceptual  Steps  for   Bake  Turkey  
•  ...   •  …  
Cooking  a   •  Cook    
•  Make  Cranberry  Sauce  
•  …   •  Cut  Veggies  
Thanksgiving  Meal   •  Chill   •  Prepare   •  Prepare  Stuffing  
•  ….   •  Cook   •  …  

•  …   Make  Side  
Bake  Pie  
Dishes  
SBNL workflow

Quality Evaluation & Data


Big Data Partitioning

Local Learner

Data Quality Local Ensemble


Evaluation Learning
Conceptual  Steps  for  
Insurance  and  Traffic  
Master Learner
Data  Analy1cs  using  Big  
Data  Bayesian  Network  
MasterEnsemble Final BN
Learning Structure

Learning  
3:  Treat  Each  Step  as  a  Separate  Sub-­‐Project    
-­‐  un1l  you  reach  an  atomic  func1onal  step  -­‐  
Find  data     Clean  data   Interpret  results  
Access  data   Integrate  data   Analyze  data   Summarize  results  
Acquire  data   Subset  data   Process  data   Visualize  results  
Move  data   Pre-­‐process  data   Post-­‐process  results  

Some  ques1ons  to  ask:  


•  Where  and  how  do  I  get  the  data?  
•  What  format  is  the  data  in?  
•  How  do  I  integrate  or  subset  datasets?  
•  How  do  I  analyze  the  data  and  what  is  the  analysis  func1on?  
•  What  are  the  parameters  to  customize  each  step?  
•  What  are  the  compu1ng  needs  to  schedule  and  run  each  step?  
•  How  do  I  make  sure  the  results  are  useful  for  the  next  step  or  
as  scien1fic  products?    
4.  Plan  your  Project  
•  Create  a  design  document  and  milestones  
based  on  the  first  three  steps  

•  Agree  on  team  responsibili1es  

•  Create  a  regular  mee1ng  calendar    

•  Iden1fy  poten1al  road  blocks  


Next  Step  1:  Meet  the  advisors!  
•  Presenta1ons  by  poten1al  advisors  

•  A  website  with  all  the  advisors  and  projects  will  be  


created  

•  Arrange  mee1ngs  with  the  advisors  to  discuss  their  


interests  

•  Email  me  (al1ntas@sdsc.edu)  to  ask  if  you  have  any  


ques1ons  
***  Start  email  subject  with  “MAS/DSE  Capstone  14-­‐15:”  ***  
Next  Step  2:  Think  of  a  project  
•  Start  early!!!  

•  Thinks  to  consider:  


–  Data  sources  
–  Analysis  problems  
–  What  MAS/DSE  knowledge  you  will  be  applying  
–  Feasibility  in  a  year  
–  Output  organiza1on  
Ques1ons?  

You might also like