Data Science Inaugurazione AA 2017 18

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Master Degree in

University of Milano-Bicocca

1
Departments involved

•Dipartimento di Informatica, Sistemistica e


Comunicazione (DISCo)
•Dipartimento di Economia, Metodi quantitativi e
Strategie di impresa
•Dipartimento di Statistica e Metodi quantitativi

2
The three stakeholders, students, companies,
teachers: how to boost cooperation among them?

Courses
Early Bird
and Labs
Initiative

Kaggle

3
Students

4
Statistics on enrolled Students - 1
Per area culturale
Economia e marketing Statistica
Informatica Matematica
Fisica Scienze Com
Ingegnerie Filosofia

0,9
0,9 0,9
0,9
3,7
0,9
5,5

7,3
38,5

16,5

19,3

5
Statistics on enrolled Students - 2
Per area geografica di provenienza

1,1

9,9

7,7

14,3
55,4

5,5

25,3

Bicocca Altre Milano Lombardia altre Nord Centro Sud

6
The three stakeholders, students, companies,
teachers: how to boost cooperation among them?

Courses
and Labs

7
From S. Ceri, EDBT Venice, March 2017
How big is the genome? How many genomes will be sequenced
As a string: 700MByte in 5 years?
As raw data: 200 Gbyte Estimates: order of 5-20 Millions
As called mutations: 125MByte Very big data problem
From small data to big data
Broadness of observed reality

Time

Depth in knowledge of observed reality 9


10
From small data to big data
Broadness of observed reality

Time

Depth in knowledge of observed reality 11


1 among 4

Courses
Data Science Lab in
Environment & Physics
1 among 3
BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems
for data science
BDPhis - Big data management
and analysis in physics research
IP – Signal and image
processing Data Science Lab in biosciences

BDB&B – Big data in


1 among 3 TIDS – Technological infra- biotechnology & biosciences
structures for data science
DM&DV - Data FC – Foundam. MSBD - Making sense of
management in Comp.Sc. biological data
and visualization 1 among 3 Data Science Lab in Medicine
DS – Data ASM – Advanced statistical
Semantics BDM1 – - Big Data
MLDM – Machine methodologies for Big Data in Health Care
Learning &
TMS – Text SDM – Streaming BDM2 - Medical imaging
Decision Models IS – Information
Systems mining
data management and & big data Labs
time series analysis
and search
JSI – Juridical & 1 among 2 IL – Industry Lab
Social Issues in EDS – Economics for Data
EW – Expert Week
Information FS – Found. Science
Society in Stat. & P. Data Science Lab in Business & Marketing
1 among 3
STDA – Statistical WM&CM – Web BDBF – - Big data in Business
modelling marketing & SMA – Social Media and Finance
Communication Analytics
Management BDBP - Big data in
SS –Service Science Behavioural Psycology

DSL - Data Science


Lab BI - Business Data Science Lab in Public Policies & Services
Intelligence
BDPHe – - Big Data
in Public Health

BDPS - Big Data in Public and


Social Services

1 among 3
First year 12
Second year
Courses 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
Common courses BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems
Analytical track for data science
BDPhis - Big data management
and analysis in physics research

Business track IP – Signal and image


processing Data Science Lab in biosciences
Ana BDB&B – Big data in
1 among 3 lytical TIDS – Technological infra- biotechnology & biosciences
structures for data science
DM&DV - Data
management
FC – Foundam. track MSBD - Making sense of
biological data
in Comp.Sc.
and visualization 1 among 3 Data Science Lab in Medicine
DS – Data ASM – Advanced statistical
Semantics BDM1 – - Big Data
MLDM – Machine methodologies for Big Data in Health Care
Learning &
TMS – Text SDM – Streaming BDM2 - Medical imaging
Decision Models IS – Information
Systems mining
data management and & big data Labs
time series analysis
and search
JSI – Juridical & 1 among 2 IL – Industry Lab
Social Issues in EDS – Economics for Data
EW – Expert Week
Information FS – Found. Science
Society in Stat. & P. Data Science Lab in Business & Marketing
1 among 3
STDA – Statistical WM&CM – Web BDBF – - Big data in Business
modelling marketing & SMA – Social Media and Finance
Communication
Management
Busi- Analytics
BDBP - Big data in
ness SS –Service Science Behavioural Psycology

DSL - Data Science Track BI - Business


Lab Data Science Lab in Public Policies & Services
Intelligence
BDPHe – - Big Data
in Public Health

BDPS - Big Data in Public and


Social Services

1 among 3
First year 13
Second year
Scientific areas 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
BDGIS – Big Data in Geo-
Computer Science CYB – Cybersecurity graphical Information Systems
for data science
Statistics BDPhis - Big data management
and analysis in physics research
IP – Signal and image
SocioEconomic processing Data Science Lab in biosciences
Ana
Mixed BDB&B – Big data in
lytical TIDS – Technological infra- biotechnology & biosciences
structures for data science
track MSBD - Making sense of
biological data
1 among 3
1 among 3 Data Science Lab in Medicine
DM&DV - Data FC – Foundam. ASM – Advanced
management in Comp.Sc. BDM1 – - Big Data
statistical methodologies
and visualization in Health Care
for Big Data
DS – Data BDM2 - Medical imaging
Semantics SDM – Streaming & big data Labs
MLDM – Machine data management and
Learning & time series analysis
Decision Models IS – Information IL – Industry Lab
TMS – Text mining
Systems EDS – Economics for Data
and search
Science
JSI – Juridical & 1 among 2 Data Science Lab in Business & Marketing
Social Issues in 1 among 3
EW – Expert Week
Information FS – Found. BDBF – - Big data in Business
Society in Stat. & Pr. SMA – Social Media and Finance
Busi- Analytics
STDA – Statistical BDBP - Big data in
modelling
WM&CM – Web
marketing & ness SS –Service Science Behavioural Psycology
Communication
Management Track BI - Business Data Science Lab in Public Policies & Services
Intelligence
BDPHe – - Big Data
DSL - Data Science Lab in Public Health

BDPS - Big Data in Public and


Social Services

1 among 3
First year 14
Second year
DSc – a dynamic, evolving science 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
Statistics BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems

Computer Science for data science


BDPhis - Big data management
and analysis in physics research
IP – Signal and image
SocioEconomic processing Data Science Lab in biosciences
Ana
Mixed BDB&B – Big data in
lytical TIDS – Technological infra- biotechnology & biosciences
structures for data science
track MSBD - Making sense of
biological data
1 among 3
1 among 3 Data Science Lab in Medicine
DM&DV - Data FC – Foundam. ASM – Advanced
management in Comp.Sc. BDM1 – - Big Data
statistical methodologies
and visualization in Health Care
for Big Data
DS – Data BDM2 - Medical imaging
Semantics SDM – Streaming & big data Labs
MLDM – Machine data management and
Learning & time series analysis
Decision Models IS – Information IL – Industry Lab
TMS – Text mining
Systems EDS – Economics for Data
and search
Science
JSI – Juridical & 1 among 2 Data Science Lab in Business & Marketing
Social Issues in 1 among 3
EW – Expert Week
Information FS – Found. BDBF – - Big data in Business
Society in Stat. & P. SMA – Social Media and Finance
Analytics
STDA – Statistical WM&CM – Web BDBP - Big data in
modelling marketing & SS –Service Science Behavioural Psycology
Communication Busi-
Management
ness BI - Business Data Science Lab in Public Policies & Services
Intelligence
Track BDPHe – - Big Data
DSL - Data Science Lab in Public Health

BDPS - Big Data in Public and


Social Services

1 among 3
First year 15
Second year
Four V’s of Big Data

•Volume
•Velocity
•Variety
•Value

16
Change of Paradigm…in Data Management Systems

Volume
NoSQL + Hadoop +
Big Data MapReduce Hadoop & Spark
(plus: distributed file system)

Spark
Small SQL + Traditional DBMS (plus: in-memory processing)
Data

Long-term Streaming Velocity


changing data data
17
The four Vs: 1 among 4
Data Science Lab in
1. VOLume 1 among 3
Environment & Physics

2. VELocity
VEL BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems
for data science

3. VARiety IP – Signal and image


VEL
BDPhis - Big data management
and analysis in physics research

Ana
4. VALue lytical
processing Data Science Lab in biosciences
VOL BDB&B – Big data in
1 among 3 track TIDS – Technological infra- biotechnology & biosciences
structures for data science
FC – Foundam. MSBD - Making sense of
VOL in Comp.Sc. biological data
1 among 3
DM&DV - Data VAR VOL Data Science Lab in Medicine
management
DS – Data ASM – Advanced statistical
and visualization BDM1 – - Big Data
Semantics methodologies for Big Data in Health Care
VAL VAL VEL
SDM – Streaming BDM2 - Medical imaging
IS – Information VOL VOL
MLDM – Machine TMS – Text data management and & big data
Learning & Systems time series analysis
mining
Decision Models
and search VEL VAL
VEL
1 among 2 EDS – Economics for Data IL – Industry Lab
VAL Science
JSI – Juridical & FS – Found. VAL
EW – Expert Week
Social Issues in in Stat. & Pr VAL Data Science Lab in Business & Marketing
Information 1 among 3 VAL
Society VAL BDBF – - Big data in Business
WM&CM – Web
marketing & Busi- SMA – Social Media and Finance
STDA – Statistical Analytics
modelling
Communication
Management ness BDBP - Big data in

VOL Track SS –Service Science Behavioural Psycology

DSL1 – Data
Data Science Lab BI - Business
Science Lab 1 Data Science Lab in Public Policies & Services
Intelligence
BDPHe – - Big Data
in Public Health

BDPS - Big Data in Public and


Social Services

1 among 3
First year
Second year
Data types
• Tables
• Relational (keys, referential integrity, etc.)
• Weak semantics (e.g. csv)
• Texts
• Loosely structured
• Semistructured (e.g. XML)
• Signals (from the Internet of Things)
• Images (e.g. X-ray, security, etc.)
• Graphs
• Mathematical - syntactic
• Knowledge - semantic
• Open data & Linked Open Data
• Maps & Remote sensing & Georeferenced data
• Mixed (Web data)

20
Main Data Types 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
Tables & Series CYB – Cybersecurity BDGIS – Big Data in Geo-
for data science graphical Information Systems

Signals and images IP – Signal and image BDPhis - Big data management
and analysis in physics research
processing
Knowledge graphs IP – Signal and image Data Science Lab in biosciences

Loosely Str..&Semistr. texts Ana processing BDB&B – Big data in


lytical TIDS – Technological infra- biotechnology & biosciences
Maps & Georef. data track
structures for data science MSBD - Making sense of
biological data
Not relevant 1 among 3 Data Science Lab in Medicine
1 among 3 ASM – Advanced statistical BDM1 – - Big Data
DM&DV - Data methodologies for Big Data in Health Care
management FS – Foundam.
in Informatics SDM – Streaming BDM2 - Medical imaging
and visualization data management and & big data Labs
time series analysis
DS – Data
MLDM – Machine Semantics IL – Industry Lab
Learning & TMS – Text EDS – Economics for Data
Decision Models mining Science
IS – Information
Systems and search Data Science Lab in Business & Marketing
1 among 3
JSI – Juridical & 1 among 2 BDBF – - Big data in Business
EW – Expert Week
Social Issues in SMA – Social Media and Finance
Information FS – Found. Analytics
Society in Stat. & Pr BDBP - Big data in
Busi- SS –Service Science Behavioural Psycology
STDA – Statistical
modelling
WM&CM – Web
marketing &
ness BI - Business
Track
Data Science Lab in Public Policies & Services
Communication Intelligence
Management
BDPHe – - Big Data
in Public Health
DSL1 – Data
Science Lab 1 BDPS - Big Data in Public and
Social Services
DSL1 – Data
Science Lab 1 1 among 3
First year Second year 21
Traditional Analysis Life cycle vs
new Analysis life cycle of digital (big) data
Big Data
Life cycle Cross cutting activities
Access
Traditional S
life cycle Management E Q L
M U E V
Extract A A A A
Visualization
Transform N L R L
T I N U
Load Analysis I T I E
C Y N
Diffusion S G

22
Phases of the life cycle and main feedbacks

1. Access

2. Management

3. Visualization

4. Analysis

5. Diffusion
23
Phases of the life cycle - detail
1. Access & Acquisition
• Search
• Selection
• Acquisition
2. Management
• Filtering
• Quality assessment
• Semantic interpretation & enrichment
• Matching & integration
3. Visualization
4. Analysis
• Descriptive analysis: what happened or what is happening.
• Diagnostic analysis: why it happened or why it is happening.
• Predictive analysis: what will happen
• Prescriptive analysis : what to do to achieve the goal
5. Diffusion
24
Main Phases of the Life Cycle 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
Access & Acquisition BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems

Management for data science BDPhis - Big data


management and analysis in
IP – Signal and image physics research
Visualization processing
Data Science Lab in biosciences

Analysis Ana BDB&B – Big data in


lytical TIDS – Technological infra- biotechnology & biosciences
Diffusion & Usage track
structures for data science
MSBD - Making sense of
biological data
All 1 among 3 Data Science Lab in Medicine
ASM – Advanced statistical BDM1 – - Big Data
DM&DV - Data methodologies for Big Data in Health Care
1 among 3
management
BDM2 - Medical imaging
FS – Foundam. SDM – Streaming
in Informatics data management
& big data Labs
DM&DV - Data
visualization DS – Data IL – Industry Lab
Semantics SDM - Time series analysis

MLDM – Machine IS – Information TMS – Text Data Science Lab in Business & Marketing
Learning & mining EDS – Economics for Data
Systems Science
Decision Models and search BDBF – - Big data in Business
and Finance
1 among 2 1 among 3
EW – Expert Week BDBP - Big data in
JSI – Juridical & FS – Found. Behavioural Psycology
Social Issues in SMA – Social Media
in Stat. & PC Analytics
Information
Society
WM&CM – Web
Busi- Data Science Lab in Public Policies & Services
SS –Service Science
STDA – Statistical
marketing &
Communication
ness BDPHe – - Big Data
modelling Management
Track BI - Business
Intelligence
in Public Health

BDPS - Big Data in Public and


DSL1 – Data Social Services
Science Lab 1 1 among 3
First year Second year 25
Main Platforms and languages 1 among 4
Data Science Lab in
Environment & Physics
SQL R Hadoop Kaggle 1 among 3
BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems
NoSQL SAS Spark for data science
BDPhis - Big data management
Knime Phyton IP – Signal and image and analysis in physics research
processing
Data Science Lab in biosciences
RDF & Sparql BPMN
Ana Hadoop Spark
BDB&B – Big data in
lytical TIDS – Technological infra-
structures for data science
biotechnology & biosciences

Kaggle NoSQL track MSBD - Making sense of


biological data
1 among 3
DM&DV - Data SQL 1 among 3 Data Science Lab in Medicine
management and FC – Foundam.
visualization in Comp.Sc. ASM – Advanced statistical BDM1 – - Big Data
Phyton methodologies for Big Data
Hadoop Spark in Health Care
Kaggle
DS – Data SDM – Streaming BDM2 - Medical imaging
Kaggle Knime Semantics RDF & Sparql data management and R & big data Labs
time series analysis
MLDM – Machine
Learning & IS – Information BPMN IL – Industry Lab
Decision Models Systems EDS – Economics for Data
Science
R Phyton TMS – Text
1 among 2
mining Data Science Lab in Business & Marketing
1 among 3
JSI – Juridical & FS – Found. and search
Social Issues in BDBF – - Big data in Business
in Stat. & P. SMA – Social Media and Finance
Information
Society EW – Expert Week Analytics
WM&CM – Web BDBP - Big data in
Kaggle
marketing & SS –Service Science Behavioural Psycology
Communication
Management Kaggle
STDA – Statistical
modelling Busi- BI - Business Data Science Lab in Public Policies & Services
Intelligence
R SAS DSL - Data Science
ness BDPHe – - Big Data

SAS
Lab Track in Public Health
Kaggle
BDPS - Big Data in Public and
Social Services
Phyton
1 among 3
First year 26
Second year
The three stakeholders, students, companies,
teachers: how to boost cooperation among them?

Kaggle

27
Kaggle: a platform
managing data challenges

It allows to:
• Participate in Dataset-specific competitions orga-nized
by Companies
• Grow up Data Science skills through practical
experience on Datasets provided by Companies
• Get Academic Credits
• Know about Job Offers

• Prof. Stella will provide further detail soon

28
Students Portfolio
• https://www.linkedin.com/pulse/building-data-science-
portfolio-newcomers-guide-data-scientist

29
The three stakeholders, students, companies,
teachers: how to boost cooperation among them?

Early Bird
Initiative

30
Early Bird Initiative
Opportunities of collaboration for companies

• Training activities
1. Testimonials and Case studies
2. Teaching in the first year «Data Science Lab» and in the
second year «Industry Lab»
3. Hackathons
4. Certifications
• Internships
• Final thesis

31
Other types of contributions from companies

To Students
• Scolarships
• Grants for
1. Internships in Italian companies
2. Internships in European universities or companies (Erasmus
programs)
3. Internships in extra-European universities or companies (Extra
programs)
• Degree Awards
Training services
• Access to big data infrastructures
Communication and Marketing
• Endorsement
• Donations (with tax benefit)
32
Erasmus and Double Degrees

•Strong effort to establish Erasmus agreements


and Double Degrees

•Prof. Pasi will provide further detail soon

33
Start-ups

•All students should consider the opportunity


to create a startup

•This is one of the topics of the expert week

34
Want to know more?

•Access http://datascience.disco.unimib.it/

35
Timetable

See

www.disco.unimib.it

36

You might also like