Professional Documents
Culture Documents
Data Science Inaugurazione AA 2017 18
Data Science Inaugurazione AA 2017 18
Data Science Inaugurazione AA 2017 18
University of Milano-Bicocca
1
Departments involved
2
The three stakeholders, students, companies,
teachers: how to boost cooperation among them?
Courses
Early Bird
and Labs
Initiative
Kaggle
3
Students
4
Statistics on enrolled Students - 1
Per area culturale
Economia e marketing Statistica
Informatica Matematica
Fisica Scienze Com
Ingegnerie Filosofia
0,9
0,9 0,9
0,9
3,7
0,9
5,5
7,3
38,5
16,5
19,3
5
Statistics on enrolled Students - 2
Per area geografica di provenienza
1,1
9,9
7,7
14,3
55,4
5,5
25,3
6
The three stakeholders, students, companies,
teachers: how to boost cooperation among them?
Courses
and Labs
7
From S. Ceri, EDBT Venice, March 2017
How big is the genome? How many genomes will be sequenced
As a string: 700MByte in 5 years?
As raw data: 200 Gbyte Estimates: order of 5-20 Millions
As called mutations: 125MByte Very big data problem
From small data to big data
Broadness of observed reality
Time
Time
Courses
Data Science Lab in
Environment & Physics
1 among 3
BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems
for data science
BDPhis - Big data management
and analysis in physics research
IP – Signal and image
processing Data Science Lab in biosciences
1 among 3
First year 12
Second year
Courses 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
Common courses BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems
Analytical track for data science
BDPhis - Big data management
and analysis in physics research
1 among 3
First year 13
Second year
Scientific areas 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
BDGIS – Big Data in Geo-
Computer Science CYB – Cybersecurity graphical Information Systems
for data science
Statistics BDPhis - Big data management
and analysis in physics research
IP – Signal and image
SocioEconomic processing Data Science Lab in biosciences
Ana
Mixed BDB&B – Big data in
lytical TIDS – Technological infra- biotechnology & biosciences
structures for data science
track MSBD - Making sense of
biological data
1 among 3
1 among 3 Data Science Lab in Medicine
DM&DV - Data FC – Foundam. ASM – Advanced
management in Comp.Sc. BDM1 – - Big Data
statistical methodologies
and visualization in Health Care
for Big Data
DS – Data BDM2 - Medical imaging
Semantics SDM – Streaming & big data Labs
MLDM – Machine data management and
Learning & time series analysis
Decision Models IS – Information IL – Industry Lab
TMS – Text mining
Systems EDS – Economics for Data
and search
Science
JSI – Juridical & 1 among 2 Data Science Lab in Business & Marketing
Social Issues in 1 among 3
EW – Expert Week
Information FS – Found. BDBF – - Big data in Business
Society in Stat. & Pr. SMA – Social Media and Finance
Busi- Analytics
STDA – Statistical BDBP - Big data in
modelling
WM&CM – Web
marketing & ness SS –Service Science Behavioural Psycology
Communication
Management Track BI - Business Data Science Lab in Public Policies & Services
Intelligence
BDPHe – - Big Data
DSL - Data Science Lab in Public Health
1 among 3
First year 14
Second year
DSc – a dynamic, evolving science 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
Statistics BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems
1 among 3
First year 15
Second year
Four V’s of Big Data
•Volume
•Velocity
•Variety
•Value
16
Change of Paradigm…in Data Management Systems
Volume
NoSQL + Hadoop +
Big Data MapReduce Hadoop & Spark
(plus: distributed file system)
Spark
Small SQL + Traditional DBMS (plus: in-memory processing)
Data
2. VELocity
VEL BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems
for data science
Ana
4. VALue lytical
processing Data Science Lab in biosciences
VOL BDB&B – Big data in
1 among 3 track TIDS – Technological infra- biotechnology & biosciences
structures for data science
FC – Foundam. MSBD - Making sense of
VOL in Comp.Sc. biological data
1 among 3
DM&DV - Data VAR VOL Data Science Lab in Medicine
management
DS – Data ASM – Advanced statistical
and visualization BDM1 – - Big Data
Semantics methodologies for Big Data in Health Care
VAL VAL VEL
SDM – Streaming BDM2 - Medical imaging
IS – Information VOL VOL
MLDM – Machine TMS – Text data management and & big data
Learning & Systems time series analysis
mining
Decision Models
and search VEL VAL
VEL
1 among 2 EDS – Economics for Data IL – Industry Lab
VAL Science
JSI – Juridical & FS – Found. VAL
EW – Expert Week
Social Issues in in Stat. & Pr VAL Data Science Lab in Business & Marketing
Information 1 among 3 VAL
Society VAL BDBF – - Big data in Business
WM&CM – Web
marketing & Busi- SMA – Social Media and Finance
STDA – Statistical Analytics
modelling
Communication
Management ness BDBP - Big data in
DSL1 – Data
Data Science Lab BI - Business
Science Lab 1 Data Science Lab in Public Policies & Services
Intelligence
BDPHe – - Big Data
in Public Health
1 among 3
First year
Second year
Data types
• Tables
• Relational (keys, referential integrity, etc.)
• Weak semantics (e.g. csv)
• Texts
• Loosely structured
• Semistructured (e.g. XML)
• Signals (from the Internet of Things)
• Images (e.g. X-ray, security, etc.)
• Graphs
• Mathematical - syntactic
• Knowledge - semantic
• Open data & Linked Open Data
• Maps & Remote sensing & Georeferenced data
• Mixed (Web data)
20
Main Data Types 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
Tables & Series CYB – Cybersecurity BDGIS – Big Data in Geo-
for data science graphical Information Systems
Signals and images IP – Signal and image BDPhis - Big data management
and analysis in physics research
processing
Knowledge graphs IP – Signal and image Data Science Lab in biosciences
22
Phases of the life cycle and main feedbacks
1. Access
2. Management
3. Visualization
4. Analysis
5. Diffusion
23
Phases of the life cycle - detail
1. Access & Acquisition
• Search
• Selection
• Acquisition
2. Management
• Filtering
• Quality assessment
• Semantic interpretation & enrichment
• Matching & integration
3. Visualization
4. Analysis
• Descriptive analysis: what happened or what is happening.
• Diagnostic analysis: why it happened or why it is happening.
• Predictive analysis: what will happen
• Prescriptive analysis : what to do to achieve the goal
5. Diffusion
24
Main Phases of the Life Cycle 1 among 4
Data Science Lab in
Environment & Physics
1 among 3
Access & Acquisition BDGIS – Big Data in Geo-
CYB – Cybersecurity graphical Information Systems
MLDM – Machine IS – Information TMS – Text Data Science Lab in Business & Marketing
Learning & mining EDS – Economics for Data
Systems Science
Decision Models and search BDBF – - Big data in Business
and Finance
1 among 2 1 among 3
EW – Expert Week BDBP - Big data in
JSI – Juridical & FS – Found. Behavioural Psycology
Social Issues in SMA – Social Media
in Stat. & PC Analytics
Information
Society
WM&CM – Web
Busi- Data Science Lab in Public Policies & Services
SS –Service Science
STDA – Statistical
marketing &
Communication
ness BDPHe – - Big Data
modelling Management
Track BI - Business
Intelligence
in Public Health
SAS
Lab Track in Public Health
Kaggle
BDPS - Big Data in Public and
Social Services
Phyton
1 among 3
First year 26
Second year
The three stakeholders, students, companies,
teachers: how to boost cooperation among them?
Kaggle
27
Kaggle: a platform
managing data challenges
It allows to:
• Participate in Dataset-specific competitions orga-nized
by Companies
• Grow up Data Science skills through practical
experience on Datasets provided by Companies
• Get Academic Credits
• Know about Job Offers
28
Students Portfolio
• https://www.linkedin.com/pulse/building-data-science-
portfolio-newcomers-guide-data-scientist
29
The three stakeholders, students, companies,
teachers: how to boost cooperation among them?
Early Bird
Initiative
30
Early Bird Initiative
Opportunities of collaboration for companies
• Training activities
1. Testimonials and Case studies
2. Teaching in the first year «Data Science Lab» and in the
second year «Industry Lab»
3. Hackathons
4. Certifications
• Internships
• Final thesis
31
Other types of contributions from companies
To Students
• Scolarships
• Grants for
1. Internships in Italian companies
2. Internships in European universities or companies (Erasmus
programs)
3. Internships in extra-European universities or companies (Extra
programs)
• Degree Awards
Training services
• Access to big data infrastructures
Communication and Marketing
• Endorsement
• Donations (with tax benefit)
32
Erasmus and Double Degrees
33
Start-ups
34
Want to know more?
•Access http://datascience.disco.unimib.it/
35
Timetable
See
www.disco.unimib.it
36