Professional Documents
Culture Documents
Data Science With SAS and Cloudera
Data Science With SAS and Cloudera
and Cloudera
Josh Wills, Senior Director of Data Science
Cloudera
One Definition
versus Another
What I Think I Do
What I Actually Do
10
11
12
13
14
Web index
Recommendation
systems
Sensor data
Market basket analysis
Online advertising
Enter Hadoop
15
16
Map Stage
Embarrassingly parallel
Like a DATA Step
Like PROC SORT
Reduce Stage
Process all of the values that have the same key in a single
step
Like PROC MEANS with a BY statement
17
Apache Hive
SQL-based query
language
18
19
20
21
22
Going Supernova
23
24
Cloudera Impala
25
SAS LASR
26
27
28
Iterative Algorithms
29
30
31
32
33
34
K-Means Clustering
35
36
K-Means++
37
38
39
40
41
42
Operational Analytics
43
44
Question-driven
Interactive
Ad-hoc, post-hoc
Fixed data
Output is embedded into a
report or in-database
scoring engine
Operational Analytics
Metric-driven
Automated
Systematic
Fluid data
Output is a production
system that makes
customer-facing decisions
45
Thank you!
@josh_wills