Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Alteryx Inspire Conference

 Field summary used to investigate data type & statistical dist.


 Scatter plots & plot of means can be used for exploratory data analysis
 Impute tool (handles missing or zero values) with mean as an option
 Are 0 values included in the mean calculation?
 P-value analysis on target variable (lower the value more significant the result)
 Association measure (analysis only relevant for linear/logistic regression)
 Create samples tool: creates a training/testing set
 Linear regression (interactive tool provides breakdown of results). Especially look for lowest
p value indicating most relevance (statistical significance)
 Intercept value (value if every other variable is zero)
 OLS analysis (spread of errors will reveal model bias)
 Stepwise regression (re-selects predictor variables depending on their significance)
 Oversample tool (selects samples biased to a certain value)
 Log normalisation (dealing with skewed data)
Log([value]+1), regression deals easier with linearised data
 Confusion matrix will give values of false positives/negatives
 Using false positives, we can oversample that to 50% split to train the model

 Decision Tree (green: path to failure, orange represents success, Tree Classification browse
tool, if it is a yes (go to the left otherwise right)
Accuracy at each node can be shown
 Union tool can also combine model objects together

Understanding Time Series

 Always start with a field summary (describe())


 Find any missing periods
 MUST have consecutive periods between beginning and ending periods

 TS Filler fills missing gaps


 Green bar represents population of numeric vs. null values
 TS Plots allows you to analyse time series data in terms of decomposition, auto-correlation,
partial auto-correlation

 Log frequency/sample to look at relative basis over time


 Clustering is an un-supervised learning technique
 Udacity (predictive analytics course). Can do

Cache & run workflow (caching up till a certain point in a workflow)


Insights tool – has a built in viz platform

Putler’s Predictive Analytics Pyramid

 Determine information needed to address problem/issue


 Find & engineer appropriate and meaningful predictors
 Relationship between predictors & target
 Determine type of models needed

Meaningful metrics for prediction

Decision makers can tend to jump to a solution too soon rather than determining what information
is really needed to inform the problem/solution.

Comparing metrics from different types of models

Is it providing signal or creating noise in the model

Which predictor matters the most when making a prediction

Different modelling methods use different measures of effect size

How does predicted value change as level of numeric predictor increases or as the category changes
for a categorical predictor

For classification models – predicted probability for each possible target classes

Regression models (predicted numeric value of target)

Metrics - Regression

1. MAPE (%)
2. RMSE
3. Correlation between actual & predicted values

Metrics - Binary or Multi-Class Models

- Area under receiver operator curve (AUC) only for binary, can have multi-class extension to
it
- Confusion matrix
- Log-loss (penalise based on count)

Partial dependency plot (fitted values across range of a focal predictor)

Multi-collinearity only starts affecting the model when number of records are a lot

Reverse-causality

Efficiency

 Performance
 Memory
 Hard drive space
 Load on servers during production

Develop Efficiency

Caching

 Right-click & cache to avoid re-running workflow

Reduce by sampling

Ctrl+f (in all caps, can search for values within tools)

Can load games (in ‘about’ section)

HIPPO (Highest Paid Person’s Opinion)

You might also like