Business Intelligence Software and Techniques: BUAN6324/MIS6324

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Business Intelligence

Software and
Techniques
BUAN6324/MIS6324
Spring, 2016 Gregory G. MacDonald, PhD
(Lecture #1)

Agenda for Tonight

Course Preliminaries/Logistics
Motivation for a Data Mining (DM) project
General modeling considerations
Typical phases of a DM project
Form groups (deferred for a few classes yet)
Install the required software
Short tool tutorial using the weather dataset
Assignment for next week
2

Office Hours
Section 501 (Monday)
Time: 6:00PM 7:00PM, Location: JSOM 3.604

Section 502 (Tuesday)


Time: 6:00PM 7:00PM, Location: JSOM 3.604

Email address:
gregory.macdonald@utdallas.edu
3

Course TA*
TBD

* See syllabus for more information

Software Tool Notice


This course covers theories and applications of business
intelligence. The focus is on extracting business
intelligence from firms' business data for various
applications, including (but not limited to) customer
segmentation, customer relationship management (CRM),
personalization, online recommendation systems, web
mining and product assortment. The emphasis is placed on
the 'know-how' -- knowing how to extract and apply
business intelligence to improve business decision-making.
Students will also acquire hands-on experience with
business intelligence software using Rattle (an R
package). Use of other toolsets is encouraged.
5

Why R? Its Moving Up!


2015

2014

July 24, 2015


Open Source
4000 +
packages
It is free!
Mac,
Linux,
Windows

Source: http://cacm.acm.org/news/189911-the-2015-top-10-programminglanguages/fulltext
6

Resources
Special note on the following texts:
Theory: Introduction to Data Mining
Tan, Steinbach, Kumar, 2006

Practice: Data Mining with Rattle and R


Williams, 2011
http://mineriaddatos.wikispaces.com/file/view/Data+Mining+W
ith+Rattle+and+R_+The+Art+of+Excavating+Data+for+Knowledge+
Discovery+-+Graham+Williams.pdf
7

Syllabus Review

Other Topics?

What is Data Mining?

?
10

What is Data Mining?


Extraction of information and knowledge
from data repositories Business
Intelligence
Typically draws from the following skill set:
AI, CS, Information Theory, Probability/Stats, Statistical
Learning Theory, Business sense (awareness)

What skill sets are represented here?


11

Why Data Mining?

?
12

Why Data Mining?


Placement of products in a store
Who is likely to buy product X?
What medical diagnosis is likely to be
correct?
Will customer X be a good loan risk?
Is this piece of factory equipment likely to
fail in the near future?
Customer churn
13

Terminology/Concepts
Some Terminology
Features are inputs (independent variables)
Target is the outcome (dependent variable)

General Concepts
Over-fitting
Generalization
Concept to be Learned
14

Types of Learning
Unsupervised no target value
Supervised target value (outcome)
Other forms of learning

15

Problems with Data


Data from different sources
Missing Data, How to handle?
Data Imputation making up data

Irrelevant Data
features that are irrelevant to the
concept

16

Problems with Modeling


Target leakage
One or more of the features (inputs
variables) may be a proxy for the target
variable
CustID
123464
324566

Status
L
P

Amount Past Due


$1,200.50
0

Class imbalance problem


17

Assessing Model
Goodness

?
18

Assessing Model
Goodness
Accuracy?
Classification Error Rate?
Receiver Operating Characteristic
Curve (ROC)
Confusion Matrix
Caution: several different representations
in the literature (i.e., transpose of the
matrix)
19

Receiver Operating
Characteristic

AU
C

20

Confusion Matrix
Features: Size, Tail length,
weight, ear length, eye
count, food consumed,
transportation method
Target: Cat, Dog, Rabbit

21

Typical DM Process

Determine data required


Clean, Merge and make up
Feature Selection
Model(s) construction/assessment*
Implementation

22

Typical DM Process

Determine data required Often > 70%


of the project
Clean and Merge
effort
Feature Selection
Model(s) construction/assessment*
Implementation

* Should you always use the best performing model?


23

Form Groups
4-5 people in a group, mix of
strengths
Group formation via eLearning
Group membership must stay fixed
for the semester

24

Tools Installation
https://www.youtube.com/watch?v=cX532N_XLIs
R http://www.r-project.org
RStudio (IDE for R) - https://www.rstudio.com
Rattle -

install.packages(rattle)
library(rattle)
rattle()

one time only

When beginning to use Rattle you will see that it will


often call for other R packages to be installed
25

Quick Rattle Tutorial


We will use the default weather
dataset
Remember to Execute in each tab

26

Assignment
Read first 3 chapters of Data Mining with
Rattle and R, and chapter 1 of Intro to
DM
Complete the install of R, RStudio, Rattle
Next week: The journey begins! (2 weeks
for the Monday class)
Topic: Classification
27

You might also like