Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Module 2: Finding Natural Patterns in Data

Introduction

https://vimeo.com/204795912

Course Example – Basketball Player Statistics


Congratulations! You have passed this lesson. NEXT LESSON

Basketball Example and Learning Objectives


Learning Objectives
Using the basketball data example, you will learn how to
 Import and store data in a table.
 Access and manipulate data stored in a statistical data type.
 Ignore, replace or delete observations with missing data values.
 Divide a data set into groups, according to a chosen measure of similarity.
 Use visualizations to interpret and evaluate the quality of clustering solution.
Basketball Example
The data set below contains statistics for a selection of basketball players.

bballPlayers.txt bballStats.txt

Finding Natural Patterns in Data


Congratulations! You have passed this lesson. NEXT LESSON

Use unsupervised learning techniques to group observations based on a set of explanatory variables
and discover natural patterns in a data set.
https://vimeo.com/204796757

Normalizing Data – Introduction


Congratulations! You have passed this lesson. NEXT LESSON
Many of the clustering methods use the distance between the observations as a similarity measure.
Smaller distances indicate more similar observations.
Consider the two players and their statistics shown to the right.
Do they have similar playing styles? Is distance the good measure of

similarity?
 These players have played a different number of games. A better similarity measure will use
the statistics averaged over the number of games played.
 Each statistic has different units and scales. When using the distance measure, statistical
data with wider scales will be given more importance.

In this lesson, you will try to correct these two shortcomings of the distance measurement by:
 calculating the statistics per game, and
 normalizing the data such that each variable spans zero with unit standard deviation.

Binary Singleton Expansion (bsxfun)


Congratulations! You have passed this lesson. NEXT LESSON
Calculating the Statistics per Game:

Given a matrix of statistics, stats, and a vector containing the number of games played, GP, how
can you calculate the player statistics per game?
You will have to divide each row of the stats matrix by the corresponding row of the GP vector.

However, using an element-wise division operator / will generate an error because the dimensions
of statsand GP are not consistent.
bsxfun
In such cases, you can use the function bsxfun . This functions replicates the inputs so that they
have the same size and then performs the operation specified.
Consider a small example in which you need to compare a vector with a matrix.

You can use bsxfun as shown below. Note that @gt refers to the built-in ‘greater than’ function.
bsxfun works in the following way:

Expansion on One Input


Start performing these commands in MATLAB command window:
>> x = [4;5;6]
>> y = [1 2 3 ; 4 5 6 ; 7 8 9]
>> x > y
Tasks
Task 1
As you can see, trying to compare if x is greater than y directly results in an error.
Presumably, the intended calculation is to compare each column of y with the
corresponding element in x.
Use bsxfun to compare if each element of x is greater than (@gt) each element of y. Assign the
result to a logical array named comp.
Use the bsxfun function with the function handle @gt to compare the values.
>> results = bsxfun(@gt,A,B);

Task 2
Subtract x from each column of y. Assign the result to a variable named z.
Use the bsxfun function with the function handle @minus to subract the values.
>> results = bsxfun(@minus,A,B);

Normalizing Data
Congratulations! You have completed this lesson. NEXT LESSON

A common way to normalize raw data is to subtract the average value of a variable from each
element of the variable, then to scale it with a measure of spread, such as standard deviation.

If you just want to do the common normalization to zero mean and unit standard deviation, you
can use the zscore function.
>> Z = zscore(X)
Outputs
Outputs

Z Normalized array with mean 0 and standard deviation 1

Normalizing Data using MATLAB


Start performing these commands in MATLAB command window:
>> x = [2;2;3]
>> y = [10 14 10 ; 22 18 30 ; 21 18 30]
>> y/x
Tasks
Task 1
As you can see, trying to divide y by x directly results in an error. Presumably, the
intended calculation is to divide each column of y with the corresponding element in x.
Create an array named yDiv where each column in y is divided (@rdivide) by the corresponding
element in x.
Use the xbsxfun function with the function handle @rdivide to divide the values in the
matrix y by those in the vector x.
>> results = bsxfun(@rdivide,matrix,vector);
Task 2

Task 2
Use the zscore function to normalize the values in yDiv zero mean and unit standard deviation. Name
the result yNorm.

Normalizing Data Quiz


You have not taken this lesson's quiz yet

1. What is the result of the following code?50


normalizing data q1

o
2. What is the result of the following code?50

normalizing data q2

o
o

Complete Quiz

Normalizing Data – Basketball Players


Download the data file before you start

Download

Prepare the data by performing these commands in MATLAB command window:


%% Import & initialize data
data = readtable(‘bballData.txt’);
data.pos = categorical(data.pos);

Tasks
Task 1
The data table contains the information and statistics for several basketball players.
Particularly, the sixth variable, GP, contains the number of games played. Columns seven
through the end contain player statistics across all games.
Replace the statistics of each player (variable numbers seven onwards) by the statistics per game.
You can extract the values in the table data to a numeric matrix, use curly braces to index into
columns 7:end.
Y = table{:,colVals};
Then, use the bsxfun function to divide, using the function handle @rdivide, each column in
the extracted data by the values in data.GP.

Y = bsxfun(@rdivide,Y,table.divisorVar);
Finally, index into data using curly braces to replace the old values.

table{:,colVals} = Y;
Task 2
Task 2
Shift and scale columns seven through the end of data so that the values in each column are normalized
to zero mean and unit standard deviation.
You will overwrite the values in the table data, so use curly braces to index into columns 7:end.
Then, use the zscore function on that same data.
Y = table{:,colVals};
Y = zscore(Y);
table{:,colVals} = Y;

You might also like