Welcome to Scribd!

Business Analytics and Big Data

Uploaded by

0% found this document useful (0 votes)

55 views2 pages

This document provides an overview of clustering, an unsupervised machine learning technique used to group unlabeled data points that are similar. It describes k-means clustering, an algorithm that assigns data points to k clusters based on minimizing distances between points and assigned cluster centers. The document includes an example applying k-means to a sample dataset to assign records describing customers to two clusters based on age and years of service attributes.

Original Description:

clustering

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

55 views2 pages

Business Analytics and Big Data

Uploaded by

Eunice Wong

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

Worksheet on Clustering

Clustering
It is often useful to partition data without having a training sample; this is also known as
unsupervised learning. For example, in business, it may be important to determine groups
of customers who have similar buying patterns, or in medicine, it may be important to
determine groups of patients who show similar reactions to prescribed drugs. The goal of
clustering is to place records into groups, such that records in a group are similar to each
other and dissimilar to records in other groups. The groups are usually disjoint.

An important aspect of clustering is the similarity function that is used. When the data is
numeric, a similarity function based on distance is typically used. For example, the
Euclidean distance can be used to measure similarity. Consider two n-dimensional data
records as points x and y in n-dimensional space. We can consider the value for the ith
dimension as xi and yi for the two records. The Euclidean distance between points
x=(x1,…,xn) and y=(y1,…,yn) in n-dimensional space is

n
U ( x, y ) ¦x yi
2
i
i 1

The smaller the distance between two points, the greater is the similarity. A classic
clustering algorithm is the following k-Means algorithm:

K-means clustering algorithm

This algorithm begins by randomly choosing k records to represent the centroids (means),
ml, ..., mk, of the clusters, C1, ..., Ck. All the records are placed in a given cluster based on
the distance between the record and the cluster mean. If the distance between mi and
record rj is the smallest among all cluster means, then record r, is placed in cluster Ci.
Once all records have been initially placed in a cluster, the mean for each cluster is
recomputed. Then the process repeats, by examining each record again and placing it in
the cluster whose mean is closest. Several iterations may be needed, but the algorithm
will converge. Consider the following database.

RECORD Age Years of Service

1 30 5
2 50 25
3 50 15
4 25 5
5 30 10
6 55 25

1
Assume that the number of desired clusters k is 2. Let the algorithm choose records with
RECORD 3 for cluster C1 and RECORD 6 for cluster C2 as the initial cluster centroids.
The remaining records will be assigned to one of those clusters during the first iteration of
the repeat loop.

RECORD 1 has a distance from C1 RI¥2 + 102) = 22.4 and a distance from C2 of 32.0,
so it joins cluster C1. RECORD 2 has a distance from C1 of 10.0 and a distance from C2
of 5.0, so it joins cluster C2. RECORD 4 has a distance from C1 of 25.5 and a distance
from C2 of 36.6, so it joins cluster C1. RECORD 5 has a distance from C1 of 20.6 and a
distance from C2 of 29.2, so it joins cluster C1.

RECORD Age Years of Service Dist from 3 Dist from 6

1 30 5 22.4 32.0
2 50 25 10.0 5.0
3 50 15 0 -
4 25 5 25.5 36.6
5 30 10 20.6 29.2
6 55 25 - 0

Thus we have
C1 = {RECORD 1, RECORD 3, RECORD 4, RECORD 5}

C2 = {RECORD 2, RECORD 6}.

Now, the new means (centroids) for the two clusters are computed. The mean for a cluster,
Ci, is a vector consisting of the mean of the individual dimensions within the cluster.

Tasks

1. Determine the mean (vector) for cluster C1

2. Determine the mean (vector) for cluster C2

3. Re-allocate all the records according to the new means calculated in (1) and (2)

4. Determine the mean (vector) for the new clusters

5. Re-allocate all the records according to the new means calculated in (4)

EMIC 2008 Team Solution
Document28 pages
EMIC 2008 Team Solution
WatcharapongWongkaew
No ratings yet
Learn Lab3
Document12 pages
Learn Lab3
Andika Bayu Aji
No ratings yet
Clustering Numericals
Document8 pages
Clustering Numericals
Apurva Patil
No ratings yet
AL1 Week 9
Document4 pages
AL1 Week 9
Reynante Louis Dedace
No ratings yet
Chapter 2B QS (PC)
Document15 pages
Chapter 2B QS (PC)
SEOW INN LEE
No ratings yet
Median, Mode
Document44 pages
Median, Mode
Sanjay S
No ratings yet
Module5 Measures of Central Tendency Grouped Data Business 1
Document13 pages
Module5 Measures of Central Tendency Grouped Data Business 1
Donna Mia Canlom
No ratings yet
Module5 Measures of Central Tendency Grouped Data Business 1
Document13 pages
Module5 Measures of Central Tendency Grouped Data Business 1
Donna Mia Canlom
No ratings yet
K Means Clustering
Document17 pages
K Means Clustering
Wet
No ratings yet
CS8091 - Big Data Analytics - Unit 2
Document44 pages
CS8091 - Big Data Analytics - Unit 2
Senthilnathan S
No ratings yet
Machine Learning
Document45 pages
Machine Learning
uxama
No ratings yet
Chapter 2B QM (PC)
Document20 pages
Chapter 2B QM (PC)
SEOW INN LEE
No ratings yet
Assignment Example
Document21 pages
Assignment Example
Azuan Afizam Bin Azman H19A0073
No ratings yet
Lesson 2: Methods of Collecting, Organizing and Presenting Data
Document10 pages
Lesson 2: Methods of Collecting, Organizing and Presenting Data
Miah Dimalaluan
No ratings yet
Department of Information Technology: Question Bank TE IT AY-22-23 Sem VI Module 04: Clustering & Outlier Analysis
Document4 pages
Department of Information Technology: Question Bank TE IT AY-22-23 Sem VI Module 04: Clustering & Outlier Analysis
Anurag Singh
No ratings yet
Quartile Deviation 2
Document4 pages
Quartile Deviation 2
Injamam Alam
No ratings yet
A Course Module For Engineering Data Analysis: 1st Semester, Academic Year 2020-2021
Document48 pages
A Course Module For Engineering Data Analysis: 1st Semester, Academic Year 2020-2021
Luis Villaflores
No ratings yet
Find Mean, Median, Modal Class From Grouped Data: Session - 4
Document24 pages
Find Mean, Median, Modal Class From Grouped Data: Session - 4
PRIYA KUMARI
No ratings yet
Second Year B.C.A. (Sem. I LL) Exam Ination 301: Statistical M Ethods
Document4 pages
Second Year B.C.A. (Sem. I LL) Exam Ination 301: Statistical M Ethods
Pratiksha Parmar
No ratings yet
Quiz 1 - Demmie Dennis Chuan - ms201012534
Document3 pages
Quiz 1 - Demmie Dennis Chuan - ms201012534
jnbf2wc7zj
No ratings yet
Data Mining
Document7 pages
Data Mining
Pavan Kotesh
No ratings yet
Lesson 5 - Measures of Variability
Document17 pages
Lesson 5 - Measures of Variability
ningcordel
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
Document4 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
Asyraf Adnil
No ratings yet
AMA 1201 Lesson 5
Document7 pages
AMA 1201 Lesson 5
RALPHVINCENT RAMOS
No ratings yet
Lesson 5
Document20 pages
Lesson 5
RM April Alon
No ratings yet
K Means Algorithms
Document27 pages
K Means Algorithms
priyanshidubey2008
No ratings yet
Clu String
Document18 pages
Clu String
Yousef
No ratings yet
K Means Algo
Document7 pages
K Means Algo
Prakash Chorage
No ratings yet
Data Mining Exercises - Solutions
Document5 pages
Data Mining Exercises - Solutions
Mehmet Zirek
No ratings yet
Pembaharuan Panduan Kepenulisan KRISNA II
Document10 pages
Pembaharuan Panduan Kepenulisan KRISNA II
sasa crt
No ratings yet
Lecture - 2 Descriptive Analytics
Document56 pages
Lecture - 2 Descriptive Analytics
pgp14lipikak
No ratings yet
Suggession of Machine Learning
Document6 pages
Suggession of Machine Learning
Parthasarathi Hazra
No ratings yet
BML 202/ BCM 126 - Quantitative Techniques September-December, 2021 Instructions
Document6 pages
BML 202/ BCM 126 - Quantitative Techniques September-December, 2021 Instructions
Kenya's Finest
No ratings yet
Central Tendency
Document38 pages
Central Tendency
Pronomy Procheta
No ratings yet
Kmeans Worksheet
Document6 pages
Kmeans Worksheet
sageson_m-1
No ratings yet
Economics
Document37 pages
Economics
Samuel Biyama
No ratings yet
Statistical Analysis Measure of Variation
Document14 pages
Statistical Analysis Measure of Variation
John Paul Ramos
No ratings yet
AI Chapter 3 Part 5
Document30 pages
AI Chapter 3 Part 5
biruck
No ratings yet
2033 Rao Faisal Maqbool Data Maining 2
Document3 pages
2033 Rao Faisal Maqbool Data Maining 2
Ahmad Aslam Majhiana
No ratings yet
Unit 4 - Group7-Handouts
Document7 pages
Unit 4 - Group7-Handouts
Gerald Guiwa
No ratings yet
(BBA116) (120-634) - Answer Booklet.
Document25 pages
(BBA116) (120-634) - Answer Booklet.
Shaxle Shiiraar shaxle
No ratings yet
Mesaure of Central Tendency
Document68 pages
Mesaure of Central Tendency
Gelo hohoho
No ratings yet
Statistics Exercises
Document34 pages
Statistics Exercises
Issouf Koïta
No ratings yet
Module 4-1
Document153 pages
Module 4-1
nigel.colaco12
No ratings yet
Introduction To The City Clustering Algorithm: Steffen Kriewald December 19, 2019
Document8 pages
Introduction To The City Clustering Algorithm: Steffen Kriewald December 19, 2019
pablomartinezdiez
No ratings yet
Control Charts
Document8 pages
Control Charts
luis ojeda
No ratings yet
Tabular Methods in Summarizing Data
Document10 pages
Tabular Methods in Summarizing Data
Malot Nunez Medillo
No ratings yet
Module 1 Ungrouped Data
Document5 pages
Module 1 Ungrouped Data
Angel Pasahol
No ratings yet
محاضرة اولى -احصاء هندسي
Document10 pages
محاضرة اولى -احصاء هندسي
alhaswbalshkhsy969
No ratings yet
Module 3 Assessment of Learning Upload
Document13 pages
Module 3 Assessment of Learning Upload
reyes.jennifer
No ratings yet
Worksheet No. 4: NAME: - COURSE
Document15 pages
Worksheet No. 4: NAME: - COURSE
erish Foliente
No ratings yet
Kmeans Worksheet PDF
Document6 pages
Kmeans Worksheet PDF
احمد ياسر
No ratings yet
Solution Chapter 2 Mandenhall
Document29 pages
Solution Chapter 2 Mandenhall
Uzair Khan
No ratings yet
AIDI 1002 FinalExam Section 01
Document2 pages
AIDI 1002 FinalExam Section 01
uniquelifeofvj
No ratings yet
Quantiles Are
Document4 pages
Quantiles Are
Hana Tsukushi
No ratings yet
Lecture 14 Clustering
Document57 pages
Lecture 14 Clustering
Albin Mathew
0% (1)
Probabilty and Statistics
Document5 pages
Probabilty and Statistics
limit less
No ratings yet
SMAC 005 Basic Numeracy Skills (Study Guide) 2
Document67 pages
SMAC 005 Basic Numeracy Skills (Study Guide) 2
Joel Sen Gibson
No ratings yet
Research Notes (Chapter 3-5)
Document14 pages
Research Notes (Chapter 3-5)
Joong Seo
No ratings yet
Instruction for Using a Slide Rule
From Everand
Instruction for Using a Slide Rule
W. Stanley
No ratings yet
Division
From Everand
Division
Jean Wolff
No ratings yet
Feedback of In-Class Exercise 1
Document7 pages
Feedback of In-Class Exercise 1
Eunice Wong
No ratings yet
Multi-Dimensional Expressions (MDX) : Hong Kong Baptist University
Document16 pages
Multi-Dimensional Expressions (MDX) : Hong Kong Baptist University
Eunice Wong
No ratings yet
Ssms Vs Ssis: SQL Server Management Studio (SSMS)
Document17 pages
Ssms Vs Ssis: SQL Server Management Studio (SSMS)
Eunice Wong
No ratings yet
Possible Solution of In-Class Exercise 1: PART A: Short Questions (4 Marks Each)
Document5 pages
Possible Solution of In-Class Exercise 1: PART A: Short Questions (4 Marks Each)
Eunice Wong
No ratings yet
Business Analytics and Big Data PDF
Document15 pages
Business Analytics and Big Data PDF
Eunice Wong
No ratings yet
Lab2a Ans
Document1 page
Lab2a Ans
Eunice Wong
No ratings yet
Data Warehousing
Document32 pages
Data Warehousing
Eunice Wong
No ratings yet
Data Warehouse Modeling
Document50 pages
Data Warehouse Modeling
Eunice Wong
0% (1)
Lecture 1 Foundation of Business Ethics
Document20 pages
Lecture 1 Foundation of Business Ethics
Eunice Wong
No ratings yet
Chapter 5: Sampling Distributions: Solve The Problem
Document4 pages
Chapter 5: Sampling Distributions: Solve The Problem
Eunice Wong
No ratings yet
Chapter 4: Random Variables and Probability Distributions: Solve The Problem
Document4 pages
Chapter 4: Random Variables and Probability Distributions: Solve The Problem
Eunice Wong
No ratings yet
Chu Hai College of Higher Education Faculty of Commerce: ABA 102 Business Statistics Assignment 1
Document4 pages
Chu Hai College of Higher Education Faculty of Commerce: ABA 102 Business Statistics Assignment 1
Eunice Wong
No ratings yet
Chu Hai College of Higher Education Faculty of Commerce: Submission Deadline: 18-10-2013
Document3 pages
Chu Hai College of Higher Education Faculty of Commerce: Submission Deadline: 18-10-2013
Eunice Wong
No ratings yet
CH Businesslaw 2 Contract
Document60 pages
CH Businesslaw 2 Contract
Eunice Wong
No ratings yet
2 E-Marketplaces - Mechanisms, Tools, and Impacts of E-Commerce
Document32 pages
2 E-Marketplaces - Mechanisms, Tools, and Impacts of E-Commerce
Eunice Wong
No ratings yet
Computer Networks A4
Document2 pages
Computer Networks A4
Eunice Wong
No ratings yet
ER Modeling
Document31 pages
ER Modeling
nouman ASLAM
No ratings yet
CSI2101 Discrete Structures
Document61 pages
CSI2101 Discrete Structures
Shoaib Kareem
No ratings yet
Fette Catalog
Document13 pages
Fette Catalog
Douglas Laranjeira
No ratings yet
3.5.3.4 Packet Tracer - Configure and Verify EBGP - ILM
Document11 pages
3.5.3.4 Packet Tracer - Configure and Verify EBGP - ILM
Omar Gomez Vasquez
No ratings yet
Example PHP Form Image Upload Store in MySQL Database Retrieve PDF
Document16 pages
Example PHP Form Image Upload Store in MySQL Database Retrieve PDF
beki4
No ratings yet
W65C22S Versatile Interface Adapter (VIA) : June 1, 2009
Document47 pages
W65C22S Versatile Interface Adapter (VIA) : June 1, 2009
Dave
No ratings yet
Bharath Kumar: Software Development Engineer I
Document3 pages
Bharath Kumar: Software Development Engineer I
Alton Abraham
No ratings yet
Service Bulletin Upgrade OITS - Change PCO 3 Use 4x Tergantung Software Version
Document2 pages
Service Bulletin Upgrade OITS - Change PCO 3 Use 4x Tergantung Software Version
Widya Putra
No ratings yet
Log Parsing
Document24 pages
Log Parsing
Shivaraj Abbigeri
No ratings yet
Namma Kalvi 12th Computer Applications Chapter 3 Study Marerial em Ganesh
Document4 pages
Namma Kalvi 12th Computer Applications Chapter 3 Study Marerial em Ganesh
Aakaash C.K.
No ratings yet
DHS Federated Info Sharing Technology Recommendations
Document5 pages
DHS Federated Info Sharing Technology Recommendations
Forbes
No ratings yet
20BIT004 - Mariyam - Bharmal - Prac 7
Document9 pages
20BIT004 - Mariyam - Bharmal - Prac 7
shreyas panda
No ratings yet
Image Chapter3 Part1
Document6 pages
Image Chapter3 Part1
Siraj Ud-Doulla
No ratings yet
A Real-Time Collision Detection System For Vehicles
Document6 pages
A Real-Time Collision Detection System For Vehicles
Mayank Lovanshi
No ratings yet
Chauvin Arnoux C.A 6545-6547 - User's Manual PDF
Document32 pages
Chauvin Arnoux C.A 6545-6547 - User's Manual PDF
lilisanica6232
No ratings yet
Software Design Patterns
Document15 pages
Software Design Patterns
Aaron mkandawire
No ratings yet
Chapter 01 Complete
Document22 pages
Chapter 01 Complete
Pascal Egbenda
No ratings yet
Unit 1 Grammar Practice-Reinforcement
Document3 pages
Unit 1 Grammar Practice-Reinforcement
Virginia Ruiz Jimenez
No ratings yet
TL-PA4010 V1.0 Datasheet PDF
Document2 pages
TL-PA4010 V1.0 Datasheet PDF
Lukas
No ratings yet
Daa Unit Wise Ques
Document18 pages
Daa Unit Wise Ques
vijayar
No ratings yet
Wind Screen Wipers
Document7 pages
Wind Screen Wipers
zac18992
No ratings yet
Assembly Language Programming For Reverse Engineering Paul Chin
Document3 pages
Assembly Language Programming For Reverse Engineering Paul Chin
Carlos Eduardo
No ratings yet
Rucno Koriscenje CC220
Document204 pages
Rucno Koriscenje CC220
Anonymous 1JHrgY
No ratings yet
Readmel
Document2 pages
Readmel
KevinKD
No ratings yet
PDS BatchHistorian
Document5 pages
PDS BatchHistorian
Ricardas Kragnys
No ratings yet
Bluetooth SDK User Guide PDF
Document6 pages
Bluetooth SDK User Guide PDF
im im
No ratings yet
K X K X K: Bachelor of Science in Electronics Engineering University of Rizal System - Morong Algebra
Document3 pages
K X K X K: Bachelor of Science in Electronics Engineering University of Rizal System - Morong Algebra
Jhasper Managyo
No ratings yet
Chapter 2
Document91 pages
Chapter 2
Mikiyas Getasew
No ratings yet
RRC Conn Setups RRC Setup, Failures Period Start Time PLMN Namernc Namernc - Gid DN
Document126 pages
RRC Conn Setups RRC Setup, Failures Period Start Time PLMN Namernc Namernc - Gid DN
Anonymous g8YR8b9
No ratings yet