Welcome to Scribd!

Balanced K-Means Revisited-7

Uploaded by

0% found this document useful (0 votes)

4 views2 pages

This paper revisits the k-means clustering algorithm by formulating it as a two-objective optimization problem that aims to minimize variance within clusters while also minimizing differences in cluster sizes. The proposed balanced k-means algorithm initially focuses on minimizing variance but increases emphasis on balanced cluster sizes in later iterations. This allows the algorithm to produce balanced clusters without requiring a pre-specified balance parameter, instead terminating based on a specified balance criterion. Experimental results on real and synthetic datasets demonstrate that the new approach improves on prior balanced k-means methods in producing balanced clusters efficiently without parameter tuning.

Original Description:

Original Title

Balanced k-means revisited-7

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

4 views2 pages

Balanced K-Means Revisited-7

Uploaded by

jefferyleclerc

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

Applied Computing and Intelligence

3(2): 145–179
DOI: 10.3934/aci.2023008
Received: 06 October 2023
Accepted: 17 October 2023
https://www.aimspress.com/journal/aci
Published: 27 October 2023
Research article
Balanced k-means revisited

Rieke de Maeyer1 , Sami Sieranoja2, * and Pasi Fränti2

1
Saarland Informatics Campus, Saarland University, Saarbrücken, Germany
2
Machine Learning Group, School of Computing, University of Eastern Finland, Joensuu, Finland
* Correspondence: Email: samisi@cs.uef.fi.

Academic Editor: Chih-Cheng Hung

Abstract: The k-means algorithm aims at minimizing the variance within clusters without considering
the balance of cluster sizes. Balanced k-means defines the partition as a pairing problem that enforces the
cluster sizes to be strictly balanced, but the resulting algorithm is impractically slow O(n3 ). Regularized
k-means addresses the problem using a regularization term including a balance parameter. It works
reasonably well when the balance of the cluster sizes is a mandatory requirement but does not generalize
well for soft balance requirements. In this paper, we revisit the k-means algorithm as a two-objective
optimization problem with two goals contradicting each other: to minimize the variance within clusters
and to minimize the difference in cluster sizes. The proposed algorithm implements a balance-driven
variant of k-means which initially only focuses on minimizing the variance but adds more weight to
the balance constraint in each iteration. The resulting balance degree is not determined by a control
parameter that has to be tuned, but by the point of termination which can be precisely specified by a
balance criterion.
Keywords: clustering; k-means; balanced k-means; balanced-constrained; soft balance

1. Introduction

The clustering problem is to partition objects into separate groups (called clusters) so that objects
within one cluster are more similar to each other than objects in different clusters [1]. Since the middle
of the 20th century, thousands of algorithms addressing the clustering problem have been published [2].
One of the most popular clustering algorithms is the k-means algorithm [2, 3]. It was first proposed
by [4] and [5]. This clustering algorithm aims to build k disjoint clusters such that the sum of squared
distances between the data points and their representatives is minimized. The representatives, called
centroids, are determined by the mean of the data points belonging to a cluster. As a distance function,
the Euclidean distance is used. The number of clusters k has to be set by the user.
169

1e13
1.46 RKM 4

Running time in sec

1.44 BKM+
1.42 BKM 3
SSE

1.40
2
1.38
1.36 1
1.34
0
15.0 12.5 10.0 7.5 5.0 2.5 0.0 15.0 12.5 10.0 7.5 5.0 2.5 0.0
SDCS SDCS

1e13
1.46 4
Running time in sec

1.44
1.42 3
SSE

1.40
2
1.38
1.36 1
1.34
0
0.9997 0.9998 0.9999 1.0000 0.9997 0.9998 0.9999 1.0000
Normalized entropy Normalized entropy

1e13
1.46 4
Running time in sec

1.44
1.42 3
SSE

1.40
2
1.38
1.36 1
1.34
0
300 310 320 330 300 310 320 330
Minimum cluster size Minimum cluster size

Figure 6. Results of soft-balanced clustering for the data set S2.

Applied Computing and Intelligence Volume 3, Issue 2, 145–179

Rolls-Royce CanMan Controller
Document59 pages
Rolls-Royce CanMan Controller
Danilo
100% (5)
Windrock FFT
Document67 pages
Windrock FFT
Edwin Casadiego Avila
100% (2)
Question # 2:: Machine Working Hours Millimeters of Target
Document2 pages
Question # 2:: Machine Working Hours Millimeters of Target
bilal butt
No ratings yet
Merged Unit-Model
Document1 page
Merged Unit-Model
Amir Dorian
No ratings yet
Bai Tap NMR
Document31 pages
Bai Tap NMR
An Truong
No ratings yet
Board 1-Daniel-2024 04 30-00 48 06 17
Document5 pages
Board 1-Daniel-2024 04 30-00 48 06 17
daniel.plagbe
No ratings yet
CMK - Machine - Capabilty - Cover - Leak Test
Document6 pages
CMK - Machine - Capabilty - Cover - Leak Test
luisA1923
No ratings yet
Narasimlu Publication 2020
Document13 pages
Narasimlu Publication 2020
sreedhar_vk
No ratings yet
Acero PDF
Document1 page
Acero PDF
Alonso Chura
No ratings yet
Ee225 Lab 2
Document10 pages
Ee225 Lab 2
Sherlin Chand
100% (1)
4.4.2 Capacitors
Document27 pages
4.4.2 Capacitors
Lithmi Karunanayake
No ratings yet
+2.96 Without BRZL
Document20 pages
+2.96 Without BRZL
tata buddy
No ratings yet
California Bearing Ratio Test (CBR) : Highway Laboratory
Document7 pages
California Bearing Ratio Test (CBR) : Highway Laboratory
Ahmed S. ALhayek
No ratings yet
Wind Power System Project
Document12 pages
Wind Power System Project
Rasheed Mutahar Alabbasi
No ratings yet
4 Tecnicas Acotada
Document1 page
4 Tecnicas Acotada
Danny Macz
No ratings yet
Training Slides
Document19 pages
Training Slides
Wendigo
No ratings yet
Quality Assurance (MSN 4213) Assignment 2: Submitted by
Document6 pages
Quality Assurance (MSN 4213) Assignment 2: Submitted by
bilal butt
No ratings yet
DCP Abr Zone 3 - 20201128
Document4 pages
DCP Abr Zone 3 - 20201128
Doni Triatmojo
No ratings yet
Sample 1 2 3 4 5: X Bar Chart
Document3 pages
Sample 1 2 3 4 5: X Bar Chart
imikimi89
No ratings yet
Analysis The Effects of Process Parameters in En24 Alloy Steel During CNC Turning by Using Madm
Document5 pages
Analysis The Effects of Process Parameters in En24 Alloy Steel During CNC Turning by Using Madm
shashikant
No ratings yet
IMX1
Document4 pages
IMX1
prncha
No ratings yet
Neo 21 C 11132 072 RM - 20230304 - 113501 PDF
Document3 pages
Neo 21 C 11132 072 RM - 20230304 - 113501 PDF
Nikhil
No ratings yet
Salib Readthedocs Io en Develop
Document55 pages
Salib Readthedocs Io en Develop
rjees
No ratings yet
IAC Unit III
Document125 pages
IAC Unit III
Sayam
No ratings yet
DMP Kicking
Document12 pages
DMP Kicking
Areej Abdul Hai
No ratings yet
Format Nilai Bing 12 TKJ
Document20 pages
Format Nilai Bing 12 TKJ
Andres Saputra
No ratings yet
Format Nilai 10 TKJ 1
Document20 pages
Format Nilai 10 TKJ 1
Andres Saputra
No ratings yet
CE33 LabRep5
Document11 pages
CE33 LabRep5
Kkezy phew
No ratings yet
Test Safe Distance
Document2 pages
Test Safe Distance
Emirhan Pay
No ratings yet
3 Tecnicas Acotada
Document1 page
3 Tecnicas Acotada
Danny Macz
No ratings yet
Aplikasi Nilai 13 Revisi X ATP
Document15 pages
Aplikasi Nilai 13 Revisi X ATP
Andres Saputra
No ratings yet
Name Nation PL.: X Credit For Highlight Distribution, Base Value Multiplied by 1.1
Document9 pages
Name Nation PL.: X Credit For Highlight Distribution, Base Value Multiplied by 1.1
pedrolamelas
No ratings yet
A Practical Theory of Programming: 2011-1-3 Edition
Document242 pages
A Practical Theory of Programming: 2011-1-3 Edition
asd
No ratings yet
Part A
Document3 pages
Part A
Sami Khan
No ratings yet
Prokon
Document7 pages
Prokon
anwar
No ratings yet
DAHAN Tower Crane Topless Crane Topkit Luffing Crane 5612
Document1 page
DAHAN Tower Crane Topless Crane Topkit Luffing Crane 5612
DahanTowerCrane
No ratings yet
Aplikasi Nilai 13 Revisi X TBSM
Document20 pages
Aplikasi Nilai 13 Revisi X TBSM
Andres Saputra
No ratings yet
Piso Vs Transient Simple
Document33 pages
Piso Vs Transient Simple
Marcelo
No ratings yet
Textbook Linear Discrete Time Systems Zoran M Buchevats Ebook All Chapter PDF
Document53 pages
Textbook Linear Discrete Time Systems Zoran M Buchevats Ebook All Chapter PDF
anna.sprouse710
100% (11)
A Practical Theory of Programming: 2024-6-7 Edition
Document253 pages
A Practical Theory of Programming: 2024-6-7 Edition
hannes07hero
No ratings yet
EML2322L Wheel Motor Speed & Robot Time Calculations Template
Document6 pages
EML2322L Wheel Motor Speed & Robot Time Calculations Template
afavanetto
No ratings yet
POCLLBPR - V2 - BUT Premium - Commercial Live Goals - UK - 2020
Document1 page
POCLLBPR - V2 - BUT Premium - Commercial Live Goals - UK - 2020
amamùra maamar
100% (1)
01 Overview
Document40 pages
01 Overview
hassanalabasi
No ratings yet
Aplikasi Nilai 13 Revisi X TKJ
Document22 pages
Aplikasi Nilai 13 Revisi X TKJ
Andres Saputra
No ratings yet
A Practical Theory of Programming PDF
Document242 pages
A Practical Theory of Programming PDF
John Beveridge
No ratings yet
Fundamental Concepts in EMG Signal Acquisition: Gianluca de Luca
Document34 pages
Fundamental Concepts in EMG Signal Acquisition: Gianluca de Luca
Maged Ahmed
No ratings yet
Dahan Tower Crane Topkit 5613
Document1 page
Dahan Tower Crane Topkit 5613
DahanTowerCrane
No ratings yet
Waiting Line Model: Construction Site
Document18 pages
Waiting Line Model: Construction Site
Erick Bok Cang Yeong
No ratings yet
G e o M e T R y Units:meter, CM
Document3 pages
G e o M e T R y Units:meter, CM
aaaa
No ratings yet
Low Cost Lowpass Filter Design Using Image Parameters: by Richard Kurzrok
Document2 pages
Low Cost Lowpass Filter Design Using Image Parameters: by Richard Kurzrok
Steve188
No ratings yet
Assignments - CPM
Document1 page
Assignments - CPM
mohamed saoody
No ratings yet
Revealing The Behavior of Soliton Build-Up in A Mode-Locked Laser
Document33 pages
Revealing The Behavior of Soliton Build-Up in A Mode-Locked Laser
曾圆果
No ratings yet
Nope
Document14 pages
Nope
Sarah NeoSkyrer
No ratings yet
Kit PPK L1L2 Da T2R Soluções Tecnológicas em Um DJI Air 2S
Document10 pages
Kit PPK L1L2 Da T2R Soluções Tecnológicas em Um DJI Air 2S
Joel Milhomem
No ratings yet
A Practical Theory of Programming: 2004-3-25 Edition
Document6 pages
A Practical Theory of Programming: 2004-3-25 Edition
Anonymous GuQd67
No ratings yet
JuniorMasculino FS Scores
Document1 page
JuniorMasculino FS Scores
pedrolamelas
No ratings yet
OUTPUT2
Document5 pages
OUTPUT2
nitin
No ratings yet
Group5 Laboratory Sheet 2 BSME 1G PHYSENG PDF
Document14 pages
Group5 Laboratory Sheet 2 BSME 1G PHYSENG PDF
peter vander
No ratings yet
Training Slides QuickGuide
Document15 pages
Training Slides QuickGuide
Wendigo
No ratings yet
DSP Applications Using C and the TMS320C6x DSK
From Everand
DSP Applications Using C and the TMS320C6x DSK
Rulph Chassaing
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
Document4 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
jefferyleclerc
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-3
Document3 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-3
jefferyleclerc
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
Document10 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
jefferyleclerc
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
Document7 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
jefferyleclerc
No ratings yet
2 Mapreduce Model Principles
Document7 pages
2 Mapreduce Model Principles
jefferyleclerc
No ratings yet
MapReduce - What It Is, and Why It Is So Popular
Document7 pages
MapReduce - What It Is, and Why It Is So Popular
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
Document2 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
jefferyleclerc
No ratings yet
Hadoop
Document7 pages
Hadoop
jefferyleclerc
No ratings yet
Paper Dvi
Document7 pages
Paper Dvi
jefferyleclerc
No ratings yet
Balanced K-Means Revisited-5
Document3 pages
Balanced K-Means Revisited-5
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
Document2 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
jefferyleclerc
No ratings yet
Balanced K-Means Revisited-1
Document3 pages
Balanced K-Means Revisited-1
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
Document4 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
Document3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
Document3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
jefferyleclerc
No ratings yet
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
Document1 page
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
Document3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
Document3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
jefferyleclerc
No ratings yet
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
Document3 pages
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
Document3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
Document6 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
Document3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
jefferyleclerc
No ratings yet
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
Document3 pages
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
Document3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
jefferyleclerc
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
Document4 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
jefferyleclerc
No ratings yet
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
Document7 pages
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
jefferyleclerc
No ratings yet
K-Means Clustering Optimization Algorithm Based On Mapreduce
Document6 pages
K-Means Clustering Optimization Algorithm Based On Mapreduce
jefferyleclerc
No ratings yet
Fast Scalable K-Means++ Algorithm With Mapreduce
Document2 pages
Fast Scalable K-Means++ Algorithm With Mapreduce
jefferyleclerc
No ratings yet
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
Document42 pages
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
jefferyleclerc
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
Document6 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
jefferyleclerc
No ratings yet
Real Time Speed Estimation
Document7 pages
Real Time Speed Estimation
Raghu Ram
No ratings yet
Virtual User Mail System With Postfix, Dovecot and Roundcube - ArchWiki
Document14 pages
Virtual User Mail System With Postfix, Dovecot and Roundcube - ArchWiki
You You
No ratings yet
MPMC 931x Single Slot 3U VPX CPCI System Product Sheet
Document3 pages
MPMC 931x Single Slot 3U VPX CPCI System Product Sheet
ozgur
No ratings yet
Module 4
Document59 pages
Module 4
Manju M CSE
No ratings yet
Python Inheritence
Document20 pages
Python Inheritence
nithya Anand
No ratings yet
Livecareer Cover Letter
Document6 pages
Livecareer Cover Letter
f5dgrnzh
100% (2)
Data Communication EXP 1 Student Manual
Document10 pages
Data Communication EXP 1 Student Manual
Mohammad Saydul Alam
No ratings yet
2023 IEEE Machine-Generated - Text - A - Comprehensive - Survey - of - Threat - Models - and - Detection - Methods
Document26 pages
2023 IEEE Machine-Generated - Text - A - Comprehensive - Survey - of - Threat - Models - and - Detection - Methods
SmitUpendra
No ratings yet
Sony DVCAM Family 2005/2006
Document32 pages
Sony DVCAM Family 2005/2006
Dave Lang
No ratings yet
FEASA LED Analyser Software
Document2 pages
FEASA LED Analyser Software
tommurphy73
No ratings yet
A10 DS 15102 en PDF
Document16 pages
A10 DS 15102 en PDF
cutemickeygal
No ratings yet
OS03
Document14 pages
OS03
Asghar Khattak FDCP
No ratings yet
1 - SQL-RDBMS-2 Introduction To My SQL SYNC - VF
Document12 pages
1 - SQL-RDBMS-2 Introduction To My SQL SYNC - VF
John
No ratings yet
CN Assignment5 Siddharth
Document4 pages
CN Assignment5 Siddharth
Kunj Bhansali
No ratings yet
Course Hand Out
Document70 pages
Course Hand Out
Udhaya Sankar
No ratings yet
TrackManager 102 Eng
Document18 pages
TrackManager 102 Eng
Luis
No ratings yet
Commerce Act
Document5 pages
Commerce Act
killmobile16
No ratings yet
Manless Advanced Shopping With Smart Cart
Document4 pages
Manless Advanced Shopping With Smart Cart
Editor IJTSRD
No ratings yet
MC25&M25&M56-R-OpenCPU GCC Installation Guide V1.0
Document14 pages
MC25&M25&M56-R-OpenCPU GCC Installation Guide V1.0
Hiral
No ratings yet
Load Balancer: Level 200
Document32 pages
Load Balancer: Level 200
Harish Naik
No ratings yet
Carpooling SRS Document
Document20 pages
Carpooling SRS Document
hari
No ratings yet
Google Analytics Individual Qualification Exam Answers 2021
Document44 pages
Google Analytics Individual Qualification Exam Answers 2021
Shridhar Mak
No ratings yet
Design of Wireless Smart Sensor Module For Infant Incubator Test
Document6 pages
Design of Wireless Smart Sensor Module For Infant Incubator Test
AhmedSaad
No ratings yet
Lenovo LXM WL19CH
Document52 pages
Lenovo LXM WL19CH
Mahmoud Digital-Digital
No ratings yet
Error Code20
Document5 pages
Error Code20
Makhathini Kutlwano
No ratings yet
Itm University Gwalior: Major Project (MCA-401) Car Showroom S
Document14 pages
Itm University Gwalior: Major Project (MCA-401) Car Showroom S
Ajay Singh Tomar
No ratings yet
SOP MVJ
Document2 pages
SOP MVJ
Aayush Tripathi
No ratings yet
f3 Ict Ms
Document3 pages
f3 Ict Ms
Abdifatah Said
No ratings yet
Unit 18: Discrete Mathematics
Document22 pages
Unit 18: Discrete Mathematics
Ryda S
No ratings yet
Chapter 1 - Introduction: Sr. No. Questions
Document27 pages
Chapter 1 - Introduction: Sr. No. Questions
Pratik Vishwakarma
No ratings yet