Professional Documents
Culture Documents
Facility Location With Clustering Algorithm
Facility Location With Clustering Algorithm
INTERNATIONAL UNIVERSITY
DEPARTMENT OF INDUSTRIAL & SYSTEMS ENGINEERING
i
FACILITY LOCATION PROBLEM
USING CLUSTERING ALGORITHM
By
HUYNH NHAT VINH NGUYEN
Certified by : __________________________________________
MSc. Duong Vo Nhi Anh
Thesis Advisor
Approved by : __________________________________________
Dr. Pham Huynh Tram
Head of ISE Department
ii
ABSTRACT
To identify optimum location for facility is one of the major challenges in logistics
network. A location for facility is optimum when it can optimize a certain objective such
capturing the largest market share, etc. Many decisions for facility location involving
distance objective functions on Spherical surface have been approached using heuristic
The purpose of this study focuses on the design of distribution network of DHL
eCommerce in Ho Chi Minh City. Further, it aims to propose the solution of the
distribution center (depot) allocation at the logistics service provider DHL eCommerce
Vietnam.
The potential location proved the reduction of transportation cost in a network design from
the current model and also provided insight into considering volume in location analysis
Optimization Modeling.
iii
ACKNOWLEDGMENTS
First of all, I would like to express my gratitude to Mr. Duong Vo Nhi Anh - my advisor
am truly sorry for my continuously disconnects with you through the time but my thesis
would not be completed in time without your contribution and valuable guidance.
My appreciation also extends to all my friends, those ‘soul mates’ from High school, those
T.A., who had helped me survived through many courses during 4 years studying at IU
and spent his little precious time helping me with all the ideas and ‘coding things’ to end
this nightmare.
priceless to me.
iv
TABLE OF CONTENTS
ABSTRACT...................................................................................................................... iii
ACKNOWLEDGMENTS ................................................................................................ iv
TABLE OF CONTENTS................................................................................................... v
3.5.2. Costs....................................................................................................................... 22
v
3.5.3. Volume................................................................................................................... 24
REFERENCES ................................................................................................................ 33
APPENDIX A ................................................................................................................... A
APPENDIX B ................................................................................................................... G
vi
LIST OF FIGURES
Figure 1.2 Single Allocation (Left) And Extensions To A Pure Hub-And-Spoke Layout:
vii
LIST OF TABLES
Table 1.1 Coverage areas of each depot in Ho Chi Minh City. ............................................ 5
Table 3.1 An extract from data collected from more than 18,000 transactions/month. ..... 21
Table 3.2 DHL Hub & Spoke Operations Model – High-level Operations Cost Factors. . 23
Table 4.1 Results of calculating starting point for each cluster using Center of Gravity. .. 26
viii
LIST OF ABBREVIATIONS AND SYMBOLS
DC Distribution Center
LP Linear Programming
Lat. Latitude
Long. Longitude
Fixed cost is defined by the default size of each depot multiplied times
unit: parcels
0 otherwise
ix
CHAPTER 1 INTRODUCTION
1.1. Background
In order to meet the demand of the fast growing industry, the number and location of
evaluating these important choices with a simple framework, companies can position
One significant drawback when making network design decisions derives from the fact
that traditional clustering algorithms cannot reflect the real world conditions. When
designing a network in a relatively small radius, straight lines and curved distances
cannot describe the true geographical distance and therefore neglect the efficiency of
the network. When physical boundaries and features in urban areas such as traffic are
of the demand areas. Covering models have been proven to be very useful in solving
facility location problems. A demand point is treated as covered only if a facility can
be available to provide the certain service to the demand point within a required
1
A hub-and-spoke logistics network consists of hubs performing transshipment
consignment units), and spokes or depots linking end customers with the hubs.
- Stage 1: Depots pick up shipments at their customers. This is usually carried out in
the form of pickup tours, serving several customers within the same round-trip. While
the number and routing of pick-up tours is left to the depot (and may be subject to
operational decisions as well as route optimization), by the end of this stage, all
shipments of a given time period must be present at the depot where they are readied
- Stage 2: This stage is, in fact, a complex procedure containing the following sub-
stages:
+ Shipping the parcels from the hub to the depots of destination. Note that
usually, the same vehicles perform both depot→hub and hub→depot transport
in subsequent steps. This implies that a balance of inbound and outbound traffic
feasible option. Another operational issue resulting from this arrangement is the
need for sufficient shipping capacity to perform this stage: a sufficient number
of vehicles must be present at the hub by a certain time limit to carry out the
2
hub→depot step. If needed, additional vehicles must be called in (typically
from the destination depot) to handle the volume scheduled for the given
period.
- Stage 3: Once the shipped parcels are at the disposal of the destination depot,
delivery to the destinations is performed. Again, it may be left to the given depot how
this is carried out—a delivery tour may also be combined with a pickup tour to
shipping from Depot 1 to Hub 1, sorting at Hub 1, shipping from Hub 1 to Hub 2 if
needed, shipping from Hub 2 to Depot 2, and delivery to destinations in the delivery
area of Depot 2.
Hub-and-spoke networks can be classified by the arrangement and number of hubs and
spokes, as well as their connectivity. In a “pure” hub-and-spoke case, the sending and
receiving depots may be assigned to a single hub (referred to as the single allocation
case), or they may serve several depots (multiple allocation). In practice, the
means one hub is maintained on each area in the North and South of Vietnam - Hanoi
and Ho Chi Minh City, the depots in each area are assigned to the respective hub. For
3
example, Depot Ba Dinh in Hanoi can only be connected to Hanoi Hub yet they cannot
a) b)
Figure 1.2 Single allocation (left) and Extensions to a pure hub-and-spoke layout:
hub-to-hub trunking (right) in a hub-and-spoke network.
1.2. Problem
The distribution network in HCMC currently consists of the central hub and only 2
depots, namely Binh Tan and Binh Thanh. The 2 depots have to cover the demand of
total 19 districts in HCMC which certainly lead to the fact that DHL takes more time
to collect/deliver parcels from/to a specific destination therefore extends total lead time
and cost more on transportation. DHL needs to know if a more efficient and effective
ability of a network to meet requirements in a timely manner. In fact, how long it takes
for the network to meet demand. Effectiveness concerns the ability of the network to
deliver requirements to the necessary locations. The current network may not be the
most efficient for meeting the demands of recent and future plan. Cost and time values
are used to compare the efficiency and effectiveness of the current versus alternative
networks. This research provides DHL with an analysis of the current and potential
4
depot locations and how efficiently and effectively these locations meet the demand
Depot District
Quận 1
Quận 2
Quận 3
Quận 4
Depot Bình Thạnh
Quận 9
Phú Nhuận
Bình Thạnh
Thủ Đức
Quận 5
Quận 6
Quận 7
Quận 8
Quận 10
Depot Bình Tân Quận 11
Quận 12
Tân Bình
Tân Phú
Gò Vấp
Bình Tân
1.3. Objective
Following the problem, the aim of this research is to:
- To determine the optimal number of depots and the location of each depot. The
optimal depot locations are those minimize the total delivery time of parcels to a
specific destination location, the driving distances related costs and facilities costs.
5
CHAPTER 2 LITERATURE REVIEW
There are many literatures in the area of facility location problem. A simple facility
optimization criterion being the minimization of the weighted sum of distances from a
given set of point locations. More complicated problems include the placement of
set of potential facility sites P where a facility can be opened, and a set of demand
points D that must be serviced. The goal is to pick a subset F of facilities to open, to
minimize the sum of distances from each demand point to its closest facility, plus the
been developed for the facility location problem and many of its variants. In the past,
The classical location-allocation problem is the basis of many of the location models
that have been built upon throughout the supply chain design literatures. The location-
locations with historical needs and a group of potential DC locations are proposed.
When a DC is allocated at one of the potential areas, a known fixed cost is earned.
There is also a known unit delivery cost between each potential DC and each customer
location. The locations of the DC and the shipment pattern between the DC and the
6
Location-allocation problems with capacity constraints have many variants across
(RCPMP) (Murray and Gerrard 1997), the capacitated p-median problem (CPMP), the
capacitated centered clustering problem (CCCP) (Negreiros and Palhano 2006), the
capacitated single allocation hub location problem (Ernst and Krishnamoorthy 1999),
the single source capacitated plant location problem (Díaz and Fernández 2002),
among many others. They are different from each other in terms of different
requires that facility locations be a subset of the demand point locations (Díaz and
(LP) problem with a linear combination of solution variables (Vinod 1969, ReVelle
(Garey and Johnson 1979). The time cost of a deterministic approach will increase
Therefore, substantial research work has been carried out to develop heuristics to
obtain good approximations of the optimal solution (França et al. 1999, Wu et al.
such as branch-and-bound (Marín and Pelegrín 1997), simulated annealing (SA) (Ernst
and Krishnamoorthy 1999), adaptive tabu search (França et al.1999), set partitioning
(Baldacci et al. 2002), and scatter search (Díaz and Fernández 2005, Scheuerer and
Wendolsky 2006).
The research reported here uses a clustering strategy (which is a special heuristic
7
constraints. Clustering analysis is one of the most commonly used approaches in data
analysis and has been applied in many application domains such as pattern discovery,
not depend on prior knowledge and can discover natural groupings of data items (Jain
and Dubes 1988, Jain et al. 1999, Han et al. 2001; Guo et al. 2003). Therefore, when
the distribution of demands and thus facilitate the search for an approximate optimal
Both the CAPCLUST method and our proposed method are adapted from the K-means
partitions a set of data items into clusters while ensuring a low internal dissimilarity or
distance. It assumes that the number of clusters (k) is known. The K-means algorithm
consists of three steps (Jain et al. 1999): (1) randomly choosing k cluster centers within
the data space; (2) assigning each data item to the closest cluster center; and (3)
recalculating the cluster centers using the points assigned to each cluster. Steps 2 and 3
problem, Step 2 can be used to allocate demands and Step 3 can be used to optimize
facility locations.
8
CHAPTER 3 METHODOLOGY
Define problems
Mathematical model
Data collection
NO
Validation
YES
Model computational
process
Recommendation and
conclusion
Implementation
9
Define problem: Study the actual company’s situation, operation process and define
the problem. Focusing on the limitation of production affects and interruptions to the
and objectives of the study (minimizes the total cost as transportation cost, facility cost
Data collection: Based on the objective of the study, collecting data on volume, the
Validation: After applying the data to the mathematical model, the validation should be
carried out to make sure the accuracy of the mathematical model and the support data
is consistent.
Model computational process: Base on the demand and all of the collected data to take
Conclusion and recommendation: report the result, concluding how effective the
method is in solving the problem and provide recommendation to improve the system.
Implementation: This action is depending on the decision maker to choose the set of
10
3.2. K-means algorithm
Data clustering, or cluster analysis, is the process of grouping data items so that similar
items belong to the same group/cluster. Clustering methods are used to identify groups
of similar objects in a multivariate data sets collected from fields such as marketing,
bio-medical and geo-spatial. They are different types of clustering methods, including:
- Partitioning methods
- Hierarchical clustering
- Fuzzy clustering
- Density-based clustering
- Model-based clustering
One of the simplest and most popular clustering algorithms is called ‘k-means
clustering’, which would split the data into a set of clusters (groups) based on the
distances between each data point and the center location of each cluster.
K-means has a wide range of application such as computational biology, business and
marketing, search engine or social science. As for DHL location planning problem, the
K-means clustering can also be applied because there are similarities between
company depots location problem and data clustering problem with K-means.
The first step when using k-means clustering is to indicate the number of clusters (k)
The algorithm starts by randomly selecting k objects from the data set to serve as the
initial centers for the clusters. The selected objects are also known as cluster means or
centroids.
11
Next, each of the remaining objects is assigned to its closest centroid, where closest is
defined using the Euclidean distance between the object and the cluster mean. This
After the assignment step, the algorithm computes the new mean value of each cluster.
The term cluster “centroid update” is used to design this step. Now that the centers
have been recalculated, every observation is checked again to see if it might be closer
to a different cluster. All the objects are reassigned again using the updated cluster
means.
The cluster assignment and centroid update steps are iteratively repeated until the
cluster assignments stop changing (i.e. until convergence is achieved). That is, the
clusters formed in the current iteration are the same as those obtained in the previous
iteration.
2. Select randomly k objects from the dataset as the initial cluster centers or
means.
4. For each of the k clusters update the cluster centroid by calculating the new
mean values of all the data points in the cluster. The centroid of a Kth cluster is
a vector of length p containing the means of all variables for the observations in
5. Iteratively minimize the total within sum of square. That is, iterate steps 3
and 4 until the cluster assignments stop changing or the maximum number of
iterations is reached.
12
K-means is usually run many times, starting with different random centroids each time.
The results can be compared by examining the clusters or by a numeric measure such
as the clusters’ distortion, which is the sum of the squared differences between each
data point and its corresponding centroid. In cluster distortion case, the clustering with
K-means clustering method purposes to search the positions of the clusters which
minimize the distance from the data points to the cluster and the goal of company’s
depot planning is to find the location for the depots that minimize the distance from the
K-means cluster analysis given a set of data (x1, x2,…, xn) where each data is a d-
dimensional real vector, k-means clustering aims to separate the n data into k clusters
Advantages
- With a large number of variables, K-means may be calculated faster than hierarchical
- K-means may generate clusters more tightly than hierarchical clustering, particularly
- An instance can change cluster (move to another cluster) when the centroids are
recomputed.
Disadvantages
- Difficult to compare the quality of the cluster generated. (E.g. for the different initial
- Fixed number of clusters can make it difficult to calculate what K should be.
- Does not run well with non-globular clusters (non-circular cluster shape).
13
- Different early partitions can result in different final clusters. It is useful to rerun the
program using the same as well as different K values, to compare the final result.
location to minimize the sum of facilities cost and the sum of the volume of goods at a
- The cost is related the length from the warehouse to the destination point, the
transport conditions are not considered. Transportation cost is related to the distance
only. The transportation cost equals the distances traveled times a fixed price per unit,
distance.
- Each destination point wishes to minimize the cost of acquiring the product.
14
Generate driving distance matrix from the set
of destination locations latitudes/longitudes
No Is the facility
location optimal?
15
- Step 1: Generate driving distance matrix from the set of destination locations’
16
Start
Calculate centroid
Calculate distance
Group based on
minimum distance
- Step 3: Calculate starting point of facility location for each cluster using Center of
Gravity method and sets as current facility location. The Center of Gravity method
assumes that the cost is directly proportional to distance and volume shipped, inbound
and outbound transportation costs are equal, and it does not include special shipping
costs for less than full loads. Using latitude and longitude coordinates might be helpful
to calculate the initial facility location centers for each cluster. The following formula
Center of Gravity
• •
17
Figure 3.2 CoG of each district in HCMC.
- Step 4: Calculate the driving distances from the current point calculated in Step 3 to
each destination location of each cluster using the Google Maps in order to have a data
- Step 5: Search the optimal facility locations. All distances are calculated using the
Google Maps driving distances. Let the starting point be the current point calculated in
Step 3. Use the maximal Google Maps driving distance calculated from Step 4 as
Firstly, the sets, indices, input parameters and decision variables used throughout this
research are defined. Then, the objective functions and the constraints for the model is
specified (minimize total cost which is a function of facility set up cost and
transportation cost).
18
3.4.1. Notations
3.4.1.1. Sets
: Fixed cost is defined by the default size of each depot multiplied times the renting
Z= (1)
3.4.3. Constraints
19
This ensures the quantity of parcel x flow through depot 𝑖 to customer 𝑗 is equal to the
This ensures x are the associated continuous or integer variables, y a binary variable
and M a large enough coefficient. The M must be large enough so as to let the model
This ensures the delivery capacity of depot 𝑖 must larger than the quantity of parcel x
(5)
20
3.5. Data collection
3.5.1. Parcels Data
The data required was information on all of the customer address and the products
Table 3.1 An extract from data collected from more than 18,000 transactions/month.
Each of the recipients’ locations was then geocoded into coordinates including latitude
and longitude. Such process was carried out by using Google Sheets add-in called
ezGeocode.
21
Figure 3.3 Geocoded recipients’ address into coordinates (latitude, longitude).
3.5.2. Costs
Operations costs of a logistics provider company comprise of 5 main factors:
- Pick-up cost (first-mile) is the costs for couriers and vehicles like trucks or
- Cost at hub/depot is the costs for all the process of handling, encoding and sorting
there would be costs for the transportation of the parcel from the first hub to the second
22
For example, if a parcel is sent to Ba Dinh District - Hanoi from HCMC, it will be
transported from Sai Gon Hub to Hanoi Hub for further process.
then it would be costs for the transportation of the parcel from the hub/depot to
respective shuttle depots for the minimization of the delivery distance and therefor
For example, if a parcel is sent to Vung Tau from HCMC, it will be transported from
- Last-mile cost is the one for couriers and vehicles like trucks or motorbikes to come
to each location of the customers (consumers) or drop-off points to deliver the parcels.
Table 3.2 DHL Hub & Spoke Operations Model – High-level Operations Cost Factors.
Intra-region Cross-region
Cost factor Semi Semi
Metro Remote Metro Remote
Urban Urban
Pick-up $ $ $ $ $ $
Hub/Depot $ $ $ $ $ $
Line haul
Air $ $ $
Ground $ $ $
Shuttle $ $$ $$ $ $$ $$$
Last-mile
Re-shuttle $ $$ $ $$
Courier $ $$ $$$ $ $$ $$$
*Note: In the same cost factor, the more $ marks means the cost is higher in compare with
other regions.
23
3.5.3. Volume
Table 3.3 Parcels volume acquired in 12 months.
07/2017 08/2017 09/2017 10/2017 11/2017 12/2017 01/2018 02/2018 03/2018 04/2018 05/2018 06/2018
Quận 1 1386 1272 1414 1376 1381 1372 1389 1399 1291 1255 1296 1408
Quận 2 775 798 804 805 709 772 773 795 785 735 804 764
Quận 3 775 798 804 805 709 772 773 795 785 735 804 764
Quận 4 501 437 367 412 377 463 455 363 437 398 518 512
Quận 5 542 593 559 596 520 610 586 521 550 513 540 590
Quận 6 541 503 524 510 541 508 497 532 480 538 511 550
Quận 7 1001 1138 1087 978 1156 1148 1104 984 1158 967 1031 1063
Quận 8 715 737 754 737 742 712 702 763 712 746 708 760
Quận 9 648 679 635 708 635 649 669 670 662 647 620 664
Quận 10 782 700 730 733 791 696 724 772 751 785 765 754
Quận 11 650 778 762 763 639 721 622 750 738 780 772 776
Quận 12 1088 746 952 731 931 817 773 1098 932 826 774 851
Phú Nhuận 790 727 715 714 728 755 782 742 790 728 785 717
Tân Bình 1772 1820 1842 1844 1457 1840 1490 1883 1578 1699 1549 1577
Tân Phú 1401 1326 1448 1350 1310 1428 1342 1426 1312 1399 1346 1449
Gò Vấp 1576 1321 1319 1427 1600 1330 1339 1579 1329 1589 1394 1542
Bình Thạnh 1544 1441 1519 1430 1473 1431 1518 1442 1522 1587 1441 1599
Bình Tân 1024 1385 1091 1032 1246 1386 1029 1379 1082 1142 1311 1334
Thủ Đức 1455 1514 1422 1472 1467 1562 1508 1456 1581 1457 1409 1550
*Unit: parcels
constraints and variables. CPLEX has the possibility to translate the mathematical
model of the problem into the standard mathematical formulation by the special model
language. CPLEX collects data from Excel and giving the answer through Excel,
makes the user becomes easy to follow and understand how they solve the problem. It
24
can solve large and reality optimization problems with the promptly speed which
25
CHAPTER 4 RESULTS
Using Excel VBA tool based on the formulas of Center of Gravity, 19 CoGs were
calculate the driving distances from current location point to each destination location
in each cluster.
shared on an open forum. The clustering based on the destination locations driving
distance matrix to generate K clusters. After that, the results could be plotted out and
point out the centroids. These centroids were chosen to be potential locations for
further calculation.
26
4.2.1. Results for 4 Clusters
Table 4.2 Results of performing K-means algorithm – 4 Clusters.
Input Data Result of K-Means
X (Lat.) Y (Long.) X (Lat.) Y (Long.) Centroid
Quận 1 10.776720 106.697331 Quận 12 10.857451 106.640107 1
Quận 2 10.793508 106.747026 Phú Nhuận 10.798941 106.679540 1
Quận 3 10.767929 106.684908 Tân Bình 10.806420 106.650581 1
Quận 4 10.759512 106.703046 Gò Vấp 10.834546 106.669932 1
Quận 5 10.755364 106.669738 Bình Thạnh 10.805133 106.706301 1
Quận 6 10.746871 106.635372 Quận 2 10.793508 106.747026 2
Quận 7 10.737153 106.719134 Quận 9 10.832679 106.793046 2
Quận 8 10.737499 106.659784 Thủ Đức 10.853974 106.748966 2
Quận 9 10.832679 106.793046 Quận 5 10.755364 106.669738 3
Quận 10 10.771274 106.668515 Quận 6 10.746871 106.635372 3
Quận 11 10.765020 106.650189 Quận 8 10.737499 106.659784 3
Quận 12 10.857451 106.640107 Quận 10 10.771274 106.668515 3
Phú Nhuận 10.798941 106.679540 Quận 11 10.765020 106.650189 3
Tân Bình 10.806420 106.650581 Tân Phú 10.784238 106.648603 3
Tân Phú 10.784238 106.648603 Bình Tân 10.766005 106.605664 3
Gò Vấp 10.834546 106.669932 Quận 1 10.776720 106.697331 4
Bình Thạnh 10.805133 106.706301 Quận 3 10.767929 106.684908 4
Bình Tân 10.766005 106.605664 Quận 4 10.759512 106.703046 4
Thủ Đức 10.853974 106.748966 Quận 7 10.737153 106.719134 4
27
Centroid 4
28
4.2.2. Results for 6 Clusters
Table 4.4 Results of performing K-means algorithm – 6 Clusters.
Input Data Result of K-Means
X (Lat.) Y (Long.) X (Lat.) Y (Long.) Centroid
Quận 1 10.776720 106.697331 Quận 12 10.85745 106.6401 1
Quận 2 10.793508 106.747026 Tân Bình 10.80642 106.6506 1
Quận 3 10.767929 106.684908 Gò Vấp 10.83455 106.6699 1
Quận 4 10.759512 106.703046 Quận 2 10.79351 106.747 2
Quận 5 10.755364 106.669738 Quận 9 10.83268 106.793 2
Quận 6 10.746871 106.635372 Thủ Đức 10.85397 106.749 2
Quận 7 10.737153 106.719134 Quận 1 10.77672 106.6973 3
Quận 8 10.737499 106.659784 Quận 3 10.76793 106.6849 3
Quận 9 10.832679 106.793046 Quận 10 10.77127 106.6685 3
Quận 10 10.771274 106.668515 Phú Nhuận 10.79894 106.6795 3
Quận 11 10.765020 106.650189 Tân Phú 10.78424 106.6486 3
Quận 12 10.857451 106.640107 Bình Thạnh 10.80513 106.7063 3
Phú Nhuận 10.798941 106.679540 Quận 4 10.75951 106.703 4
Tân Bình 10.806420 106.650581 Quận 7 10.73715 106.7191 4
Tân Phú 10.784238 106.648603 Quận 5 10.75536 106.6697 5
Gò Vấp 10.834546 106.669932 Quận 8 10.7375 106.6598 5
Bình Thạnh 10.805133 106.706301 Quận 11 10.76502 106.6502 5
Bình Tân 10.766005 106.605664 Quận 6 10.74687 106.6354 6
Thủ Đức 10.853974 106.748966 Bình Tân 10.76601 106.6057 6
29
Centroid 4
Centroid 5
Centroid 6
30
4.3. CPLEX Results
Table 4.6 Comparison between 3 proposals.
After running mathematical model using CPLEX, the results were demonstrated in the
table above. In the first option, the total cost for the distribution network with 2
clusters (k=2) and 2 corresponding depots (which is the current network of DHL) is
transportation cost and fixed cost for setting up a facility. Next option is with 4 clusters
(k=4) and 4 corresponding depots, total cost is 130,854,256,129.729 (billion VND per
year), which saves 62,634,927,115.74 (billion VND) from the first option – about
32.37%. The last option is with 6 clusters (k=6) and 6 corresponding depots, total cost
31
CHAPTER 5 CONCLUSION
Facility location decisions play an important role in the strategic planning and design
flow of materials through the distribution system, and lead to decreased costs and
improved customer service. This paper has focused on the implementation of facility
Given the location of each destination in terms of their coordinates, the requirement at
each destination and shipping costs for the region of interest, the proposed
methodology in this paper is able to determine the optimal location of each facility and
helps companies assess the locations of facilities. On top of this, we could locate
location decisions.
clustering method that indirectly optimizes the location-allocation quality under the
individual and overall capacity restrictions. Since the allocation strategy adopted from
the K-means algorithm can ensure a near-optimal allocation result (in terms of the
objective function) when facility locations are fixed, this research focused more on
32
REFERENCES
[1] Baldaci R., Maniezzo V. and Mingozzi A. , 2002. A new method for solving
capacitated location problems based on a set partitioning approach, s.l.: s.n.
[2] D., V. H., 1969. Integer programming and the theory of grouping, s.l.: s.n.
[3] Diaz J. A. and Fernandez E. , 2002. A Branch-and-bound algorithm for the single
source capacitated plant location problem, s.l.: s.n.
[4] Diaz J. A. and Fernandez E. , 2005. Hybrid scatter search and path relinking for
the capacitated p-median problem, s.l.: s.n.
[5] Ernst A. T. and Krishnarmoorthy M. , 1999. Solution algorithm for the capacitated
single allocation - Hub location problems, s.l.: s.n.
[6] Franca P. M., Sosa N. M. and Pureza V. , 1997. An addaptive tabu search
algorithm for the capacitated clustering problem, s.l.: s.n.
[11] Negreiros M. and Palhano A., 2006. The capacitated centred clustering problem,
s.l.: s.n.
[12] ReVelle C. S. and Swain R. W. , 1970. Central facilities location, s.l.: s.n.
33
APPENDIX A
Sub Run()
'Run k-Means
If Not kMeansSelection Then
Call MsgBox("Error: " & Err.Description, vbExclamation, "kMeans Error")
End If
End Sub
kMeansSelection_Error:
kMeansSelection = (Err.Number = 0)
End Function
A
'Clusters - Number of clusters to reduce records into.
uniqueCentroid = True
For c2 = LBound(Centroid) To c - 1
'Loop Through Record Dimensions and check if all are the same
x=0
y=0
For d2 = LBound(Centroid(c).Dimension) To _
UBound(Centroid(c).Dimension)
x = x + Centroid(c).Dimension(d2) ^ 2
y = y + Centroid(c2).Dimension(d2) ^ 2
Next d2
B
uniqueCentroid = Not Sqr(x) = Sqr(y)
If Not uniqueCentroid Then Exit For
Next c2
Next c
PassCounter = PassCounter + 1
ClustersStable = True 'Until Proved otherwise
lastCluster = Record(r).Cluster
lowestDistance = 0 'Reset lowest distance
'======================================================
' Calculate Elucidean Distance
'======================================================
' d(p,q) = Sqr((q1 - p1)^2 + (q2 - p2)^2 + (q3 - p3)^2)
'------------------------------------------------------
' X = (q1 - p1)^2 + (q2 - p2)^2 + (q3 - p3)^2
' d(p,q) = X
x=0
y=0
'Loop Through Record Dimensions
For d = LBound(Record(r).Dimension) To _
UBound(Record(r).Dimension)
y = Record(r).Dimension(d) - Centroid(c).Dimension(d)
y=y^2
x=x+y
Next d
'If distance to centroid is lowest (or first pass) assign record to centroid cluster.
If c = LBound(Centroid) Or x < lowestDistance Then
lowestDistance = x
'Assign distance to centroid to record
Record(r).Distance(c) = lowestDistance
'Assign record to centroid
Record(r).Cluster = c
End If
Next c
C
If ClustersStable Then ClustersStable = Record(r).Cluster = lastCluster
Next r
End If
Next r
kMeans = (Err.Number = 0)
End Function
'Output Headings
With oSheet.Rows(rowNumber)
With .Cells(1)
D
.Value = "Row Title"
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
With .Cells(2)
.Value = "Centroid"
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
End With
'Print by Row
rowNumber = rowNumber + 1 'Blank Row
For r = LBound(Record) To UBound(Record)
oSheet.Rows(rowNumber).Cells(1).Value = Table.Rows(r).Cells(1).Value
oSheet.Rows(rowNumber).Cells(2).Value = Record(r).Cluster
rowNumber = rowNumber + 1
Next r
'Print Centroids
rowNumber = rowNumber + 1
For c = LBound(Centroid) To UBound(Centroid)
With oSheet.Rows(rowNumber).Cells(1)
.Value = "Centroid " & c
.Font.Bold = True
End With
'Loop through cluster dimensions
For d = LBound(Centroid(c).Dimension) To UBound(Centroid(c).Dimension)
oSheet.Rows(rowNumber).Cells(d).Value = Centroid(c).Dimension(d)
Next d
rowNumber = rowNumber + 1
Next c
outputClusters_Error:
outputClusters = (Err.Number = 0)
End Function
E
Name = Name & " (" & Num & ")"
Wend
F
APPENDIX B
float fixed_outbound_cost_excel[temp][period]=...;
float fixed_outbound_cost[i in depot][j in customer][t in period] = fixed_outbound_cost_excel[(i-
1)*numcustomer + j][t];
float fixed_cost[depot] = ...;
float variables_outbound_cost = ...;
float distance[depot][customer] = ...;
float capacity_delivery[depot] = ...;
float demand[customer][period] = ...;
minimize
sum(i in depot, t in period)(fixed_cost[i]*y[i][t])
+ sum(i in depot, j in customer, t in period)(fixed_outbound_cost[i][j][t] +
variables_outbound_cost*distance[i][j])*x[i][j][t];
subject to
{
// Constraint 1:
forall (j in customer, t in period)
sum(i in depot) x[i][j][t] == demand[j][t];
// Constraint 2:
forall (i in depot, t in period)
sum(j in customer)x[i][j][t] - Big_M*y[i][t] <= 0;
// Constraint 3:
forall (i in depot, t in period)
sum(j in customer)x[i][j][t] <= capacity_delivery[i];
// Constraint 4:
forall (i in depot, t in period:t>1)
y[i][t] >= y[i][t-1];
G
Appendix B-2: Reading the Data Sheet
SheetConnection nguyen("DHL_data.xlsx");