Facility Location With Clustering Algorithm

VIETNAM NATIONAL UNIVERSITY – HOCHIMINH CITY
INTERNATIONAL UNIVERSITY
DEPARTMENT OF INDUSTRIAL & SYSTEMS ENGINEERING
FACILITY LOCATION PROBLEM

USING CLUSTERING ALGORITHM
Submitted in partial fulfillment of the requirements

for the Degree of Bachelor of Engineering in
Industrial and Systems Engineering
Student: HUYNH NHAT VINH NGUYEN

ID: IELSIU14050
Thesis advisor: MSc. DUONG VO NHI ANH
Ho Chi Minh City, Vietnam

August/2018
i
FACILITY LOCATION PROBLEM
USING CLUSTERING ALGORITHM
By
HUYNH NHAT VINH NGUYEN
Submitted in partial fulfillment of the requirements for the Degree of

Bachelor of Engineering in Industrial and Systems Engineering
International University, Ho Chi Minh City
August/2018
Signature of Student: __________________________________________

Huynh Nhat Vinh Nguyen
Certified by : __________________________________________
MSc. Duong Vo Nhi Anh
Thesis Advisor
Approved by : __________________________________________
Dr. Pham Huynh Tram
Head of ISE Department
ii
ABSTRACT
To identify optimum location for facility is one of the major challenges in logistics
network. A location for facility is optimum when it can optimize a certain objective such
as providing equitable service to customers, minimizing transportation and facility cost,
capturing the largest market share, etc. Many decisions for facility location involving
distance objective functions on Spherical surface have been approached using heuristic
algorithms, branch-and-bound algorithm, approximation algorithms and simulation.
The purpose of this study focuses on the design of distribution network of DHL
eCommerce in Ho Chi Minh City. Further, it aims to propose the solution of the
distribution center (depot) allocation at the logistics service provider DHL eCommerce
Vietnam.
The potential location proved the reduction of transportation cost in a network design from
the current model and also provided insight into considering volume in location analysis
as it can serve as a magnifier of business impact.
Keywords: Facility Location, Location-allocation, Clustering Algorithm, K-means,
Optimization Modeling.
iii
ACKNOWLEDGMENTS
First of all, I would like to express my gratitude to Mr. Duong Vo Nhi Anh - my advisor
on this thesis. I am thankful for his recommendation, feedbacks and encouragement. I
am truly sorry for my continuously disconnects with you through the time but my thesis
would not be completed in time without your contribution and valuable guidance.
My appreciation also extends to all my friends, those ‘soul mates’ from High school, those
at IU and my lovely colleagues at DHL eCommerce. Special thanks goes to my brother
T.A., who had helped me survived through many courses during 4 years studying at IU
and spent his little precious time helping me with all the ideas and ‘coding things’ to end
this nightmare.
On top of this, I am indebted to my family, whose support and encouragement are
priceless to me.
iv
TABLE OF CONTENTS
ABSTRACT...................................................................................................................... iii
ACKNOWLEDGMENTS ................................................................................................ iv
TABLE OF CONTENTS................................................................................................... v
LIST OF FIGURES ......................................................................................................... vii
LIST OF TABLES .......................................................................................................... viii
LIST OF ABBREVIATIONS AND SYMBOLS ............................................................. ix
CHAPTER 1 INTRODUCTION ..................................................................................... 1
1.1. Background ................................................................................................................. 1
1.2. Problem ....................................................................................................................... 4
1.3. Objective ..................................................................................................................... 5
1.4. Scope and Limitation .................................................................................................. 5
CHAPTER 2 LITERATURE REVIEW .......................................................................... 6
CHAPTER 3 METHODOLOGY .................................................................................... 9
3.1. Research process ......................................................................................................... 9
3.2. K-means algorithm ................................................................................................... 11
3.3. Solution development ............................................................................................... 14
3.4. Mathematical model ................................................................................................. 18
3.4.1. Notations ................................................................................................................ 19
3.4.2. Objective function .................................................................................................. 19
3.4.3. Constraints ............................................................................................................. 19
3.5. Data collection .......................................................................................................... 21
3.5.1. Parcels Data ........................................................................................................... 21
3.5.2. Costs....................................................................................................................... 22
v
3.5.3. Volume................................................................................................................... 24
3.6. IBM ILOG CPLEX ................................................................................................... 24
CHAPTER 4 RESULTS ................................................................................................ 26
4.1. Center of Gravity Results.......................................................................................... 26
4.2. K-Means Results ....................................................................................................... 26
4.2.1. Results for 4 Clusters ............................................................................................. 27
4.2.2. Results for 6 Clusters ............................................................................................. 29
4.3. CPLEX Results ......................................................................................................... 31
CHAPTER 5 CONCLUSION ........................................................................................ 32
REFERENCES ................................................................................................................ 33
APPENDIX A ................................................................................................................... A
Source Code For K-Means Algorithm Using Excel VBA ................................................ A
APPENDIX B ................................................................................................................... G
Source Code for Model Using CPLEX............................................................................. G
vi
LIST OF FIGURES
Figure 1.1 Typical Stages Of Shipping In A Hub-And-Spoke Network. ............................. 3
Figure 1.2 Single Allocation (Left) And Extensions To A Pure Hub-And-Spoke Layout:
Hub-To-Hub Trunking (Right) In A Hub-And-Spoke Network........................................... 4
Figure 3.1 Location (Address) Of Customers Based On Latitudes/Longitudes. ................ 16
Figure 3.2 Cog Of Each District In HCMC. ....................................................................... 18
Figure 3.3 Geocoded Recipients’ Address Into Coordinates. ............................................. 22
Figure 4.1 Results Of Potential Locations – 4 Clusters. ..................................................... 28
Figure 4.2 Results Of Potential Locations – 6 Clusters. ..................................................... 30
vii
LIST OF TABLES
Table 1.1 Coverage areas of each depot in Ho Chi Minh City. ............................................ 5
Table 3.1 An extract from data collected from more than 18,000 transactions/month. ..... 21
Table 3.2 DHL Hub & Spoke Operations Model – High-level Operations Cost Factors. . 23
Table 3.3 Parcels volume acquired in 12 months. .............................................................. 24
Table 4.1 Results of calculating starting point for each cluster using Center of Gravity. .. 26
Table 4.2 Results of performing K-means algorithm – 4 Clusters. .................................... 27
Table 4.3 Coordinates of Centroids generated by K-means algorithm – 4 Clusters........... 28
Table 4.4 Results of performing K-means algorithm – 6 Clusters. .................................... 29
Table 4.5 Coordinates of Centroids generated by K-means algorithm – 6 Clusters........... 30
Table 4.6 Comparison between 3 proposals. ...................................................................... 31
viii
LIST OF ABBREVIATIONS AND SYMBOLS
HCMC Ho Chi Minh City
DC Distribution Center
RCPMP Regionally Constrained P-median Problem
CPMP Capacitated P-median Problem
CCCP Capacitated Centered Clustering Problem
LP Linear Programming
CoG Center of Gravity
Lat. Latitude
Long. Longitude
Fixed outbound last-mile transportation cost from depot 𝑖 to customer
demand point j in period t, unit: VND/ parcel
Fixed cost is defined by the default size of each depot multiplied times
the renting cost/m2 in location 𝑖, unit: VND/m2
Variable outbound last-mile transportation cost, unit: VND/km
Distance from depot 𝑖 to customer demand point j, unit: km
Delivery capacity of depot 𝑖, unit: parcels
Demand of customer 𝑗 in period 𝑡, unit: parcels
Quantity of parcel x flow through depot 𝑖 to customer 𝑗 in period 𝑡,
unit: parcels
1 if depot 𝑖 is recommended by the model to open in period 𝑡,
0 otherwise
ix
CHAPTER 1 INTRODUCTION
1.1. Background
In order to meet the demand of the fast growing industry, the number and location of
facilities of a company is an important factor in its long-term strategy. Placing
distribution centers in optimal locations leads to efficient utilization of the resources of
these locations. By precisely forecasting the demand and optimizing distribution
network design, a company can optimize the utilization of their resources. By
evaluating these important choices with a simple framework, companies can position
themselves to make strategic decisions based on sound reasoning, allowing them to
defend their capital expenses to stakeholders.
One significant drawback when making network design decisions derives from the fact
that traditional clustering algorithms cannot reflect the real world conditions. When
designing a network in a relatively small radius, straight lines and curved distances
cannot describe the true geographical distance and therefore neglect the efficiency of
the network. When physical boundaries and features in urban areas such as traffic are
taken into account, an optimal distribution network is seemed to be inefficient. For
these reasons, it is critical to consider driving distance between facilities such as
distribution centers (hub) and demand clusters.
An important consideration in selecting the location of these facilities is the coverage
of the demand areas. Covering models have been proven to be very useful in solving
facility location problems. A demand point is treated as covered only if a facility can
be available to provide the certain service to the demand point within a required
distance or time from a facility.
1
A hub-and-spoke logistics network consists of hubs performing transshipment
operations (i.e., reassembling and redirecting compound shipments of smaller
consignment units), and spokes or depots linking end customers with the hubs.
Typically, shipping processes in a hub-and-spoke network follow this sequence:
- Stage 1: Depots pick up shipments at their customers. This is usually carried out in
the form of pickup tours, serving several customers within the same round-trip. While
the number and routing of pick-up tours is left to the depot (and may be subject to
operational decisions as well as route optimization), by the end of this stage, all
shipments of a given time period must be present at the depot where they are readied
for transport to the hub.
- Stage 2: This stage is, in fact, a complex procedure containing the following sub-
stages:
+ Shipping the parcels from the depots of origin to the hub;
+ Re-arrangement of goods from several depots to assemble new shipments
that contain items with the same depot of destination;
+ Shipping the parcels from the hub to the depots of destination. Note that
usually, the same vehicles perform both depot→hub and hub→depot transport
in subsequent steps. This implies that a balance of inbound and outbound traffic
is needed to minimize deadheading traffic. Delivery time constraints, however,
impose limits on simple load balancing by withholding a certain number of
consignments. Also, imbalance of longer duration or bursts of critical volume
are certain to surpass the dimensions where temporary withholding would be a
feasible option. Another operational issue resulting from this arrangement is the
need for sufficient shipping capacity to perform this stage: a sufficient number
of vehicles must be present at the hub by a certain time limit to carry out the
2
hub→depot step. If needed, additional vehicles must be called in (typically
from the destination depot) to handle the volume scheduled for the given
period.
- Stage 3: Once the shipped parcels are at the disposal of the destination depot,
delivery to the destinations is performed. Again, it may be left to the given depot how
this is carried out—a delivery tour may also be combined with a pickup tour to
improve vehicle utilization or reduce time lags.
Collection Depot 1 Hub 1 (Hub 2) Depot 2 Delivery
First-mile Line-haul Last-mile
Figure 1.1 Typical stages of shipping in a hub-and-spoke network.
The typical stages of shipping in a hub-and-spoke network: collection by Depot 1,
shipping from Depot 1 to Hub 1, sorting at Hub 1, shipping from Hub 1 to Hub 2 if
needed, shipping from Hub 2 to Depot 2, and delivery to destinations in the delivery
area of Depot 2.
Hub-and-spoke networks can be classified by the arrangement and number of hubs and
spokes, as well as their connectivity. In a “pure” hub-and-spoke case, the sending and
receiving depots may be assigned to a single hub (referred to as the single allocation
case), or they may serve several depots (multiple allocation). In practice, the
distribution network of DHL is an extension to a pure hub-and-spoke network which
means one hub is maintained on each area in the North and South of Vietnam - Hanoi
and Ho Chi Minh City, the depots in each area are assigned to the respective hub. For
3
example, Depot Ba Dinh in Hanoi can only be connected to Hanoi Hub yet they cannot
ship parcels directly to Sai Gon Hub.
a) b)
Figure 1.2 Single allocation (left) and Extensions to a pure hub-and-spoke layout:
hub-to-hub trunking (right) in a hub-and-spoke network.
1.2. Problem
The distribution network in HCMC currently consists of the central hub and only 2
depots, namely Binh Tan and Binh Thanh. The 2 depots have to cover the demand of
total 19 districts in HCMC which certainly lead to the fact that DHL takes more time
to collect/deliver parcels from/to a specific destination therefore extends total lead time
and cost more on transportation. DHL needs to know if a more efficient and effective
hub-and-spoke network is feasible in Ho Chi Minh City. Efficiency is defined as the
ability of a network to meet requirements in a timely manner. In fact, how long it takes
for the network to meet demand. Effectiveness concerns the ability of the network to
deliver requirements to the necessary locations. The current network may not be the
most efficient for meeting the demands of recent and future plan. Cost and time values
are used to compare the efficiency and effectiveness of the current versus alternative
networks. This research provides DHL with an analysis of the current and potential
4
depot locations and how efficiently and effectively these locations meet the demand
placed on the system in an effort to find an optimal network.
Table 1.1 Coverage areas of each depot in Ho Chi Minh City.
Depot District
Quận 1
Quận 2
Quận 3
Quận 4
Depot Bình Thạnh
Quận 9
Phú Nhuận
Bình Thạnh
Thủ Đức
Quận 5
Quận 6
Quận 7
Quận 8
Quận 10
Depot Bình Tân Quận 11
Quận 12
Tân Bình
Tân Phú
Gò Vấp
Bình Tân
1.3. Objective
Following the problem, the aim of this research is to:
- To determine the optimal number of depots and the location of each depot. The
optimal depot locations are those minimize the total delivery time of parcels to a
specific destination location, the driving distances related costs and facilities costs.
1.4. Scope and Limitation

Focus only in HCMC distribution network.
This thesis mainly concentrates on numerical investigations and simulations.
Experimental study is a must in the next stage of this study.
5
CHAPTER 2 LITERATURE REVIEW
There are many literatures in the area of facility location problem. A simple facility
location problem is in which a single facility is to be located, with the only
optimization criterion being the minimization of the weighted sum of distances from a
given set of point locations. More complicated problems include the placement of
multiple facilities, constraints on the locations of facilities, and more complex
optimization criteria. In a basic formulation, the facility location problem consists of a
set of potential facility sites P where a facility can be opened, and a set of demand
points D that must be serviced. The goal is to pick a subset F of facilities to open, to
minimize the sum of distances from each demand point to its closest facility, plus the
sum of opening costs of the facilities. A number of approximation algorithms have
been developed for the facility location problem and many of its variants. In the past,
many facility location decisions involving distance objective functions on Spherical
Surface have been approached using algorithmic, meta-heuristic algorithms, branch-
and-bound algorithm, heuristic techniques, approximation algorithm and simulation.
The classical location-allocation problem is the basis of many of the location models
that have been built upon throughout the supply chain design literatures. The location-
allocation problem has been defined as follows in literature: a group of customer
locations with historical needs and a group of potential DC locations are proposed.
When a DC is allocated at one of the potential areas, a known fixed cost is earned.
There is also a known unit delivery cost between each potential DC and each customer
location. The locations of the DC and the shipment pattern between the DC and the
customers to achieve the desired objective.
6
Location-allocation problems with capacity constraints have many variants across
different application contexts, including the regionally constrained p-median problem
(RCPMP) (Murray and Gerrard 1997), the capacitated p-median problem (CPMP), the
capacitated centered clustering problem (CCCP) (Negreiros and Palhano 2006), the
capacitated single allocation hub location problem (Ernst and Krishnamoorthy 1999),
the single source capacitated plant location problem (Díaz and Fernández 2002),
among many others. They are different from each other in terms of different
constraints on demand assignments and/or facility locations. For example, CPMP
requires that facility locations be a subset of the demand point locations (Díaz and
Fernández 2005). RCPMP requires some facilities to be placed in given regions
(Murray and Gerrard 1997).
Capacitated location-allocation problems can be structured as a Linear Programming
(LP) problem with a linear combination of solution variables (Vinod 1969, ReVelle
and Swain 1970). However, capacitated location-allocation problems are NP-complete
(Garey and Johnson 1979). The time cost of a deterministic approach will increase
exponentially and make it impractical to process large location-allocation problems.
Therefore, substantial research work has been carried out to develop heuristics to
obtain good approximations of the optimal solution (França et al. 1999, Wu et al.
2006). Heuristics can be integrated with LP models or applied as standalone methods,
such as branch-and-bound (Marín and Pelegrín 1997), simulated annealing (SA) (Ernst
and Krishnamoorthy 1999), adaptive tabu search (França et al.1999), set partitioning
(Baldacci et al. 2002), and scatter search (Díaz and Fernández 2005, Scheuerer and
Wendolsky 2006).
The research reported here uses a clustering strategy (which is a special heuristic
approach) to formulate solutions for location-allocation problems with capacity
7
constraints. Clustering analysis is one of the most commonly used approaches in data
analysis and has been applied in many application domains such as pattern discovery,
document retrieval, image segmentation, among many others. Clustering methods do
not depend on prior knowledge and can discover natural groupings of data items (Jain
and Dubes 1988, Jain et al. 1999, Han et al. 2001; Guo et al. 2003). Therefore, when
used in a location-allocation context, clustering methods have the capability to adapt to
the distribution of demands and thus facilitate the search for an approximate optimal
solution. Mulvey and Beck (1984) presented a clustering-based heuristic
(CAPCLUST), whose performance is very close to that of an LP-based approach. The
implicit connection between this clustering concept and location-allocation problems is
also suggested in an earlier work (Vinod 1969).
Both the CAPCLUST method and our proposed method are adapted from the K-means
clustering algorithm. K-means is a distance-based partitioning clustering approach that
partitions a set of data items into clusters while ensuring a low internal dissimilarity or
distance. It assumes that the number of clusters (k) is known. The K-means algorithm
consists of three steps (Jain et al. 1999): (1) randomly choosing k cluster centers within
the data space; (2) assigning each data item to the closest cluster center; and (3)
recalculating the cluster centers using the points assigned to each cluster. Steps 2 and 3
are then repeated until the result converges. To adapt it to a location-allocation
problem, Step 2 can be used to allocate demands and Step 3 can be used to optimize
facility locations.
8
CHAPTER 3 METHODOLOGY
3.1. Research process

The research model shows the way to study steps by steps are given detailed:
Define problems
Mathematical model
Data collection
NO
Validation
YES
Model computational
process
Recommendation and
conclusion
Implementation
9
Define problem: Study the actual company’s situation, operation process and define
the problem. Focusing on the limitation of production affects and interruptions to the
process. Trying to identify the goals and constraints of the problems.
Mathematical model: Develop and modify the mathematical model in details
(parameters, constraints, scale, etc.) in order to appropriate the company’s situation
and objectives of the study (minimizes the total cost as transportation cost, facility cost
and setup cost)
Data collection: Based on the objective of the study, collecting data on volume, the
coordinates of destination locations, distance, transportation cost, facility cost, delivery
capacity of facility, the demand, etc.
Validation: After applying the data to the mathematical model, the validation should be
carried out to make sure the accuracy of the mathematical model and the support data
is consistent.
Model computational process: Base on the demand and all of the collected data to take
experiment on software to solve the mathematical model.
Conclusion and recommendation: report the result, concluding how effective the
method is in solving the problem and provide recommendation to improve the system.
Implementation: This action is depending on the decision maker to choose the set of
target value that satisfied the company expectation and policy.
10
3.2. K-means algorithm
Data clustering, or cluster analysis, is the process of grouping data items so that similar
items belong to the same group/cluster. Clustering methods are used to identify groups
of similar objects in a multivariate data sets collected from fields such as marketing,
bio-medical and geo-spatial. They are different types of clustering methods, including:
- Partitioning methods
- Hierarchical clustering
- Fuzzy clustering
- Density-based clustering
- Model-based clustering
One of the simplest and most popular clustering algorithms is called ‘k-means
clustering’, which would split the data into a set of clusters (groups) based on the
distances between each data point and the center location of each cluster.
K-means has a wide range of application such as computational biology, business and
marketing, search engine or social science. As for DHL location planning problem, the
K-means clustering can also be applied because there are similarities between
company depots location problem and data clustering problem with K-means.
The first step when using k-means clustering is to indicate the number of clusters (k)
that will be generated in the final solution.
The algorithm starts by randomly selecting k objects from the data set to serve as the
initial centers for the clusters. The selected objects are also known as cluster means or
centroids.
11
Next, each of the remaining objects is assigned to its closest centroid, where closest is
defined using the Euclidean distance between the object and the cluster mean. This
step is called “cluster assignment step”.
After the assignment step, the algorithm computes the new mean value of each cluster.
The term cluster “centroid update” is used to design this step. Now that the centers
have been recalculated, every observation is checked again to see if it might be closer
to a different cluster. All the objects are reassigned again using the updated cluster
means.
The cluster assignment and centroid update steps are iteratively repeated until the
cluster assignments stop changing (i.e. until convergence is achieved). That is, the
clusters formed in the current iteration are the same as those obtained in the previous
iteration.
K-means algorithm can be summarized as follow:
1. Specify the number of clusters (K) to be created.
2. Select randomly k objects from the dataset as the initial cluster centers or
means.
3. Assigns each observation to their closest centroid, based on the Euclidean
distance between the object and the centroid.
4. For each of the k clusters update the cluster centroid by calculating the new
mean values of all the data points in the cluster. The centroid of a Kth cluster is
a vector of length p containing the means of all variables for the observations in
the kth cluster; p is the number of variables.
5. Iteratively minimize the total within sum of square. That is, iterate steps 3
and 4 until the cluster assignments stop changing or the maximum number of
iterations is reached.
12
K-means is usually run many times, starting with different random centroids each time.
The results can be compared by examining the clusters or by a numeric measure such
as the clusters’ distortion, which is the sum of the squared differences between each
data point and its corresponding centroid. In cluster distortion case, the clustering with
lowest distortion value can be chosen as the best clustering.
K-means clustering method purposes to search the positions of the clusters which
minimize the distance from the data points to the cluster and the goal of company’s
depot planning is to find the location for the depots that minimize the distance from the
customers to their facility.
K-means cluster analysis given a set of data (x1, x2,…, xn) where each data is a d-
dimensional real vector, k-means clustering aims to separate the n data into k clusters
(k ≤ n) S= {S1, S2, …, Sk} within the cluster sum of squares is minimized.
 Advantages and Disadvantages of using K-means clustering
Advantages
- With a large number of variables, K-means may be calculated faster than hierarchical
clustering (if K is small).
- K-means may generate clusters more tightly than hierarchical clustering, particularly
if the clusters are globular.
- An instance can change cluster (move to another cluster) when the centroids are
recomputed.
Disadvantages
- Difficult to compare the quality of the cluster generated. (E.g. for the different initial
partitions of values of K affect outcome).
- Fixed number of clusters can make it difficult to calculate what K should be.
- Does not run well with non-globular clusters (non-circular cluster shape).
13
- Different early partitions can result in different final clusters. It is useful to rerun the
program using the same as well as different K values, to compare the final result.
3.3. Solution development

The algorithm of this paper is to develop a method to determine the optimal facility
location to minimize the sum of facilities cost and the sum of the volume of goods at a
destination multiplied by the transportation rate to ship to the destination multiplied by
the Google Maps driving distance based on the following assumptions:
- The good of every destination points can be transported in one time.
- The one destination point is only served by one warehouse.
- The cost is related the length from the warehouse to the destination point, the
transport conditions are not considered. Transportation cost is related to the distance
only. The transportation cost equals the distances traveled times a fixed price per unit,
distance.
- The warehouse locations are located at populated places.
- All service facilities are identical.
- Each destination point wishes to minimize the cost of acquiring the product.
- The company treats each cluster independently.
14
Generate driving distance matrix from the set
of destination locations latitudes/longitudes
Perform K means clustering based on the

destination locations driving distance matrix
to generate K clusters
Calculate starting point of facility location

using center of gravity for each cluster
Use heuristic method to search the optimal

facility locations which minimize the sum of
driving distances cost from current location point
to each destination location in each cluster
No Is the facility
location optimal?
15
- Step 1: Generate driving distance matrix from the set of destination locations’
latitudes/longitudes by Google Maps.
Figure 3.1 Location (address) of customers based on latitudes/longitudes.
- Step 2: Perform K-means clustering based on the destination locations driving
distance matrix to generate K clusters.
16
Start
Input number of clusters
Calculate centroid
Calculate distance
Group based on
minimum distance
- Step 3: Calculate starting point of facility location for each cluster using Center of
Gravity method and sets as current facility location. The Center of Gravity method
assumes that the cost is directly proportional to distance and volume shipped, inbound
and outbound transportation costs are equal, and it does not include special shipping
costs for less than full loads. Using latitude and longitude coordinates might be helpful
to calculate the initial facility location centers for each cluster. The following formula
is used to perform spherical coordinate conversion from latitude/longitudes to
Cartesian coordinates for each destination location.
Center of Gravity
• •
17
Figure 3.2 CoG of each district in HCMC.
- Step 4: Calculate the driving distances from the current point calculated in Step 3 to
each destination location of each cluster using the Google Maps in order to have a data
set as an input for the mathematical model.
- Step 5: Search the optimal facility locations. All distances are calculated using the
Google Maps driving distances. Let the starting point be the current point calculated in
Step 3. Use the maximal Google Maps driving distance calculated from Step 4 as
radius of current point.
3.4. Mathematical model

This part presents the mathematical model with constrains in details and explanations.
Firstly, the sets, indices, input parameters and decision variables used throughout this
research are defined. Then, the objective functions and the constraints for the model is
specified (minimize total cost which is a function of facility set up cost and
transportation cost).
18
3.4.1. Notations
3.4.1.1. Sets
I set of depot facilities 𝑖 ∈ 𝐼
T set of forecasting period 𝑡∈𝑇
J set of demand points 𝑗∈𝐽
3.4.1.2. Input parameters
: Fixed outbound last-mile transportation cost from depot 𝑖 to customer demand
point j in period t, unit: VND/ parcel
: Fixed cost is defined by the default size of each depot multiplied times the renting
cost/ m2 in location 𝑖, unit: VND/m2
: Variable outbound last-mile transportation cost, unit: VND/km
: Distance from depot 𝑖 to customer demand point j, unit: km
: Delivery capacity of depot 𝑖, unit: parcels
: Demand of customer 𝑗 in period 𝑡, unit: parcels
3.4.1.3. Decision variables
: Quantity of parcel x flow through depot 𝑖 to customer 𝑗 in period 𝑡, unit: parcels
: 1 if depot 𝑖 is recommended by the model to open in period 𝑡, 0 otherwise
3.4.2. Objective function

The objective function is to minimize Total cost which is a function of fixed facility set
up cost with the fixed and variable transportation cost.
Z= (1)
3.4.3. Constraints
3.4.3.1. Demand constraint

(2)
19
This ensures the quantity of parcel x flow through depot 𝑖 to customer 𝑗 is equal to the
demand of customer 𝑗 in period t.
3.4.3.2. Linking constraint

(3)
This ensures x are the associated continuous or integer variables, y a binary variable
and M a large enough coefficient. The M must be large enough so as to let the model
choose appropriate values for the x variables if y is set to 1.
3.4.3.3. Delivery capacity constraint for each depot i

(4)
This ensures the delivery capacity of depot 𝑖 must larger than the quantity of parcel x
flow through depot 𝑖 to customer 𝑗.
3.4.3.4. Binary constraint
(5)
3.4.3.5. Non-negativity constraint

(6)
3.4.3.6. Signaling constraint

(7)
This ensures once a depot is opened it will not be closed.
20
3.5. Data collection
3.5.1. Parcels Data
The data required was information on all of the customer address and the products
delivered to them along with their weight and important timestamps.
Table 3.1 An extract from data collected from more than 18,000 transactions/month.
RECIPIENT ADDRESS DISTRICT PROVINCE WEIGHT DIMWEIGHT CODAMOUNT ENCODING_DAT

Ngô phương trang 28/31 lương văn can Quận 8 Hồ Chí Minh 9940 9418.4036 02/01/2018 1
kim hồng 19/1/13 cô bắc,phường 1,phú
Quậnnhuận,tp.hcm
Phú Nhuận Hồ Chí Minh 13650 13530 240000 02/01/2018 1
Trần anh tần 151A quốc lộ 13 Quận Thủ Đức Hồ Chí Minh 730 919.338 73000 02/01/2018 1
Vy Nguyễn 52/4A Huỳnh Văn Bánh Quận Phú Nhuận Hồ Chí Minh 5740 11677.2096 02/01/2018 1
nguyễn thành trung 143/12 Lê Thị Riêng Quận 1 Hồ Chí Minh 6460 5640.96 210000 02/01/2018 1
Tiểu Long 123 Liên khu 4-5 Quận Bình Tân Hồ Chí Minh 310 1081.47 398650 02/01/2018 1
Huy nguyen 220 nguyễn trãi quận 1 Quận 1 Hồ Chí Minh 15870 15835.512 297600 02/01/2018 1
Hoangthilananh 38nguyen gian thanh ,f15,q10
Quận 10 Hồ Chí Minh 4300 12166.08 389000 02/01/2018 1
Trần Huynh 9/10/7c đường Đặng Văn BiQuận Thủ Đức Hồ Chí Minh 740 590.9268 124900 02/01/2018 1
Trương Huỳnh như 65/2/1/12a đường 20 Quận Thủ Đức Hồ Chí Minh 250 482.9388 74000 02/01/2018 1
Lê văn ba 6B phạm hùng binh hưng binh
Huyệnchanh
Bình Chánh Hồ Chí Minh 6130 5035.536 277000 02/01/2018 1
Triet 1369 Phan Văn Trị Quận Gò Vấp Hồ Chí Minh 3330 5159.336 730000 02/01/2018 1
Phương Nguyễn 1.16 chung cư Ruby garden,
Quận
Số 2A
TânNguyễn
Bình Sỹ Sách
Hồ Chí Minh 6940 13102.452 279000 02/01/2018 1
Nguyễn Thị Mai Trang 360A Bến Vân Đồn Quận 4 Hồ Chí Minh 6180 4988.9448 277000 02/01/2018 1
Phạm trần phương vy 740 phạm văn chiêu Quận Gò Vấp Hồ Chí Minh 7820 16350.636 1359000 02/01/2018 1
Chienthang 78/21a3 tan hoa dong Quận 6 Hồ Chí Minh 3610 9047.2404 209000 02/01/2018 1
Đỗ Quang Thịnh Trường đại học an ninh, km18
QuậnxaThủ
lộ hà
Đứcnội, phường
Hồ Chí
linh
Minh
trung, quận
1760
thủ đức, 4709.34
thành phố hồ chí
899000
minh 02/01/2018 1
nguyễn hữu trọng 202a Hoàng Văn Thụ Quận Phú Nhuận Hồ Chí Minh 15660 15486.2124 02/01/2018 1
Đỗ Lê Duy 65-67 Gò Cẩm Đệm Quận Tân Bình Hồ Chí Minh 5650 7931.52 1919000 02/01/2018 1
Le Thanh Hien 30/40 Nguyễn Đình Chi Quận 6 Hồ Chí Minh 15930 15934.404 320000 02/01/2018 1
Vũ xuân anh tuấn 649/87 điện biên phủ Quận Bình Thạnh Hồ Chí Minh 9640 6135.2652 199000 02/01/2018 1
văng mỹ linh Tòa nhà H2, 196 Hoàng Diệu,
Quận
P.84, Q4 Hồ Chí Minh 14680 15384.963 301320 02/01/2018 1
Nguyễn Ngọc Lệ Chi số 17 đường Lê Duẩn (Central
QuậnPlaza)
1 Hồ Chí Minh 13680 13021.9056 240000 02/01/2018 1
Each of the recipients’ locations was then geocoded into coordinates including latitude
and longitude. Such process was carried out by using Google Sheets add-in called
ezGeocode.
21
Figure 3.3 Geocoded recipients’ address into coordinates (latitude, longitude).
3.5.2. Costs
Operations costs of a logistics provider company comprise of 5 main factors:
- Pick-up cost (first-mile) is the costs for couriers and vehicles like trucks or
motorbikes to come to each location of the merchants or pick-up points (drop-off
points) to collect all the parcels back to a depot or a hub.
- Cost at hub/depot is the costs for all the process of handling, encoding and sorting
before moving the parcels to next-step hubs or depots.
At this stage, there should be 2 scenarios:
+ If a parcel is transported cross-region (the region of North, Central or South),
there would be costs for the transportation of the parcel from the first hub to the second
hub by trucks (ground) or airfreight (air).
22
For example, if a parcel is sent to Ba Dinh District - Hanoi from HCMC, it will be
transported from Sai Gon Hub to Hanoi Hub for further process.
+ If a parcel is transported intra-region (the region of North, Central or South),
then it would be costs for the transportation of the parcel from the hub/depot to
respective shuttle depots for the minimization of the delivery distance and therefor
minimize the last-mile costs.
For example, if a parcel is sent to Vung Tau from HCMC, it will be transported from
Sai Gon Hub to Vung Tau Depot for further process.
- Last-mile cost is the one for couriers and vehicles like trucks or motorbikes to come
to each location of the customers (consumers) or drop-off points to deliver the parcels.
Table 3.2 DHL Hub & Spoke Operations Model – High-level Operations Cost Factors.
Intra-region Cross-region
Cost factor Semi Semi
Metro Remote Metro Remote
Urban Urban
Pick-up $ $ $ $ $ $
Hub/Depot $ $ $ $ $ $
Line haul
Air $ $ $
Ground $ $ $
Shuttle $ $$ $$ $ $$ $$$
Last-mile
Re-shuttle $ $$ $ $$
Courier $ $$ $$$ $ $$ $$$
*Note: In the same cost factor, the more $ marks means the cost is higher in compare with
other regions.
23
3.5.3. Volume
Table 3.3 Parcels volume acquired in 12 months.
07/2017 08/2017 09/2017 10/2017 11/2017 12/2017 01/2018 02/2018 03/2018 04/2018 05/2018 06/2018
Quận 1 1386 1272 1414 1376 1381 1372 1389 1399 1291 1255 1296 1408
Quận 2 775 798 804 805 709 772 773 795 785 735 804 764
Quận 3 775 798 804 805 709 772 773 795 785 735 804 764
Quận 4 501 437 367 412 377 463 455 363 437 398 518 512
Quận 5 542 593 559 596 520 610 586 521 550 513 540 590
Quận 6 541 503 524 510 541 508 497 532 480 538 511 550
Quận 7 1001 1138 1087 978 1156 1148 1104 984 1158 967 1031 1063
Quận 8 715 737 754 737 742 712 702 763 712 746 708 760
Quận 9 648 679 635 708 635 649 669 670 662 647 620 664
Quận 10 782 700 730 733 791 696 724 772 751 785 765 754
Quận 11 650 778 762 763 639 721 622 750 738 780 772 776
Quận 12 1088 746 952 731 931 817 773 1098 932 826 774 851
Phú Nhuận 790 727 715 714 728 755 782 742 790 728 785 717
Tân Bình 1772 1820 1842 1844 1457 1840 1490 1883 1578 1699 1549 1577
Tân Phú 1401 1326 1448 1350 1310 1428 1342 1426 1312 1399 1346 1449
Gò Vấp 1576 1321 1319 1427 1600 1330 1339 1579 1329 1589 1394 1542
Bình Thạnh 1544 1441 1519 1430 1473 1431 1518 1442 1522 1587 1441 1599
Bình Tân 1024 1385 1091 1032 1246 1386 1029 1379 1082 1142 1311 1334
Thủ Đức 1455 1514 1422 1472 1467 1562 1508 1456 1581 1457 1409 1550
*Unit: parcels
3.6. IBM ILOG CPLEX

IBM ILOG CPLEX Optimizer's mathematical programming technology enables
decision optimization for improving efficiency, reducing costs, and increasing
profitability. It helps businesses to make accurate and logical decisions.
IBM ILOG CPLEX Optimizer provides complex, high-performance mathematical
programming solvers for linear programming, mixed integer programming, quadratic
programming, and quadratically constrained programming problems. These include a
distributed parallel algorithm for mixed integer programming to leverage multiple
computers to solve difficult problems. It has solved problems with millions of
constraints and variables. CPLEX has the possibility to translate the mathematical
model of the problem into the standard mathematical formulation by the special model
language. CPLEX collects data from Excel and giving the answer through Excel,
makes the user becomes easy to follow and understand how they solve the problem. It
24
can solve large and reality optimization problems with the promptly speed which
satisfy the reality’s interactive in optimization.
25
CHAPTER 4 RESULTS
4.1. Center of Gravity Results

Table 4.1 Results of calculating starting point for each cluster using Center of Gravity.
X (Lat.) Y (Long.) X (Lat.) Y (Long.)

Quận 1 10.776720 106.697331 Quận 11 10.765020 106.650189
Quận 2 10.793508 106.747026 Quận 12 10.857451 106.640107
Quận 3 10.767929 106.684908 Phú Nhuận 10.798941 106.679540
Quận 4 10.759512 106.703046 Tân Bình 10.806420 106.650581
Quận 5 10.755364 106.669738 Tân Phú 10.784238 106.648603
Quận 6 10.746871 106.635372 Gò Vấp 10.834546 106.669932
Quận 7 10.737153 106.719134 Bình Thạnh 10.805133 106.706301
Quận 8 10.737499 106.659784 Bình Tân 10.766005 106.605664
Quận 9 10.832679 106.793046 Thủ Đức 10.853974 106.748966
Quận 10 10.771274 106.668515
Using Excel VBA tool based on the formulas of Center of Gravity, 19 CoGs were
calculated and resulted in 19 coordinates. These coordinates were then used to
calculate the driving distances from current location point to each destination location
in each cluster.
4.2. K-Means Results

The process was carried out by using Excel VBA tool with the source code given and
shared on an open forum. The clustering based on the destination locations driving
distance matrix to generate K clusters. After that, the results could be plotted out and
point out the centroids. These centroids were chosen to be potential locations for
further calculation.
26
4.2.1. Results for 4 Clusters
Table 4.2 Results of performing K-means algorithm – 4 Clusters.
Input Data Result of K-Means
X (Lat.) Y (Long.) X (Lat.) Y (Long.) Centroid
Quận 1 10.776720 106.697331 Quận 12 10.857451 106.640107 1
Quận 2 10.793508 106.747026 Phú Nhuận 10.798941 106.679540 1
Quận 3 10.767929 106.684908 Tân Bình 10.806420 106.650581 1
Quận 4 10.759512 106.703046 Gò Vấp 10.834546 106.669932 1
Quận 5 10.755364 106.669738 Bình Thạnh 10.805133 106.706301 1
Quận 6 10.746871 106.635372 Quận 2 10.793508 106.747026 2
Quận 7 10.737153 106.719134 Quận 9 10.832679 106.793046 2
Quận 8 10.737499 106.659784 Thủ Đức 10.853974 106.748966 2
Quận 9 10.832679 106.793046 Quận 5 10.755364 106.669738 3
Quận 10 10.771274 106.668515 Quận 6 10.746871 106.635372 3
Quận 11 10.765020 106.650189 Quận 8 10.737499 106.659784 3
Quận 12 10.857451 106.640107 Quận 10 10.771274 106.668515 3
Phú Nhuận 10.798941 106.679540 Quận 11 10.765020 106.650189 3
Tân Bình 10.806420 106.650581 Tân Phú 10.784238 106.648603 3
Tân Phú 10.784238 106.648603 Bình Tân 10.766005 106.605664 3
Gò Vấp 10.834546 106.669932 Quận 1 10.776720 106.697331 4
Bình Thạnh 10.805133 106.706301 Quận 3 10.767929 106.684908 4
Bình Tân 10.766005 106.605664 Quận 4 10.759512 106.703046 4
Thủ Đức 10.853974 106.748966 Quận 7 10.737153 106.719134 4
27
Centroid 4
Figure 4.1 Results of potential locations – 4 Clusters.
Table 4.3 Coordinates of Centroids generated by K-means algorithm – 4 Clusters.
X (Lat.) Y (Long.) Location

Centroid 1 10.820498 106.669292 Tân Bình
Centroid 2 10.826721 106.763013 Quận 9
Centroid 3 10.760896 106.648266 Quận 11
Centroid 4 10.760329 106.701105 Quận 4
28
4.2.2. Results for 6 Clusters
Table 4.4 Results of performing K-means algorithm – 6 Clusters.
Input Data Result of K-Means
X (Lat.) Y (Long.) X (Lat.) Y (Long.) Centroid
Quận 1 10.776720 106.697331 Quận 12 10.85745 106.6401 1
Quận 2 10.793508 106.747026 Tân Bình 10.80642 106.6506 1
Quận 3 10.767929 106.684908 Gò Vấp 10.83455 106.6699 1
Quận 4 10.759512 106.703046 Quận 2 10.79351 106.747 2
Quận 5 10.755364 106.669738 Quận 9 10.83268 106.793 2
Quận 6 10.746871 106.635372 Thủ Đức 10.85397 106.749 2
Quận 7 10.737153 106.719134 Quận 1 10.77672 106.6973 3
Quận 8 10.737499 106.659784 Quận 3 10.76793 106.6849 3
Quận 9 10.832679 106.793046 Quận 10 10.77127 106.6685 3
Quận 10 10.771274 106.668515 Phú Nhuận 10.79894 106.6795 3
Quận 11 10.765020 106.650189 Tân Phú 10.78424 106.6486 3
Quận 12 10.857451 106.640107 Bình Thạnh 10.80513 106.7063 3
Phú Nhuận 10.798941 106.679540 Quận 4 10.75951 106.703 4
Tân Bình 10.806420 106.650581 Quận 7 10.73715 106.7191 4
Tân Phú 10.784238 106.648603 Quận 5 10.75536 106.6697 5
Gò Vấp 10.834546 106.669932 Quận 8 10.7375 106.6598 5
Bình Thạnh 10.805133 106.706301 Quận 11 10.76502 106.6502 5
Bình Tân 10.766005 106.605664 Quận 6 10.74687 106.6354 6
Thủ Đức 10.853974 106.748966 Bình Tân 10.76601 106.6057 6
29
Centroid 4
Centroid 5
Centroid 6
Figure 4.2 Results of potential locations – 6 Clusters.
Table 4.5 Coordinates of Centroids generated by K-means algorithm – 6 Clusters.
X (Lat.) Y (Long.) Location

Centroid 1 10.832806 106.653540 Gò Vấp
Centroid 2 10.826721 106.763013 Quận 9
Centroid 3 10.784039 106.680866 Quận 3
Centroid 4 10.748333 106.711090 Quận 7
Centroid 5 10.752628 106.659903 Quận 5
Centroid 6 10.756438 106.620518 Bình Tân
30
4.3. CPLEX Results
Table 4.6 Comparison between 3 proposals.
Number of Depots Total Cost

Saved Cost % Saved Cost
(Number of Clusters) (VND)
2 193,489,183,245.464 - -
4 130,854,256,129.729 62,634,927,115.74 32.37
6 93,151,799,428.776 100,337,383,816.69 51.86
After running mathematical model using CPLEX, the results were demonstrated in the
table above. In the first option, the total cost for the distribution network with 2
clusters (k=2) and 2 corresponding depots (which is the current network of DHL) is
193,489,183,245.464 (billion VND per year). This total cost comprises of
transportation cost and fixed cost for setting up a facility. Next option is with 4 clusters
(k=4) and 4 corresponding depots, total cost is 130,854,256,129.729 (billion VND per
year), which saves 62,634,927,115.74 (billion VND) from the first option – about
32.37%. The last option is with 6 clusters (k=6) and 6 corresponding depots, total cost
is 93,151,799,428.776 (billion VND per year), which saves 100,337,383,816.69
(billion VND) from the first option – about 51.86%.
31
CHAPTER 5 CONCLUSION
Facility location decisions play an important role in the strategic planning and design
of logistics/supply chain network. Well-planned location decisions enable the efficient
flow of materials through the distribution system, and lead to decreased costs and
improved customer service. This paper has focused on the implementation of facility
location decisions based on driving distances.
Given the location of each destination in terms of their coordinates, the requirement at
each destination and shipping costs for the region of interest, the proposed
methodology in this paper is able to determine the optimal location of each facility and
helps companies assess the locations of facilities. On top of this, we could locate
optimal facility in perspective. In regard to transportation cost, the driving distance in
the presence of geographic barriers should be taken into consideration in facility
location decisions.
The research presented here transforms a special location-allocation problem into a
clustering problem. The proposed method is essentially a constrained K-means
clustering method that indirectly optimizes the location-allocation quality under the
individual and overall capacity restrictions. Since the allocation strategy adopted from
the K-means algorithm can ensure a near-optimal allocation result (in terms of the
objective function) when facility locations are fixed, this research focused more on
designing methods to obtain a high-quality configuration of facility locations.
32
REFERENCES
[1] Baldaci R., Maniezzo V. and Mingozzi A. , 2002. A new method for solving
capacitated location problems based on a set partitioning approach, s.l.: s.n.
[2] D., V. H., 1969. Integer programming and the theory of grouping, s.l.: s.n.
[3] Diaz J. A. and Fernandez E. , 2002. A Branch-and-bound algorithm for the single
source capacitated plant location problem, s.l.: s.n.
[4] Diaz J. A. and Fernandez E. , 2005. Hybrid scatter search and path relinking for
the capacitated p-median problem, s.l.: s.n.
[5] Ernst A. T. and Krishnarmoorthy M. , 1999. Solution algorithm for the capacitated
single allocation - Hub location problems, s.l.: s.n.
[6] Franca P. M., Sosa N. M. and Pureza V. , 1997. An addaptive tabu search
algorithm for the capacitated clustering problem, s.l.: s.n.
[7] Garey M. R. and Johnson D. S. , 1979. Computers and Intractability: A Guide to

the Theory of NP-Completeness, s.l.: s.n.
[8] Ke Liao and DianshengGuo, 2008. A Clustering-based approach to the

Capacitated facility location problem, s.l.: s.n.
[9] Marin A. and Pelegrin B. , 1997. A branch-and-bound algorithm for the

transportation problem with location of transshipment points, s.l.: s.n.
[10] Murray A. T. and Gerrard R. A. , 1997. Capacitated service and regional

constraints in location-allocation modeling, s.l.: s.n.
[11] Negreiros M. and Palhano A., 2006. The capacitated centred clustering problem,
s.l.: s.n.
[12] ReVelle C. S. and Swain R. W. , 1970. Central facilities location, s.l.: s.n.
33
APPENDIX A
Source Code For K-Means Algorithm Using Excel VBA
Private Type Records

Dimension() As Double
Distance() As Double
Cluster As Integer
End Type
Dim Table As Range

Dim Record() As Records
Dim Centroid() As Records
Sub Run()
'Run k-Means
If Not kMeansSelection Then
Call MsgBox("Error: " & Err.Description, vbExclamation, "kMeans Error")
End If
End Sub
Function kMeansSelection() As Boolean

'Get user table selection
On Error Resume Next
Set Table = Application.InputBox(Prompt:= _
"Please select the range to analyse.", _
title:="Specify Range", Type:=8)
If Table Is Nothing Then Exit Function 'Cancelled
'Check table dimensions

If Table.Rows.Count < 4 Or Table.columns.Count < 2 Then
Err.Raise Number:=vbObjectError + 1000, Source:="k-Means Cluster Analysis",
Description:="Table has insufficent rows or columns."
End If
'Get number of clusters

Dim numClusters As Integer
numClusters = Application.InputBox("Specify Number of Clusters", "k Means Cluster Analysis",
Type:=1)
If Not numClusters > 0 Or numClusters = False Then

Exit Function 'Cancelled
End If
If Err.Number = 0 Then
If kMeans(Table, numClusters) Then
outputClusters
End If
End If
kMeansSelection_Error:
kMeansSelection = (Err.Number = 0)
End Function
Function kMeans(Table As Range, Clusters As Integer) As Boolean

'Table - Range of data to group. Records (Rows) are grouped according to
attributes/dimensions(columns)
A
'Clusters - Number of clusters to reduce records into.
'Script Performance Variables

Dim PassCounter As Integer
'Initialize Data Arrays

ReDim Record(2 To Table.Rows.Count)
Dim r As Integer 'record
Dim d As Integer 'dimension index
Dim d2 As Integer 'dimension index
Dim c As Integer 'centroid index
Dim c2 As Integer 'centroid index
Dim di As Integer 'distance
Dim x As Double 'Variable Distance Placeholder

Dim y As Double 'Variable Distance Placeholder
For r = LBound(Record) To UBound(Record)

'Initialize Dimension Value Arrays
ReDim Record(r).Dimension(2 To Table.columns.Count)
'Initialize Distance Arrays
ReDim Record(r).Distance(1 To Clusters)
For d = LBound(Record(r).Dimension) To UBound(Record(r).Dimension)
Record(r).Dimension(d) = Table.Rows(r).Cells(d).Value
Next d
Next r
'Initialize Initial Centroid Arrays

ReDim Centroid(1 To Clusters)
Dim uniqueCentroid As Boolean
For c = LBound(Centroid) To UBound(Centroid)

'Initialize Centroid Dimension Depth
ReDim Centroid(c).Dimension(2 To Table.columns.Count)
'Initialize record index to next record

r = LBound(Record) + c - 2
Do ' Loop to ensure new centroid is unique

r=r+1 'Increment record index throughout loop to find unique record to use as a centroid
'Assign record dimensions to centroid

For d = LBound(Centroid(c).Dimension) To UBound(Centroid(c).Dimension)
Centroid(c).Dimension(d) = Record(r).Dimension(d)
Next d
uniqueCentroid = True
For c2 = LBound(Centroid) To c - 1
'Loop Through Record Dimensions and check if all are the same
x=0
y=0
For d2 = LBound(Centroid(c).Dimension) To _
UBound(Centroid(c).Dimension)
x = x + Centroid(c).Dimension(d2) ^ 2
y = y + Centroid(c2).Dimension(d2) ^ 2
Next d2
B
uniqueCentroid = Not Sqr(x) = Sqr(y)
If Not uniqueCentroid Then Exit For
Next c2
Loop Until uniqueCentroid
Next c
'Calculate Distances from Centroids
Dim lowestDistance As Double

Dim lastCluster As Integer
Dim ClustersStable As Boolean
Do 'While Clusters are not Stable
PassCounter = PassCounter + 1
ClustersStable = True 'Until Proved otherwise
'Loop Through Records

lastCluster = Record(r).Cluster
lowestDistance = 0 'Reset lowest distance
'Loop through record distances to centroids

'======================================================
' Calculate Elucidean Distance
'======================================================
' d(p,q) = Sqr((q1 - p1)^2 + (q2 - p2)^2 + (q3 - p3)^2)
'------------------------------------------------------
' X = (q1 - p1)^2 + (q2 - p2)^2 + (q3 - p3)^2
' d(p,q) = X
x=0
y=0
'Loop Through Record Dimensions
For d = LBound(Record(r).Dimension) To _
UBound(Record(r).Dimension)
y = Record(r).Dimension(d) - Centroid(c).Dimension(d)
y=y^2
x=x+y
Next d
x = Sqr(x) 'Get square root
'If distance to centroid is lowest (or first pass) assign record to centroid cluster.
If c = LBound(Centroid) Or x < lowestDistance Then
lowestDistance = x
'Assign distance to centroid to record
Record(r).Distance(c) = lowestDistance
'Assign record to centroid
Record(r).Cluster = c
End If
Next c
'Only change if true
C
If ClustersStable Then ClustersStable = Record(r).Cluster = lastCluster
Next r
'Move Centroids to calculated cluster average

For c = LBound(Centroid) To UBound(Centroid) 'For every cluster
'Loop through cluster dimensions

For d = LBound(Centroid(c).Dimension) To _
UBound(Centroid(c).Dimension)
Centroid(c).Cluster = 0 'Reset nunber of records in cluster

Centroid(c).Dimension(d) = 0 'Reset centroid dimensions

'If Record is in Cluster then

If Record(r).Cluster = c Then
'Use to calculate avg dimension for records in cluster
'Add to number of records in cluster

Centroid(c).Cluster = Centroid(c).Cluster + 1
'Add record dimension to cluster dimension for later division
Centroid(c).Dimension(d) = Centroid(c).Dimension(d) + _
Record(r).Dimension(d)
End If
Next r
'Assign Average Dimension Distance

Centroid(c).Dimension(d) = Centroid(c).Dimension(d) / _
Centroid(c).Cluster
Next d
Next c
Loop Until ClustersStable
kMeans = (Err.Number = 0)
End Function
Function outputClusters() As Boolean
Dim c As Integer 'Centroid Index

Dim r As Integer 'Row Index
Dim d As Integer 'Dimension Index
Dim oSheet As Worksheet

Set oSheet = addWorksheet("Cluster Analysis", ActiveWorkbook)

Dim rowNumber As Integer
rowNumber = 1
'Output Headings
With oSheet.Rows(rowNumber)
With .Cells(1)
D
.Value = "Row Title"
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
With .Cells(2)
.Value = "Centroid"
.Font.Bold = True
End With
End With
'Print by Row
rowNumber = rowNumber + 1 'Blank Row
oSheet.Rows(rowNumber).Cells(1).Value = Table.Rows(r).Cells(1).Value
oSheet.Rows(rowNumber).Cells(2).Value = Record(r).Cluster
rowNumber = rowNumber + 1
Next r
'Print Centroids - Headings

For d = LBound(Centroid(LBound(Centroid)).Dimension) To
UBound(Centroid(LBound(Centroid)).Dimension)
With oSheet.Rows(rowNumber).Cells(d)
.Value = Table.Rows(1).Cells(d).Value
.Font.Bold = True
End With
Next d
'Print Centroids
With oSheet.Rows(rowNumber).Cells(1)
.Value = "Centroid " & c
.Font.Bold = True
End With
'Loop through cluster dimensions
For d = LBound(Centroid(c).Dimension) To UBound(Centroid(c).Dimension)
oSheet.Rows(rowNumber).Cells(d).Value = Centroid(c).Dimension(d)
Next d
Next c
oSheet.columns.AutoFit '//AutoFit columns to contents
outputClusters_Error:
outputClusters = (Err.Number = 0)
End Function
Function addWorksheet(Name As String, Optional Workbook As Workbook) As Worksheet

'// If a Workbook wasn't specified, use the active workbook
If Workbook Is Nothing Then Set Workbook = ActiveWorkbook
Dim Num As Integer

'// If a worksheet(s) exist with the same name, add/increment a number after the name
While WorksheetExists(Name, Workbook)
Num = Num + 1
If InStr(Name, " (") > 0 Then Name = Left(Name, InStr(Name, " ("))
E
Name = Name & " (" & Num & ")"
Wend
'//Add a sheet to the workbook

Set addWorksheet = Workbook.Worksheets.Add
'//Name the sheet

addWorksheet.Name = Name
End Function
Public Function WorksheetExists(WorkSheetName As String, Workbook As Workbook) As Boolean

WorksheetExists = (Workbook.Sheets(WorkSheetName).Name <> "")
On Error GoTo 0
End Function
F
APPENDIX B
Source Code for Model Using CPLEX
Appendix B-1: The model

int Big_M = 10000;
int numdepot=...; //number of depots

int numperiod=...;//number of periods
int numcustomer=...;//number of customers destination locations
range depot = 1..numdepot;

range period = 1..numperiod;
range customer = 1..numcustomer;
range temp = 1..numdepot*numcustomer;
float fixed_outbound_cost_excel[temp][period]=...;
float fixed_outbound_cost[i in depot][j in customer][t in period] = fixed_outbound_cost_excel[(i-
1)*numcustomer + j][t];
float fixed_cost[depot] = ...;
float variables_outbound_cost = ...;
float distance[depot][customer] = ...;
float capacity_delivery[depot] = ...;
float demand[customer][period] = ...;
dvar int+ x[depot][customer][period];

dvar boolean y[depot][period];
minimize
sum(i in depot, t in period)(fixed_cost[i]*y[i][t])
+ sum(i in depot, j in customer, t in period)(fixed_outbound_cost[i][j][t] +
variables_outbound_cost*distance[i][j])*x[i][j][t];
subject to
{
// Constraint 1:
forall (j in customer, t in period)
sum(i in depot) x[i][j][t] == demand[j][t];
// Constraint 2:
forall (i in depot, t in period)
sum(j in customer)x[i][j][t] - Big_M*y[i][t] <= 0;
// Constraint 3:
forall (i in depot, t in period)
sum(j in customer)x[i][j][t] <= capacity_delivery[i];
// Constraint 4:
forall (i in depot, t in period:t>1)
y[i][t] >= y[i][t-1];
G
Appendix B-2: Reading the Data Sheet
SheetConnection nguyen("DHL_data.xlsx");
fixed_outbound_cost_excel from SheetRead(nguyen,"fixedoutbound");
fixed_cost from SheetRead(nguyen,"fixed_cost");
distance from SheetRead(nguyen,"distance");
demand from SheetRead(nguyen,"demand");
capacity_delivery from SheetRead(nguyen,"capacity_delivery");

Facility Location With Clustering Algorithm

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Facility Location With Clustering Algorithm

Uploaded by

Copyright:

Available Formats

VIETNAM NATIONAL UNIVERSITY – HOCHIMINH CITY

FACILITY LOCATION PROBLEM

Submitted in partial fulfillment of the requirements

Student: HUYNH NHAT VINH NGUYEN

Ho Chi Minh City, Vietnam

Submitted in partial fulfillment of the requirements for the Degree of

Signature of Student: __________________________________________

as providing equitable service to customers, minimizing transportation and facility cost,

algorithms, branch-and-bound algorithm, approximation algorithms and simulation.

as it can serve as a magnifier of business impact.

Keywords: Facility Location, Location-allocation, Clustering Algorithm, K-means,

on this thesis. I am thankful for his recommendation, feedbacks and encouragement. I

at IU and my lovely colleagues at DHL eCommerce. Special thanks goes to my brother

On top of this, I am indebted to my family, whose support and encouragement are

LIST OF FIGURES ......................................................................................................... vii

LIST OF TABLES .......................................................................................................... viii

LIST OF ABBREVIATIONS AND SYMBOLS ............................................................. ix

CHAPTER 1 INTRODUCTION ..................................................................................... 1

1.1. Background ................................................................................................................. 1

1.2. Problem ....................................................................................................................... 4

1.3. Objective ..................................................................................................................... 5

1.4. Scope and Limitation .................................................................................................. 5

CHAPTER 2 LITERATURE REVIEW .......................................................................... 6

CHAPTER 3 METHODOLOGY .................................................................................... 9

3.1. Research process ......................................................................................................... 9

3.2. K-means algorithm ................................................................................................... 11

3.3. Solution development ............................................................................................... 14

3.4. Mathematical model ................................................................................................. 18

3.4.1. Notations ................................................................................................................ 19

3.4.2. Objective function .................................................................................................. 19

3.4.3. Constraints ............................................................................................................. 19

3.5. Data collection .......................................................................................................... 21

3.5.1. Parcels Data ........................................................................................................... 21

3.6. IBM ILOG CPLEX ................................................................................................... 24

CHAPTER 4 RESULTS ................................................................................................ 26

4.1. Center of Gravity Results.......................................................................................... 26

4.2. K-Means Results ....................................................................................................... 26

4.2.1. Results for 4 Clusters ............................................................................................. 27

4.2.2. Results for 6 Clusters ............................................................................................. 29

4.3. CPLEX Results ......................................................................................................... 31

CHAPTER 5 CONCLUSION ........................................................................................ 32

Source Code For K-Means Algorithm Using Excel VBA ................................................ A

Source Code for Model Using CPLEX............................................................................. G

Figure 1.1 Typical Stages Of Shipping In A Hub-And-Spoke Network. ............................. 3

Hub-To-Hub Trunking (Right) In A Hub-And-Spoke Network........................................... 4

Figure 3.1 Location (Address) Of Customers Based On Latitudes/Longitudes. ................ 16

Figure 3.2 Cog Of Each District In HCMC. ....................................................................... 18

Figure 3.3 Geocoded Recipients’ Address Into Coordinates. ............................................. 22

Figure 4.1 Results Of Potential Locations – 4 Clusters. ..................................................... 28

Figure 4.2 Results Of Potential Locations – 6 Clusters. ..................................................... 30

Table 3.3 Parcels volume acquired in 12 months. .............................................................. 24

Table 4.2 Results of performing K-means algorithm – 4 Clusters. .................................... 27

Table 4.3 Coordinates of Centroids generated by K-means algorithm – 4 Clusters........... 28

Table 4.4 Results of performing K-means algorithm – 6 Clusters. .................................... 29

Table 4.5 Coordinates of Centroids generated by K-means algorithm – 6 Clusters........... 30

Table 4.6 Comparison between 3 proposals. ...................................................................... 31

HCMC Ho Chi Minh City

RCPMP Regionally Constrained P-median Problem

CPMP Capacitated P-median Problem

CCCP Capacitated Centered Clustering Problem

CoG Center of Gravity

Fixed outbound last-mile transportation cost from depot 𝑖 to customer

demand point j in period t, unit: VND/ parcel