Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

VIETNAM NATIONAL UNIVERSITY – HOCHIMINH CITY

INTERNATIONAL UNIVERSITY
DEPARTMENT OF INDUSTRIAL & SYSTEMS ENGINEERING

FACILITY LOCATION PROBLEM


USING CLUSTERING ALGORITHM

Submitted in partial fulfillment of the requirements


for the Degree of Bachelor of Engineering in
Industrial and Systems Engineering

Student: HUYNH NHAT VINH NGUYEN


ID: IELSIU14050
Thesis advisor: MSc. DUONG VO NHI ANH

Ho Chi Minh City, Vietnam


August/2018

i
FACILITY LOCATION PROBLEM
USING CLUSTERING ALGORITHM

By
HUYNH NHAT VINH NGUYEN

Submitted in partial fulfillment of the requirements for the Degree of


Bachelor of Engineering in Industrial and Systems Engineering
International University, Ho Chi Minh City
August/2018

Signature of Student: __________________________________________


Huynh Nhat Vinh Nguyen

Certified by : __________________________________________
MSc. Duong Vo Nhi Anh
Thesis Advisor

Approved by : __________________________________________
Dr. Pham Huynh Tram
Head of ISE Department

ii
ABSTRACT

To identify optimum location for facility is one of the major challenges in logistics

network. A location for facility is optimum when it can optimize a certain objective such

as providing equitable service to customers, minimizing transportation and facility cost,

capturing the largest market share, etc. Many decisions for facility location involving

distance objective functions on Spherical surface have been approached using heuristic

algorithms, branch-and-bound algorithm, approximation algorithms and simulation.

The purpose of this study focuses on the design of distribution network of DHL

eCommerce in Ho Chi Minh City. Further, it aims to propose the solution of the

distribution center (depot) allocation at the logistics service provider DHL eCommerce

Vietnam.

The potential location proved the reduction of transportation cost in a network design from

the current model and also provided insight into considering volume in location analysis

as it can serve as a magnifier of business impact.

Keywords: Facility Location, Location-allocation, Clustering Algorithm, K-means,

Optimization Modeling.

iii
ACKNOWLEDGMENTS

First of all, I would like to express my gratitude to Mr. Duong Vo Nhi Anh - my advisor

on this thesis. I am thankful for his recommendation, feedbacks and encouragement. I

am truly sorry for my continuously disconnects with you through the time but my thesis

would not be completed in time without your contribution and valuable guidance.

My appreciation also extends to all my friends, those ‘soul mates’ from High school, those

at IU and my lovely colleagues at DHL eCommerce. Special thanks goes to my brother

T.A., who had helped me survived through many courses during 4 years studying at IU

and spent his little precious time helping me with all the ideas and ‘coding things’ to end

this nightmare.

On top of this, I am indebted to my family, whose support and encouragement are

priceless to me.

iv
TABLE OF CONTENTS

ABSTRACT...................................................................................................................... iii

ACKNOWLEDGMENTS ................................................................................................ iv

TABLE OF CONTENTS................................................................................................... v

LIST OF FIGURES ......................................................................................................... vii

LIST OF TABLES .......................................................................................................... viii

LIST OF ABBREVIATIONS AND SYMBOLS ............................................................. ix

CHAPTER 1 INTRODUCTION ..................................................................................... 1

1.1. Background ................................................................................................................. 1

1.2. Problem ....................................................................................................................... 4

1.3. Objective ..................................................................................................................... 5

1.4. Scope and Limitation .................................................................................................. 5

CHAPTER 2 LITERATURE REVIEW .......................................................................... 6

CHAPTER 3 METHODOLOGY .................................................................................... 9

3.1. Research process ......................................................................................................... 9

3.2. K-means algorithm ................................................................................................... 11

3.3. Solution development ............................................................................................... 14

3.4. Mathematical model ................................................................................................. 18

3.4.1. Notations ................................................................................................................ 19

3.4.2. Objective function .................................................................................................. 19

3.4.3. Constraints ............................................................................................................. 19

3.5. Data collection .......................................................................................................... 21

3.5.1. Parcels Data ........................................................................................................... 21

3.5.2. Costs....................................................................................................................... 22

v
3.5.3. Volume................................................................................................................... 24

3.6. IBM ILOG CPLEX ................................................................................................... 24

CHAPTER 4 RESULTS ................................................................................................ 26

4.1. Center of Gravity Results.......................................................................................... 26

4.2. K-Means Results ....................................................................................................... 26

4.2.1. Results for 4 Clusters ............................................................................................. 27

4.2.2. Results for 6 Clusters ............................................................................................. 29

4.3. CPLEX Results ......................................................................................................... 31

CHAPTER 5 CONCLUSION ........................................................................................ 32

REFERENCES ................................................................................................................ 33

APPENDIX A ................................................................................................................... A

Source Code For K-Means Algorithm Using Excel VBA ................................................ A

APPENDIX B ................................................................................................................... G

Source Code for Model Using CPLEX............................................................................. G

vi
LIST OF FIGURES

Figure 1.1 Typical Stages Of Shipping In A Hub-And-Spoke Network. ............................. 3

Figure 1.2 Single Allocation (Left) And Extensions To A Pure Hub-And-Spoke Layout:

Hub-To-Hub Trunking (Right) In A Hub-And-Spoke Network........................................... 4

Figure 3.1 Location (Address) Of Customers Based On Latitudes/Longitudes. ................ 16

Figure 3.2 Cog Of Each District In HCMC. ....................................................................... 18

Figure 3.3 Geocoded Recipients’ Address Into Coordinates. ............................................. 22

Figure 4.1 Results Of Potential Locations – 4 Clusters. ..................................................... 28

Figure 4.2 Results Of Potential Locations – 6 Clusters. ..................................................... 30

vii
LIST OF TABLES

Table 1.1 Coverage areas of each depot in Ho Chi Minh City. ............................................ 5

Table 3.1 An extract from data collected from more than 18,000 transactions/month. ..... 21

Table 3.2 DHL Hub & Spoke Operations Model – High-level Operations Cost Factors. . 23

Table 3.3 Parcels volume acquired in 12 months. .............................................................. 24

Table 4.1 Results of calculating starting point for each cluster using Center of Gravity. .. 26

Table 4.2 Results of performing K-means algorithm – 4 Clusters. .................................... 27

Table 4.3 Coordinates of Centroids generated by K-means algorithm – 4 Clusters........... 28

Table 4.4 Results of performing K-means algorithm – 6 Clusters. .................................... 29

Table 4.5 Coordinates of Centroids generated by K-means algorithm – 6 Clusters........... 30

Table 4.6 Comparison between 3 proposals. ...................................................................... 31

viii
LIST OF ABBREVIATIONS AND SYMBOLS

HCMC Ho Chi Minh City

DC Distribution Center

RCPMP Regionally Constrained P-median Problem

CPMP Capacitated P-median Problem

CCCP Capacitated Centered Clustering Problem

LP Linear Programming

CoG Center of Gravity

Lat. Latitude

Long. Longitude

Fixed outbound last-mile transportation cost from depot 𝑖 to customer

demand point j in period t, unit: VND/ parcel

Fixed cost is defined by the default size of each depot multiplied times

the renting cost/m2 in location 𝑖, unit: VND/m2

Variable outbound last-mile transportation cost, unit: VND/km

Distance from depot 𝑖 to customer demand point j, unit: km

Delivery capacity of depot 𝑖, unit: parcels

Demand of customer 𝑗 in period 𝑡, unit: parcels

Quantity of parcel x flow through depot 𝑖 to customer 𝑗 in period 𝑡,

unit: parcels

1 if depot 𝑖 is recommended by the model to open in period 𝑡,

0 otherwise

ix
CHAPTER 1 INTRODUCTION

1.1. Background
In order to meet the demand of the fast growing industry, the number and location of

facilities of a company is an important factor in its long-term strategy. Placing

distribution centers in optimal locations leads to efficient utilization of the resources of

these locations. By precisely forecasting the demand and optimizing distribution

network design, a company can optimize the utilization of their resources. By

evaluating these important choices with a simple framework, companies can position

themselves to make strategic decisions based on sound reasoning, allowing them to

defend their capital expenses to stakeholders.

One significant drawback when making network design decisions derives from the fact

that traditional clustering algorithms cannot reflect the real world conditions. When

designing a network in a relatively small radius, straight lines and curved distances

cannot describe the true geographical distance and therefore neglect the efficiency of

the network. When physical boundaries and features in urban areas such as traffic are

taken into account, an optimal distribution network is seemed to be inefficient. For

these reasons, it is critical to consider driving distance between facilities such as

distribution centers (hub) and demand clusters.

An important consideration in selecting the location of these facilities is the coverage

of the demand areas. Covering models have been proven to be very useful in solving

facility location problems. A demand point is treated as covered only if a facility can

be available to provide the certain service to the demand point within a required

distance or time from a facility.

1
A hub-and-spoke logistics network consists of hubs performing transshipment

operations (i.e., reassembling and redirecting compound shipments of smaller

consignment units), and spokes or depots linking end customers with the hubs.

Typically, shipping processes in a hub-and-spoke network follow this sequence:

- Stage 1: Depots pick up shipments at their customers. This is usually carried out in

the form of pickup tours, serving several customers within the same round-trip. While

the number and routing of pick-up tours is left to the depot (and may be subject to

operational decisions as well as route optimization), by the end of this stage, all

shipments of a given time period must be present at the depot where they are readied

for transport to the hub.

- Stage 2: This stage is, in fact, a complex procedure containing the following sub-

stages:

+ Shipping the parcels from the depots of origin to the hub;

+ Re-arrangement of goods from several depots to assemble new shipments

that contain items with the same depot of destination;

+ Shipping the parcels from the hub to the depots of destination. Note that

usually, the same vehicles perform both depot→hub and hub→depot transport

in subsequent steps. This implies that a balance of inbound and outbound traffic

is needed to minimize deadheading traffic. Delivery time constraints, however,

impose limits on simple load balancing by withholding a certain number of

consignments. Also, imbalance of longer duration or bursts of critical volume

are certain to surpass the dimensions where temporary withholding would be a

feasible option. Another operational issue resulting from this arrangement is the

need for sufficient shipping capacity to perform this stage: a sufficient number

of vehicles must be present at the hub by a certain time limit to carry out the

2
hub→depot step. If needed, additional vehicles must be called in (typically

from the destination depot) to handle the volume scheduled for the given

period.

- Stage 3: Once the shipped parcels are at the disposal of the destination depot,

delivery to the destinations is performed. Again, it may be left to the given depot how

this is carried out—a delivery tour may also be combined with a pickup tour to

improve vehicle utilization or reduce time lags.

Collection Depot 1 Hub 1 (Hub 2) Depot 2 Delivery

First-mile Line-haul Last-mile

Figure 1.1 Typical stages of shipping in a hub-and-spoke network.

The typical stages of shipping in a hub-and-spoke network: collection by Depot 1,

shipping from Depot 1 to Hub 1, sorting at Hub 1, shipping from Hub 1 to Hub 2 if

needed, shipping from Hub 2 to Depot 2, and delivery to destinations in the delivery

area of Depot 2.

Hub-and-spoke networks can be classified by the arrangement and number of hubs and

spokes, as well as their connectivity. In a “pure” hub-and-spoke case, the sending and

receiving depots may be assigned to a single hub (referred to as the single allocation

case), or they may serve several depots (multiple allocation). In practice, the

distribution network of DHL is an extension to a pure hub-and-spoke network which

means one hub is maintained on each area in the North and South of Vietnam - Hanoi

and Ho Chi Minh City, the depots in each area are assigned to the respective hub. For

3
example, Depot Ba Dinh in Hanoi can only be connected to Hanoi Hub yet they cannot

ship parcels directly to Sai Gon Hub.

a) b)

Figure 1.2 Single allocation (left) and Extensions to a pure hub-and-spoke layout:
hub-to-hub trunking (right) in a hub-and-spoke network.

1.2. Problem
The distribution network in HCMC currently consists of the central hub and only 2

depots, namely Binh Tan and Binh Thanh. The 2 depots have to cover the demand of

total 19 districts in HCMC which certainly lead to the fact that DHL takes more time

to collect/deliver parcels from/to a specific destination therefore extends total lead time

and cost more on transportation. DHL needs to know if a more efficient and effective

hub-and-spoke network is feasible in Ho Chi Minh City. Efficiency is defined as the

ability of a network to meet requirements in a timely manner. In fact, how long it takes

for the network to meet demand. Effectiveness concerns the ability of the network to

deliver requirements to the necessary locations. The current network may not be the

most efficient for meeting the demands of recent and future plan. Cost and time values

are used to compare the efficiency and effectiveness of the current versus alternative

networks. This research provides DHL with an analysis of the current and potential

4
depot locations and how efficiently and effectively these locations meet the demand

placed on the system in an effort to find an optimal network.

Table 1.1 Coverage areas of each depot in Ho Chi Minh City.

Depot District
Quận 1
Quận 2
Quận 3
Quận 4
Depot Bình Thạnh
Quận 9
Phú Nhuận
Bình Thạnh
Thủ Đức
Quận 5
Quận 6
Quận 7
Quận 8
Quận 10
Depot Bình Tân Quận 11
Quận 12
Tân Bình
Tân Phú
Gò Vấp
Bình Tân

1.3. Objective
Following the problem, the aim of this research is to:

- To determine the optimal number of depots and the location of each depot. The

optimal depot locations are those minimize the total delivery time of parcels to a

specific destination location, the driving distances related costs and facilities costs.

1.4. Scope and Limitation


Focus only in HCMC distribution network.

This thesis mainly concentrates on numerical investigations and simulations.

Experimental study is a must in the next stage of this study.

5
CHAPTER 2 LITERATURE REVIEW

There are many literatures in the area of facility location problem. A simple facility

location problem is in which a single facility is to be located, with the only

optimization criterion being the minimization of the weighted sum of distances from a

given set of point locations. More complicated problems include the placement of

multiple facilities, constraints on the locations of facilities, and more complex

optimization criteria. In a basic formulation, the facility location problem consists of a

set of potential facility sites P where a facility can be opened, and a set of demand

points D that must be serviced. The goal is to pick a subset F of facilities to open, to

minimize the sum of distances from each demand point to its closest facility, plus the

sum of opening costs of the facilities. A number of approximation algorithms have

been developed for the facility location problem and many of its variants. In the past,

many facility location decisions involving distance objective functions on Spherical

Surface have been approached using algorithmic, meta-heuristic algorithms, branch-

and-bound algorithm, heuristic techniques, approximation algorithm and simulation.

The classical location-allocation problem is the basis of many of the location models

that have been built upon throughout the supply chain design literatures. The location-

allocation problem has been defined as follows in literature: a group of customer

locations with historical needs and a group of potential DC locations are proposed.

When a DC is allocated at one of the potential areas, a known fixed cost is earned.

There is also a known unit delivery cost between each potential DC and each customer

location. The locations of the DC and the shipment pattern between the DC and the

customers to achieve the desired objective.

6
Location-allocation problems with capacity constraints have many variants across

different application contexts, including the regionally constrained p-median problem

(RCPMP) (Murray and Gerrard 1997), the capacitated p-median problem (CPMP), the

capacitated centered clustering problem (CCCP) (Negreiros and Palhano 2006), the

capacitated single allocation hub location problem (Ernst and Krishnamoorthy 1999),

the single source capacitated plant location problem (Díaz and Fernández 2002),

among many others. They are different from each other in terms of different

constraints on demand assignments and/or facility locations. For example, CPMP

requires that facility locations be a subset of the demand point locations (Díaz and

Fernández 2005). RCPMP requires some facilities to be placed in given regions

(Murray and Gerrard 1997).

Capacitated location-allocation problems can be structured as a Linear Programming

(LP) problem with a linear combination of solution variables (Vinod 1969, ReVelle

and Swain 1970). However, capacitated location-allocation problems are NP-complete

(Garey and Johnson 1979). The time cost of a deterministic approach will increase

exponentially and make it impractical to process large location-allocation problems.

Therefore, substantial research work has been carried out to develop heuristics to

obtain good approximations of the optimal solution (França et al. 1999, Wu et al.

2006). Heuristics can be integrated with LP models or applied as standalone methods,

such as branch-and-bound (Marín and Pelegrín 1997), simulated annealing (SA) (Ernst

and Krishnamoorthy 1999), adaptive tabu search (França et al.1999), set partitioning

(Baldacci et al. 2002), and scatter search (Díaz and Fernández 2005, Scheuerer and

Wendolsky 2006).

The research reported here uses a clustering strategy (which is a special heuristic

approach) to formulate solutions for location-allocation problems with capacity

7
constraints. Clustering analysis is one of the most commonly used approaches in data

analysis and has been applied in many application domains such as pattern discovery,

document retrieval, image segmentation, among many others. Clustering methods do

not depend on prior knowledge and can discover natural groupings of data items (Jain

and Dubes 1988, Jain et al. 1999, Han et al. 2001; Guo et al. 2003). Therefore, when

used in a location-allocation context, clustering methods have the capability to adapt to

the distribution of demands and thus facilitate the search for an approximate optimal

solution. Mulvey and Beck (1984) presented a clustering-based heuristic

(CAPCLUST), whose performance is very close to that of an LP-based approach. The

implicit connection between this clustering concept and location-allocation problems is

also suggested in an earlier work (Vinod 1969).

Both the CAPCLUST method and our proposed method are adapted from the K-means

clustering algorithm. K-means is a distance-based partitioning clustering approach that

partitions a set of data items into clusters while ensuring a low internal dissimilarity or

distance. It assumes that the number of clusters (k) is known. The K-means algorithm

consists of three steps (Jain et al. 1999): (1) randomly choosing k cluster centers within

the data space; (2) assigning each data item to the closest cluster center; and (3)

recalculating the cluster centers using the points assigned to each cluster. Steps 2 and 3

are then repeated until the result converges. To adapt it to a location-allocation

problem, Step 2 can be used to allocate demands and Step 3 can be used to optimize

facility locations.

8
CHAPTER 3 METHODOLOGY

3.1. Research process


The research model shows the way to study steps by steps are given detailed:

Define problems

Mathematical model

Data collection
NO

Validation

YES

Model computational
process

Recommendation and
conclusion

Implementation

9
Define problem: Study the actual company’s situation, operation process and define

the problem. Focusing on the limitation of production affects and interruptions to the

process. Trying to identify the goals and constraints of the problems.

Mathematical model: Develop and modify the mathematical model in details

(parameters, constraints, scale, etc.) in order to appropriate the company’s situation

and objectives of the study (minimizes the total cost as transportation cost, facility cost

and setup cost)

Data collection: Based on the objective of the study, collecting data on volume, the

coordinates of destination locations, distance, transportation cost, facility cost, delivery

capacity of facility, the demand, etc.

Validation: After applying the data to the mathematical model, the validation should be

carried out to make sure the accuracy of the mathematical model and the support data

is consistent.

Model computational process: Base on the demand and all of the collected data to take

experiment on software to solve the mathematical model.

Conclusion and recommendation: report the result, concluding how effective the

method is in solving the problem and provide recommendation to improve the system.

Implementation: This action is depending on the decision maker to choose the set of

target value that satisfied the company expectation and policy.

10
3.2. K-means algorithm
Data clustering, or cluster analysis, is the process of grouping data items so that similar

items belong to the same group/cluster. Clustering methods are used to identify groups

of similar objects in a multivariate data sets collected from fields such as marketing,

bio-medical and geo-spatial. They are different types of clustering methods, including:

- Partitioning methods

- Hierarchical clustering

- Fuzzy clustering

- Density-based clustering

- Model-based clustering

One of the simplest and most popular clustering algorithms is called ‘k-means

clustering’, which would split the data into a set of clusters (groups) based on the

distances between each data point and the center location of each cluster.

K-means has a wide range of application such as computational biology, business and

marketing, search engine or social science. As for DHL location planning problem, the

K-means clustering can also be applied because there are similarities between

company depots location problem and data clustering problem with K-means.

The first step when using k-means clustering is to indicate the number of clusters (k)

that will be generated in the final solution.

The algorithm starts by randomly selecting k objects from the data set to serve as the

initial centers for the clusters. The selected objects are also known as cluster means or

centroids.

11
Next, each of the remaining objects is assigned to its closest centroid, where closest is

defined using the Euclidean distance between the object and the cluster mean. This

step is called “cluster assignment step”.

After the assignment step, the algorithm computes the new mean value of each cluster.

The term cluster “centroid update” is used to design this step. Now that the centers

have been recalculated, every observation is checked again to see if it might be closer

to a different cluster. All the objects are reassigned again using the updated cluster

means.

The cluster assignment and centroid update steps are iteratively repeated until the

cluster assignments stop changing (i.e. until convergence is achieved). That is, the

clusters formed in the current iteration are the same as those obtained in the previous

iteration.

K-means algorithm can be summarized as follow:

1. Specify the number of clusters (K) to be created.

2. Select randomly k objects from the dataset as the initial cluster centers or

means.

3. Assigns each observation to their closest centroid, based on the Euclidean

distance between the object and the centroid.

4. For each of the k clusters update the cluster centroid by calculating the new

mean values of all the data points in the cluster. The centroid of a Kth cluster is

a vector of length p containing the means of all variables for the observations in

the kth cluster; p is the number of variables.

5. Iteratively minimize the total within sum of square. That is, iterate steps 3

and 4 until the cluster assignments stop changing or the maximum number of

iterations is reached.

12
K-means is usually run many times, starting with different random centroids each time.

The results can be compared by examining the clusters or by a numeric measure such

as the clusters’ distortion, which is the sum of the squared differences between each

data point and its corresponding centroid. In cluster distortion case, the clustering with

lowest distortion value can be chosen as the best clustering.

K-means clustering method purposes to search the positions of the clusters which

minimize the distance from the data points to the cluster and the goal of company’s

depot planning is to find the location for the depots that minimize the distance from the

customers to their facility.

K-means cluster analysis given a set of data (x1, x2,…, xn) where each data is a d-

dimensional real vector, k-means clustering aims to separate the n data into k clusters

(k ≤ n) S= {S1, S2, …, Sk} within the cluster sum of squares is minimized.

 Advantages and Disadvantages of using K-means clustering

Advantages

- With a large number of variables, K-means may be calculated faster than hierarchical

clustering (if K is small).

- K-means may generate clusters more tightly than hierarchical clustering, particularly

if the clusters are globular.

- An instance can change cluster (move to another cluster) when the centroids are

recomputed.

Disadvantages

- Difficult to compare the quality of the cluster generated. (E.g. for the different initial

partitions of values of K affect outcome).

- Fixed number of clusters can make it difficult to calculate what K should be.

- Does not run well with non-globular clusters (non-circular cluster shape).

13
- Different early partitions can result in different final clusters. It is useful to rerun the

program using the same as well as different K values, to compare the final result.

3.3. Solution development


The algorithm of this paper is to develop a method to determine the optimal facility

location to minimize the sum of facilities cost and the sum of the volume of goods at a

destination multiplied by the transportation rate to ship to the destination multiplied by

the Google Maps driving distance based on the following assumptions:

- The good of every destination points can be transported in one time.

- The one destination point is only served by one warehouse.

- The cost is related the length from the warehouse to the destination point, the

transport conditions are not considered. Transportation cost is related to the distance

only. The transportation cost equals the distances traveled times a fixed price per unit,

distance.

- The warehouse locations are located at populated places.

- All service facilities are identical.

- Each destination point wishes to minimize the cost of acquiring the product.

- The company treats each cluster independently.

14
Generate driving distance matrix from the set
of destination locations latitudes/longitudes

Perform K means clustering based on the


destination locations driving distance matrix
to generate K clusters

Calculate starting point of facility location


using center of gravity for each cluster

Use heuristic method to search the optimal


facility locations which minimize the sum of
driving distances cost from current location point
to each destination location in each cluster

No Is the facility
location optimal?

15
- Step 1: Generate driving distance matrix from the set of destination locations’

latitudes/longitudes by Google Maps.

Figure 3.1 Location (address) of customers based on latitudes/longitudes.

- Step 2: Perform K-means clustering based on the destination locations driving

distance matrix to generate K clusters.

16
Start

Input number of clusters

Calculate centroid

Calculate distance

Group based on
minimum distance

- Step 3: Calculate starting point of facility location for each cluster using Center of

Gravity method and sets as current facility location. The Center of Gravity method

assumes that the cost is directly proportional to distance and volume shipped, inbound

and outbound transportation costs are equal, and it does not include special shipping

costs for less than full loads. Using latitude and longitude coordinates might be helpful

to calculate the initial facility location centers for each cluster. The following formula

is used to perform spherical coordinate conversion from latitude/longitudes to

Cartesian coordinates for each destination location.

Center of Gravity

• •

17
Figure 3.2 CoG of each district in HCMC.

- Step 4: Calculate the driving distances from the current point calculated in Step 3 to

each destination location of each cluster using the Google Maps in order to have a data

set as an input for the mathematical model.

- Step 5: Search the optimal facility locations. All distances are calculated using the

Google Maps driving distances. Let the starting point be the current point calculated in

Step 3. Use the maximal Google Maps driving distance calculated from Step 4 as

radius of current point.

3.4. Mathematical model


This part presents the mathematical model with constrains in details and explanations.

Firstly, the sets, indices, input parameters and decision variables used throughout this

research are defined. Then, the objective functions and the constraints for the model is

specified (minimize total cost which is a function of facility set up cost and

transportation cost).

18
3.4.1. Notations

3.4.1.1. Sets

I set of depot facilities 𝑖 ∈ 𝐼

T set of forecasting period 𝑡∈𝑇

J set of demand points 𝑗∈𝐽

3.4.1.2. Input parameters

: Fixed outbound last-mile transportation cost from depot 𝑖 to customer demand

point j in period t, unit: VND/ parcel

: Fixed cost is defined by the default size of each depot multiplied times the renting

cost/ m2 in location 𝑖, unit: VND/m2

: Variable outbound last-mile transportation cost, unit: VND/km

: Distance from depot 𝑖 to customer demand point j, unit: km

: Delivery capacity of depot 𝑖, unit: parcels

: Demand of customer 𝑗 in period 𝑡, unit: parcels

3.4.1.3. Decision variables

: Quantity of parcel x flow through depot 𝑖 to customer 𝑗 in period 𝑡, unit: parcels

: 1 if depot 𝑖 is recommended by the model to open in period 𝑡, 0 otherwise

3.4.2. Objective function


The objective function is to minimize Total cost which is a function of fixed facility set

up cost with the fixed and variable transportation cost.

Z= (1)

3.4.3. Constraints

3.4.3.1. Demand constraint


(2)

19
This ensures the quantity of parcel x flow through depot 𝑖 to customer 𝑗 is equal to the

demand of customer 𝑗 in period t.

3.4.3.2. Linking constraint


(3)

This ensures x are the associated continuous or integer variables, y a binary variable

and M a large enough coefficient. The M must be large enough so as to let the model

choose appropriate values for the x variables if y is set to 1.

3.4.3.3. Delivery capacity constraint for each depot i


(4)

This ensures the delivery capacity of depot 𝑖 must larger than the quantity of parcel x

flow through depot 𝑖 to customer 𝑗.

3.4.3.4. Binary constraint

(5)

3.4.3.5. Non-negativity constraint


(6)

3.4.3.6. Signaling constraint


(7)

This ensures once a depot is opened it will not be closed.

20
3.5. Data collection
3.5.1. Parcels Data
The data required was information on all of the customer address and the products

delivered to them along with their weight and important timestamps.

Table 3.1 An extract from data collected from more than 18,000 transactions/month.

RECIPIENT ADDRESS DISTRICT PROVINCE WEIGHT DIMWEIGHT CODAMOUNT ENCODING_DAT


Ngô phương trang 28/31 lương văn can Quận 8 Hồ Chí Minh 9940 9418.4036 02/01/2018 1
kim hồng 19/1/13 cô bắc,phường 1,phú
Quậnnhuận,tp.hcm
Phú Nhuận Hồ Chí Minh 13650 13530 240000 02/01/2018 1
Trần anh tần 151A quốc lộ 13 Quận Thủ Đức Hồ Chí Minh 730 919.338 73000 02/01/2018 1
Vy Nguyễn 52/4A Huỳnh Văn Bánh Quận Phú Nhuận Hồ Chí Minh 5740 11677.2096 02/01/2018 1
nguyễn thành trung 143/12 Lê Thị Riêng Quận 1 Hồ Chí Minh 6460 5640.96 210000 02/01/2018 1
Tiểu Long 123 Liên khu 4-5 Quận Bình Tân Hồ Chí Minh 310 1081.47 398650 02/01/2018 1
Huy nguyen 220 nguyễn trãi quận 1 Quận 1 Hồ Chí Minh 15870 15835.512 297600 02/01/2018 1
Hoangthilananh 38nguyen gian thanh ,f15,q10
Quận 10 Hồ Chí Minh 4300 12166.08 389000 02/01/2018 1
Trần Huynh 9/10/7c đường Đặng Văn BiQuận Thủ Đức Hồ Chí Minh 740 590.9268 124900 02/01/2018 1
Trương Huỳnh như 65/2/1/12a đường 20 Quận Thủ Đức Hồ Chí Minh 250 482.9388 74000 02/01/2018 1
Lê văn ba 6B phạm hùng binh hưng binh
Huyệnchanh
Bình Chánh Hồ Chí Minh 6130 5035.536 277000 02/01/2018 1
Triet 1369 Phan Văn Trị Quận Gò Vấp Hồ Chí Minh 3330 5159.336 730000 02/01/2018 1
Phương Nguyễn 1.16 chung cư Ruby garden,
Quận
Số 2A
TânNguyễn
Bình Sỹ Sách
Hồ Chí Minh 6940 13102.452 279000 02/01/2018 1
Nguyễn Thị Mai Trang 360A Bến Vân Đồn Quận 4 Hồ Chí Minh 6180 4988.9448 277000 02/01/2018 1
Phạm trần phương vy 740 phạm văn chiêu Quận Gò Vấp Hồ Chí Minh 7820 16350.636 1359000 02/01/2018 1
Chienthang 78/21a3 tan hoa dong Quận 6 Hồ Chí Minh 3610 9047.2404 209000 02/01/2018 1
Đỗ Quang Thịnh Trường đại học an ninh, km18
QuậnxaThủ
lộ hà
Đứcnội, phường
Hồ Chí
linh
Minh
trung, quận
1760
thủ đức, 4709.34
thành phố hồ chí
899000
minh 02/01/2018 1
nguyễn hữu trọng 202a Hoàng Văn Thụ Quận Phú Nhuận Hồ Chí Minh 15660 15486.2124 02/01/2018 1
Đỗ Lê Duy 65-67 Gò Cẩm Đệm Quận Tân Bình Hồ Chí Minh 5650 7931.52 1919000 02/01/2018 1
Le Thanh Hien 30/40 Nguyễn Đình Chi Quận 6 Hồ Chí Minh 15930 15934.404 320000 02/01/2018 1
Vũ xuân anh tuấn 649/87 điện biên phủ Quận Bình Thạnh Hồ Chí Minh 9640 6135.2652 199000 02/01/2018 1
văng mỹ linh Tòa nhà H2, 196 Hoàng Diệu,
Quận
P.84, Q4 Hồ Chí Minh 14680 15384.963 301320 02/01/2018 1
Nguyễn Ngọc Lệ Chi số 17 đường Lê Duẩn (Central
QuậnPlaza)
1 Hồ Chí Minh 13680 13021.9056 240000 02/01/2018 1

Each of the recipients’ locations was then geocoded into coordinates including latitude

and longitude. Such process was carried out by using Google Sheets add-in called

ezGeocode.

21
Figure 3.3 Geocoded recipients’ address into coordinates (latitude, longitude).

3.5.2. Costs
Operations costs of a logistics provider company comprise of 5 main factors:

- Pick-up cost (first-mile) is the costs for couriers and vehicles like trucks or

motorbikes to come to each location of the merchants or pick-up points (drop-off

points) to collect all the parcels back to a depot or a hub.

- Cost at hub/depot is the costs for all the process of handling, encoding and sorting

before moving the parcels to next-step hubs or depots.

At this stage, there should be 2 scenarios:

+ If a parcel is transported cross-region (the region of North, Central or South),

there would be costs for the transportation of the parcel from the first hub to the second

hub by trucks (ground) or airfreight (air).

22
For example, if a parcel is sent to Ba Dinh District - Hanoi from HCMC, it will be

transported from Sai Gon Hub to Hanoi Hub for further process.

+ If a parcel is transported intra-region (the region of North, Central or South),

then it would be costs for the transportation of the parcel from the hub/depot to

respective shuttle depots for the minimization of the delivery distance and therefor

minimize the last-mile costs.

For example, if a parcel is sent to Vung Tau from HCMC, it will be transported from

Sai Gon Hub to Vung Tau Depot for further process.

- Last-mile cost is the one for couriers and vehicles like trucks or motorbikes to come

to each location of the customers (consumers) or drop-off points to deliver the parcels.

Table 3.2 DHL Hub & Spoke Operations Model – High-level Operations Cost Factors.

Intra-region Cross-region
Cost factor Semi Semi
Metro Remote Metro Remote
Urban Urban
Pick-up $ $ $ $ $ $
Hub/Depot $ $ $ $ $ $
Line haul
Air $ $ $
Ground $ $ $
Shuttle $ $$ $$ $ $$ $$$
Last-mile
Re-shuttle $ $$ $ $$
Courier $ $$ $$$ $ $$ $$$
*Note: In the same cost factor, the more $ marks means the cost is higher in compare with
other regions.

23
3.5.3. Volume
Table 3.3 Parcels volume acquired in 12 months.

07/2017 08/2017 09/2017 10/2017 11/2017 12/2017 01/2018 02/2018 03/2018 04/2018 05/2018 06/2018
Quận 1 1386 1272 1414 1376 1381 1372 1389 1399 1291 1255 1296 1408
Quận 2 775 798 804 805 709 772 773 795 785 735 804 764
Quận 3 775 798 804 805 709 772 773 795 785 735 804 764
Quận 4 501 437 367 412 377 463 455 363 437 398 518 512
Quận 5 542 593 559 596 520 610 586 521 550 513 540 590
Quận 6 541 503 524 510 541 508 497 532 480 538 511 550
Quận 7 1001 1138 1087 978 1156 1148 1104 984 1158 967 1031 1063
Quận 8 715 737 754 737 742 712 702 763 712 746 708 760
Quận 9 648 679 635 708 635 649 669 670 662 647 620 664
Quận 10 782 700 730 733 791 696 724 772 751 785 765 754
Quận 11 650 778 762 763 639 721 622 750 738 780 772 776
Quận 12 1088 746 952 731 931 817 773 1098 932 826 774 851
Phú Nhuận 790 727 715 714 728 755 782 742 790 728 785 717
Tân Bình 1772 1820 1842 1844 1457 1840 1490 1883 1578 1699 1549 1577
Tân Phú 1401 1326 1448 1350 1310 1428 1342 1426 1312 1399 1346 1449
Gò Vấp 1576 1321 1319 1427 1600 1330 1339 1579 1329 1589 1394 1542
Bình Thạnh 1544 1441 1519 1430 1473 1431 1518 1442 1522 1587 1441 1599
Bình Tân 1024 1385 1091 1032 1246 1386 1029 1379 1082 1142 1311 1334
Thủ Đức 1455 1514 1422 1472 1467 1562 1508 1456 1581 1457 1409 1550
*Unit: parcels

3.6. IBM ILOG CPLEX


IBM ILOG CPLEX Optimizer's mathematical programming technology enables

decision optimization for improving efficiency, reducing costs, and increasing

profitability. It helps businesses to make accurate and logical decisions.

IBM ILOG CPLEX Optimizer provides complex, high-performance mathematical

programming solvers for linear programming, mixed integer programming, quadratic

programming, and quadratically constrained programming problems. These include a

distributed parallel algorithm for mixed integer programming to leverage multiple

computers to solve difficult problems. It has solved problems with millions of

constraints and variables. CPLEX has the possibility to translate the mathematical

model of the problem into the standard mathematical formulation by the special model

language. CPLEX collects data from Excel and giving the answer through Excel,

makes the user becomes easy to follow and understand how they solve the problem. It

24
can solve large and reality optimization problems with the promptly speed which

satisfy the reality’s interactive in optimization.

25
CHAPTER 4 RESULTS

4.1. Center of Gravity Results


Table 4.1 Results of calculating starting point for each cluster using Center of Gravity.

X (Lat.) Y (Long.) X (Lat.) Y (Long.)


Quận 1 10.776720 106.697331 Quận 11 10.765020 106.650189
Quận 2 10.793508 106.747026 Quận 12 10.857451 106.640107
Quận 3 10.767929 106.684908 Phú Nhuận 10.798941 106.679540
Quận 4 10.759512 106.703046 Tân Bình 10.806420 106.650581
Quận 5 10.755364 106.669738 Tân Phú 10.784238 106.648603
Quận 6 10.746871 106.635372 Gò Vấp 10.834546 106.669932
Quận 7 10.737153 106.719134 Bình Thạnh 10.805133 106.706301
Quận 8 10.737499 106.659784 Bình Tân 10.766005 106.605664
Quận 9 10.832679 106.793046 Thủ Đức 10.853974 106.748966
Quận 10 10.771274 106.668515

Using Excel VBA tool based on the formulas of Center of Gravity, 19 CoGs were

calculated and resulted in 19 coordinates. These coordinates were then used to

calculate the driving distances from current location point to each destination location

in each cluster.

4.2. K-Means Results


The process was carried out by using Excel VBA tool with the source code given and

shared on an open forum. The clustering based on the destination locations driving

distance matrix to generate K clusters. After that, the results could be plotted out and

point out the centroids. These centroids were chosen to be potential locations for

further calculation.

26
4.2.1. Results for 4 Clusters
Table 4.2 Results of performing K-means algorithm – 4 Clusters.
Input Data Result of K-Means
X (Lat.) Y (Long.) X (Lat.) Y (Long.) Centroid
Quận 1 10.776720 106.697331 Quận 12 10.857451 106.640107 1
Quận 2 10.793508 106.747026 Phú Nhuận 10.798941 106.679540 1
Quận 3 10.767929 106.684908 Tân Bình 10.806420 106.650581 1
Quận 4 10.759512 106.703046 Gò Vấp 10.834546 106.669932 1
Quận 5 10.755364 106.669738 Bình Thạnh 10.805133 106.706301 1
Quận 6 10.746871 106.635372 Quận 2 10.793508 106.747026 2
Quận 7 10.737153 106.719134 Quận 9 10.832679 106.793046 2
Quận 8 10.737499 106.659784 Thủ Đức 10.853974 106.748966 2
Quận 9 10.832679 106.793046 Quận 5 10.755364 106.669738 3
Quận 10 10.771274 106.668515 Quận 6 10.746871 106.635372 3
Quận 11 10.765020 106.650189 Quận 8 10.737499 106.659784 3
Quận 12 10.857451 106.640107 Quận 10 10.771274 106.668515 3
Phú Nhuận 10.798941 106.679540 Quận 11 10.765020 106.650189 3
Tân Bình 10.806420 106.650581 Tân Phú 10.784238 106.648603 3
Tân Phú 10.784238 106.648603 Bình Tân 10.766005 106.605664 3
Gò Vấp 10.834546 106.669932 Quận 1 10.776720 106.697331 4
Bình Thạnh 10.805133 106.706301 Quận 3 10.767929 106.684908 4
Bình Tân 10.766005 106.605664 Quận 4 10.759512 106.703046 4
Thủ Đức 10.853974 106.748966 Quận 7 10.737153 106.719134 4

27
Centroid 4

Figure 4.1 Results of potential locations – 4 Clusters.

Table 4.3 Coordinates of Centroids generated by K-means algorithm – 4 Clusters.

X (Lat.) Y (Long.) Location


Centroid 1 10.820498 106.669292 Tân Bình
Centroid 2 10.826721 106.763013 Quận 9
Centroid 3 10.760896 106.648266 Quận 11
Centroid 4 10.760329 106.701105 Quận 4

28
4.2.2. Results for 6 Clusters
Table 4.4 Results of performing K-means algorithm – 6 Clusters.
Input Data Result of K-Means
X (Lat.) Y (Long.) X (Lat.) Y (Long.) Centroid
Quận 1 10.776720 106.697331 Quận 12 10.85745 106.6401 1
Quận 2 10.793508 106.747026 Tân Bình 10.80642 106.6506 1
Quận 3 10.767929 106.684908 Gò Vấp 10.83455 106.6699 1
Quận 4 10.759512 106.703046 Quận 2 10.79351 106.747 2
Quận 5 10.755364 106.669738 Quận 9 10.83268 106.793 2
Quận 6 10.746871 106.635372 Thủ Đức 10.85397 106.749 2
Quận 7 10.737153 106.719134 Quận 1 10.77672 106.6973 3
Quận 8 10.737499 106.659784 Quận 3 10.76793 106.6849 3
Quận 9 10.832679 106.793046 Quận 10 10.77127 106.6685 3
Quận 10 10.771274 106.668515 Phú Nhuận 10.79894 106.6795 3
Quận 11 10.765020 106.650189 Tân Phú 10.78424 106.6486 3
Quận 12 10.857451 106.640107 Bình Thạnh 10.80513 106.7063 3
Phú Nhuận 10.798941 106.679540 Quận 4 10.75951 106.703 4
Tân Bình 10.806420 106.650581 Quận 7 10.73715 106.7191 4
Tân Phú 10.784238 106.648603 Quận 5 10.75536 106.6697 5
Gò Vấp 10.834546 106.669932 Quận 8 10.7375 106.6598 5
Bình Thạnh 10.805133 106.706301 Quận 11 10.76502 106.6502 5
Bình Tân 10.766005 106.605664 Quận 6 10.74687 106.6354 6
Thủ Đức 10.853974 106.748966 Bình Tân 10.76601 106.6057 6

29
Centroid 4

Centroid 5

Centroid 6

Figure 4.2 Results of potential locations – 6 Clusters.

Table 4.5 Coordinates of Centroids generated by K-means algorithm – 6 Clusters.

X (Lat.) Y (Long.) Location


Centroid 1 10.832806 106.653540 Gò Vấp
Centroid 2 10.826721 106.763013 Quận 9
Centroid 3 10.784039 106.680866 Quận 3
Centroid 4 10.748333 106.711090 Quận 7
Centroid 5 10.752628 106.659903 Quận 5
Centroid 6 10.756438 106.620518 Bình Tân

30
4.3. CPLEX Results
Table 4.6 Comparison between 3 proposals.

Number of Depots Total Cost


Saved Cost % Saved Cost
(Number of Clusters) (VND)
2 193,489,183,245.464 - -
4 130,854,256,129.729 62,634,927,115.74 32.37
6 93,151,799,428.776 100,337,383,816.69 51.86

After running mathematical model using CPLEX, the results were demonstrated in the

table above. In the first option, the total cost for the distribution network with 2

clusters (k=2) and 2 corresponding depots (which is the current network of DHL) is

193,489,183,245.464 (billion VND per year). This total cost comprises of

transportation cost and fixed cost for setting up a facility. Next option is with 4 clusters

(k=4) and 4 corresponding depots, total cost is 130,854,256,129.729 (billion VND per

year), which saves 62,634,927,115.74 (billion VND) from the first option – about

32.37%. The last option is with 6 clusters (k=6) and 6 corresponding depots, total cost

is 93,151,799,428.776 (billion VND per year), which saves 100,337,383,816.69

(billion VND) from the first option – about 51.86%.

31
CHAPTER 5 CONCLUSION

Facility location decisions play an important role in the strategic planning and design

of logistics/supply chain network. Well-planned location decisions enable the efficient

flow of materials through the distribution system, and lead to decreased costs and

improved customer service. This paper has focused on the implementation of facility

location decisions based on driving distances.

Given the location of each destination in terms of their coordinates, the requirement at

each destination and shipping costs for the region of interest, the proposed

methodology in this paper is able to determine the optimal location of each facility and

helps companies assess the locations of facilities. On top of this, we could locate

optimal facility in perspective. In regard to transportation cost, the driving distance in

the presence of geographic barriers should be taken into consideration in facility

location decisions.

The research presented here transforms a special location-allocation problem into a

clustering problem. The proposed method is essentially a constrained K-means

clustering method that indirectly optimizes the location-allocation quality under the

individual and overall capacity restrictions. Since the allocation strategy adopted from

the K-means algorithm can ensure a near-optimal allocation result (in terms of the

objective function) when facility locations are fixed, this research focused more on

designing methods to obtain a high-quality configuration of facility locations.

32
REFERENCES

[1] Baldaci R., Maniezzo V. and Mingozzi A. , 2002. A new method for solving
capacitated location problems based on a set partitioning approach, s.l.: s.n.

[2] D., V. H., 1969. Integer programming and the theory of grouping, s.l.: s.n.

[3] Diaz J. A. and Fernandez E. , 2002. A Branch-and-bound algorithm for the single
source capacitated plant location problem, s.l.: s.n.

[4] Diaz J. A. and Fernandez E. , 2005. Hybrid scatter search and path relinking for
the capacitated p-median problem, s.l.: s.n.

[5] Ernst A. T. and Krishnarmoorthy M. , 1999. Solution algorithm for the capacitated
single allocation - Hub location problems, s.l.: s.n.

[6] Franca P. M., Sosa N. M. and Pureza V. , 1997. An addaptive tabu search
algorithm for the capacitated clustering problem, s.l.: s.n.

[7] Garey M. R. and Johnson D. S. , 1979. Computers and Intractability: A Guide to


the Theory of NP-Completeness, s.l.: s.n.

[8] Ke Liao and DianshengGuo, 2008. A Clustering-based approach to the


Capacitated facility location problem, s.l.: s.n.

[9] Marin A. and Pelegrin B. , 1997. A branch-and-bound algorithm for the


transportation problem with location of transshipment points, s.l.: s.n.

[10] Murray A. T. and Gerrard R. A. , 1997. Capacitated service and regional


constraints in location-allocation modeling, s.l.: s.n.

[11] Negreiros M. and Palhano A., 2006. The capacitated centred clustering problem,
s.l.: s.n.

[12] ReVelle C. S. and Swain R. W. , 1970. Central facilities location, s.l.: s.n.

33
APPENDIX A

Source Code For K-Means Algorithm Using Excel VBA

Private Type Records


Dimension() As Double
Distance() As Double
Cluster As Integer
End Type

Dim Table As Range


Dim Record() As Records
Dim Centroid() As Records

Sub Run()
'Run k-Means
If Not kMeansSelection Then
Call MsgBox("Error: " & Err.Description, vbExclamation, "kMeans Error")
End If
End Sub

Function kMeansSelection() As Boolean


'Get user table selection
On Error Resume Next
Set Table = Application.InputBox(Prompt:= _
"Please select the range to analyse.", _
title:="Specify Range", Type:=8)

If Table Is Nothing Then Exit Function 'Cancelled

'Check table dimensions


If Table.Rows.Count < 4 Or Table.columns.Count < 2 Then
Err.Raise Number:=vbObjectError + 1000, Source:="k-Means Cluster Analysis",
Description:="Table has insufficent rows or columns."
End If

'Get number of clusters


Dim numClusters As Integer
numClusters = Application.InputBox("Specify Number of Clusters", "k Means Cluster Analysis",
Type:=1)

If Not numClusters > 0 Or numClusters = False Then


Exit Function 'Cancelled
End If
If Err.Number = 0 Then
If kMeans(Table, numClusters) Then
outputClusters
End If
End If

kMeansSelection_Error:
kMeansSelection = (Err.Number = 0)
End Function

Function kMeans(Table As Range, Clusters As Integer) As Boolean


'Table - Range of data to group. Records (Rows) are grouped according to
attributes/dimensions(columns)

A
'Clusters - Number of clusters to reduce records into.

On Error Resume Next

'Script Performance Variables


Dim PassCounter As Integer

'Initialize Data Arrays


ReDim Record(2 To Table.Rows.Count)
Dim r As Integer 'record
Dim d As Integer 'dimension index
Dim d2 As Integer 'dimension index
Dim c As Integer 'centroid index
Dim c2 As Integer 'centroid index
Dim di As Integer 'distance

Dim x As Double 'Variable Distance Placeholder


Dim y As Double 'Variable Distance Placeholder

For r = LBound(Record) To UBound(Record)


'Initialize Dimension Value Arrays
ReDim Record(r).Dimension(2 To Table.columns.Count)
'Initialize Distance Arrays
ReDim Record(r).Distance(1 To Clusters)
For d = LBound(Record(r).Dimension) To UBound(Record(r).Dimension)
Record(r).Dimension(d) = Table.Rows(r).Cells(d).Value
Next d
Next r

'Initialize Initial Centroid Arrays


ReDim Centroid(1 To Clusters)
Dim uniqueCentroid As Boolean

For c = LBound(Centroid) To UBound(Centroid)


'Initialize Centroid Dimension Depth
ReDim Centroid(c).Dimension(2 To Table.columns.Count)

'Initialize record index to next record


r = LBound(Record) + c - 2

Do ' Loop to ensure new centroid is unique


r=r+1 'Increment record index throughout loop to find unique record to use as a centroid

'Assign record dimensions to centroid


For d = LBound(Centroid(c).Dimension) To UBound(Centroid(c).Dimension)
Centroid(c).Dimension(d) = Record(r).Dimension(d)
Next d

uniqueCentroid = True

For c2 = LBound(Centroid) To c - 1

'Loop Through Record Dimensions and check if all are the same
x=0
y=0
For d2 = LBound(Centroid(c).Dimension) To _
UBound(Centroid(c).Dimension)
x = x + Centroid(c).Dimension(d2) ^ 2
y = y + Centroid(c2).Dimension(d2) ^ 2
Next d2

B
uniqueCentroid = Not Sqr(x) = Sqr(y)
If Not uniqueCentroid Then Exit For
Next c2

Loop Until uniqueCentroid

Next c

'Calculate Distances from Centroids

Dim lowestDistance As Double


Dim lastCluster As Integer
Dim ClustersStable As Boolean

Do 'While Clusters are not Stable

PassCounter = PassCounter + 1
ClustersStable = True 'Until Proved otherwise

'Loop Through Records


For r = LBound(Record) To UBound(Record)

lastCluster = Record(r).Cluster
lowestDistance = 0 'Reset lowest distance

'Loop through record distances to centroids


For c = LBound(Centroid) To UBound(Centroid)

'======================================================
' Calculate Elucidean Distance
'======================================================
' d(p,q) = Sqr((q1 - p1)^2 + (q2 - p2)^2 + (q3 - p3)^2)
'------------------------------------------------------
' X = (q1 - p1)^2 + (q2 - p2)^2 + (q3 - p3)^2
' d(p,q) = X

x=0
y=0
'Loop Through Record Dimensions
For d = LBound(Record(r).Dimension) To _
UBound(Record(r).Dimension)
y = Record(r).Dimension(d) - Centroid(c).Dimension(d)
y=y^2
x=x+y
Next d

x = Sqr(x) 'Get square root

'If distance to centroid is lowest (or first pass) assign record to centroid cluster.
If c = LBound(Centroid) Or x < lowestDistance Then
lowestDistance = x
'Assign distance to centroid to record
Record(r).Distance(c) = lowestDistance
'Assign record to centroid
Record(r).Cluster = c
End If
Next c

'Only change if true

C
If ClustersStable Then ClustersStable = Record(r).Cluster = lastCluster

Next r

'Move Centroids to calculated cluster average


For c = LBound(Centroid) To UBound(Centroid) 'For every cluster

'Loop through cluster dimensions


For d = LBound(Centroid(c).Dimension) To _
UBound(Centroid(c).Dimension)

Centroid(c).Cluster = 0 'Reset nunber of records in cluster


Centroid(c).Dimension(d) = 0 'Reset centroid dimensions

'Loop Through Records


For r = LBound(Record) To UBound(Record)

'If Record is in Cluster then


If Record(r).Cluster = c Then
'Use to calculate avg dimension for records in cluster

'Add to number of records in cluster


Centroid(c).Cluster = Centroid(c).Cluster + 1
'Add record dimension to cluster dimension for later division
Centroid(c).Dimension(d) = Centroid(c).Dimension(d) + _
Record(r).Dimension(d)

End If

Next r

'Assign Average Dimension Distance


Centroid(c).Dimension(d) = Centroid(c).Dimension(d) / _
Centroid(c).Cluster
Next d
Next c

Loop Until ClustersStable

kMeans = (Err.Number = 0)
End Function

Function outputClusters() As Boolean

Dim c As Integer 'Centroid Index


Dim r As Integer 'Row Index
Dim d As Integer 'Dimension Index

Dim oSheet As Worksheet


On Error Resume Next

Set oSheet = addWorksheet("Cluster Analysis", ActiveWorkbook)

'Loop Through Records


Dim rowNumber As Integer
rowNumber = 1

'Output Headings
With oSheet.Rows(rowNumber)
With .Cells(1)

D
.Value = "Row Title"
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
With .Cells(2)
.Value = "Centroid"
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
End With

'Print by Row
rowNumber = rowNumber + 1 'Blank Row
For r = LBound(Record) To UBound(Record)
oSheet.Rows(rowNumber).Cells(1).Value = Table.Rows(r).Cells(1).Value
oSheet.Rows(rowNumber).Cells(2).Value = Record(r).Cluster
rowNumber = rowNumber + 1
Next r

'Print Centroids - Headings


rowNumber = rowNumber + 1
For d = LBound(Centroid(LBound(Centroid)).Dimension) To
UBound(Centroid(LBound(Centroid)).Dimension)
With oSheet.Rows(rowNumber).Cells(d)
.Value = Table.Rows(1).Cells(d).Value
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
Next d

'Print Centroids
rowNumber = rowNumber + 1
For c = LBound(Centroid) To UBound(Centroid)
With oSheet.Rows(rowNumber).Cells(1)
.Value = "Centroid " & c
.Font.Bold = True
End With
'Loop through cluster dimensions
For d = LBound(Centroid(c).Dimension) To UBound(Centroid(c).Dimension)
oSheet.Rows(rowNumber).Cells(d).Value = Centroid(c).Dimension(d)
Next d
rowNumber = rowNumber + 1
Next c

oSheet.columns.AutoFit '//AutoFit columns to contents

outputClusters_Error:
outputClusters = (Err.Number = 0)
End Function

Function addWorksheet(Name As String, Optional Workbook As Workbook) As Worksheet


On Error Resume Next
'// If a Workbook wasn't specified, use the active workbook
If Workbook Is Nothing Then Set Workbook = ActiveWorkbook

Dim Num As Integer


'// If a worksheet(s) exist with the same name, add/increment a number after the name
While WorksheetExists(Name, Workbook)
Num = Num + 1
If InStr(Name, " (") > 0 Then Name = Left(Name, InStr(Name, " ("))

E
Name = Name & " (" & Num & ")"
Wend

'//Add a sheet to the workbook


Set addWorksheet = Workbook.Worksheets.Add

'//Name the sheet


addWorksheet.Name = Name
End Function

Public Function WorksheetExists(WorkSheetName As String, Workbook As Workbook) As Boolean


On Error Resume Next
WorksheetExists = (Workbook.Sheets(WorkSheetName).Name <> "")
On Error GoTo 0
End Function

F
APPENDIX B

Source Code for Model Using CPLEX

Appendix B-1: The model


int Big_M = 10000;

int numdepot=...; //number of depots


int numperiod=...;//number of periods
int numcustomer=...;//number of customers destination locations

range depot = 1..numdepot;


range period = 1..numperiod;
range customer = 1..numcustomer;
range temp = 1..numdepot*numcustomer;

float fixed_outbound_cost_excel[temp][period]=...;
float fixed_outbound_cost[i in depot][j in customer][t in period] = fixed_outbound_cost_excel[(i-
1)*numcustomer + j][t];
float fixed_cost[depot] = ...;
float variables_outbound_cost = ...;
float distance[depot][customer] = ...;
float capacity_delivery[depot] = ...;
float demand[customer][period] = ...;

dvar int+ x[depot][customer][period];


dvar boolean y[depot][period];

minimize
sum(i in depot, t in period)(fixed_cost[i]*y[i][t])
+ sum(i in depot, j in customer, t in period)(fixed_outbound_cost[i][j][t] +
variables_outbound_cost*distance[i][j])*x[i][j][t];

subject to
{

// Constraint 1:
forall (j in customer, t in period)
sum(i in depot) x[i][j][t] == demand[j][t];

// Constraint 2:
forall (i in depot, t in period)
sum(j in customer)x[i][j][t] - Big_M*y[i][t] <= 0;

// Constraint 3:
forall (i in depot, t in period)
sum(j in customer)x[i][j][t] <= capacity_delivery[i];

// Constraint 4:
forall (i in depot, t in period:t>1)
y[i][t] >= y[i][t-1];

G
Appendix B-2: Reading the Data Sheet
SheetConnection nguyen("DHL_data.xlsx");

fixed_outbound_cost_excel from SheetRead(nguyen,"fixedoutbound");

fixed_cost from SheetRead(nguyen,"fixed_cost");

distance from SheetRead(nguyen,"distance");

demand from SheetRead(nguyen,"demand");

capacity_delivery from SheetRead(nguyen,"capacity_delivery");

You might also like