Data Driven Congestion Trends Prediction of Urban Transportation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO.

2, APRIL 2018 581

Data Driven Congestion Trends Prediction


of Urban Transportation
Rui Jia, Pengcheng Jiang, Lei Liu, Lizhen Cui, and Yuliang Shi

Abstract—Smart traffic prediction system provides significant solving traffic congestion [3], [5], [6], but the smart prediction
benefits in solving the city traffic congestion. However, existing system has its unique contribution in alleviating traffic con-
smart transportation system needs a lot of real-time traffic data gestion. Assuming that people have arrived at the congestion
and accurate location information to display the traffic condi-
tion. We hope that we can use the data which is easy to be section before updating the real-time congestion information,
obtained, and then predict a reliable congestion time. To address this congestion information has little significance for peo-
this problem, this paper studied a smart traffic forecasting system ple who are in the congestion. If it is possible to predict a
based on SWARIMA model. The system includes three steps: reliable congestion time or to provide a congestion elastic
1) use the sliding windows to calculate and process real-time data interval in advance, it will be of great benefit to ease con-
stream; 2) establish the SWARIMA model and make regression
analysis; and 3) from a statistical point of view, calculate the gestion and reduce travel time. A simple and efficient smart
elastic interval and predict the congestion trend. Our system is real-time traffic prediction system will play an important role
capable of accepting the real-time traffic data stream for the to the development of cities. The smart traffic system will
congestion prediction, in addition, we reduce the actual running be not only beneficial to the city’s energy-saving and emis-
parameters to three attributes: 1) speed; 2) time; and 3) location sion reduction, but also be able to make a more reasonable
information. When faced with the challenges of real-time traffic
congestion, the system can timely and effectively calculate the and efficient plan for people’s peak travel, meanwhile it can
congestion trends and provide three reliable elastic intervals: provide constructive suggestions for the road networks of the
1) warning; 2) congestion; and 3) mitigation, which has sig- city. This paper is working on studying a simple and effective
nificance to improve traffic condition and alleviate urban road traffic prediction system, making it more effectively to predict
congestion. the whole process of congestion in advance and provide the
Index Terms—Autoregressive integrated moving aver- elastic range, so that people can know the approximate for-
age (ARIMA) model, congestion trends prediction, data mining, mation time of congestion and then reduce the possibility of
sliding window, smart traffic. congestion.
There are many ways to deal with short term traffic data
stream at present, for example, Zhang et al. [7] focused on
I. I NTRODUCTION the aggregating and sampling methods for processing GPS
ITH the acceleration of urbanization process and the data stream for traffic state estimation; Tan et al. [8] uti-
W limitations of existing road networks in city, people’s
demand for an unobstructed road has been surpassed the
lized the autoregressive integrated moving average (ARIMA)
to predict traffic stream; while Lv et al. [9] predicted the traf-
expansion speed of urban road traffic networks. So the traf- fic stream with deep learning. The traffic congestion is not
fic congestion during the peak period has become a great a simple ON–OFF problem, because its formation has a cer-
challenge for every city [1], which causes a large amount of tain trend and a number of points of mutation. But most of the
economic losses to the local city and will directly affect the predictions cannot give an elastic interval for adjustment effec-
city’s image and development of the city [2]. A 2011 U.S. tively. In addition, the traffic flow of urban roads has certain
urban transport report shows that the travel time delays and periodicity and weak stationarity [10]. For example, assuming
fuel consumption are estimated to be $121 billion in economic a section in city often has traffic jams. If it congests during
losses and it will increase to $190 billion in the future [3]. the morning peak on Monday, while at the same time of the
In China, the most influential of the top ten cities every day next Monday, it will also congest, what is more its severity
cause economic losses estimated at $1 billion because of traf- also has a certain similarity. In this paper, we found that most
fic congestion. In addition to the United States and China, of the vehicles in the traffic congestion are low-speed, and the
traffic congestion in the rest of the world is also a severe low-speed vehicles are also the main body of congestion. So
challenge [4]. There are many researches on alleviating and most of the traffic congestions are formed by rapid increase-
ment of low-speed vehicles at the beginning and while the real
Manuscript received December 31, 2016; revised May 15, 2017; accepted congestion will be formed, the number of low speed vehicles
June 3, 2017. Date of publication June 15, 2017; date of current version
April 10, 2018. This work was supported by the National Science Foundation will tend to be stable. However, the congestion relief is just the
of China under Grant 61402262 and Grant 61572295. (Corresponding author: opposite of this phenomenon, it is started by a relatively rapid
Lei Liu.) reduction in the number of low-speed vehicles, and when the
The authors are with the School of Computer Science and Technology,
Shandong University, Jinan 250101, China (e-mail: l.liu@sdu.edu.cn). real congestion will be over, the low-speed vehicles will tend
Digital Object Identifier 10.1109/JIOT.2017.2716114 to be stable again in a period of time.
2327-4662 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
582 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

Therefore, in this paper, we make the following contribu-


tions based on this practical phenomenon.
1) Processing the original data stream with the preproces-
sor: Using sliding window method to analyze real-time
traffic data stream and extract data features.
2) Using ARIMA model to deal with the above results,
then performing mathematical modeling and solving
regression equations.
3) Using a probabilistic model to find the feature points,
resulting two elastic intervals which can be used as
a prediction: congestion is about to occur in the first
interval and ease in the second interval.
4) With a large number of actual traffic data and experi-
ments, the results show that our method can well predict Fig. 1. Example of sliding windows.
the traffic congestion.
What is more, we compared our experimental results with B. ARIMA
the online analysis results of Baidu map, GaoDe map, and ARIMA is one of the most widely used linear regres-
other map softwares and the results show that the smart traffic sion models [16], especially in the application of time series
prediction system does have a great advantage. prediction in social economy [17] and networks [18]. For
The rest of this paper is organized as follows. We intro- example, Yunus et al. [19] studied the modeling of wind speed
duce the preliminary knowledge of this paper in Section II. time series based on ARIMA; Cardoso and Cruz [20] used
Section III presents the details of SWARIMA model and algo- ARIMA models and artificial neural network to forecast natu-
rithm. We introduce our experimental process, show a large ral gas consumption; Zhang et al. [21] established a model of
number of processing results and evaluations in Section IV. multiple seasonal ARIMA on time serial data and predicted for
Section V describes the application of the smart prediction the seasonal time series. In addition, many researchers have
system with comparing the results of this paper and other map used this model to detect abnormal value or mutation value
softwares in detail. We conclude this paper in Section VI. and to solve practical problems.
The AR model is also used to conduct traffic
modeling, prediction, and congestion control for high-
II. P RELIMINARY
speed networks [22]. It mainly displays the relationship
A. Sliding Windows Model between the time t arrived data Xt and the data Xt−1 , Xt−2
Sliding windows model based on the data stream is one and so on before arriving. The definition is as follows (all the
of the most common models of mining data. For example, parameters and descriptions can be found in Table I):
Noh et al. [11] studied the vehicle detection in highway envi- 
p
ronment with sliding window; Pralon et al. [12] used the Xt = c + φi Xt−i + ε (1)
sliding window approach to study convergence rate, Gaussian i=1
sources, and spatial correlation; Sibley et al. [13] concerned where c is a constant, p is the order of the model, φi (i =
with improving the range resolution of stereo vision for entry; 1, 2, . . . , p) are the parameters and ε is the error term that
Jin et al. [14] used a time-sensitive sliding window to mining represents the difference between the calculating result and
recent frequent itemsets over data stream. Gradually, with data the true value.
stream constantly coming, the old data point which had been The above AR model represents the relationship between
read will be thrown out or replaced by the new data point the Xt and the correct terms. In order to show the relationship
because of the less contribution [15]. In this model, both two between it and the error terms, we need to use the MA model,
ends of the window are identified as two sliding endpoints. which is defined as follows:
When the new data stream arrives to the preprocessor, data q
which meet the requirement of the current window will be Xt = μ + εt + θi εt−i (2)
read. When the last data is filtered out, the window moves i=1
forward with time and the old data will be thrown out. This where μ is the average value of series, εi (i = t−1, t−2, . . . , t)
process is called as sliding. are the error terms, q is the order of the model and θi (i =
Fig. 1 gives an example of the sliding window, the horizon- 1, 2, . . . , q) are the parameters. The above two models can be
tal axis indicates the time that the data stream arrives, and the unified in a formula to describe and then the ARIMA model
vertical axis represents the speed, where S1 and S2 represent can be deduced. The definition is as follows:
the speed range filtered by the preprocessor. For example, in p 
q
this figure, P3 is an existed old data point, time window will Xt = θ0 + φi Xt−i + εt + θi εt−i (3)
slide to the solid box position because of the arrival of new i=1 i=1
data P6 and P7, meanwhile P3 is thrown. Besides, P4, P7, and where θi (i = 1, 2, . . . , q) and φi (i = 1, 2, . . . , p) are the
other data without being read because they do not belong to parameters. p and q are the orders, usually as an integer. Xt is
the screening speed range. the sample value of the data stream. εi (i = t − 1, t − 2, . . . , t)
JIA et al.: DATA DRIVEN CONGESTION TRENDS PREDICTION OF URBAN TRANSPORTATION 583

TABLE I
L IST OF N OTATIONS sliding window. While the sliding window detects the data
stream, it filter out the points that not match the window and
perform the operation of adding data. When the detection is
completed, the sliding window with data stream will speed for-
ward automatically. Repeat the above process to continuously
detect the data stream until the end.
2) Second Stage (Modeling Stage): When the detection of
one window is completed, SWARIMA uses the information
of this window to update the regression model, and tries to
use the new model to predict the attribute information of the
current point and the next point.
3) Third Stage (Calculating the Score of This Change
Process): First, we obtain the prediction scores of the current
data point, which is calculated by the regression model of the
previous window. Second, we calculate the score of current
point in the current window. And finally compare the differ-
ence between the two scores. In this paper, the probability
model is used to evaluate the difference. If the score exceeds
a certain threshold, the corresponding time of the current point
is considered to be a drastic change.

B. Preprocessing Stage
In this paper, the raw data which is used to deal with is
source from the measurements at the intersections in a desig-
nated city in China. The raw data contains five basic attributes:
1) arrived time; 2) the license plate number; 3) vehicle speed;
4) vehicle arrival direction; and 5) crossing number. In order
to convert the raw data into preprocessing file which is avail-
able for the use of the second stage expediently, we design a
preprocessor to complete it. In Fig. 3, the operation of the pre-
processor is divided into four steps: 1) grouping; 2) removal;
3) selection; and 4) statistics, while the grouping and removal
can be implemented at a same time. In order to save time
cost, each data point is read only once, and the processor use
Fig. 2. Flow diagram of SWARIMA model. online processing to deal with a large number of real-time data
streams. Specific steps are as follows.
1) First Step (Grouping): In order to simplify the subse-
are the noise error terms. Obviously, if p is equal to 0, ARIMA quent statistical work, we first group the raw data. When the
model will be simplified as MA model; if q is equal to 0, data stream arrives at a certain time, the preprocessor first
ARIMA model will be simplified as AR model [23]. groups the data according to the attributes of the data, such as
The ARIMA model shows well in prediction, so many vehicle arrival direction, crossing number, and so on.
researchers have used it for time series prediction. Because the 2) Second Step (Removal): After grouping, there are still
ARIMA model can express the difference between the data, a lot of attributes redundant in the data set. The removal step
we use it to study the change of sensitive data in real-time not only saves the time cost of the follow-up steps but also
data stream and then make accurate judgment and prediction reduces the storage cost of the system. In fact, we only need
in this paper. to read the useful data properties into the fitting array, while
ignoring other irrelevant attributes or the attributes of small
impact.
III. SWARIMA M ODEL AND A LGORITHM D ETAILS
3) Third Step (Selection): This step, actually, can be car-
A. SWARIMA Model Introduction ried out with the removal. In this paper, we mainly study the
In this paper, we propose a new method named SWARIMA, number of low speed vehicles that can respond to the change
which includes three stages that will be detailed in of congestion. Therefore, it is necessary for us to filter the
Sections II–IV. And Fig. 2 shows the flow diagram model high speed vehicles data and some strange data points. Thus,
of SWARIMA in detail. only the data points that meet the requirements of the win-
1) First Stage (Pretreatment Stage): SWARIMA process dow speed are obtained, so as to generate the standard format
the raw traffic data stream through a preprocessor and extract files. While this step is also the most important step in the
features. First, we obtain a current point as the beginning of preprocessor.
584 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

Fig. 3. Flow diagram of preprocessing phase.

Algorithm 1 Standardization of Raw Data Algorithm 2 Congestion Trends Forecast


Require: X,C,sr,tr,tstep Require: value no of current window Wo ; values of windows
Ensure: win before Wo
1: //when an object X comes, get and process it. Ensure: the predicted value of Wo and Wo+1
2: GET X 1: no ← (no−2 + no−1 + no ) ÷ 3
3: LET t = GetSeconds(X) 2: uo = (1 − r)no−1 + rno
4: LET sp = GetSpeed(X) 3: zo = no − uo
5: if C − sr < sp&&sp <= C + sr then 4: Z.add(zo )
6: //compute the numbers of leftmost and rightmost fitted 5: /* using ARIMA(p,d,q) regress zi (i = 1, 2, . . . , o)*/
windows 6: mod ← ARIMA(Z, p, d, q)
7: LET LW = ( t - 2 * tr ) / tstep + 2 7: ẑo,o ← prdict(mod, Z1:o )
8: LET RW = t / tstep + 1 8: n̂o,o = ẑo,o + uo
9: for i = LW to RW do
10: win[i] = win[i] + 1
11: end for value and the actual value, so as to reflect the road conditions.
12: end if
As mentioned in Section II of this paper, the ARIMA model
has great advantages and convenience in solving this kind of
problem and the SWARIMA model also inherits the advan-
4) Fourth Step (Statistics): The statistics plays an impor- tage of the ARIMA. Algorithm 2 is the prediction process.
tant role in the sliding window model, aiming at processing The variables in the specific operation process are defined in
data points after above the three steps according to the sliding Algorithm 2.
window method into the preprocessing file for data regression. The current point is represented by Wo or (o, n), where o
First define the time center and the time radius of the sliding represents the current window of time, and no represents the
window tr and then begin to statistics on each standard format current value. The new sequence N to be used for analysis
files from the first time window. If a data point is in a time consists of the current n and the values before it, while o
range of a sliding window, the sliding window corresponding is the index of the current window and the result model is
to the array of values plus one. If a data not only belong to the defined as Mo , which is calculated by ni (i = 1, 2 . . . o). In the
previous window but also belongs to the next window, then formula, p, d, and q are predefined parameters of the ARIMA
this one will be recorded by the two windows at the same time. model.
With the sliding window sliding, we can traverse the standard Next, we define a new symbol n̂i,j , which means that the
format file to obtain the standardized files which contain the value of the nj is predicted by model Mi . The SWARIMA
number of each sliding window. The achievement of pseudo model, for example, uses the current results to predict the
code is in Algorithm 1. value of the current point n̂o,o , and the value of the next point
n̂o,o+1 . And such a no will reflect the congestion trend in the
C. Modeling Stage subsequent steps. Finally, n̂o,o and n̂o,o+1 will be processed
The output of the previous stage will be the input for this by the next step in the system.
stage. In order to facilitate the description of the subsequent
steps, we define a current window, which indicates that the D. Calculating Score of the Change Process
previous window of statistics has been completed and output First, we assume that n is affected by the Gauss distribution,
to the window in this stage and the internal data is processed and the expectation and variance are necessary in the Gauss
by the 2-D data. SWARIMA is a method that can be used to distribution model. By the introduction of step C, we can
predict multidimensional data, while in this paper we only use assume that expectations are n̂o,o and n̂o−1,o , and the variance
2-D. Before the data regression, we observed that the overall can be calculated by
sample is large and the fluctuation of sliding window data. In
 2
order to reduce the impact of these data on the experimental Varo = r · Varo−1 + (1 − r) · no − n̂o,o . (4)
results, we will make an average of each value and the first
two values as the current value of the subsequent calculations. Equation (5) is the Gauss distribution formula with the mean
Subsequently, SWARIMA predicted the current and the next μ and variance σ 2 . Then we can use n̂o,o and Po−1,o , respec-
value and used a probabilistic model to compare the predictive tively, calculate the possible value of no , in order to facilitate
JIA et al.: DATA DRIVEN CONGESTION TRENDS PREDICTION OF URBAN TRANSPORTATION 585

Algorithm 3 Computing the Change Scores


Require: predicted value n̂o,o and n̂o−1,o , real value no ,
previous variance Varo−1 , threshold value To
Ensure: the predicted road state, which may be smooth,
warning, congestion, or mitigation
1: /*Compute the variance and mean of */
2: Varo ← r · Varo−1 + (1 − r) · (no − n̂o,o )2
3: /*compute the change score and make sure the state*/
4: Po,o ← Gaussian(μ = no,o , σ 2 = Varo )
5: Po−1,o ← Gaussian(μ
 = no−1,o−1
 , σ 2 = Varo )

6: CSo ← − ln( Po,o − Po−1,o ) 
7: FCSo ← (CSo−2 + CSo−1 + CSo ) ÷ 3
8: if CSo ≥ To then
9: stateo ← check(Wo )
Fig. 4. Real data in the whole day.
10: else
11: stateo ← normal
12: end if

the description, we use Po,o and Po−1,o to represent these two


values

  1 (x − μ)2
P x| μ,σ = √ exp −
2
. (5) (a) (b)
σ 2π 2σ 2
Then we use Po−1,o to express the difference between Fig. 5. Results and the original data for morning peak. (a) Process results
of SWARIMA. (b) Original real data.
Po,o and Po−1,o . At this point we can use the score calculation
formula to calculate the value of CSo
 
CSo = − ln Po,o − Po−1,o  . (6) understand the characteristics of the data, we directly use the
original data to draw a diagram, as shown in Fig. 4, the hor-
Taking into account the volatility of the data, this paper uses
izontal axis represents the window time of one day, and the
the average method to calculate the final score of FCSo , and
vertical axis represents the number of vehicles. As can be seen
its value is equal to the average value of CSo−2 , CSo−1 , and
from Fig. 4, the number of vehicles in the designated city in the
CSo . When the FCSo of the window Wo exceeds the thresh-
morning and evening peak will produce significant mutations
old, then it is called the time point of the window has changed
in the vicinity of some points in time, and these mutations
dramatically. In this paper, we mainly study the traffic con-
are located just in the time range of the morning and evening
gestion of urban traffic peak, so SWARIMA model will use
peak. What is more, we can notice that in many time windows,
the average change points of continuous several peak win-
there are some obvious data fluctuations which will influence
dows as threshold. And in the process of a large number of
our observation and the processing of data stream, so it is
experiments, we have used data from consecutive days and
necessary for this paper to use the mean value to preprocess
the same day of consecutive weeks to perform the experi-
more smoothly. In addition, we can observe that the formation
ment and verification. The pseudo code calculation process is
and remission of the morning peak is much more intense than
shown by Algorithm 3, and it consists of three steps: calcu-
the evening peak, which is also in line with our understand-
lating the variance using the Gauss distribution to calculate
ing of the actual situation. Therefore, the actual experiment in
the probability and finally using the SWARIMA model to
this paper is to calculate the morning and evening peak data,
calculate the FCS. If the score is greater than the threshold,
respectively, to ensure the scientific results.
then check whether the change point is the beginning or the
Fig. 4 is directly drawn from the original data. From this
end of the congestion, otherwise the state of the road has not
figure, we can roughly see the morning and evening peak
changed.
congestion time and congestion trends.
Fig. 5(a) and (b), respectively, are the morning results of the
IV. E XPERIMENTAL R ESULTS AND E VALUATION score calculation and original data map. The red line which is
The experimental data used in this paper is the intersection parallel to the horizontal axis is the threshold calculated, and
traffic data which is obtained from the designated city in China the calculation method is to take the average of the change of
for a month, and its contents include time, license plate num- 7:00 A . M . to 9:30 A . M . in the morning before several days.
ber, vehicle speed, vehicle direction, and intersection number. Based on the intersections of the threshold line and the original
And there are more than five million records every day. After image, we can draw some vertical lines that are perpendicular
the grouping step, we mainly use two important attributes: to the horizontal axis, and these intersections are the critical
1) arrival time and 2) the vehicle speed. In order to intuitively time we require.
586 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

(a) (b) (a) (b)

Fig. 6. Results and the original data for evening peak. (a) Process results of Fig. 8. Results and the simulation for morning peak. (a) Process results of
SWARIMA. (b) Original real data. SWARIMA. (b) Original simulation data.

(a) (b)

Fig. 9. Results and the simulation for evening peak. (a) Process results of
SWARIMA. (b) Original simulation data.

17:23 P. M . are early warning period; between 17:23 P. M .


and 18:19 P. M . are congestion period; between 18:19 P. M .
and 19:36 P. M . are mitigation period. Although the mitigation
Fig. 7. Simulation data of a whole day. period is later than the experience conclusion, the prediction
results still provide a relatively reliable reference value in a
certain range of allowable error.
As shown in Fig. 5(a), we can observe that these sev- In addition, we also try to carry out simulation experiments
eral time points are 7:00 A . M ., 7:35 A . M ., 8:30 A . M ., and to verify the reliability of the SWARIMA method. On the
9:15 A . M ., while the four time points show that the number basis of experimental data, the data of simulation experiment
of low-speed vehicles vary most dramatically at these time is created by the Poisson distribution, which rely on the aver-
points. Before the full formation of congestion, the number age number of vehicles to distinguish between peak period
of low-speed vehicles has gone through a change from less to and other periods, and the mean of the peak is significantly
more, and then from more to less, which is like to the quadratic greater than the mean of other periods. Fig. 7 is drawn directly
curve. While before the complete remission, the number of from the simulated data. Fig. 8 is the original figure and the
low-speed vehicles change from less to more and then from processed figure for morning, and Fig. 9 is the original figure
more to less again. Obviously, this result is consistent with and the processed figure for evening.
the view in Section I. The former change process is called as From Fig. 8, we observe four time nodes are: 1) 7:00 A . M .;
the early warning period, the latter is known as the mitigation 2) 7:35 A . M .; 3) 8:52 A . M .; and 4) 9:27 A . M . The four time
period, and the real congestion is just located between these nodes in Fig. 9 are: 1) 17:00 P. M .; 2) 17:30 P. M .; 3) 19:42 P. M .;
two elastic intervals. In short, every formation of morning peak and 4) 21:27 P. M . The mitigation period is significantly later
will go through three periods: 1) early warning period; 2) con- than the actual data. The reason is that we deliberately delayed
gestion period; and 3) remission period. While a large number the evening peak in order to test the sensitivity of the SWAR
of experimental results also verify the existence of such a phe- model in the production of simulation data. Obviously, the
nomenon. From the point of life experience, the first interval improved model also has excellent performance in the pro-
can be considered as the beginning of congestion and the start- cessing of simulation data and can clearly determine in which
ing point is considered to be the starting point of all stages; time window the original data exists dramatic changes and
the second interval can be considered to be the end of the find out the corresponding time points. This also confirms the
congestion and the end point is considered to be the end point reliability of our experiment from another angle.
of all stages. So far, the prediction process of morning peak What is more, we conducted a large number of data exper-
can be fully displayed. iments. Figs. 10–14 are the processing figure and original
Fig. 6(a) and (b) is the evening time results of the score cal- data figure of the morning and evening peaks in five differ-
culation and original data map, respectively. The same to the ent sections. Next, we will explain in detail how to clearly
description of the last paragraph, we can observe the evening understand and distinguish the range and time interval of con-
peak congestion also contains three periods from Fig. 6(a), gestion. The first step, by smoothing process we process the
while the four time nodes are: 1) 16:41 P. M .; 2) 17:23 P. M .; original graph into a graph which can be analyzed. And this
3) 18:19 P. M .; and 4) 19:36 P. M . Between 16:41 P. M . and process is clearly described in the preprocessing and modeling
JIA et al.: DATA DRIVEN CONGESTION TRENDS PREDICTION OF URBAN TRANSPORTATION 587

(a) (b) (c) (d)

Fig. 10. Results and the original graphs of the first section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.

(a) (b) (c) (d)

Fig. 11. Results and the original graphs of the second section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.

(a) (b) (c) (d)

Fig. 12. Results and the original graphs of the third section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.

(a) (b) (c) (d)

Fig. 13. Results and the original graphs of the fourth section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.

stages; The second step, we need a threshold to divide the time the beginning of congestion mitigation and the end of conges-
point whether there is congestion. In this paper, we will use tion mitigation. Two images as a set, from the left image we
the change score FCS as our threshold, and this paper will can see that the red horizontal line is the threshold which we
calculate threshold of each half day to ensure the accuracy calculated and it has four intersections with the image. While
of the threshold in different images. Once the curve in the we identify these four time points on the right original data
image is below this threshold, we consider this region not image, it can be seen that the entire peak time is completely
congested. on the contrary, we believe that this region exists included by the beginning time point and the end time point.
congestion and conduct the third step analysis; The third step, In addition, each region of the predicted results not only has
a large number of data results show that there is a formation the elastic range of same characteristics, and the elastic range
period in the occurrence of each congestion and there is a mit- is also in line with the actual situation, which shows that our
igation period in the disappearance of each congestion. Both prediction scheme is accurate and effective.
these two periods have the start time point and end time point.
Therefore, after the complete processing, we can clearly find
that each threshold has four intersections with the image. And V. A PPLICATION
the meaning of these four intersection coordinates is the begin- In this paper, based on the theory of intelligent transporta-
ning of congestion formation, the end of congestion formation, tion, we process, analyze, and regress the real-time data stream
588 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

(a) (b) (c) (d)

Fig. 14. Results and the original graphs of the fifth section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.

obtained from the electronic sensing at the intersection, and


obtain the SWARIMA model curve of the peak time, and
predict the whole congestion process simply and effectively. In
a short time, we can smartly transform traffic information and
instant messaging data into predictive services so as to provide
it to users through a variety of ways. This smart service is of
great significance for the majority of city traffic, especially
for the roads in large cities and special sections. By effec-
tively forecasting peak time of the congestion, administrators
Fig. 15. Query page of smart traffic prediction.
can timely relieve traffic pressure, improve the traffic condi-
tions of the congested road sections, and reduce the occurrence
of traffic accident [24].
that is, the sliding window traverses the original data once to
obtain the preprocessed file. Subsequent processing using the
A. System Characteristics R language standard library function to ensure the operating
1) Universal: First, since the system gets data from the efficiency and low time complexity. The whole system does
city’s basis traffic facilities, such as the intersection of detec- not have a complex processing mechanism, and its processing
tion equipment, so we can easily get a lot of experimental raw results can be presented in a short time.
data. Second, due to the standardization of urban traffic mon- 5) Visualization: We believe that the visualization of the
itoring equipment, we can obtain unified data format. Third, final processing results is necessary. Although we can clearly
the large number of verification tests in this paper show that show the elasticity interval of congestion occurrence and mit-
SWARIMA model will not fluctuate obviously due to different igation, it is easier for people to understand and use our
intersection status. Finally, the processing results of the system results [25] in the form of Web pages. As shown in Fig. 15,
is convenient for secondary analysis and post-processing. we combine the results of the system with clocks and watches,
and use different colors to distinguish different road condi-
2) Accuracy: The data used in this paper is source from the
tions. When the users input the date, time, the road number,
real data sets. From a certain angle, the precision of the equip-
and the direction, the system will automatically generate a cir-
ment also determines our acquisition accuracy. But the results
cular chart on the right side of the page. The four mark points
show that our original data set has fully met the requirements
in the graph are the change points of the road condition. The
of the SWARIMA model. In addition, the system’s data anal-
four colors of green, yellow, red, and blue in the ring indicate
ysis algorithm and each step of the data processing methods
four traffic states: 1) smooth; 2) warning; 3) congestion; and
are effective, and the final results also reflect the accuracy of
4) mitigation.
the process.
3) Prediction and Elastic Intervals: Compared with other
systems, this is the most important difference of the system in B. Application
this paper. Real-time traffic system uses the vehicles position- There are many differences between the smart forecasting
ing equipment to carry out the statistics, and visually display system in this paper and the real-time traffic system which is
the real-time congestion curve. In this paper, mathematical widely used in the market. Details are as follows.
algorithm is used to analyze the change of the number of 1) Essential Difference: The real-time traffic system is
vehicles, so as to predict the approximate time of the conges- based on the wide range of GPS positioning and large data
tion, and give the warning time and mitigation time of two statistics to obtain the results. This approach requires a large
elastic intervals for people to refer to. Early prediction allows amount of terminal equipment support [26]. The current main-
the users to adjust the travel route as early as possible, avoid stream map applications, such as GaoDe map, Baidu map [27],
traffic congestion, reduce traffic congestion, and improve road are through GPS or other mobile devices to collect real-time
conditions. location and speed of the vehicle information and distinguish
4) Low-Complexity: This system is divided into two parts: the number of vehicles on a road with different colors. The
1) preprocessing and 2) data analysis. The preprocessing sec- color is red when there is a large number of vehicles or even
tion is optimized so that the complexity is reduced to O(n), congestion, while it is green when there is a small number
JIA et al.: DATA DRIVEN CONGESTION TRENDS PREDICTION OF URBAN TRANSPORTATION 589

TABLE II
C OMPARISON B ETWEEN P REDICTIVE R ESULTS AND BAIDU M AP

of vehicles. The smart prediction system uses the intersection Fig. 16. 5:08 P. M . level 1 in GaoDe maps.
monitoring equipment to collect and process the real-time data
stream, and calculates the results by SWARIMA. Whenever
our system receives the data stream, we calculate the muta-
tion value of the number of low-speed vehicles according to
the steps of preprocessing, regression analysis, calculating the
change scores and then use the peak prediction method to
predict the elastic interval of congestion occurrence and miti-
gation. And then we update the website with the latest results
for users to query.
2) Data Used for Processing Is Different: Compared with
real-time traffic system, smart prediction system only uses the Fig. 17. 5:24 P. M . level 2 in GaoDe maps.
low-speed vehicles obtained by the preprocessing to analyze.
Because the decrease of the mean value of vehicle speed and
the increase of the number of low-speed vehicles are the main more low-speed vehicles than before, and the congestion
characteristic of traffic congestion. This not only makes the may occur.
sample data in the SWARIMA system more lightweight, saves 2) Level 2: The time point is 8:03 A . M . The road in this
a lot of time cost of system operation, but also can accurately figure was red and was the first time from yellow to red,
grasp the information and alleviation of road congestion. which indicated that it had entered the congestion state.
3) System Results: Real-time traffic system determines 3) Level 3: The time point is 8:46 A . M . and the road in
whether there is traffic congestion from the results. If the this figure turned yellow-green again while no red color
number of vehicles stranded on the road for more than a cer- appeared after that time, indicating that there was no
tain threshold, it will be judged to be congested. Therefore, congestion situation from this moment.
a sudden change in color may occur at a certain time, and We compared the predicted results (as shown in Fig. 5) with
this also means it is difficult for people to know which stage the actual congestion in the section, as shown in Table II.
of the elastic interval does the current road belongs to. The The three elastic intervals given by our system are: 1) warn-
biggest advantage of smart prediction system is to predict ing time (from 7:00 A . M . to 7:35 A . M .); 2) congestion (from
the formation of congestion in advance and to provide a 7:35 A . M . to 8:30 A . M .); and 3) mitigation (from 8:30 A . M . to
reference elastic interval for the three stages of congestion 9:15 A . M .). What is more, the above three points of time also
process. The four points provided by the system are all abrupt fit in the elastic interval shown in Fig. 5, respectively. This
points in the change process of the low-speed vehicles’number. shows that, within allowable error range, our smart prediction
According to experience, we have the following division: the system successfully predicts the congestion situation of morn-
two endpoints of the warning phase indicate that the road ing peak before it happened, which means we can predict a
has a congestion trend and the congestion will form imme- warning interval before the dramatic number incresement of
diately. The two endpoints of the mitigation phase indicate low-speed vehicles, a congestion interval before the real con-
that the road has a mitigation trend and the road will be unob- gestion happens, and a mitigation interval before the complete
structed immediately. And the time between these two phases remission.
are the real congestion phase. This results is also better in Figs. 16–18 are the whole process of congestion formation
line with people’s understanding of congestion formation and and mitigation in a section. We also divide it into three levels.
mitigation. 1) Level 1: The time point in Fig. 16 is 5:08 P. M . and at this
4) Contradistinction: We took a screenshots of Baidu map time the intersection appeared yellow first time, which
and Gaode map displayed results of the same time, and com- said the number of low-speed vehicles were more and
pared it with the results of this paper. The specific contrast is the congestion may occur.
as follows. 2) Level 2: Fig. 17 corresponds to the time point of 5:24
Three real time points in Table II are the whole process of P. M . this is the first time that the intersection appeared
congestion formation and mitigation in a section. We divide it red and then continued to be red, said the intersection
into three levels. had been in a state of congestion already.
1) Level 1: The time point is 7:25 A . M ., the road was 3) Level 3: The time point in Fig. 18 is 6:29 P. M ., then no
yellow in this figure and the yellow color appears con- longer appeared red in this intersection, indicating that
tinuously from this moment, indicating that there were there was no congestion situation from this moment.
590 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

[2] S. Liu, K. P. Triantis, and S. Sarangi, “A framework for evaluating


the dynamic impacts of a congestion pricing policy for a transportation
socioeconomic system,” Transp. Res. A Policy Pract., vol. 44, no. 8,
pp. 596–608, 2010.
[3] S. Wang, S. Djahel, Z. Zhang, and J. McManis, “Next road rerouting:
A multiagent system for mitigating unexpected urban traffic conges-
tion,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 10, pp. 2888–2899,
Oct. 2016.
[4] J. I. Levy, J. J. Buonocore, and K. von Stackelberg, “Evaluation of the
public health impacts of traffic congestion: A health risk assessment,”
Environ. Health, vol. 9, no. 1, pp. 1–12, 2010.
Fig. 18. 6:29 P. M . level 3 in GaoDe maps. [5] G. Kim, Y. S. Ong, T. Cheong, and P. S. Tan, “Solving the dynamic
vehicle routing problem under traffic congestion,” IEEE Trans. Intell.
TABLE III Transp. Syst., vol. 17, no. 8, pp. 2367–2380, Aug. 2016.
C OMPARISON B ETWEEN P REDICTIVE R ESULTS AND G AO D E M AP [6] A. Ahmad, R. Arshad, S. A. Mahmud, G. M. Khan, and
H. S. Al-Raweshidy, “Earliest-deadline-based scheduling to reduce
urban traffic congestion,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 4,
pp. 1510–1526, Aug. 2014.
[7] J.-D. Zhang, J. Xu, and S. S. Liao, “Aggregating and sampling meth-
ods for processing GPS data streams for traffic state estimation,”
IEEE Trans. Intell. Transp. Syst., vol. 14, no. 4, pp. 1629–1641,
Dec. 2013.
[8] M.-C. Tan, L.-B. Feng, and J.-M. Xu, “Traffic flow prediction based on
hybrid ARIMA and ANN model,” China J. Highway Transp., vol. 20,
We compared the predicted results (as shown in Fig. 6) with no. 4, pp. 118–121, 2007.
the actual congestion in the section, as shown in Table III. The [9] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction
with big data: A deep learning approach,” IEEE Trans. Intell. Transp.
three elastic intervals given by our system are: 1) warning time Syst., vol. 16, no. 2, pp. 865–873, Apr. 2015.
(from 16:41 P. M . to 17:23 P. M .); 2) congestion (from 17:23 [10] C. Dong, C. Shao, D. Zhao, and Y. Liu, Short-Term Traffic Flow
P. M . to 18:19 P. M .); and 3) mitigation (from 18:19 P. M . to Forecasting Based on Periodicity Similarity Characteristics. Heidelberg,
Germany: Springer, 2013.
19:36 P. M .). Meanwhile, these time points also, respectively, fit [11] S. Noh, D. Shim, and M. Jeon, “Adaptive sliding-window strategy for
in the elastic interval shown in Fig. 6. This shows that, within vehicle detection in highway environments,” IEEE Trans. Intell. Transp.
allowable error range, our smart prediction system success- Syst., vol. 17, no. 2, pp. 323–335, Feb. 2016.
fully predicts the congestion situation of evening peak before [12] L. Pralon, G. Vasile, M. D. Mura, J. Chanussot, and N. Besic,
“Evaluation of ICA-based ICTD for PolSAR data analysis using a
it happens. By far, the smart prediction system is finished. sliding window approach: Convergence rate, Gaussian sources, and
spatial correlation,” IEEE Geosci. Remote Sens. Lett., vol. 54, no. 7,
pp. 4262–4271, Jul. 2016.
VI. C ONCLUSION [13] G. Sibley, L. Matthies, and G. Sukhatme, “Sliding window filter with
application to planetary landing,” J. Field Robot., vol. 27, no. 5,
In this paper, we studied the smart traffic prediction system pp. 587–608, 2010.
based on SWARIMA method. Inspired by an observation on [14] L. Jin, D. J. Chai, Y. K. Lee, and K. H. Ryu, “Mining frequent
itemsets over data streams with multiple time-sensitive sliding win-
the causes of traffic congestion, we found that most sections of dows,” in Proc. Int. Conf. Adv. Lang. Process. Web Inf. Technol., 2007,
the urban road networks had similar congestion characteristics. pp. 486–491.
Therefore, we propose a three-step model to solve the con- [15] F. Angiulli and F. Fassetti, “Detecting distance-based outliers in streams
gestion prediction problem: 1) we first use the preprocessor to of data,” in Proc. 16th ACM Conf. Inf. Knowl. Manag. (CIKM),
Lisbon, Portugal, Nov. 2007, pp. 811–820.
preprocess the original data directly; 2) the SWARIMA model [16] M. Khashei and M. Bijari, “A novel hybridization of artificial neural
is used to analyze the mathematical model and the regression networks and ARIMA models for time series forecasting,” Appl. Soft
analysis; and 3) the Gauss distribution model is used to calcu- Comput., vol. 11, no. 2, pp. 2664–2675, 2011.
[17] R. M. K. T. Rathnayaka, D. M. K. N. Seneviratna, W. Jianguo, and
late the change scores so as to further calculate three elastic H. I. Arumawadu, “A hybrid statistical approach for stock market fore-
intervals: 1) warning; 2) congestion; and 3) the mitigation. In casting based on artificial neural network and ARIMA time series
addition, this paper also carries out the simulation data test. models,” in Proc. Int. Conf. Behav. Econ. Socio Cult. Comput. (BESC),
Oct./Nov. 2015, pp. 54–60.
Experimental results show that our method are superior to real-
[18] S. M. T. Nezhad, M. Nazari, and E. A. Gharavol, “A novel DoS and
time traffic system in terms of the congestion trends prediction DDoS attacks detection algorithm using ARIMA time series model and
and providing the elastic range of congestion. Through the chaotic system in computer networks,” IEEE Commun. Lett., vol. 20,
long-term accumulation of data, the public can be more rea- no. 4, pp. 700–703, Apr. 2016.
[19] K. Yunus, T. Thiringer, and P. Chen, “ARIMA-based frequency-
sonable, according to the congestion prediction, to plan their decomposed modeling of wind speed time series,” IEEE Trans. Power
own itinerary and reduce energy consumption; managers are Syst., vol. 31, no. 4, pp. 2546–2556, Jul. 2016.
able to manage frequently congested sections and adjust the [20] C. A. V. Cardoso and G. L. Cruz, “Forecasting natural gas consump-
tion using ARIMA models and artificial neural networks,” IEEE Latin
traffic conditions comprehensively, thus helping to alleviate America Trans., vol. 14, no. 5, pp. 2233–2238, May 2016.
the traffic congestion in the city. [21] W. Zhang, Y. Q. Zhang, and X. Yang, “Model of multiple sea-
sonal ARIMA and its application to data in time series,” Acta
Academiae Medicine Militaris Tertiae, vol. 24, no. 8, pp. 955–957,
R EFERENCES 2002.
[22] B.-S. Chen, S.-C. Peng, and K.-C. Wang, “Traffic modeling, prediction,
[1] S. Wongcharoen and T. Senivongse, “Twitter analysis of road traffic and congestion control for high-speed networks: A fuzzy AR
congestion severity estimation,” in Proc. 13th Int. Joint Conf. Comput. approach,” IEEE Trans. Fuzzy Syst., vol. 8, no. 5, pp. 491–508,
Sci. Softw. Eng. (JCSSE), Jul. 2016, pp. 1–6. Oct. 2000.
JIA et al.: DATA DRIVEN CONGESTION TRENDS PREDICTION OF URBAN TRANSPORTATION 591

[23] G. P. Zhang, “Time series forecasting using a hybrid ARIMA and Lei Liu received the master’s and doctoral degrees from the University of
neural network model,” Neurocomputing, vol. 50, no. 1, pp. 159–175, Bradford, Bradford, U.K., in 2005 and 2010, respectively.
2003. He is currently an Associate Professor with Shandong University, Jinan,
[24] C. Xiaodong, Y. Zhaosheng, and F. Jinqiao, “A study on the prediction China. He has authored or co-authored over 40 papers in conference pro-
technology of urban road intersection congestion,” in Proc. Int. Conf. ceedings and journals. His current research interests include software-defined
Intell. Comput. Technol. Autom., 2009, pp. 456–459. networking, network performance engineering, quality of experience, and big
[25] S. Koshman, “Information visualization: Human-centered issues and per- data enabled networking.
spectives,” J. Assoc. Inf. Sci. Technol., vol. 60, no. 12, pp. 2588–2590,
2009.
[26] W.-H. Lee, S.-S. Tseng, and W.-Y. Shieh, “Collaborative real-time traf-
fic information generation and sharing framework for the intelligent
transportation system,” Inf. Sci. Int. J., vol. 180, no. 1, pp. 62–70, 2010. Lizhen Cui received the master’s and doctoral degrees from Shandong
[27] W. Pan, Z. Chen, and X. Li, “Decryption method of Baidu map’s University, Jinan, China, in 2002 and 2005, respectively.
coordinates based on artificial neural network,” Comput. Eng. Appl., He is currently a Professor with Shandong University. He has authored
pp. 110–113, 2014. or co-authored over 50 papers in conference proceedings and journals. His
current research interests include big data management and big data analytics,
big data artificial intelligence, service computing and collaborative computing,
and software architecture and technology in the cloud.
Rui Jia is currently pursuing the undergraduate degree with Shandong
University, Jinan, China.
His current research interests include data mining and big data analytics.

Yuliang Shi received the master’s degree from Shandong University, Jinan,
China, in 2003, and the doctoral degree from Fudan University, Shanghai,
Pengcheng Jiang received the B.E. degree from Shandong University, Jinan, China, in 2006.
China, in 2015, where he is currently pursuing the M.E. degree with the He is currently a Professor with Shandong University. He has authored or
School of Computer Science. co-authored over 40 papers in conference proceedings and journals. His cur-
His current research interests include machine learning, data analysis, and rent research interests include cloud computing, large-scale data management,
artificial intelligence. and privacy protection.

You might also like