Professional Documents
Culture Documents
Data Driven Congestion Trends Prediction of Urban Transportation
Data Driven Congestion Trends Prediction of Urban Transportation
Data Driven Congestion Trends Prediction of Urban Transportation
Abstract—Smart traffic prediction system provides significant solving traffic congestion [3], [5], [6], but the smart prediction
benefits in solving the city traffic congestion. However, existing system has its unique contribution in alleviating traffic con-
smart transportation system needs a lot of real-time traffic data gestion. Assuming that people have arrived at the congestion
and accurate location information to display the traffic condi-
tion. We hope that we can use the data which is easy to be section before updating the real-time congestion information,
obtained, and then predict a reliable congestion time. To address this congestion information has little significance for peo-
this problem, this paper studied a smart traffic forecasting system ple who are in the congestion. If it is possible to predict a
based on SWARIMA model. The system includes three steps: reliable congestion time or to provide a congestion elastic
1) use the sliding windows to calculate and process real-time data interval in advance, it will be of great benefit to ease con-
stream; 2) establish the SWARIMA model and make regression
analysis; and 3) from a statistical point of view, calculate the gestion and reduce travel time. A simple and efficient smart
elastic interval and predict the congestion trend. Our system is real-time traffic prediction system will play an important role
capable of accepting the real-time traffic data stream for the to the development of cities. The smart traffic system will
congestion prediction, in addition, we reduce the actual running be not only beneficial to the city’s energy-saving and emis-
parameters to three attributes: 1) speed; 2) time; and 3) location sion reduction, but also be able to make a more reasonable
information. When faced with the challenges of real-time traffic
congestion, the system can timely and effectively calculate the and efficient plan for people’s peak travel, meanwhile it can
congestion trends and provide three reliable elastic intervals: provide constructive suggestions for the road networks of the
1) warning; 2) congestion; and 3) mitigation, which has sig- city. This paper is working on studying a simple and effective
nificance to improve traffic condition and alleviate urban road traffic prediction system, making it more effectively to predict
congestion. the whole process of congestion in advance and provide the
Index Terms—Autoregressive integrated moving aver- elastic range, so that people can know the approximate for-
age (ARIMA) model, congestion trends prediction, data mining, mation time of congestion and then reduce the possibility of
sliding window, smart traffic. congestion.
There are many ways to deal with short term traffic data
stream at present, for example, Zhang et al. [7] focused on
I. I NTRODUCTION the aggregating and sampling methods for processing GPS
ITH the acceleration of urbanization process and the data stream for traffic state estimation; Tan et al. [8] uti-
W limitations of existing road networks in city, people’s
demand for an unobstructed road has been surpassed the
lized the autoregressive integrated moving average (ARIMA)
to predict traffic stream; while Lv et al. [9] predicted the traf-
expansion speed of urban road traffic networks. So the traf- fic stream with deep learning. The traffic congestion is not
fic congestion during the peak period has become a great a simple ON–OFF problem, because its formation has a cer-
challenge for every city [1], which causes a large amount of tain trend and a number of points of mutation. But most of the
economic losses to the local city and will directly affect the predictions cannot give an elastic interval for adjustment effec-
city’s image and development of the city [2]. A 2011 U.S. tively. In addition, the traffic flow of urban roads has certain
urban transport report shows that the travel time delays and periodicity and weak stationarity [10]. For example, assuming
fuel consumption are estimated to be $121 billion in economic a section in city often has traffic jams. If it congests during
losses and it will increase to $190 billion in the future [3]. the morning peak on Monday, while at the same time of the
In China, the most influential of the top ten cities every day next Monday, it will also congest, what is more its severity
cause economic losses estimated at $1 billion because of traf- also has a certain similarity. In this paper, we found that most
fic congestion. In addition to the United States and China, of the vehicles in the traffic congestion are low-speed, and the
traffic congestion in the rest of the world is also a severe low-speed vehicles are also the main body of congestion. So
challenge [4]. There are many researches on alleviating and most of the traffic congestions are formed by rapid increase-
ment of low-speed vehicles at the beginning and while the real
Manuscript received December 31, 2016; revised May 15, 2017; accepted congestion will be formed, the number of low speed vehicles
June 3, 2017. Date of publication June 15, 2017; date of current version
April 10, 2018. This work was supported by the National Science Foundation will tend to be stable. However, the congestion relief is just the
of China under Grant 61402262 and Grant 61572295. (Corresponding author: opposite of this phenomenon, it is started by a relatively rapid
Lei Liu.) reduction in the number of low-speed vehicles, and when the
The authors are with the School of Computer Science and Technology,
Shandong University, Jinan 250101, China (e-mail: l.liu@sdu.edu.cn). real congestion will be over, the low-speed vehicles will tend
Digital Object Identifier 10.1109/JIOT.2017.2716114 to be stable again in a period of time.
2327-4662 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
582 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018
TABLE I
L IST OF N OTATIONS sliding window. While the sliding window detects the data
stream, it filter out the points that not match the window and
perform the operation of adding data. When the detection is
completed, the sliding window with data stream will speed for-
ward automatically. Repeat the above process to continuously
detect the data stream until the end.
2) Second Stage (Modeling Stage): When the detection of
one window is completed, SWARIMA uses the information
of this window to update the regression model, and tries to
use the new model to predict the attribute information of the
current point and the next point.
3) Third Stage (Calculating the Score of This Change
Process): First, we obtain the prediction scores of the current
data point, which is calculated by the regression model of the
previous window. Second, we calculate the score of current
point in the current window. And finally compare the differ-
ence between the two scores. In this paper, the probability
model is used to evaluate the difference. If the score exceeds
a certain threshold, the corresponding time of the current point
is considered to be a drastic change.
B. Preprocessing Stage
In this paper, the raw data which is used to deal with is
source from the measurements at the intersections in a desig-
nated city in China. The raw data contains five basic attributes:
1) arrived time; 2) the license plate number; 3) vehicle speed;
4) vehicle arrival direction; and 5) crossing number. In order
to convert the raw data into preprocessing file which is avail-
able for the use of the second stage expediently, we design a
preprocessor to complete it. In Fig. 3, the operation of the pre-
processor is divided into four steps: 1) grouping; 2) removal;
3) selection; and 4) statistics, while the grouping and removal
can be implemented at a same time. In order to save time
cost, each data point is read only once, and the processor use
Fig. 2. Flow diagram of SWARIMA model. online processing to deal with a large number of real-time data
streams. Specific steps are as follows.
1) First Step (Grouping): In order to simplify the subse-
are the noise error terms. Obviously, if p is equal to 0, ARIMA quent statistical work, we first group the raw data. When the
model will be simplified as MA model; if q is equal to 0, data stream arrives at a certain time, the preprocessor first
ARIMA model will be simplified as AR model [23]. groups the data according to the attributes of the data, such as
The ARIMA model shows well in prediction, so many vehicle arrival direction, crossing number, and so on.
researchers have used it for time series prediction. Because the 2) Second Step (Removal): After grouping, there are still
ARIMA model can express the difference between the data, a lot of attributes redundant in the data set. The removal step
we use it to study the change of sensitive data in real-time not only saves the time cost of the follow-up steps but also
data stream and then make accurate judgment and prediction reduces the storage cost of the system. In fact, we only need
in this paper. to read the useful data properties into the fitting array, while
ignoring other irrelevant attributes or the attributes of small
impact.
III. SWARIMA M ODEL AND A LGORITHM D ETAILS
3) Third Step (Selection): This step, actually, can be car-
A. SWARIMA Model Introduction ried out with the removal. In this paper, we mainly study the
In this paper, we propose a new method named SWARIMA, number of low speed vehicles that can respond to the change
which includes three stages that will be detailed in of congestion. Therefore, it is necessary for us to filter the
Sections II–IV. And Fig. 2 shows the flow diagram model high speed vehicles data and some strange data points. Thus,
of SWARIMA in detail. only the data points that meet the requirements of the win-
1) First Stage (Pretreatment Stage): SWARIMA process dow speed are obtained, so as to generate the standard format
the raw traffic data stream through a preprocessor and extract files. While this step is also the most important step in the
features. First, we obtain a current point as the beginning of preprocessor.
584 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018
Fig. 6. Results and the original data for evening peak. (a) Process results of Fig. 8. Results and the simulation for morning peak. (a) Process results of
SWARIMA. (b) Original real data. SWARIMA. (b) Original simulation data.
(a) (b)
Fig. 9. Results and the simulation for evening peak. (a) Process results of
SWARIMA. (b) Original simulation data.
Fig. 10. Results and the original graphs of the first section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.
Fig. 11. Results and the original graphs of the second section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.
Fig. 12. Results and the original graphs of the third section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.
Fig. 13. Results and the original graphs of the fourth section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.
stages; The second step, we need a threshold to divide the time the beginning of congestion mitigation and the end of conges-
point whether there is congestion. In this paper, we will use tion mitigation. Two images as a set, from the left image we
the change score FCS as our threshold, and this paper will can see that the red horizontal line is the threshold which we
calculate threshold of each half day to ensure the accuracy calculated and it has four intersections with the image. While
of the threshold in different images. Once the curve in the we identify these four time points on the right original data
image is below this threshold, we consider this region not image, it can be seen that the entire peak time is completely
congested. on the contrary, we believe that this region exists included by the beginning time point and the end time point.
congestion and conduct the third step analysis; The third step, In addition, each region of the predicted results not only has
a large number of data results show that there is a formation the elastic range of same characteristics, and the elastic range
period in the occurrence of each congestion and there is a mit- is also in line with the actual situation, which shows that our
igation period in the disappearance of each congestion. Both prediction scheme is accurate and effective.
these two periods have the start time point and end time point.
Therefore, after the complete processing, we can clearly find
that each threshold has four intersections with the image. And V. A PPLICATION
the meaning of these four intersection coordinates is the begin- In this paper, based on the theory of intelligent transporta-
ning of congestion formation, the end of congestion formation, tion, we process, analyze, and regress the real-time data stream
588 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018
Fig. 14. Results and the original graphs of the fifth section. (a) Morning-processed. (b) Morning-original. (c) Evening-processed. (d) Evening-original.
TABLE II
C OMPARISON B ETWEEN P REDICTIVE R ESULTS AND BAIDU M AP
of vehicles. The smart prediction system uses the intersection Fig. 16. 5:08 P. M . level 1 in GaoDe maps.
monitoring equipment to collect and process the real-time data
stream, and calculates the results by SWARIMA. Whenever
our system receives the data stream, we calculate the muta-
tion value of the number of low-speed vehicles according to
the steps of preprocessing, regression analysis, calculating the
change scores and then use the peak prediction method to
predict the elastic interval of congestion occurrence and miti-
gation. And then we update the website with the latest results
for users to query.
2) Data Used for Processing Is Different: Compared with
real-time traffic system, smart prediction system only uses the Fig. 17. 5:24 P. M . level 2 in GaoDe maps.
low-speed vehicles obtained by the preprocessing to analyze.
Because the decrease of the mean value of vehicle speed and
the increase of the number of low-speed vehicles are the main more low-speed vehicles than before, and the congestion
characteristic of traffic congestion. This not only makes the may occur.
sample data in the SWARIMA system more lightweight, saves 2) Level 2: The time point is 8:03 A . M . The road in this
a lot of time cost of system operation, but also can accurately figure was red and was the first time from yellow to red,
grasp the information and alleviation of road congestion. which indicated that it had entered the congestion state.
3) System Results: Real-time traffic system determines 3) Level 3: The time point is 8:46 A . M . and the road in
whether there is traffic congestion from the results. If the this figure turned yellow-green again while no red color
number of vehicles stranded on the road for more than a cer- appeared after that time, indicating that there was no
tain threshold, it will be judged to be congested. Therefore, congestion situation from this moment.
a sudden change in color may occur at a certain time, and We compared the predicted results (as shown in Fig. 5) with
this also means it is difficult for people to know which stage the actual congestion in the section, as shown in Table II.
of the elastic interval does the current road belongs to. The The three elastic intervals given by our system are: 1) warn-
biggest advantage of smart prediction system is to predict ing time (from 7:00 A . M . to 7:35 A . M .); 2) congestion (from
the formation of congestion in advance and to provide a 7:35 A . M . to 8:30 A . M .); and 3) mitigation (from 8:30 A . M . to
reference elastic interval for the three stages of congestion 9:15 A . M .). What is more, the above three points of time also
process. The four points provided by the system are all abrupt fit in the elastic interval shown in Fig. 5, respectively. This
points in the change process of the low-speed vehicles’number. shows that, within allowable error range, our smart prediction
According to experience, we have the following division: the system successfully predicts the congestion situation of morn-
two endpoints of the warning phase indicate that the road ing peak before it happened, which means we can predict a
has a congestion trend and the congestion will form imme- warning interval before the dramatic number incresement of
diately. The two endpoints of the mitigation phase indicate low-speed vehicles, a congestion interval before the real con-
that the road has a mitigation trend and the road will be unob- gestion happens, and a mitigation interval before the complete
structed immediately. And the time between these two phases remission.
are the real congestion phase. This results is also better in Figs. 16–18 are the whole process of congestion formation
line with people’s understanding of congestion formation and and mitigation in a section. We also divide it into three levels.
mitigation. 1) Level 1: The time point in Fig. 16 is 5:08 P. M . and at this
4) Contradistinction: We took a screenshots of Baidu map time the intersection appeared yellow first time, which
and Gaode map displayed results of the same time, and com- said the number of low-speed vehicles were more and
pared it with the results of this paper. The specific contrast is the congestion may occur.
as follows. 2) Level 2: Fig. 17 corresponds to the time point of 5:24
Three real time points in Table II are the whole process of P. M . this is the first time that the intersection appeared
congestion formation and mitigation in a section. We divide it red and then continued to be red, said the intersection
into three levels. had been in a state of congestion already.
1) Level 1: The time point is 7:25 A . M ., the road was 3) Level 3: The time point in Fig. 18 is 6:29 P. M ., then no
yellow in this figure and the yellow color appears con- longer appeared red in this intersection, indicating that
tinuously from this moment, indicating that there were there was no congestion situation from this moment.
590 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018
[23] G. P. Zhang, “Time series forecasting using a hybrid ARIMA and Lei Liu received the master’s and doctoral degrees from the University of
neural network model,” Neurocomputing, vol. 50, no. 1, pp. 159–175, Bradford, Bradford, U.K., in 2005 and 2010, respectively.
2003. He is currently an Associate Professor with Shandong University, Jinan,
[24] C. Xiaodong, Y. Zhaosheng, and F. Jinqiao, “A study on the prediction China. He has authored or co-authored over 40 papers in conference pro-
technology of urban road intersection congestion,” in Proc. Int. Conf. ceedings and journals. His current research interests include software-defined
Intell. Comput. Technol. Autom., 2009, pp. 456–459. networking, network performance engineering, quality of experience, and big
[25] S. Koshman, “Information visualization: Human-centered issues and per- data enabled networking.
spectives,” J. Assoc. Inf. Sci. Technol., vol. 60, no. 12, pp. 2588–2590,
2009.
[26] W.-H. Lee, S.-S. Tseng, and W.-Y. Shieh, “Collaborative real-time traf-
fic information generation and sharing framework for the intelligent
transportation system,” Inf. Sci. Int. J., vol. 180, no. 1, pp. 62–70, 2010. Lizhen Cui received the master’s and doctoral degrees from Shandong
[27] W. Pan, Z. Chen, and X. Li, “Decryption method of Baidu map’s University, Jinan, China, in 2002 and 2005, respectively.
coordinates based on artificial neural network,” Comput. Eng. Appl., He is currently a Professor with Shandong University. He has authored
pp. 110–113, 2014. or co-authored over 50 papers in conference proceedings and journals. His
current research interests include big data management and big data analytics,
big data artificial intelligence, service computing and collaborative computing,
and software architecture and technology in the cloud.
Rui Jia is currently pursuing the undergraduate degree with Shandong
University, Jinan, China.
His current research interests include data mining and big data analytics.
Yuliang Shi received the master’s degree from Shandong University, Jinan,
China, in 2003, and the doctoral degree from Fudan University, Shanghai,
Pengcheng Jiang received the B.E. degree from Shandong University, Jinan, China, in 2006.
China, in 2015, where he is currently pursuing the M.E. degree with the He is currently a Professor with Shandong University. He has authored or
School of Computer Science. co-authored over 40 papers in conference proceedings and journals. His cur-
His current research interests include machine learning, data analysis, and rent research interests include cloud computing, large-scale data management,
artificial intelligence. and privacy protection.