Professional Documents
Culture Documents
1679 1779 1 SM
1679 1779 1 SM
1679 1779 1 SM
*
Corresponding Author: Huiyu Sun; E-mail: hs2879@nyu.edu
DOI: 10.3966/160792642018031902027
592 Journal of Internet Technology Volume 19 (2018) No.2
Figure 4. Uber pickup numbers by zones inside Manhattan (top), Brooklyn (middle) and Queens (bottom) in 2015
594 Journal of Internet Technology Volume 19 (2018) No.2
6 Applications on the Mobile Sensor technologies including Spark, Hive and MapReduce in
Network order to understand traffic and travel patterns of taxi
and Uber pickups. Each taxi and Uber acts as a sensor
node on the interconnected mobile sensor network over
6.1 Trip Suggestion NYC. We divided the city into fine-sized regions and
determined the optimal region size to cover around
For riders, given any location in NYC, we can
10,000 square meters. We then used logistic regression
suggest to them whether yellow, green or Uber is the
to train classifiers on the 2014-2015 taxi and Uber
optimal choice in terms of wait time, distance, and
dataset with a training/testing data split of 80%/20%.
chance of pickup. For taxi drivers, we can tell them to
Each trip is associated with a region and is classified
spatially avoid areas dominated by Uber and
into one of three categories based on the most active
temporally focus on regions with higher historical
service inside the region: Yellow taxi, Green taxi or
pickups. For Uber drivers, they can spatially focus on
Uber. For example, a trip classified as Yellow taxi
zones with higher number of historical pickups. Given
means that in the region surrounding the trip, Yellow
any location point in New York City, we use our
taxi is the most active service. Our classifier achieved
classifier to output a category (yellow, green or Uber)
an average accuracy of 85% tested on the 2014-2015
reflecting which service is dominating the given
monthly taxis and Uber dataset with the highest
point’s surrounding region (approximately 10,000
accuracy of 90% achieved on the April 2014 dataset.
square meters in size), allowing suggestions of service
We next used our classifier to verify Uber’s rise
based on locations which is further exploited in the
directly resulted in the decline of taxi pickups. Finally,
next sections.
we proposed applications describing how our analysis
Figure 7 shows our proposed user interface for the
results can provide actionable insights to users of our
web application. The input of the model is an address
services. Our trip suggestion application integrates the
of any location given by user in NYC, and the output is
backend system producing analytics and classification
a suggestion to user on which transportation service
results with a web front-end to provide users with
would be the optimal choice at that specific time and
meaningful visualizations and make recommendations
location in terms of historical pickup frequency. More
on future rides.
specifically, the front-end enable users to type in an
address in the search bar, and we used Google
Geocoding API to translate the address to a longitude- References
latitude coordinate that would be sent to the backend
classifier. After the backend receives the given [1] J. A. Deri, F. Franchetti, J. M. F. Moura, Big Data
coordinate, it will take an adequate amount of points Computation of Taxi Movement in New York City, 2016
around the given point and the model will take those IEEE International Conference on Big Data, Washington, DC,
random points as input to generate prediction for each 2016, pp. 2616-2625.
point. After running the classification model on [2] J. Freire, C. Silva, H. Vo, H. Doraiswamy, N. Ferreira, J.
random points, each point gets a predicted label and by Poco, Riding from Urban Data to Insight Using New York
summarizing which label has the highest count, the City Taxis, IEEE Technical Committee on Data Engineering,
model can then claim which company historically had Vol. 37, No. 4, pp. 43-55, December, 2014.
the highest pickups around that location. [3] S. Osswald, N. Brueckel, C. Brickwedde, M. Lienkamp, M.
Schoell, Taxi Checker: A Mobile Application for Real-Time
Taxi Fare Analysis, Adjunct Proceedings of the 6th
International Conference on Automotive User Interfaces and
Interactive Vehicular Applications, Seattle, WA, 2014, pp. 1-
6.
[4] V. Salnikov, R. Lambiotte, A. Noulas, C. Mascolo,
OpenStreetCab: Exploiting Taxi Mobility Patterns in New
York City to Reduce Commuter Costs, https://arxiv.org/abs/
1503.03021.
[5] J. W. Y. Chan, V. L. N. Chang, W. K. Lau, K. T. Law, C. J.
Lei, Taxi App Market Analysis in Hong Kong, Journal of
Figure 7. User interface of the map page of our
Economics, Business and Management, Vol. 4, No. 3, pp.
proposed web service application.
239-242, March, 2016.
[6] S. Ma, Y. Zheng, O. Wolfson, Real-Time City-Scale Taxi
7 Conclusion Ridesharing, IEEE Transactions on Knowledge And Data
Engineering, Vol. 27, No. 7, pp. 1782-1795, July, 2015.
We first conducted data analytics and visualizations [7] L. Poulsen, D. Dekkers, N. Wagennar, W. Snijders, B.
on the NYC taxi and Uber trip dataset using big data Lewinsky, R. Mukkamala, R. Vatrapu, Green Cabs vs. Uber
Big Data Trip Classification on the New York City Taxi and Uber Sensor Network 597
in New York City, Proceedings of the 2016 IEEE City Taxi Riders and Drivers, Proceedings of the 5th IEEE
International Congress on Big Data (BigData Congress), San International Conference on Mobile Services, San Francisco,
Francisco, CA, 2016, pp. 222-229. CA, 2016, pp. 57-64.
[8] L. Chen, A. Mislove, C. Wilson, Peeking Beneath the Hood [23] G. Pan, G. Qi, Z. Wu, D. Zhang, S. Li, Land-use
of Uber, Proceedings of the 2015 ACM Conference on Classification Using Taxi GPS Traces, IEEE Transactions on
Internet Measurement Conference, Tokyo, Japan, 2015, pp. Intelligent Transportation Systems, Vol. 14, No. 1, pp. 113-
495-508. 123, March, 2013.
[9] J. Cramer, A. B. Krueger, Disruptive Change in the Taxi [24] J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, Y.
Business: The Case of Uber, The American Economic Review, Huang, T-drive: Driving Directions Based on Taxi
Vol. 106, No. 5, pp. 177-182, May, 2016. Trajectories, Proceedings of the 18th SIGSPATIAL
[10] J. A. Deri, J. M. F. Moura, Taxi Data in New York City: A International Conference on Advances in Geographic
Network Perspective, 49th Asilomar Conference on Signals, Information Systems, San Jose, CA, 2010, pp. 99-108.
Systems and Computers, Pacific Grove, CA, 2015, pp. 1829- [25] Apache Hadoop, http://hadoop.apache.org.
1833. [26] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M.
[11] R. Ganti, I. Mohomed, R. Raghavendra, A. Ranganathan, Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B.
Analysis of Data from a Taxi Cab Participatory Sensor Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, E.
Network, MobiQuitous 2011, Copenhagen, Denmark, 2012, Baldeschwieler, Apache Hadoop Yarn: Yet Another Resource
pp. 197-208. Negotiator, Proceedings of the 4th Annual Symposium on
[12] J. Aslam, S. Lim, X. Pan, D. Rus, City-Scale Traffic Cloud Computing, Santa Clara, CA, 2013, Article No. 5.
Estimation from a Roving Sensor Network, SenSys'12, [27] Apache Spark, http://spark.apache.org.
Toronto, Canada, 2012, pp. 141-154. [28] J. Dean, S. Ghemawat, MapReduce: Simplified Data
[13] R. Mizouni, M. E. Barachi, Mobile Phone Sensing as a Processing on Large Clusters, Communications of the ACM,
Service: Business Model and Use Cases, Seventh Vol. 51, No. 1, pp. 107-113, January, 2008.
International Conference on Next Generation Mobile Apps, [29] Apache Hive, http://hive.apache.org.
Services and Technologies, Prague, Czech Republic, 2013, pp. [30] Q. Liu, W. Cai, J. Shen, Z. Fu, X. Liu, N. Linge, A
116-121. Speculative Approach to Spatial-temporal Efficiency with
[14] L. Zhang, J. Liu, H. Jiang, Y. Guan, SensTrack: Energy- Multi-objective Optimization in a Heterogeneous Cloud
Efficient Location Tracking With Smartphone Sensors, IEEE Environment, Security and Communication Networks, Vol. 9,
Sensors Journal, Vol. 13, No. 10, pp. 3775-3784, October, No. 17, pp. 4002-4012, November, 2016.
2013. [31] Z. Fu, K. Ren, J. Shu, X. Sun, F. Huang, Enabling
[15] H. Sun, S. McIntosh, Phone Call Detection Based on Personalized Search over Encrypted Outsourced Data with
Smartphone Sensor Data, 2nd International Conference on Efficiency Improvement, IEEE Transactions on Parallel and
Cloud Computing and Security, Nanjing, China, 2016, pp. Distributed Systems, Vol. 27, No. 9, pp. 2546-2559,
284-295. September, 2016.
[16] H. Sun, S. McIntosh, B. Li, Detection of In-Progress Phone [32] Z. Fu, X. Sun, Q. Liu, L. Zhou, J. Shu, Achieving Efficient
Calls Using Smartphone Proximity and Orientation Sensors, Cloud Search Services: Multi-keyword Ranked Search over
International Journal of Sensor Networks, Vol. 25, No. 2, pp. Encrypted Cloud Data Supporting Parallel Computing, IEICE
104-114, 2017. Transactions on Communications, Vol. E98-B, No. 1, pp.
[17] B. Gu, V. S. Sheng, A Robust Regularization Path Algorithm 190-200, January, 2015.
for v-Support Vector Classification, IEEE Transactions on [33] Z. Fu, X. Wu, C. Guan, X. Sun, K. Ren, Towards Efficient
Neural Networks and Learning Systems, Vol. 28, No. 5, pp. Multi-keyword Fuzzy Search over Encrypted Outsourced
1241-1248, May, 2017. Data with Accuracy Improvement, IEEE Transactions on
[18] B. Gu, X. Sun, V. S. Sheng, Structural Minimax Probability Information Forensics and Security, Vol. 11, No. 12, pp.
Machine, IEEE Transactions on Neural Networks and 2706-2716, December, 2016.
Learning Systems, Vol. 28, No. 7, pp. 1646-1656, July, 2017. [34] Z. Xia, X. Wang, X. Sun, B. Wang, Steganalysis of Least
[19] J. Li, X. Li, B. Yang, X. Sun, Segmentation-based Image Significant Bit Matching Using Multi-order Differences,
Copy-move Forgery Detection Scheme, IEEE Transactions Security and Communication Networks, Vol. 7, No. 8, pp.
on Information Forensics and Security, Vol. 10, No. 3, pp. 1283-1291, August, 2014.
507-518, March, 2015. [35] Z. Xia, X. Wang, L. Zhang, Z. Qin, X. Sun, K. Ren, A
[20] C. Yuan, X. Sun, R. Lv, Fingerprint Liveness Detection Privacy-preserving and Copy-deterrence Content-based
Based on Multi-scale LPQ and PCA, China Communications, Image Retrieval Scheme in Cloud Computing, IEEE
Vol. 13, No. 7, pp. 60-65, July, 2016. Transactions on Information Forensics and Security, Vol. 11,
[21] Y. Zhang, X. Sun, B. Wang, Efficient Algorithm for K-barrier No. 11, pp. 2594-2608, November, 2016.
Coverage Based on Integer Linear Programming, China [36] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkatarama, D.
Communications, Vol. 13, No. 7, pp. 16-23, July, 2016. Liu, J. Freeman, D. B. Tsai, M. Amde, S. Owen, D. Xin, R.
[22] H. Sun, S. McIntosh, Big Data Mobile Services for New York Xin, M. J. Franklin, R. Zadeh, M. Zaharia, A. Talwalkar,
598 Journal of Internet Technology Volume 19 (2018) No.2
Mllib: Machine Learning in Apache Spark, The Journal of in virtualization, data center energy optimization, and
Machine Learning Research, Vol. 17, No.1, pp. 1235-1241, security research. She is Guest Editor of Transactions
April, 2016. on Services Computing.
[37] R. Sen, A. Maurya, B. Raman, R. Mahta, R. Kalyanaraman, N.
Vankadhara, S. Roy, P. Sharma, Kyun Queue: A Sensor Yi Cao received the BE degree in
Network System to Monitor Road Traffic Queues, Nanjing University of Information
Proceedings of the 10th ACM Conference on Embedded Science & Technology in 2016, China.
Network Sensor Systems, Toronto, Canada, 2012, pp. 127-140. He is currently working towards the
[38] C. Wenjie, C. Lifeng, C. Zhanglong, T. Shiliang, A Realtime MS degree in Nanjing University of
Dynamic Traffic Control System Based on Wireless Sensor Information Science & Technology,
Network, ICPP 2005 Workshops, Oslo, Norway, 2005, pp. China. His research interest includes network and
258-264. information security.
[39] J. Eriksson, L. Girod, B. Hull, R. Newton, S. Madden, H.
Balakrishnan, The Pothole Patrol: Using a Mobile Sensor
Network for Road Surface Monitoring, Proceedings of the
6th International Conference on Mobile Systems,
Applications, and Services, Breckenridge, CO, 2008, pp. 29-
39.
[40] H. Sun, R. Grishman, Y. Wang, Active Learning Based
Named Entity Recognition and Its Application in Natural
Language Coverless Information Hiding, Journal of Internet
Technology, Vol. 18, No. 2, pp. 443-451, March, 2017.
[41] H. Sun, R. Grishman, Y. Wang, Domain Adaptation with
Active Learning for Named Entity Recognition, 2nd
International Conference on Cloud Computing and Security,
Nanjing, China, 2016, pp. 611-622.
[42] B. Chen, H. Shu, G. Coatrieux, G. Chen, X. Sun, J. L.
Coatrieux, Color Image Analysis by Quaternion-type
Moments, Journal of Mathematical Imaging and Vision, Vol.
51, No. 1, pp. 124-144, January, 2015.
[43] Z. Zhou, Y. Wang, Q. J. Wu, C. N. Yang, X. Sun, Effective
and Efficient Global Context Verification for Image Copy
Detection, IEEE Transactions on Information Forensics and
Security, Vol. 12, No. 1, pp. 48-63, January, 2017.
Biographies
Siyuan Hu is an undergraduate
student majoring in Computer Science
at New York University. His research
interests include distributed systems
and wireless networks.