Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

Abstract:

Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes
in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes
large. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms
are present to enhance this process, but most are efficient only when dealing with small incomplete data.
One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG)
algorithm. This algorithm greatly improves the performance for incomplete data, but it is not designed to
find top-k dominant values in incomplete big data. Several other algorithms have been proposed to find
the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also
questionable. Algorithms developed previously were among the first attempts to apply TKD query on
incomplete data; however, these algorithms suffered from weak performance. This paper proposes
MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned
issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance
queries on large incomplete datasets. The proposed approach uses the MapReduce parallel computing
approach involving multiple computing nodes. The framework separates the tasks between several
computing nodes to independently and simultaneously work to find the result. This method has achieved
up to two times faster processing time in finding the TKD query result when compared to previously
proposed algorithms.

Existing System:

the first and easiest approach that might come to mind is to compare each two items in the dataset
individually. This naive approach implies the pair-wise comparison between the values of different
objects in the same dimension. In this approach, a comparison is required for every two values existing in
the dataset. The observed dominant value in each comparison can be used to find the final top-k dominant
values later. Indicating the top-k dominant values can be challenging when facing big data; processing
time can become astronomical even if the computational cost can be ignored. Therefore, this approach
may not be the best way of solving this problem.

Proposed System:

to apply the TKD queries with performance more acceptable and efficient than the naive approach. Based
on [2], [3], applying the top-k dominance query is possible in the incomplete dataset using Skyline query
processing. The Skyline algorithm separates the uniform values into different buckets. The buckets group
the dataset by dividing it into chunks that have same missing-value dimension. The buckets are easier and
faster to process. By having separate top-k dominant values for each bucket and combining them together,
determining the top-k values of the whole dataset becomes possible.

Conclusion :

we proposed an algorithm to apply top-k dominating queries using MapReduce framework on


incomplete big data. MapReduced Enhanced Bitmap Indexed Guided algorithm (MRBIG) is the basis of
the work that develops a new way to handle large incomplete data and uses the MapReduce framework to
enable parallel computing to manage the problem faster. the single machine algorithm has been detailed,
compared, and contrasted with the MRBIG algorithm. Based on the experiments, the single machine
algorithm cannot be an optimal way for applying TKD queries on big files. Not being resource-efficient,
process failure due to resource insufficiency, and having exponential processing time are among the
major defects when it comes to finding top-k dominant values in massive incomplete data.

REFERENCES

[1] Taxicab Fact Book, NYC Taxi Limousine Commission, New York, NY, USA, 2014.

[2] Greenhouse Gases Equivalences Calculator, USA Environ. Protection Agency, Washington, DC,
USA, 2017.

[3] After Hong Kong Failure, China’s BYD Joins Singapore Launch, South China Morning Post, Hong
Kong, 2017.

[4] E. Wilhelm et al., “Cloudthink: A scalable secure platform for mirroring transportation systems in the
cloud,” Transport, vol. 30, no. 3, pp. 320–329, 2015.

[5] D. Zhang et al., “Understanding taxi service strategies from taxi GPS traces,” IEEE Trans. Intell.
Transp. Syst., vol. 16, no. 1, pp. 123–135, Feb. 2014.

[6] S. Liu, Y. Yue, and R. Krishnan, “Non-myopic adaptive route planning in uncertain congestion
environments,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 9, pp. 2438–2451, Sep. 2015.

[7] S. Liu and S. Wang, “Trajectory community discovery and recommendation by multi-source diffusion
modeling,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 4, pp. 898–911, Apr. 2017.

[8] J. Zhao, Q. Qu, F. Zhang, C. Xu, and S. Liu, “Spatio-temporal analysis of passenger travel patterns in
massive smart card data,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 11, pp. 3135–3146, Nov. 2017.

[9] J. Yuan, Y. Zheng, L. Zhang, X. Xie, and G. Sun, “Where to find my next passenger,” in Proc. ACM
Int. Conf. Ubiquitous Comput. (UbiComp), 2011, pp. 109–118.

[10] M. Qu, H. Zhu, J. Liu, G. Liu, and H. Xiong, “A cost-effective recommender system for taxi
drivers,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD), 2014, pp. 45–54.

[11] H. Rong, X. Zhou, C. Yang, Z. Shafiq, and A. Liu, “The rich and the poor: A Markov decision
process approach to optimizing taxi driver revenue efficiency,” in Proc. ACM Int. Conf. Inf. Knowl.
Manage. (CIKM), 2016, pp. 2329–2334.

[12] C.-M. Tseng and C.-K. Chau, “Viability analysis of electric taxis using New York city dataset,” in
Proc. ACM e-Energy 8th Int. Conf. Future Energy Syst., 2017, pp. 328–333.

[13] C.-M. Tseng and C.-K. Chau, “Personalized prediction of vehicle energy consumption based on
participatory sensing,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 11, pp. 3103–3113, Nov. 2017.
[14] C.-K. Chau, K. Elbassioni, and C.-M. Tseng, “Drive mode optimization and path planning for plug-
in hybrid electric vehicles,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 12, pp. 3421–3432, Dec. 2017.

[15] T. Carpenter, A. R. Curtis, and S. Keshav, “The return on investment for taxi companies
transitioning to electric vehicles—A case study in San Francisco,” Transportation, vol. 41, pp. 785–818,
Jul. 2014.

[16] Electric Vehicle Charging Stations in New York, New York Government, New York, NY, USA,
2017.

[17] A. Furieri, “The Gaia-SINS federated projects,” Tech. Rep., 2017.

[18] Real-World Range Ramifications: Heating and Air Condition, FleetCarma, Waterloo, ON, Canada,
2014.

[19] C. Bingham, C. Walsh, and S. Carroll, “Impact of driving characteristics on electric vehicle energy
consumption and range,” IET Intell. Transp. Syst., vol. 6, no. 1, pp. 29–35, 2012.

[20] L. Feng and B. Chen, “Study the impact of driver’s behavior on the energy efficiency of hybrid
electric vehicles,” in Proc. ASME Int. Design Eng. Tech. Conf., 2013.

You might also like