A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques

Available online at
www.mdl.mazedan.com MCET
www.mazedan.com ©2022 Mazedan International Research Academy www.mazedan.com/mcet
MAZEDAN COMPUTER
A BIG DATA ANALYTICS STUDY: CHALLENGES, ENGINEERING TRANSACTIONS
UNRESOLVED RESEARCH ISSUES, AND e-ISSN: 2583-0414

Article id: MCET0301003
TECHNIQUES Vol-3, Issue-1
Received: 15 Dec 2021
YOGESH V. PATIL*, SAGAR O. MANJARE Revised: 25 Jan 2022
Accepted: 27 Jan 2022
Citation: Patil, Y. V., & Manjare, S. O. (2022). A Big Data Analytics Study: Challenges, Unresolved Research Issues, and
Techniques. Mazedan Computer Engineering Transactions, 3(1), 13-20.
Abstract
Every day, modern information systems and digital technologies such as the Internet of Things and cloud computing
generate massive volumes of data in the terabyte range. In order to get useful information from such vast amounts of
data, it is necessary to expend considerable time and effort at various levels of analysis. There is a lot of interest in big
data analysis at the moment. It is our goal to examine the impact that big data challenges, unsolved research questions,
and related tools may have. As a result, this article serves as a starting point for further exploration of big data.
Researchers now have a fresh starting point from which to build a solution based on the problems and unsolved research
questions.
Keywords: Big data analytics; Hadoop; Massive data; Structured data; Unstructured Data
1. INTRODUCTION In a digital world, there are numerous sources of data, and

the quick change to digital technologies has resulted in the
The fast move from digital technologies has led to the
production of big data. Massive datasets allow for
production of big data. It offers evolutionary gains in a
evolutionary progress in a wide range of fields. “Data
wide number of disciplines using enormous datasets. Data
warehouses” refer to large and complex datasets that can't
warehouses are large, complex datasets that are difficult
be processed using standard database management tools
to manage using standard database administration tools or
or data processing programmes. A significant amount of
data processing software. According to this definition,
quickly increasing and massively enormous data, as well
“big data” refers to a large amount of data that is rapidly
as a volume of data that cannot be processed using
expanding, as well as data whose volume is so large that
standard procedures, are all examples of “big data”
it cannot be handled using normal methods.
Figure 1 Characteristics of Big Data

Figure 2 The 8Vs of Big Data
Siddhant College of Management Studies, Sudumbre, Maharashtra,

India
*Corresponding author email- manyogesh@gmail.com
MAZEDAN INTERNATIONAL BUSINESS REVIEW [e-ISSN: 2583-0929] 14
The goal of big data analysis is to handle large amounts of data held in IT systems doubles every two years. Growing
data quickly and accurately utilising traditional and amounts of unstructured data Organising unstructured
computationally sophisticated methods, M. Kakhani et al data takes time. It's also hard to store and process. The
(2015). Gandomi and Haider discussed several of these latest generation of ETL and analytics technologies helps
information extraction strategies. Figure 1 shows the speed up report development. We will be able to create
definition of big data. The actual meaning of big data is real-time reports eventually, but not yet. Unstructured
unknown, and it is believed to be situation specific. data, or data in many formats, can make validation
Innovating and cost-effectively optimising decision- difficult. Data governance is the process of bringing data
making, insight discovery, and optimization together. It also assures data accuracy and usability.
According to AtScale's 2016 Big Data Maturity Survey,
The worldwide big data and business analytics market is
data governance is an increasing problem. Data security is
expected to reach $198.08 billion in 2020 and $684.12
an increasing issue. Data is increasingly being targeted by
billion in 2030. (alliedmarketresearch.com, 2022). Big
APTs and hackers. Data security, especially sensitive data
data is a potent catalyst for the third platform, which
like customer financial information, is a major concern for
combines big data, cloud computing, internet of things,
businesses. It is grouped into four categories: data storage
and social business. Often used to handle large datasets.
and analysis, knowledge discovery and computational
Obtaining precise insight from big data is critical here.
complexity, scalability and data visualisation. These
Most data mining methods struggle with large datasets.
points are briefly addressed below.
The main issue with big data analysis is the lack of
coordination between database systems and analytical 3. DATA STORAGE AND ANALYSIS
tools like data mining and statistical analysis. These issues
Mobile devices, aerial sensing, remote sensing, RFID
occur when we seek to find and express information for
readers, etc. have all increased data size in recent years.
practical use. A crucial issue is quantifying big data's core
Due to a shortage of storage capacity, some data are
properties. Data revolution needs epistemic implications,
ignored or destroyed. Large data analysis requires quicker
R. Kitchin (2014). The study of big data complexity will
storage and input/output. Data accessibility is vital for
also assist understand the fundamental features and
knowledge discovery and representation in many cases.
development of complex patterns, simplify
The key reason is to allow future research. However, it
representation, improve knowledge abstraction, and lead
performs worse on random input/output than sequential.
the design of big data computing models and algorithms.
So, SSD and PCM were created. But existing storage
Many scholars have studied big data and its tendencies,
technologies can't handle enormous data.
including V. Lopez et al (2014). However, not all large
data is valuable for analysis or decision making. Big data Big Data analysis is also hampered by data diversity. As
results are of interest to both industry and academics. This datasets grew, so did data mining tasks. Working with
article focuses on big data difficulties and solutions. large datasets necessitates data reduction, filtering, and
feature selection. New challenge for researchers. So that
2. BIG DATA ANALYTICS ISSUES
high-dimensional data may be processed fast. Automation
Massive amounts of data have been generated recently in and new machine learning methods to ensure consistency
fields such as health care and multidisciplinary scientific have proved tough. Then there is large data clustering.
study. Social computing, online text and document Recent technologies like Hadoop and MapReduce allow
storage, and search indexing all use massive data. Using for quick data collecting. One of the key engineering
ISI, IEEE Xplorer, Scopus, and Thomson Reuters. It difficulties is gaining insight from data. To do so, semi-
opens new study avenues for young scholars. But issues structured or unstructured data is transformed into
frequently lead to solutions. structured data and then mined. Das and Kumar discussed
a data analysis framework (2013). Das et al. also explored
Dealing with huge data presents several issues. Big data
data analysis for public tweets in their article. The key
analysis requires knowledge of computational difficulties,
challenge is to design storage systems and data analysis
information security, and computational methods. Many
tools that work with data from several sources. Ensuring
statistical procedures not scale well to massive data sets.
scalability and efficiency of data analysis techniques with
The same is true for various computer techniques. Many
machine learning.
scholars, A. Sethi et al., studied the difficulties facing the
health industry (2013). Most firms deploy data initiatives Theoretical Complexity and Knowledge Discovery
without appropriate planning. Projects frequently stagnate
Discovery and representation of knowledge is a large data
or provide no results. According to a 2017 NewVantage
challenge. Information retrieval is one of the most
Partners Big Data Executive study, 95% of Fortune 1000
important subfields in information security. It.Jolliffe
corporate executives had started a data project. But less
(2002) mentions a few strategies for knowledge
than half of these efforts produced a demonstrable effect.
acquisition and representation such as the fuzzy set and
Big data must be correctly understood and processed to be
rough set. To solve real-world challenges, several
successful. Before attempting to harness the power of
hybridization strategies are created. These methods are
data, organisations must have a clear aim in mind.
problem specific. A sequential computer may not be able
Frameworks and procedures must also be established.
to handle some of these strategies. Nevertheless, certain
They must also hire people with appropriate
strategies are scalable to parallel computers. Because big
understanding.
data is growing rapidly in size, the technologies available
Other Big Data Challenges include dealing with data to process it may not be efficient. Data warehouses and
growth. According to IDC's Digital Universe research, data marts are the most common huge dataset
15 A Big Data Analytics Study: Challenges, Unresolved Research… © Patil, Y. V., & Manjare, S. O. (2022)
management tools. A data warehouse stores data from Although significant study has been done on large data
operational systems, whereas a data mart stores data from security, it still needs improvement. Determining a multi-
a data warehouse and allows for analysis. level data model for huge data is difficult.
Large datasets require more compute. The key difficulty 4. KEY RESEARCH QUESTIONS IN BIG DATA
is uncertainty and contradictions in data. Computing ANALYTICS
complexity is usually modelled methodically. Big Data
Increasingly, businesses and colleges are using big data
may be difficult to mathematically model. However,
analytics. Data science is the study of big data. Big data
recognising a domain's complexity enables for
applications include machine learning, statistical learning,
straightforward data analytics. A large data analytics
pattern recognition, data warehousing, and signal
breakthrough might impact various sectors. Many studies
processing. Predicting future events with technology and
and surveys have been conducted utilising machine
analysis. This section addresses big data research
learning approaches with low memory requirements.
questions. IoT, cloud computing, bio-inspired computing,
These studies aim to reduce computing costs and
and quantum computing are key research areas in big data
difficulties. Sing (2014).
analysis. But it's not only these. Husing Kuo et al. (2013)
But existing large data analysis methods struggle with discusses further study topics linked to health care big
computational complexity, ambiguity, and inconsistency. data.
It is difficult to build strategies and systems that
IoT for Big Data
effectively cope with computational complexity,
ambiguity, and inconsistency. Among other things, the internet has recreated global links
and corporate practises. Machines are now helping to
Scalability and Data Visualization
build the Internet of Things (IoT). So, like people,
These include scalability and security. Following Moore's appliances with web browsers are internet users. Recent
Law, scholars have concentrated on speeding up data research focused on the Internet of Things' most
analysis and processing. On-line and multiresolution promising and challenging prospects. In the future,
sampling methods are necessary for the former. information, network, and communication technologies
Incremental techniques are scalable in big data analysis. must evolve. In the future, everything will be connected
Because data capacity is expanding faster than CPU and managed. Mobile devices, embedded and ubiquitous
performance, additional cores are being added to communication technologies, cloud computing, and data
processors. Jakob (2009). Parallel computing stems from analytics have made the IoT concept more feasible. It also
this processor shift. Real-time applications like navigation brings volume, pace, and diversity. It enables devices to
demand parallel processing. be located anywhere and may be used for applications
ranging from basic to crucial. Definitions, content, and
The goal of data visualisation is to show them better
differences from other similar notions are difficult to
utilising graph theory approaches. Graphical visualisation
understand. Big data and computational intelligence can
connects data to correct understanding. On the other hand,
help automate data management and knowledge
online marketplaces like Flipkart and Amazon have
discovery. Mishra, Lin, and Chang have studied this
millions of users and sell billions of products every month.
extensively. (N. Mishra, et. al., 2015).
This creates data. A corporation utilises Tableau for
massive data visualisation. It can visualise vast amounts Big data professionals' main difficulty is gaining
of data. These enable staff see search relevancy, track knowledge from IoT data. So, building IoT data analysis
customer feedback, and analyse sentiment. However, infrastructure is critical. Researchers can use machine
existing large data visualisation technologies lack learning to extract relevant information from IoT data
functionality, scalability, and responsiveness. streams. Understanding IoT data streams and processing
them to gain relevant information is a difficult task that
Big data has posed a number of challenges for hardware
leads to big data analytics. From an IoT perspective,
and software developers, resulting in parallel computing,
machine learning algorithms and artificial intelligence
cloud computing, distributed computing, visualisation,
approaches are the only option. Many research articles
and scalability. Additional mathematical models in
highlight key IoT technology (Z. G.Jin, et. al., 2012). 3
computer science are required to overcome this challenge.
shows the IoT big data and knowledge discovery method.
Cybersecurity
Massive amounts of data are correlated, examined, and
mined for patterns. Every company has distinct rules to
protect sensitive data. Data security is a fundamental
concern in big data analysis. Big data poses a tremendous
security concern, according to H. Zhu et al (2015). The
issue of data security is significant. AES-256 encryption
and authentication can protect huge data sets. The
network's scale, device diversity, and lack of an intrusion
detection mechanism H. Perez-sanchez et. al., (2014). Big
data's security issue has garnered IT security's attention.
This necessitates a multi-level security policy approach
and preventative method.
give scalability. In the cloud, users may access virtualized

computer resources on-demand. With Cloud computing,
you only pay for the resources you use. It improves
availability while saving money. Many scholars look at
data management, data storage, data processing, and
resource management. (M. D. Assuno, et. al., 2015). So,
Cloud computing assists in establishing a business plan
for all apps using tools.
Cloud-based big data analytics should help data
development. For additional processing and extraction,
data scientists and business analysts should work together
in the cloud. This can help several industries. Spark, R,
and other large data processing platforms could benefit
from cloud computing.
Figure 3 IoT Big Data Knowledge Discovery
A discussion on cloud computing uses big data. The
There are several theories of human information marketplace allows users to buy infrastructure services
processing that have inspired the development of from Google, Amazon, IBM, and a variety of software as
knowledge exploration systems, including frames, rules, a service provider such as NetSuite, Cloud9 and
tags, and semantic networks. Knowledge acquisition, Jobscience. Another benefit of cloud computing is cloud
knowledge base, knowledge distribution, and knowledge storage, which can accommodate large data sets. The
application are just four of the four main categories of obvious one is the time and cost of uploading and
knowledge. Various classical and computational downloading big data in the cloud. Otherwise, controlling
intelligence approaches are employed in the phase of computation and hardware becomes tough. The main
knowledge acquisition to uncover new information. challenges are privacy concerns with public data servers
Expert systems are often built using the newly acquired and storage of human study data. These concerns will
information, which is kept in knowledge bases. propel big data and cloud computing forward.
Information from a knowledge base can only be accessed
if the relevant information is shared. This technique Bio-inspired Big Data Analytics
extracts knowledge from papers, as well as from
Bio-inspired computing uses natural processes to solve
knowledge databases. The third step is to put the new
complicated real-world challenges. Biological systems
information into practise in a variety of ways. It is the
self-organize. Finding the ideal data service solution using
pinnacle of all of our endeavours in the pursuit of new
a bio-inspired cost reduction technique. Biological
information. Knowledge discovery is an iterative process,
molecules like DNA and proteins use these strategies to
and so is the application of that knowledge. Many
store, retrieve, and process data. It uses biologically
concerns, debates, and investigations are taking place in
produced materials to do computations and obtain
this field of knowledge discovery. This survey report does
intelligent performance. These systems are better for huge
not have the resources to address such a topic. Figure 4
data.
depicts the knowledge exploration mechanism for
enhanced comprehension. Since digitalization, huge amounts of data are created
from various web resources. Data scientists and big data
specialists will need to analyse this data and categorise
them into text, picture, and video. Big data, IoT, cloud
computing, bio-inspired computing, and other
technologies are proliferating, but only the correct
platform can analyse enormous data efficiently and
affordably.
Data analysis and large data use bio-inspired computing
approaches. Due to their optimization capabilities, these
algorithms aid in large-scale data mining. Simplistic
approach and speedy convergence to best solution for
tackling service supply difficulties. Cheng et al. detailed
several bio-inspired computer applications for this
purpose. et (2013). It appears from the conversations that
Figure 4 IoT Knowledge Exploration System
bio-inspired computer models assist handle ambiguity and
deliver wiser interactions. Thus, bio-inspired computing
Big Data Analytics on the Cloud may assist handle massive data in the future.
Virtualization made supercomputing more affordable. Quantum Data Analysis
These software-based computing infrastructures work like
actual computers, but with greater processor, memory, The memory of a quantum computer is exponentially
and operating system options. Cloud computing, a robust larger than its physical size (M. A. Nielsen and I.
big data technique, employs virtual computers. Massive L.Chuang, 2000). These computer system upgrades could
data and cloud computing technologies are designed to be feasible. If a genuine quantum computer existed now,
today's big data problems may be solved. Developing a analytics. Large-scale streaming platforms like Strom and
quantum computer may be achievable soon. Quantum Splunk Users may instantly start their own analysis in real
computing blends physics and data processing. In a time (Ingersoll, 2012).
computer, information is presented as long strings of bits
Apache Spark is a big data processing platform designed
encoding 0s or 1. Quantum bits power a quantum
for speed and analytics. It was developed in 2009 at UC
computer (qubits). A qubit is a quantum system that
Berkeley's AMPLab. It joined Apache in 2010. Java,
encodes the 0 and 1 into two states. So, it benefits from
Scala, and Python are supported by Spark. In addition to
superposition and entanglement. Quantum bits are qubits.
that, it can handle SQL queries and graph data. Spark
Traditional computers store 100 qubits as 2100 complex
extends Hadoop's distributed file system (HDFS). Worker
values. Larger quantum computers can solve massive data
nodes and driver software are included in Spark. The
difficulties much faster than conventional computers. To
driver software starts an app on the spark cluster. Assign
create a quantum computer and use quantum computing
tasks and resources to worker nodes and resources. Each
to solve large data problems is this generation's task.
application has a set of executor processes. The key
5. BIG DATA PROCESSING TOOLS benefit is that Spark programmes may run in existing
Hadoop clusters. Figure 6: Spark architecture. Following
There are several big data processing tools available. This
is a list of Apache Spark's features:
section examines modern techniques to huge data
analysis, with an emphasis on MapReduce, Apache Spark, When it comes to Spark, the primary focus is on RDD,
and Storm. The majority of available tools concentrate on which store data in-memory and provides failure tolerance
batch, stream, and interactive analysis. Apache Hadoop is without replication. As a result, it is faster and more
used by the majority of batch processing tools, including efficient, as well as more efficient.
Mahout and Dryad. They are commonly used for real-time
analytics. Two large-scale streaming platforms are Strom
and Splunk. Users can quickly participate in real-time
analysis for their own purposes.
Big data platforms like Dremel and Apache Drill offer
interactive analysis. These technologies aid us with large
data tasks. Many researchers, Huang et. al. (2015). Figure
5 depicts the usual work flow of large data project.
Figure 6 Architecture of Apache Spark
Streaming data, machine learning, and graph algorithms
are all supported in addition to MapReduce.
Additionally, the application may be executed in a variety
of languages, such as Java, R or Python. Higher-level
libraries for sophisticated analytics make this feasible.
With the help of these pre-built libraries, programmers
may design complicated work-flows with simplicity.
Spark speeds up a Hadoop cluster application by up to 100
times in memory and 10 times on storage. Because to the
Figure 5 Workflow of Big Data Project decrease in the number of discs reads and writes, this is
now feasible.
Apache Hadoop and MapReduce
For the Java virtual machine (JVM) environment, it is
The most widely used big data analysis software is developed in the Scala programming language. For
Apache Hadoop and Mapreduce. It includes Hadoop creating Spark-based apps, you can use Java, Python, or
kernel, MapReduce, HDFS, and Apache Hive. Map R.
reduction is a programming paradigm for processing huge
datasets. The divide and conquer strategy have two steps: Dryad
map and reduce. Hadoop has two types of nodes: master Additionally, it is a widely used programming model to
and worker. In map step, the master node splits the input create parallel and distributed applications that deal with
into smaller sub-problems. The master node then reduces huge dataflow graphs with context. There are nodes in a
the outputs from all subproblems. A sophisticated cluster, and the user can use the resources of a computer
software framework for big data challenges, Hadoop and cluster to operate a distributed application. Each of the
MapReduce. It also aids with fault-tolerant storage and thousands of devices that a dryad user uses has many
high-speed data processing. processors or cores, as a result. Concurrent programming
Apache Mahout does not need any knowledge of concurrent programming.
Computational vertices and communication channels
There are several big data tools. It focuses on MapReduce, make up a dryad application's computational directed
Apache Spark, and Storm for big data analysis. Most graph. Because of this wide range of capabilities, dryad
available tools include batch, stream, and interactive. can generate a job graph, schedule machines for the
Most batch processing tools use Apache Hadoop, available processes and handle transition failures in a
including Mahout and Dryad. They are used for real-time cluster, collect performance metrics, and visualise the job.
It can also dynamically update the job graph in response unstructured manner as well as real-time searching and
to user-defined policies and dynamically update the job delivering analytical results. For Splunk, providing
graph in response to these policy decisions. G. Fox and metrics for a wide range of applications, diagnosing issues
others (2012). with systems and information technology infrastructures,
as well as providing business operations with thoughtful
Storm
assistance, is of paramount importance.
Storm is a distributed computing system designed to
6. AVENUES FOR FUTURE WORK
manage massive data streams in real time. Unlike Hadoop,
which is designed for batch processing. Convenience, Every two years, the amount of data collected by various
scalability and fault tolerance contribute to its competitive apps in various sectors will double. We don't know
performance. The storm cluster and Hadoop cluster have anything about the data until we analyse it. As a result,
many similarities. To handle the various storm demands, new approaches for huge data analysis are required.
the hadoop architecture employs map reduce tasks. Map Thanks to modern computers, automated systems may be
reduction workloads and topologies vary widely. While a built. Using present and future computer architectures'
map-reduce operation will eventually complete, the parallelism for data mining is not an easy task. Also, this
topology will continue to process messages until the user data may be susceptible to several uncertainties. Fuzzy
stops it. A storm cluster's two major components are sets, rough sets, soft sets, neural networks, their
worker and master nodes. The master and worker nodes extensions, and hybrid models built by combining two or
are responsible for nimbus and supervisor functions. The more of these models can all be used to represent data.
map reduction structure compares the two jobs. Worker These models include a plethora of information.
nodes are allocated tasks using the Nimbus scheduling Condensed big data often only contain attributes relevant
system, which also monitors the system. The supervisor to a certain study or application field. These waste
accomplished responsibilities given by Nimbus. nimbus reduction options are currently accessible. MISSING
also commands it to start and stop the process. Because VALUES: To analyse the data, these values must be
the computing technology is partitioned among them, produced or the tuples with these missing values must be
each worker process implements a distinct component of eliminated. These additional challenges may compromise
the topology. or worsen the performance, efficiency, and scalability of
data-intensive computing systems. This procedure can
Apache Drill
cause data loss and is not advised. This creates a variety
Drilling into massive data using Apache Drill is a of research problems in the corporate and academic
distributed system for interactive analysis. In terms of communities. Another problem is handling massive
query languages, formats, and sources, it is more versatile. volumes of data fast while maintaining high speed and
Additionally, it's built to make use of nested data. The goal throughput. Big data analysis also demands advanced
is to have 10,000 servers or more and to be able to handle programming. Parallelism abstractions and data access
terabytes of data and trillions of records in seconds in a abstractions for applications are urgently needed. (S.
matter of minutes. In order to execute batch analysis, Drill Dehuri, et. al., 2015).
uses HDFS for storage and map reduce.
Additionally, academics are increasingly using machine
Jaspersoft G. learning principles and technologies in order to get
meaningful outcomes from these ideas. Data processing,
Open-source software, the Jaspersoft package generates algorithm implementation, and optimization have been the
reports from database columns. Scalable big data analytics
primary emphasis of machine learning research in the big
platform with rapid data visualisation on popular storage
data field. In order to make use of many of the new
platforms as MangoDB, Cassandra, Redis, and others.
machine learning techniques for big data, major changes
Capable of quick data visualisation on these platforms. are required. We believe that while each of these
Jaspersoft's ability to swiftly examine large datasets technologies has strengths and weaknesses, more effective
without the need for data extraction, transformation, or
solutions may be created to address the challenges of big
loading is an essential feature (ETL). Another feature is
data. There must be a way to deal with data that is both
its ability to produce rich HTML reports and dashboards
noisy and imbalanced, as well as with ambiguity and
straight from a large data store without the need for ETL
inconsistency.
processes. Everyone in or outside the user's company has
access to these created reports. 7. CONCLUSION
Splunk In the last several years, the amount of data being created
has increased dramatically. For the average person, sifting
Increasing amounts of business industry data are being
through these numbers might be a challenge. To this
created by machines in recent years. Machine-generated
purpose, we present an overview of the numerous research
big data may be exploited using Splunk, a real-time and
concerns, obstacles, and techniques utilised to study these
intelligent platform. Cloud computing and big data are large data sets in this work. According to the results of this
combined in this system. In turn, it facilitates the web- study, each big data platform has a distinct focus. Batch
based search, monitoring, and analysis of user-generated
processing and real-time analytic capabilities may be
data. Graphs, reports, and alerts provide as visual
found in some of these tools. Additionally, each big data
representations of the data. In comparison to other stream
platform has its own set of features. Statistical analysis,
processing technologies, Splunk is unique. These involve
machine learning, intelligent analysis, cloud computing,
indexing machine produced data in an organised and quantum computing, and data stream processing are
utilised for analysis. Big data analytics is a growing Distributed Computing, 74(7) (2014), pp.2561-
problem for many businesses and organisations. Big data 2573.
analytics may provide a company an advantage over its [13] M. A. Nielsen and I. L.Chuang, Quantum
competitors if used correctly. Businesses may also use it Computation and Quantum Information,
to improve their goods and operations, decrease expenses, Cambridge University Press, New York, USA
and keep their customers happy. This field is anticipated 2000.
to continue to increase in importance as more money and
[14] M. D. Assuno, R. N. Calheiros, S. Bianchi, M. a.
effort is invested in the development of the technology. As
S. Netto and R. Buyya, Big data computing and
a result of these strategies' effectiveness and efficiency,
clouds: Trends and future directions, Journal of
we believe researchers will focus more on them in the
Parallel and Distributed Computing, 79 (2015),
future.
pp.3-15.
REFERENCES [15] M. Herland, T. M. Khoshgoftaar and R. Wald, A
[1] Gandomi and M. Haider, Beyond the hype: Big review of data mining using big data in health
data concepts, methods, and analytics, informatics, Journal of Big Data, 1(2) (2014), pp.
International Journal of Information Management, 1-35.
35(2) (2015), pp.137-144. [16] M. K. Kakhani, S. Kakhani and S. R.Biradar,
[2] Jacobs, The pathologies of big data, Research issues in big data analytics, International
Communications of the ACM, 52(8) (2009), Journal of Application or Innovation in
pp.36-44. Engineering & Management, 2(8) (2015), pp.228-
232.
[3] L. Philip, Q. Chen and C. Y. Zhang, Data-intensive
applications, challenges, techniques and [17] MH. Kuo, T. Sahama, A. W. Kushniruk, E. M.
technologies: A survey on big data, Infor- mation Borycki and D. K. Grunwell, Health big data
Sciences, 275 (2014), pp.314-347. analytics: current perspectives, challenges and
potential solutions, International Journal of Big
[4] Shi, Y. Shi, Q. Qin and R. Bai Swarm intelligence
Data Intelligence, 1 (2014), pp.114-126.
in big data analytics, H. Yin, K. Tang, Y. Gao, F.
Klawonn, M. Lee, T. Weise, B. Li and X. Yao [18] N. Mishra, C. Lin and H. Chang, A cognitive
(eds.), Intelligent Data Engineering and adopted framework for IoT big data management
Automated Learning, 2013, pp.417-426. and knowledge discovery prospective,
International Journal of Distributed Sensor
[5] P. Acharjya, S. Dehuri and S. Sanyal
Networks, 2015, (2015), pp. 1-13
Computational Intelligence for Big Data Analysis,
Springer International Publishing AG, [19] P. Singh and B. Suri, Quality assessment of data
Switzerland, USA, ISBN 978-3-319-16597-4, using statistical and machine learning methods. L.
2015. C.Jain, H. S.Behera, J. K.Mandal and D.
P.Mohapatra (eds.), Computational Intelligence in
[6] H. Li, G. Fox and J. Qiu, Performance model for
Data Mining, 2 (2014), pp. 89-97.
parallel matrix mul- tiplication with dryad:
Dataflow graph runtime, Second International [20] R. Kitchin, Big Data, new epistemologies and
Conference on Cloud and Green Computing, 2012, paradigm shifts, Big Data Society, 1(1) (2014),
pp.675-683. pp.1-12.
[7] H. Zhu, Z. Xu and Y. Huang, Research on the [21] R. Nambiar, A. Sethi, R. Bhardwaj and R.
security technology of big data information, Vargheese, A look at challenges and opportunities
International Conference on Information of big data analytics in healthcare, IEEE
Technology and Management Innovation, 2015, International Conference on Big Data, 2013,
pp.1041-1044. pp.17-22.
[8] https://www.alliedmarketresearch.com/big-data- [22] S. Del. Rio, V. Lopez, J. M. Bentez and F. Herrera,
and-business-analytics-market, 2022 On the use of mapreduce for imbalanced big data
using random forest, Information Sciences, 285
[9] Merelli, H. Perez-sanchez, S. Gesing and D.
(2014), pp.112-137.
D.Agostino, Managing, analysing, and integrating
big data in medical bioinformatics: open problems [23] T. K. Das and P. M. Kumar, Big data analytics: A
and future perspectives, BioMed Research framework for unstructured data analysis,
International, 2014, (2014), pp.1-13. International Journal of Engineering and
Technology, 5(1) (2013), pp.153-156.
[10] T.Jolliffe, Principal Component Analysis,
Springer, New York, 2002. [24] T. K. Das, D. P. Acharjya and M. R. Patra, Opinion
mining about a product by analyzing public tweets
[11] Ingersoll, introducing apache mahout:
in twitter, International Conference on Computer
Scalable, commercial friendly machine learning
Communication and Informatics, 2014.
for building intelligent applications, White Paper,
IBM Developer Works, (2009), pp. 1-18. [25] X. Jin, B. W.Wah, X. Cheng and Y. Wang,
Significance and challenges of big data research,
[12] K. Kambatla, G. Kollias, V. Kumar and A. Gram,
Big Data Research, 2(2) (2015), pp.59-64.
Trends in big data analytics, Journal of Parallel and
[26] X. Y. Chen and Z. G.Jin, Research on key

technology and applications for internet of things,
Physics Procedia, 33, (2012), pp. 561-566.
[27] Z. Huang, A fast clustering algorithm to cluster
very large categorical data sets in data mining,
SIGMOD Workshop on Research Issues on Data
Mining and Knowledge Discovery, 1997.

A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques

Uploaded by

Copyright:

Available Formats

Available online at

UNRESOLVED RESEARCH ISSUES, AND e-ISSN: 2583-0414

1. INTRODUCTION In a digital world, there are numerous sources of data, and

Figure 1 Characteristics of Big Data

Siddhant College of Management Studies, Sudumbre, Maharashtra,

give scalability. In the cloud, users may access virtualized

[26] X. Y. Chen and Z. G.Jin, Research on key

You might also like