Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

BIG DATA VS DATA MINING

VINAMRA MITTAL

Lovely Professional University, Punjab, India

Abstract - In an Information technology world, the I. INTRODUCTION


ability to effectively process massive datasets has
With the exponential development of data
become integral to a broad range of scientific and
comes an ever-growing requirement to route and
other academic disciplines. We are living in an era of
evaluate the so-called Big Data. Heavy
data deluge and as a result, the term “Big Data” is
performance computing structures have been
appearing in many contexts. It ranges from devised to attend the needs for managing Big
meteorology, genomics, complex physics simulations, Data methods not only from an operation
biological and environmental research, finance and processing point of view but also from an
business to healthcare. Big Data refers to data streams analytics view. The most important target of this
of higher velocity and higher variety. The paper is to offer the reader with a historical and
infrastructure required to support the acquisition of complete view on the current style in the
Big Data must deliver low, predictable latency in both direction of huge performance computing
capturing data and in executing short, simple queries. architectures specially it transmit to Data
To be able to handle very high transaction volumes, Mining and Analytics .There are a series of
often in a distributed environment; and support readings discretely on Big Data (and its
flexible, dynamic data structures. Data processing is individuality), High presentation Computing for
considerably more challenging than simply locating, Massively Parallel Processing (MPP) databases,
Analytics and algorithms for Big Data. In-
identifying, understanding, and citing data. For
memory Databases, implementation of
effective large-scale analysis all of this has to happen
mechanism learning algorithms for Big Data
in a completely automated manner. This requires
proposals, the Analytics environments of the
differences in data structure and semantics to be
future, etc. though none gives a chronological
expressed in forms that are computer understandable, and broad vision of all these split topics in a
and then “robotically” resolvable. There is a strong particular document. It is the author’s first try to
body of work in data integration, mapping and bring about as several of these topics mutually
transformations. However, considerable additional as probable and to describe an ideal analytic
work is required to achieve automated error-free environment that is superior to the challenges of
difference resolution. This paper proposes a today’s analytics requirement. Modern
framework on recent research for the Data Mining production trends advise that big data
using Big Data. investigation is becoming necessary for
involuntary.
Keywords - Big Data, data mining, heterogeneity,
autonomous sources, complex and evolving
associations
II. The Most Advanced Data Mining of the Big models) are applicable, and this fact means that it is not
Data period possible to determine 2) by ignoring 1) and 3).

B. Data mining based on heterogeneous mixture


learning

NEC has developed a new heterogeneous mixture


learning technology for use in mining heterogeneous
mixture data. This technology is capable of the high
speed optimization of the three issues 1) to 3) referred to
in section 2 above by avoiding issues related to data
grouping or a sudden increase in prediction model
combinations Below, we explain the differences
between learning with the previous techniques (such as
the cross-validation or the Bayesian information
criterion) and the heterogeneous mixture learning as
shown in Fig. 2. Previous techniques calculated the
A. Issue of Heterogeneous mixture data analysis scores (information criteria) for the model candidates
One of the key points in the accurate analysis of and selected the model with the best score. However, as
heterogeneous mixture data is to break up the inherent we described in section 2 above, an unrealistic
heterogeneous mixture properties by arranging the data calculation time would be required if these techniques
in groups having the same patterns or rules. However, were applied to the learning of heterogeneous mixture
since there are a huge number of possibilities data due to the enormous number of model candidates.
(sometimes infinite) for the data grouping options, it is On the other hand, heterogeneous mixture learning is
in reality impossible to verify each and every candidate. capable of adaptive searching of issues 1) to 3), which
The following three issues are of importance in are the number of groups, the method of grouping and
arranging the data into several groups. the prediction model for each group. This makes it
1) Number of groups (How much the data is mixed) possible to find the optimum data grouping and
2) Method of grouping (How the data is grouped) prediction model by investigating models with high
3) Appropriate choice of prediction model according to prediction accuracies without searching unpromising
the properties of each group candidates. The advanced search and optimization of the
These issues cannot be solved independently or by heterogeneous mixture learning is backed by the latest
following the order from 1) to 3), but they should be machine learning theory called ―factorized asymptotic
solved simultaneously by considering their mutual Bayesian inference‖ 2)3)4).
dependences. For example, when the hypothesis is that
data contains a mixture of nonlinear and linear relationships
(Fig. 1, Left), a highly accurate prediction model can be
obtained by grouping the data into two groups (ellipse B
and ellipse A). However, when the hypothesis is that the
data contains a mixture of multiple linear relationships
(Fig. 1, Right), the optimum number of groups becomes
3. In both left and right parts of Fig. 1, the grouping
methods (ellipses) are determined by the sets of data to
which the linear (or nonlinear) relationships (prediction
III. Big Data Means specific cases based on the patterns Normally, the
objective of the data mining is either prediction or
Big data is classically below— classification. In classification, the thought is to arrange
described by the first occasionally
data into sets. For example, a seller might be attracted in
three properties referred to as the
require a fourth value three but the features of those who answered versus who didn’t
to build big data job organizations answered to a advertising. There are two divisions. In
prediction, the plan is to predict the rate of a continuous

A. Volume: massive information sets that are variable. For example, a seller might be involved in

command of size bigger than data managed in predicting those who will reply to a promotion.

habitual storage and analytical results. Imagine Distinctive algorithms used in data mining are as

petabytes rather than terabytes. follows:

B. Variety: complex, variable and Heterogeneous A. Classification trees: A famous data-mining system

data, which are generated in formats as dissimilar as that is used to categorize a needy categorical variable

public media, e-mail, images ,video, blogs, and based on size of one or many predictor variables. The

sensor data—as well as ―shadow data‖ such as outcome is a tree with links and nodes between the

access journals and Web explore histories. nodes that can be interpret to form if-then rules.

C. Velocity: Data is generated as a stable with real- B. Logistic regression: A algebraic technique that is a

time queries for significant information to be present modification of standard regression but enlarges the idea

up on claim instead of batched. to deal with sorting. It constructs a formula that predicts

D. Value: consequential insights that transport the possibility of the occurrence as a role of the

predictive analytics for upcoming trends and independent variables.

patterns from bottomless, difficult analysis based on C. Neural networks: A software algorithm that is

graph algorithms, machine learning and statistical molded after the matching architecture of animal minds.

modeling. These analytics overtake the results of The network includes of output nodes, hidden layers and

usual querying, reporting and business intelligence. input nodes. Each unit is allocated a weight. Data is
specified to the input node, and by a method of trial and
error, the algorithm correct the weights until it reaches a
IV. Data Mining for Big Data definite stopping criteria. Some groups have likened this
Data mining includes extracting and analyzing bulky to a black–box system.
amounts of data to discover models for big data. The D. Clustering techniques like K-nearest neighbors: A
methods came out of the grounds of artificial procedure that identifies class of related records. The K-
intelligence (AI) and statistics with a tad of database nearest neighbor technique evaluates the distances
management. Searching information from data takes two between the points and record in the historical data. It
major forms: prediction and description. it is tough to then allocates this record to the set of its nearest
know what the data shows?. Data mining is used to neighbor in a data group.
summarize and simplify the data in a way that we can
recognize and then permit us to gather things about
I. CONCLUSION probable factors. A system wants to be
Big data is directed to continue rising during cautiously designed so that unstructured
the next year and every data scientist will data can be connected through their
have to handle a large amount of data compositerelationships to form valuable
every year .This data will be more patterns, and the development of data
miscellaneous, bigger and faster. We volumes and relationships should help
discussed in this paper several insights patterns to guess the tendency and future.
about the subjects and what we think are the
major concern and the core challenges for II. REFERENCES
the future. Big Data is becoming the latest [1] Xindong Wu, Fellow, IEEE, Xingquan
final border for precise data research and for Zhu, senior Member,IEEE,Gong-
business applications. Data mining with big Qing,Wu,and Wei Ding, senior
data will assist us to discover facts that Member,IEEE:Data Mining with big Data
nobody has discovered before. The IEEE TRANSACTIONS ON
heterogeneous mixture learning technology KNOWLEDGE AND DATA
is an advanced technology used in big data ENGINEERING, VOL. 26, NO. 1,
analysis. In the above, we introduced JANUARY 2014
difficulties that are inherent in [2] M.H. Alam, J.W. Ha, and S.K. Lee,
heterogeneous mixture data analysis, the ―Novel Approaches to Crawling Important
basic concept of heterogeneous mixture Pages Early,‖ Knowledge and
learning and the results of a demonstration Information Systems, vol. 33, no. 3, pp 707-
experiment that deal with electricity demand 734, Dec. 2012.
predictions. As the big data analysis [3] S. Aral and D. Walker, ―Identifying
increases its importance, heterogeneous Influential and Susceptible Members of
mixture data mining technology is also Social Networks,‖ Science, vol. 337, pp.
expected to play a significant role in the 337-341, 2012.
market. The range of application of [5] FUJIMAKI Ryohei, MORINAGA
heterogeneous mixture learning will be Satoshi :The Most Advanced Data Mining
expanded broader than ever in the future. To of the Big Data Era
investigate Big Data, we have examined a [6] E. Birney, ―The Making of ENCODE:
number of challenges at the system levels, Lessons for Big-Data Projects,‖ Nature, vol.
data and model. To hold Big Data mining, 489, pp. 49-51, 2012.
highperformance computing platforms are [7] J. Bollen, H. Mao, and X. Zeng,
necessary, which enforce organized designs ―Twitter Mood Predicts the Stock Market,‖
to set free .the complete power of the Big J. Computational Science, vol. 2, no. 1,
Data. By the data level, the independent pp. 1-8, 2011.
information sources and the range of the [8] S. Borgatti, A. Mehra, D. Brass, and G.
data gathering environments, habitually Labianca, ―Network Analysis in the Social
result in data with complex conditions, such Sciences,‖ Science, vol. 323, pp.
as missing unsure values. The vital 892-895, 2009.
challenge is that a Big Data mining structure [9] J. Bughin, M. Chui, and J. Manyika,
needs to consider complicated interaction Clouds, Big Data, and Smart Assets: Ten
between data sources ,samples and models Tech-Enabled Business Trends to
along with their developing changes with Watch. McKinSey Quarterly, 2010.
time and additional
[10] D. Centola, ―The Spread of Behavior
in an Online Social Network Experiment,‖
Science, vol. 329, pp. 1194-
1197, 2010.
[11] E.Y. Chang, H. Bai, and K. Zhu,
―Parallel Algorithms for Mining Large-
Scale Rich-Media Data,‖ Proc. 17th ACM
Int’l Conf. Multi-media, (MM ’09,) pp. 917-
918, 2009.

You might also like