Big Data

C.
Big Data Characteristic #1: Volume

Increasing means of measuring and other factors have led to massive
growth in the volume of data. The volumes can be truly staggering.
Illustration 3-1: Volume examples in big data
For instance, by 2016 Facebook was reported to be holding 250
billion images and 2.5 trillion posts (Gewirtz, 2018). Google reportedly
processes 1.2 trillion searches per year, each of which has artificially
intelligent search algorithms behind it (Direct Energy Business, 2017). An
IoT example would be a single factory with 1000 temperature sensors
measuring at one minute intervals, which would produce half a billion
measurements per year (Gewirtz, 2018).Several factors have led to an explosion in the
volumes of available
data. Perhaps most important has been the exponential increase in user
provided data-producing activities, such as mobile data use, Web 2.0 user
provided data like social media, messaging, picture sharing, cloud storage,
and the like. In addition, as discussed below, the number of connected
digital sensors that collect data on things (IoT) has proliferated. Other
factors exist, such as the convergence of telephony, internet, broadcasting,
and other data-producing networks within dominant organizational brands,
which are able to connect the massive datasets produced by these
technologies and activities. All these factors have created never- before
seen amounts of data on hundreds of aspects of people’s lives.
Volume can challenge traditional techniques in several ways: notably,
truly large datasets cannot be stored on one node and therefore cannot be
analyzed on one node. However, by itself volume is not really a large
problem anymore, as explained later.
D. Big Data Characteristic #2: Velocity
Big data is not just about volumes of data. Data are also being
generated at exceedingly faster rates. Computer processing and networking,
as well as increasingly fast instrumentation (such as IoT sensors in cars or
appliances), are allowing the speed of data streaming to increase
exponentially. It is not only about the speed of data production, however, it
is also about the speed with which modern processes require us to process
data. In velocity-type situations, like those seen in Illustration 3-2, volume
may not be a problem at all, as you are called on to rapidly analyze
relatively small amounts of data, however, it is the speed that matters.
Illustration 3-2: Examples of velocity data
Two examples of velocity data are fraud analysis of card swipes by
banks, and algorithmic trading.
Banks wish to detect possible fraud in card transactions as it is
occurring, not afterwards, so predictive models that sift through huge
volumes of transactions per second must quickly identify possible cases of
fraud, and report this to the fraud department for immediate follow-up.
Especially impressive here is the incredible worldwide network that allows
a card transaction by a customer in a foreign city to travel quickly to the
customer’s own bank (through several interconnected banking and cardsystems) to be
analyzed immediately there for possible fraud, based on
factors such as unexpected location, amount, time, and so on. Many banks
can deliver a warning or inquisitive text message to their customers within
seconds of the swipe, attesting to the speeds of analysis involved.
Another example is algorithmic trading. Most of the world’s trades in
many markets are now executed by computer algorithms. There are two
success factors involved. The first is the quality of decision making
programmed into the algorithm. The second, however, is the speed with
which the algorithm can get and digest market information and then
execute trades: both of them are velocity problems.
Traditional statistical methods allowed us to analyze historical data at
our leisure; however, in the digital age we are increasingly called on to
analyze real-time data as it is being produced. This means that we cannot
afford to store it to disc, then retrieve it later, then assess it. In addition,
traditional processing may be too slow for such requirements. Velocity
solutions, as discussed further below, seek to resolve these challenges.
E. Big Data Characteristic #3: Variety
The diversity of data has increased substantially. Whereas businesses
used to have predominantly small, stable, and controllable relational
datasets, many factors have broadened the variety. Unstructured data is a
major example, consisting of textual data (e.g., social media streams,
documents, web logs, call-center transcripts), audio data (e.g., call
recordings, recorded job interviews), and pictorial data (e.g., photographs,
such as those supplied by users on social media, as well as video, like
surveillance footage). Unstructured data has always formed the majority of
information available to humankind, but we have previously never been
able to harness it in a systematic, data-like way.
Increased ability to gather and analyze unstructured data has led to its
explosion and to a commensurate increase in the huge variety of data, but
also to the difficulty of integrating and analyzing it. Unstructured data, by
definition, does not fit into conventional relational database format; is often
substantial in volume (see Illustration 3-1 above); can present velocity
problems (for instance where you need real-time analysis of video); and
cannot be analyzed through traditional statistical means. For the reasons
stated here, the real challenge and value in big data is probably inunstructured data analysis.
As will be discussed below, the analysis of
unstructured data is the province of artificial intelligence.
F. At Least Three Other Big Data Vs
There are many other features to big data. GGG variability We believe
it has massive value to organizations that successfully tap it, for instance, in
improving our understanding of customers. However, it throws up problems
with veracity (data accuracy, integrity, abnormalities, and the like). There
are a variety of other features.
The important thing about big data is that major advancements are
constantly being made on how to deal with it, especially from a useful data
analysis point of view. Important to note here is that manual person-driven
analytical techniques for big data are increasingly being replaced by
machine-learning approaches, in terms of which algorithms learn how to
interpret the data better and better (e.g., in the prior section we mentioned
AI algorithms for assessing text, speech, visual data, and so on).
3.3.2 Solutions for Big Data
As discussed below, major technologies that have developed to help
with big data problems are varied and include both hardware and software
solutions.
A. Methods to Break up the Data & Analysis
One of the predominant hardware approaches for big data analysis is to
break up the problem. There are various ways to achieve this, depending on
the nature of the problem, including the following three.
i. Distributed & Cluster Computing
Distributed computing involves networking normal ‘commodity
hardware’ computers together and using software layers, such as Hadoop or
Flink, to integrate the computers into one system. This has made processing
and storing high volume and varied data possible. As discussed in the
previous chapter, this is the model behind cloud computing centers.
Distributed computing systems like cloud centers can be shared between
many users, however, one can also create giant ‘cluster’ computers using
the same idea, which act like massive supercomputers.
ii. Edge and FOG ProcessingEdge and FOG processing involves processing data at or near its
point
of origin, which allows big data problems to be split up into small
processing tasks. In IoT, edge can mean processing in or near the IoT
device, for example, putting on-board computers into cars and trucks (a
decades-old example!), using customer cellphones to process IoT sensor
information, and so on. The IoT section below discusses this solution in
more detail.
iii. Analyzing Streaming Data
Analyzing streaming data involves methods to assess small packets of
data as they are being transported. Edge and FOG computing, as discussed
above, usually implicitly involves streaming data analysis close to the
origin of production. However, you can also wait for data packets to reach
your enterprise systems, and analyze them as they arrive and before they are
stored. Many banks have traditionally done fraud analysis in this way. This
methodology often involves in-memory processing, as discussed next.
B. In-Memory Processing
In-memory processing allows data scientists to process data in
computer memory (RAM or DRAM) which can make processing far faster.
This can help with the power and speed of a problem. Especially promising
here is the ability of in-memory analytics to help with streaming data
problems as discussed above, which facilitates high-velocity data
processing.
C. Data Lakes
Traditional storage solutions for data often involved systems, such as
data warehouses, that were not designed for varieties of data. A data lake
storage design allows you to store raw data of diverse types together,
including totally unstructured data, allowing for integrated big data mining
to be done and avoiding the need for different storage for different types of
data.
D. Artificial Intelligence
Artificial intelligence has become a key methodology for analyzing big
data, especially unstructured data. As will be discussed later in the AI
section, many artificial intelligence algorithms learn from big data in order
to understand pictures (computer vision), human text and speech (naturallanguage
understanding), emotional data (affective computing), and many
more applications. The resulting AI algorithms are then deployed for
ongoing use in these applications.
E. Other Analytical Advances
There are also other analytical tools and techniques that have helped
deal with big data, such as the software for analyzing large datasets that are
spread over large distributed computer networks (e.g., MapReduce),
advanced classification techniques to assist with unsupervised machine
learning, and many others. These techniques have substantially improved
our ability to deal with big data.
3.3.3 Organizational Applications of Big Data
In general, organizations have come to understand that data can be used
and analyzed on different levels. Figure 3-1 illustrates these different levels
of ‘usefulness’.
Figure 3-1: The analytics progression of usefulness
As displayed in Figure 3-1, organizations can achieve progressive levels
of usefulness with big data:
1. Descriptive analytics allows us, as a first measure, to describe
simple things about the data, such as averages of variables ordistributions (e.g., number of
customers in different geographies or
income groups). Typically, business intelligence analysts might
construct dashboards and charts from such data. As suggested in
Figure 3-1, such analyses are based on historical data and, therefore,
are backward-looking.
2. Inquisitive analytics (albeit, not a commonly utilized name)
involves seeking patterns in past data. Statistical techniques, such as
correlation and regression, achieve this aim. Here, we look for
evidence that one variable (measurement) may be associated in
some way with another variable, for example, that a certain set of
demographic indicators may be associated with sales in a certain
product.
3. Predictive analytics might be seen as the beginning of modern data
science. Here, we seek to predict future patterns, behaviors, events,
and so forth. These predictions are based on past patterns, therefore,
they build on inquisitive analytics. Whether we are attempting to
predict customer brand choices, equipment failure, employee
potential, or many such organizational examples, predictive
analytics has become a major area of interest and usefulness. As
discussed later in the chapter, artificial intelligence holds particular
possibilities for prediction.
4. In many discussions, predictive analytics is held up as the pinnacle
of possibility for analytics. However, we may wish to go beyond
merely predicting to prescribing possible responses to a prediction.
This is the area of prescriptive analytics . Drawing on the above
example of equipment or machine failure – an example to which we
will frequently return in this chapter – it is all very well to predict
machine failure times, but far better if the analyses involved could
correctly suggest the best course of action. Perhaps in one instance
the best course of action to an impending breakdown could be
immediate shut-down; however, in another instance the prediction
could allow enough time for the current operation to be completed
(e.g., a truck to deliver its load) before a suggestion to schedule
maintenance at a repair shop.5. Finally, pre-emptive analytics goes even further into the
future,
and into the possibilities of data usefulness, since it uses big data
pattern analysis to suggest courses of action by which future
organizational operations could be systematically improved.
Returning to the equipment failure discussion, it is great to predict
that a machine will break down and make machine-specific
prescription regarding what to do about it. However, far better
would be analyses informing manufacturers how to design the
machines under scrutiny better in the future to avoid, or minimize,
such breakdowns altogether.
Big data analysis, notably with the addition of AI, has begun to achieve
many of these aims across multiple domains of organizational operations.
Many of the discussions in the AI section apply also to big data, and seen
later in the Chapter.
3.3.4 Conclusion on Big Data
Big data has become a major 4IR driving force, allowing massive
advancements in the things that organizations can achieve with data, and
fueling the rise of AI (as discussed later in this chapter). With big data and
judicious analytical and AI techniques, organizations can learn critical
patterns that affect their success, harness the power of prediction, enjoy
automated guidance, and enact long-term improvements.

Big Data

Uploaded by

Copyright:

Available Formats

You might also like

Big Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data

Uploaded by

Copyright:

Available Formats

C.

Big Data Characteristic #1: Volume

You might also like