Increasing means of measuring and other factors have led to massive growth in the volume of data. The volumes can be truly staggering. Illustration 3-1: Volume examples in big data For instance, by 2016 Facebook was reported to be holding 250 billion images and 2.5 trillion posts (Gewirtz, 2018). Google reportedly processes 1.2 trillion searches per year, each of which has artificially intelligent search algorithms behind it (Direct Energy Business, 2017). An IoT example would be a single factory with 1000 temperature sensors measuring at one minute intervals, which would produce half a billion measurements per year (Gewirtz, 2018).Several factors have led to an explosion in the volumes of available data. Perhaps most important has been the exponential increase in user provided data-producing activities, such as mobile data use, Web 2.0 user provided data like social media, messaging, picture sharing, cloud storage, and the like. In addition, as discussed below, the number of connected digital sensors that collect data on things (IoT) has proliferated. Other factors exist, such as the convergence of telephony, internet, broadcasting, and other data-producing networks within dominant organizational brands, which are able to connect the massive datasets produced by these technologies and activities. All these factors have created never- before seen amounts of data on hundreds of aspects of people’s lives. Volume can challenge traditional techniques in several ways: notably, truly large datasets cannot be stored on one node and therefore cannot be analyzed on one node. However, by itself volume is not really a large problem anymore, as explained later. D. Big Data Characteristic #2: Velocity Big data is not just about volumes of data. Data are also being generated at exceedingly faster rates. Computer processing and networking, as well as increasingly fast instrumentation (such as IoT sensors in cars or appliances), are allowing the speed of data streaming to increase exponentially. It is not only about the speed of data production, however, it is also about the speed with which modern processes require us to process data. In velocity-type situations, like those seen in Illustration 3-2, volume may not be a problem at all, as you are called on to rapidly analyze relatively small amounts of data, however, it is the speed that matters. Illustration 3-2: Examples of velocity data Two examples of velocity data are fraud analysis of card swipes by banks, and algorithmic trading. Banks wish to detect possible fraud in card transactions as it is occurring, not afterwards, so predictive models that sift through huge volumes of transactions per second must quickly identify possible cases of fraud, and report this to the fraud department for immediate follow-up. Especially impressive here is the incredible worldwide network that allows a card transaction by a customer in a foreign city to travel quickly to the customer’s own bank (through several interconnected banking and cardsystems) to be analyzed immediately there for possible fraud, based on factors such as unexpected location, amount, time, and so on. Many banks can deliver a warning or inquisitive text message to their customers within seconds of the swipe, attesting to the speeds of analysis involved. Another example is algorithmic trading. Most of the world’s trades in many markets are now executed by computer algorithms. There are two success factors involved. The first is the quality of decision making programmed into the algorithm. The second, however, is the speed with which the algorithm can get and digest market information and then execute trades: both of them are velocity problems. Traditional statistical methods allowed us to analyze historical data at our leisure; however, in the digital age we are increasingly called on to analyze real-time data as it is being produced. This means that we cannot afford to store it to disc, then retrieve it later, then assess it. In addition, traditional processing may be too slow for such requirements. Velocity solutions, as discussed further below, seek to resolve these challenges. E. Big Data Characteristic #3: Variety The diversity of data has increased substantially. Whereas businesses used to have predominantly small, stable, and controllable relational datasets, many factors have broadened the variety. Unstructured data is a major example, consisting of textual data (e.g., social media streams, documents, web logs, call-center transcripts), audio data (e.g., call recordings, recorded job interviews), and pictorial data (e.g., photographs, such as those supplied by users on social media, as well as video, like surveillance footage). Unstructured data has always formed the majority of information available to humankind, but we have previously never been able to harness it in a systematic, data-like way. Increased ability to gather and analyze unstructured data has led to its explosion and to a commensurate increase in the huge variety of data, but also to the difficulty of integrating and analyzing it. Unstructured data, by definition, does not fit into conventional relational database format; is often substantial in volume (see Illustration 3-1 above); can present velocity problems (for instance where you need real-time analysis of video); and cannot be analyzed through traditional statistical means. For the reasons stated here, the real challenge and value in big data is probably inunstructured data analysis. As will be discussed below, the analysis of unstructured data is the province of artificial intelligence. F. At Least Three Other Big Data Vs There are many other features to big data. GGG variability We believe it has massive value to organizations that successfully tap it, for instance, in improving our understanding of customers. However, it throws up problems with veracity (data accuracy, integrity, abnormalities, and the like). There are a variety of other features. The important thing about big data is that major advancements are constantly being made on how to deal with it, especially from a useful data analysis point of view. Important to note here is that manual person-driven analytical techniques for big data are increasingly being replaced by machine-learning approaches, in terms of which algorithms learn how to interpret the data better and better (e.g., in the prior section we mentioned AI algorithms for assessing text, speech, visual data, and so on). 3.3.2 Solutions for Big Data As discussed below, major technologies that have developed to help with big data problems are varied and include both hardware and software solutions. A. Methods to Break up the Data & Analysis One of the predominant hardware approaches for big data analysis is to break up the problem. There are various ways to achieve this, depending on the nature of the problem, including the following three. i. Distributed & Cluster Computing Distributed computing involves networking normal ‘commodity hardware’ computers together and using software layers, such as Hadoop or Flink, to integrate the computers into one system. This has made processing and storing high volume and varied data possible. As discussed in the previous chapter, this is the model behind cloud computing centers. Distributed computing systems like cloud centers can be shared between many users, however, one can also create giant ‘cluster’ computers using the same idea, which act like massive supercomputers. ii. Edge and FOG ProcessingEdge and FOG processing involves processing data at or near its point of origin, which allows big data problems to be split up into small processing tasks. In IoT, edge can mean processing in or near the IoT device, for example, putting on-board computers into cars and trucks (a decades-old example!), using customer cellphones to process IoT sensor information, and so on. The IoT section below discusses this solution in more detail. iii. Analyzing Streaming Data Analyzing streaming data involves methods to assess small packets of data as they are being transported. Edge and FOG computing, as discussed above, usually implicitly involves streaming data analysis close to the origin of production. However, you can also wait for data packets to reach your enterprise systems, and analyze them as they arrive and before they are stored. Many banks have traditionally done fraud analysis in this way. This methodology often involves in-memory processing, as discussed next. B. In-Memory Processing In-memory processing allows data scientists to process data in computer memory (RAM or DRAM) which can make processing far faster. This can help with the power and speed of a problem. Especially promising here is the ability of in-memory analytics to help with streaming data problems as discussed above, which facilitates high-velocity data processing. C. Data Lakes Traditional storage solutions for data often involved systems, such as data warehouses, that were not designed for varieties of data. A data lake storage design allows you to store raw data of diverse types together, including totally unstructured data, allowing for integrated big data mining to be done and avoiding the need for different storage for different types of data. D. Artificial Intelligence Artificial intelligence has become a key methodology for analyzing big data, especially unstructured data. As will be discussed later in the AI section, many artificial intelligence algorithms learn from big data in order to understand pictures (computer vision), human text and speech (naturallanguage understanding), emotional data (affective computing), and many more applications. The resulting AI algorithms are then deployed for ongoing use in these applications. E. Other Analytical Advances There are also other analytical tools and techniques that have helped deal with big data, such as the software for analyzing large datasets that are spread over large distributed computer networks (e.g., MapReduce), advanced classification techniques to assist with unsupervised machine learning, and many others. These techniques have substantially improved our ability to deal with big data. 3.3.3 Organizational Applications of Big Data In general, organizations have come to understand that data can be used and analyzed on different levels. Figure 3-1 illustrates these different levels of ‘usefulness’. Figure 3-1: The analytics progression of usefulness As displayed in Figure 3-1, organizations can achieve progressive levels of usefulness with big data: 1. Descriptive analytics allows us, as a first measure, to describe simple things about the data, such as averages of variables ordistributions (e.g., number of customers in different geographies or income groups). Typically, business intelligence analysts might construct dashboards and charts from such data. As suggested in Figure 3-1, such analyses are based on historical data and, therefore, are backward-looking. 2. Inquisitive analytics (albeit, not a commonly utilized name) involves seeking patterns in past data. Statistical techniques, such as correlation and regression, achieve this aim. Here, we look for evidence that one variable (measurement) may be associated in some way with another variable, for example, that a certain set of demographic indicators may be associated with sales in a certain product. 3. Predictive analytics might be seen as the beginning of modern data science. Here, we seek to predict future patterns, behaviors, events, and so forth. These predictions are based on past patterns, therefore, they build on inquisitive analytics. Whether we are attempting to predict customer brand choices, equipment failure, employee potential, or many such organizational examples, predictive analytics has become a major area of interest and usefulness. As discussed later in the chapter, artificial intelligence holds particular possibilities for prediction. 4. In many discussions, predictive analytics is held up as the pinnacle of possibility for analytics. However, we may wish to go beyond merely predicting to prescribing possible responses to a prediction. This is the area of prescriptive analytics . Drawing on the above example of equipment or machine failure – an example to which we will frequently return in this chapter – it is all very well to predict machine failure times, but far better if the analyses involved could correctly suggest the best course of action. Perhaps in one instance the best course of action to an impending breakdown could be immediate shut-down; however, in another instance the prediction could allow enough time for the current operation to be completed (e.g., a truck to deliver its load) before a suggestion to schedule maintenance at a repair shop.5. Finally, pre-emptive analytics goes even further into the future, and into the possibilities of data usefulness, since it uses big data pattern analysis to suggest courses of action by which future organizational operations could be systematically improved. Returning to the equipment failure discussion, it is great to predict that a machine will break down and make machine-specific prescription regarding what to do about it. However, far better would be analyses informing manufacturers how to design the machines under scrutiny better in the future to avoid, or minimize, such breakdowns altogether. Big data analysis, notably with the addition of AI, has begun to achieve many of these aims across multiple domains of organizational operations. Many of the discussions in the AI section apply also to big data, and seen later in the Chapter. 3.3.4 Conclusion on Big Data Big data has become a major 4IR driving force, allowing massive advancements in the things that organizations can achieve with data, and fueling the rise of AI (as discussed later in this chapter). With big data and judicious analytical and AI techniques, organizations can learn critical patterns that affect their success, harness the power of prediction, enjoy automated guidance, and enact long-term improvements.