Database and Analytics





During the 1980s, data warehousing technology that was categorized as being only

analytical processing (OLAP) was introduced by the relational database management systems

organizations to offer support for business decisions as well as business intelligence (Kwon, Lee

& Shin, 2014). It was initially developed to archive huge amounts of data linked to the

production database as well as keep them lean and mean for effective performance. When it

comes to data warehousing, several copies of the data are located on several database service

servers that are known as a data mart. The data mart, in this case, could be independent and in

other cases an enterprise data mart. In this case, these data are subsequently extracted and loaded

into several analytical data marts. The data analysts tend to develop their algorithms that are

employed in the running of their jobs.

Data analytics

Data analytics is deemed to be the science that entails the analysis of raw data as a way of

making conclusions relating to available information (Kambatla, et al., 2014). Most of the

techniques as well as processes that are involved in data analytics have been automated into

mechanical processes along with algorithms that seem to work over raw data for human

consumption. Data analytics strategies are used to reveal trends along with metrics that would in

other cases be deemed to have been lost in the mass of information (Raghupathi & Raghupathi,

2014). The information can subsequently be used for the optimization processes as a way of

enhancing the overall efficiency of a business system.


According to Kambatla et al., (2014), data analytics is an extensive term that entails an

assortment of different forms of data analysis. Any form of information that is subjected to data

analytics strategies to generate insights that can be used to improve things. For instance,

companies on several occasions record the downtime, runtime as well as a work queue for the

different machines and subsequently analyze the data to effectively plan the workloads so that

the machines can function close to their peak capacity (Kwon, Lee & Shin, 2014).

Gandomi & Haider, (2015) asserts that data analytics is deemed to have the capability of

doing more than indicate bottlenecks evident in the production. Gaming companies rely on data

analytics to establish reward schemes for the players to maintain most of the players active in the

game. Content companies, on the other hand, use most of the same data analytics as a way of

keeping uses watching, clicking, and reorganizing content to get an additional view or click

(Belle, et al., 2015).

The data analytics process is characterized by some core elements that are required for

any initiative. Through the combination of these elements, a successful data analytics initiative

offers a clear depiction of where an organization is, where they have been as well as where they

ought to proceed. In general, the process starts with descriptive analytics (Najafabadiet al.,

2015). The process encompasses the description of historical trends that are evident in the data

being examined. Descriptive analytics seeks to answer the question of what happened. The

attribute entails an examination of the traditional indicators like return on investment, although

the indicators that are used seem to differ depending on the industry (Najafabadi, et al., 2015). It

follows that descriptive analytics do not make a prediction or directly be used to inform

decisions since its main focus is on summarizing data in a meaningful descriptive manner.

The next integral section of data analytics is advanced analytics. It forms the part of data

science that takes advantage of the advanced tools that are meant to extract data, reach

predictions and establish trends. The tools encompass the classical statistics in addition to

machine learning (Tsai, et al., 2015). The machine learning technologies that include neural

networks, sentiment analysis, and natural language processing among others make it possible for

advanced analytics to succeed. The information offers a new insight from data, addressing the

"what" if queries (Tsai, et al 2015).

Access to machine learning strategies, huge sets of data and cheap computing power has

facilitated the use of these methods in most industries (Hu, et al., 2014). The collection of

extensive data sets has been integral in allowing for the use of these strategies. It follows that

data analytics makes it possible for businesses to attain important conclusions from complex as

well as different sources of data which has been promoted by advances in the parallel processing

along with cheap computational power (Hu, et al., 2014).

Data analytics encompasses the processes that involve the evaluation of data sets to aid

the realization of conclusions relating to the information they contain, through the support of

specialized systems and software. Data analytics technologies, as well as strategies, are

extensively used in the commercial industries to aid organizations to realize more informed

business decisions in addition to scientists and researchers who use it to verify or disapprove

scientific theories, models and hypotheses (Belle, et al., 2015).

The term data analytics is mainly used in referring to several applications that range from

the basic business intelligence, reporting and online analytical processing to the reporting as well

as online analytical processing to the diverse forms of advanced analytics (Wang, Kung & Byrd,

2018). In this case, it is identical to business analytics which is an additional umbrella term that

is employed for approaches that are used for data analysis, with the sole difference being in the

fact that business analytics is directed at business uses whereas data analytics is used at a broader

perspective. The extensive view of the term data analytics cannot be considered universal

although in some instances people tend to employ the term data analytics when specifically

meaning advanced analytics, thus treating business intelligence as a separate category (Wang,

Kung & Byrd, 2018).

Data analytics initiatives are employed by businesses to aid increase their revenues,

improve their operational efficiency, optimize their marketing campaigns along with customer

services efforts, react fast to emerging market tend and further enjoy a competitive edge over

their rivals, all of which are directed at the objective of boosting the overall business

performance (Ousterhout, et al., 2015). Depending on the specific application, data that has been

analyzed could comprise of either historical records or new information that has been

operationalized for real-time uses. Further, the information could come from a mix of internal

systems as well as external sources of data (Ousterhout, et al., 2015).

Forms of data analytics applications

At a superior level, methodologies of data analytics include the exploratory data analysis

that seeks to determine patterns as well as relationships in data along with confirmatory data

analysis which employs statistical strategies in the determination of whether hypotheses relating

to a data set are accurate or false (Wamba, et al., 2017). Data analytics could additionally be

separated into either qualitative or quantitative data analysis. Quantitative data analysis entails

the analysis of numerical data using quantifiable variables that can be measures or compared

statistically. The qualitative approach, on the other hand, is more interpretative in the sense that it

seeks to understand the account of non-numerical data as images, text, video, and audio

encompassing the common phrases, themes as well as point of view (Wamba, et al., 2017).

At an application level, business intelligence offers business executives along with the

other corporate workers with actionable information relating to key performance indicators,

customers, business operations among others. In the past, data queries, as well as reports, were

commonly created for end users by the business intelligence developers who worked in IT or

centralized business intelligence team (Ghazal, et al., 2013). Presently, organizations are

increasingly using self-service business intelligence tools that allow the executives, business

analysts as well as operational workers to run their ad hoc queries and further create reports

themselves (Elgendy & Elragal, 2014).

The more advances forms of data analytics include data mining that encompasses the

process of sorting through large sets of data to identify patterns, trends as well as relationships,

predictive analytics that seeks to anticipate customer behavior, equipment failure as well as

another possible failure (Sun, et al., 2016). The other forms of data analytics are machine

learning which is an artificial intelligence technique that employs automated algorithms to churn

through data sets faster than data scientists can through conventional analytical modeling. Big

data analytics uses predictive analytics, data mining, and machine learning tools to sets of big

data that commonly contain unstructured and semi-structured data. Text mining offers a mode of

analyzing emails, documents along with other text-based content (Ghazal, et al., 2013).

Data analytics initiatives tend to offer support for a huge assortment of business uses.

For instance, banks as well as the credit card company's analyze customer withdrawal as well as

spending patterns as a way of preventing fraud and identity theft (Elgendy & Elragal, 2014).

Ecommerce companies, as well as marketing service platforms, do clickstream analysis as they


seek to identify the website visitors who have a higher likelihood of buying a specific product or

service depending on the navigation and page viewing patterns (Sun, et al., 2016). Mobile

network operators, on the other hand, evaluate customer data to ensure they forecast churn thus

allowing them to take steps that will be used to prevent defections to their rivals, boost their

customer relationship management efforts. Other companies tend to engage in customer

relationship management analytics that seeks to segment customers for marketing campaigns and

further equip their call center employees with up-to-date information relating to their callers.

Healthcare organizations on the other hand mine patient data that is used in the evaluation of the

effectiveness of diverse treatment measures as cancer (Sun, et al., 2016).

The data analytics process

Data analytics application encompasses more than mere data analysis. Especially when it

comes to the case of advanced analytics projects, most of the needed work is conducted upfront,

entailing the collection, integration, and preparation of data followed by the developing, testing

as well as revising analytical models to guarantee that they generate results. Besides the data

scientists along with other data analysts, the analytics team commonly includes data engineers

whose job is to help ensure that data sets are ready for analysis (Gupta & George, 2016).

Forms of data analytics

Data analytics is an extensive field that has four fundamental forms including diagnostic,

descriptive and prescriptive analytics. Each of these forms has a different objective and different

place in the process of data analysis (Gupta & George, 2016).

The descriptive data analytics aids answer the question of what happened. The strategies

tend to summarize huge datasets as they seek to describe outcomes to the stakeholders. Through

the creation of key performance indicators, the strategies can aid the process of tracking

successes or failures. Metrics like return on investment are employed in most industries

(Marjani, et al., 2017). Further, specialized metrics are used in evaluating performance in some

industries, with the process demand collection of relevant data, processing of the data, analysis

as well as visualization. The process offers the necessary insights into past performance

(Marjani, et al., 2017).

Diagnostic analytics on the other hand answer the question of the reason that things

happened. The techniques seek to supplement the more basic descriptive analytics. They adopt

the findings attained from descriptive analytics and dig more intensively to establish the cause.

Performance indicators are additionally evaluated to help them discover the reason they became

either better or worse (Qin, 2014). The attribute occurs in three phases; the identification of

anomalies that are evident in the data which could be the unexpected changes in a metric or

specific market, secondly, the data related to the identified anomalies and lastly the statistical

techniques along with machine learning methodologies like decision trees, neural networks, and

regression (Qin, 2014).

Prescriptive analytics aids in answering questions on what needs to be done. Through the

use of insights that have been generated from predictive analytics, it becomes possible for data-

driven decisions to be made (Loebbecke & Picot, 2015). The attribute makes it possible for a

business to reach informed decisions when faced with uncertainty. Prescriptive analytics

methodologies depend on machine learning strategies that have the capability of establishing

patterns in huge datasets. Through the analysis of the past events as well as decisions, the

likelihood of diverse outcomes can be estimated (Loebbecke & Picot, 2015).


Some applications of data analytics

The adoption of data analytics is considered to be extensive. Analysis of the data can

optimize efficiency in numerous industries. Improving performance makes it possible to succeed

in an increasingly competitive world since data analytics can be employed with immense success

in an assortment of fields (Ahmed, et al., 2017).

One of the earliest parties to adapt data analytics was in the financial sector (Qin, 2014).

Data analytics exhibits a significant role in banking as well as financial industries as it is

employed in the prediction of market trends as well as examines risk. Credit scores are one of the

examples of the ways in which data analytics that impact everyone. The scores rely on numerous

data points to establish the risk of lending in addition to the fact that it is used in detecting and

prevention fraud meant to improve efficiency and curtail risk for financial institutions (Ahmed, et

al., 2017).

The utilization of data analytics goes beyond the maximization of profits and return on

investment. Data analytics can offer vital information to be used in the healthcare sector in the

form of health informatics, prevention of crime as well as protection of the environment. The

application of data analytics relies on these strategies to improve the world (Kelleher, Mac Namee,
& D'arcy, 2015). 

While statistics and data analysis have been employed by research, advanced analytics

strategies along with big data make it possible to attain numerous new insights. These strategies

can establish trends in complex systems, with scientists, for instance, relying on machine

learning for the protection of wildlife (Kelleher, Mac Namee, & D'arcy, 2015). The use of data

analytics in the healthcare sector is already extensive. Prediction of patient outcomes, efficiency

when it comes to the allocation of funds and improvement of diagnostics are some of the few

examples of the ways in which data analytics has been revolutionizing healthcare (Duan & Xiong,

2015). The pharmaceutical industry is additionally being revolutionized by machine learning,

through the improvement of drug discovery while the pharmaceutical companies rely on data

analytics to comprehend the market for drugs and predict sales (Duan & Xiong, 2015).

The internet of things is the other field that has been exploding alongside machine

learning. The devices offer an excellent opportunity for data analytics. IoT devices commonly

contain numerous sensors that gather meaningful data points for their operations (Akter, et al.,

2016). Devices as a Nest thermostat assess movement and temperature to regulate the overall

heating and cooling. Smart devices further utilize data to learn from and subsequently predict an

individual's behavior. The outcome is that there is access to advanced home automation that can

adapt the manner in which one lives (Akter, et al., 2016).

According to Zhou, (2014), irrespective of whether the focus is on fine-tuning supply

chains, assessing operations on the floor operation, examining consumer sentiment or anything

else that relates to the large scale analytic challenges, big data has been exerting a significant

impact on the enterprise. The degree of business data that is being generated has increased

steadily over the years and more and more forms of information are stored in digital formats

(Zhou, et al., 2014). One of the main challenges encompassing determining how to deal with the

new types of data sources, transactions, selected events or blog posts. Collecting different types

of data very fast does not add any value. It is imperative to integrate analytics as it will aid in

uncovering the underlying insights that are going to add value to the business (Zhou, et al., 2014).

Out of the numerous diverse data models, the relational model has dominated since the

80s with implementations like MySQL, Oracle databases as well as Microsoft servers known as

relational database management systems (Das & Kumar, 2013). In the recent past, however, in an

ever-increasing volume of cases of relational databases that leads to challenges both due to

deficits as well as challenges in data modeling and constraints attributed to horizontal scalability

over an assortment of servers and the huge amounts of data. Two core trends have been

generating challenges to the database and analytics fields (Slavakis, Giannakis & Mateos, 2014).

The first one is the case of the exponential growth of data volume that is being generated by

users, sensors, and systems and further accelerated by the concentration of a huge portion of the

volume in the big distributed systems like Google, Amazon and other cloud services. The ever

increasing interdependency as well as the complexity of data that is accelerated by web 2.0, the

internet, social networks as well as the open standardized access to data sources from a huge

number of diverse systems (Slavakis, Giannakis & Mateos, 2014).

Entities that tend to college large amounts of unstructured data are increasingly turning to

non-relational databases that are presently referred to as the NoSQL databases. The NoSQL

databases tend to focus on the analytical processing of huge scale datasets, providing elevated

scalability over commodity hardware (Puiu, et al., 2016). Computational along with storage needs

of applications as is the case for big data analytics, social networking and business intelligence

over petabyte datasets have pushed the SQL like centralized databases to their limits. The

implication has resulted in the development of horizontally scalable, distributed non-relational

data stores that are called No-SQL databases like Google BigTable as well as open-source

implementation HBase (Verhoef, Kooge, & Walk, 2016). The emergence of distributed key-value

stores like Voldermort and Cassandra highlights the efficiency and cost-effectiveness of their

approaches. The major limitation that is associated with RDBMS encompasses the challenge of

scaling with data warehousing, Web 2.0, Grid and cloud applications (Verhoef, Kooge & Walk,

Big data analytics

Big data is used in reference to the colossal data volumes that cannot be processed

effectively via the use of the traditional applications that are present. The processing of big data

starts with the raw data that is not aggregated and is on most occasions impossible to store in

memory of a unitary computer (Xiang, et al., 2015). The buzzword that is employed to describe the

huge volumes of data, both structured as well as unstructured, big data floods a business on a

daily basis. Big data is an attribute that can be employed in the analysis of insights that result in

superior decisions and strategic business moves.

The big data analysis predominantly entails analytical methods of big data, systematic

architecture of big data as well as big data mining besides software analysis. Data investigation,

in this case, is the most integral step in big data, used for the exploration of meaningful values,

offering suggestions and decisions (Raghupathi & Raghupathi, 2014). It, however, follows that

analysis of data is an extensive area s that is dynamic and extremely complex.

Traditional data analysis

The traditional data analysis implies the effective utilization of statistical methods for

extensive data analysis in the exploration as well as the elaboration of concealed data of the

complex data set to ensure that the value of data can be maximized. Data analysis presents a

guide of the diverse plans of development for a country, anticipating customer demands as well

as forecasting the trends of the market for the organization (Gandomi & Haider, 2015). Big data

analytics can be stated as a strategy of analysis of a special form of data. Thus most of the

traditional methods are still employed for big data analytics.


Big data analytics methods

In the big data ear, everyone wants to focus on the exciting core value as well as

information from the extensive dataset to attain their organizational objectives. Presently, the

main methods that are employed in big data analytics include:

The core concept of bloom filter method follows that bit arrays are employed in the

storage of hash values. Bit arrays encompass the bitmap index for the storage of lossy

compression of harsh elements. The main advantage that characterizes the method is the fact that

it could be space high space efficiency in addition to the high query speed. The disadvantage is

that it tends to misidentify values (Tsai, et al., 2015).

Hashing exists as a method that mutates data into small as well as numeric values. The

advantages of hashing are the fact that it can be associated with fast reading, writing as well as

querying speed although it is difficult to compute the appropriate has to function (Tsai, et al.,

2015). The index is considered to be an efficacious method that is used for cutting the disk

reading cost beside disk writing cost and enhancing the speed of query deletion, insertion, and

modification. The demerit of the method, however, is associated with an extra cost of storing

index files (Tsai, et al., 2015).

Triel entails a method that is mostly employed in cases of fast retrieval with the method

being used for improving the efficiency of the query and common prefixes of strings of

characters being used to minimize comparison (Elgendy & Elragal, 2014). Parallel computing is

deemed to be in contrast to the serial computing whereby it refers to the use of resources

simultaneously to complete the task. The fundamental method is for the fragmentation of a

problem as well as the allocation of diverse processes for the attainment of co-processing

(Elgendy & Elragal, 2014).


The idea of big data is used in the context of datasets that cannot be categorized,

accessed, managed, analyses and even processes by the present tools. There have been different

definitions of big data that have been provided by diverse users of big data along with the diverse

analysts of big data as research scholars, technical practitioners, and big data analysts (Gandomi &

Haider, 2015). The most acceptable/ extensive definition of big data is in the context of a dataset

that cannot be captured, managed moreover processed by the general computers within an

acceptable scope.

Big data technologies are used to offer a description of a new generation of technologies

along with architectures that are designed to economically extract value associated with very

large volumes of an extensive variety of data via enabling the high-velocity capture, discovery

besides analysis of the data (Belle, et al., 2015). Based on this context, it follows that big data

attributes can be high volume, various forms, and structures of data, quick development as well

as having great value yet with minimal similarity.

The 4Vs definition tends to generate light on the meaning of big data in the context of the

assessment of concealed values. The definition highlights the most pertinent aspect of big data,

which exemplifies new values that are created from datasets (Ousterhout, et al., 2015).

The big data processing model

The META group research offers a three-tier structure highlighting the structure of the

big data mining model. In the case of Tier 1, the emphasis is on the low-level data accessing as

well as computing. In tier II, the emphasis is on information sharing along with privacy, with the

domains along with knowledge of Big data application. In the case of Tier III, it supports the

mining algorithms (Wamba, et al., 2017).


Significance of Big Data and Data analytics

Big data analytics is deemed a commonly complex process that involves the examination

of large as well as varied sets of data sets to uncover the underlying information as the hidden

patterns, market trends, unknown correlations and customer preferences that could organizations

reach informed business decisions (Ahmed, et al., 2017). On an extensive scale, data analytics

technologies along techniques offer a means of analyzing data sets and drawing conclusions

relating to them helping organizations to make informed business decisions. Business

intelligence queries respond to the basic question concerning business operations as well as

performance (Ahmed, et al., 2017).

The significance

According to Zhou, (2014) et al., directed by specialized analytics, systems, as well as

software and high powered computing systems, allow big data analytics to offers an assortment

of business benefits that include:

 Assessment of the new revenue opportunities

 Determining the more effective marketing

 Offering means of superior customer service

 Improving operational efficiency

Enhancing Competitive Advantages over Competitors

Big data analytics along with data analytics applications allow big data analysis,

predictive modelers, and data scientists, statisticians along other analytics professionals to

analyze the growing volumes of structured transactional data that are left untapped by the

conventional BI and analytics programs (Zhou, et al., 2014). The structured, as well as

unstructured data do not fit well in the traditional data warehouses that are founded on relational

databases oriented to structure data sets. Additionally, data warehouses may not be managed to

deal with the processing demands that are posed by sets of big data and data analytics that

demand to be updated constantly or even continually as stands the case of real-time data on stock

trading (Das & Kumar, 2013).


Overall, big data and data analytics applications encompass data that comes from both

internal as well as external sources like demographic data on consumers, weather data compiled

by third-party information service providers. Additionally, streaming analytics applications have

become prevalent in the big data and data analytics environment as users attempt to perform real-

time analytics on data that is fed on the different systems via stream processing engines. As

evident from the analysis, big data and data analytics aid organizations harness their data and

utilize it in the identification of new opportunities. The attribute subsequently leads to the

realization of smarter business measures, increasingly efficient operations, and improved profits

besides happy customers. Some of the ways in which big data and data analytics add value to

organizations include via a reduction in the overall cost of storage, they enhance superior

decision making and promote the production of new products and services via the ease of

gauging customer needs to determine what they want.



