Data Mining and Business Analytics

Institutional Affiliation


Data mining and business analytics

Many sources and explanations of the terms data mining and business analytics tend to use the

terms interchangeably but the two are not synonyms but are only related due to how they are

applied in data interpretation in organizations. To get different and the relationship between each

of the terms, a detailed definition of each will give a highlight on the application of each. Data

mining can be explained as a general term that is applied in the different analysis techniques like

artificial intelligence, machine learning, and statistics used for the purpose of scanning large

quantities of data gave organizations online databases or the simple databases. The main agenda

of data mining is to identify the various data patterns that may be present in data sets. It should

be noted that the origin of the data sets, in this case, can be from a single database or from the

large integrated warehouse of data. Here, one is usually looking to get the patterns that may be

contained in the different available datasets and groups. Therefore, by the identification of the of

the patterns that are the possible outcomes, an organization is able to associate and predict the

various outcomes in reference to the products they deal in so as to carry out the necessary

improvements as indicated by the customers. Business analytics on the other hand describes the

functions in the application of standard practices, algorithms, skills and technologies related to

the data mining and collection ideas and methods for the production of the useful information

that is interpreted for the managers to enable them make the necessary decisions on the control

and the operations management of the organizations as per the interpretations of the information

in the data (Fan, Lau, & Zhao, 2015). Data mining can be applied on two ends of business

analytics; the back end and the front end. The back end is typically the functions of the data

mining while the front end is usually the reporting associated with the management and

executives. The effective and efficient execution and implementation of the functions of business

analytics become a core function for the organization's competence in business intelligence

forms that are necessary for the support of the actions taken by the organization. One should be

able to get the clear difference between data mining and business analytics by the fact that the

data mining is clearly a subtask of business analytics and of course an important one. In the field

of business analytics, it’s noticeable that the main driver is the knowledge discovery in the

databases which is simply the data mining. The application of data mining in business analytics

enables organizations to detect and analyze data to get a better understanding of an organizations

situation of the past and current situations and have a prediction of the action to take for a future

course. For example, an organization can be able to have the necessary data and information on

the past and current behaviors of customers purchasing and the customer's choices of selections.

Product portfolio and sequencing of events can also be done by the application of the mining.

Through data mining, there is the establishment of the various patterns on information that can

be applied to compete in the market because it offers the analysis of the competitors’

information. For the purpose of getting the data mining power, there is an example of a simple

analogy. In the modern world’s technology, there is powerful software that allows data analytics

to perform various data queries. The data queries, in this case, allow the extraction of

information from big databases (Phillips-Wren, Iyer, Kulkarni, & Ariyachandra, 2015). In order

to carry out the data queries, one must be able to define the query and state it for the specific

answers that we are seeking from the database we are presented with. Simply, we use and apply

data mining to search information in huge databases of an organization so as to come up with

applicable patterns in data that are useful to us for the purpose of decision making. In simple

terms, the queries fed in the database help us to get access to the information we have an idea of,

its existence while for the case of data mining we are looking for the discovery of knowledge and

information that we have no idea existed in the provided databases. The application of data

mining is evident in the platforms of service providers, e-commerce, supermarkets, and the crime

scenes. The perfect example that can prove the applicability of data mining, in this case, is that e-

commerce. E-commerce is one of the famously known applications of data mining where the e-

commerce organizations apply data mining for the purpose of offering cross sales together with

upsells operating through e-commerce websites. Amazon is one of the e-commerce websites that

applies most sophisticated data mining techniques for the purpose of knowing the people who are

viewing their product.

Technologies for big data analytics

The application and mining of data mining are made possible by the application of various tools,

techniques, and technologies that provide the accessibility of the databases in the given

organization. However, to get an understanding of the tools and techniques used for data mining,

it’s is necessary and important to get the various types of analytics for big data. There are

majorly four broad categories of data analytics namely: descriptive analytics, diagnostic

analytics, predictive analytics, and prescriptive analytics. To start with, descriptive analytics is

mainly the analysis and summary of raw data that gives a meaningful explanation (Gandomi, &

Haider, 2015). Basically, the descriptive analytics is done for the purpose of answering questions

of what happened. Another data analytic is the diagnostic analytic which allows the users to have

a deep analysis of the data by application of data discovery, data mining, correlations and drill

down. In the descriptive analytics, the basic question that is being answered getting to know the

reason as to why something happened. The third analytic is the predictive analytic which as the

name suggests, is used for the predictions of what might happen in the future as per the data

being analyzed. Predictive analytics, in this case, answer the questions related to what is going to

happen in the future. The last of the analytic is the prescriptive analytics used for the prescription

of the course of actions recommended to solve the problems that might be there. The answer that

is being answered in this case is the best course of action that needs to be taken. The descriptive

and the diagnostic analytics make use of the tools that can be able to do data manipulations and

that help to visualize the large sets of data. Prescriptive analytics and predictive analytics rely on

the tool of mathematical, simulations and scenario planning tools to give the best outcomes as

required by data analytics. The various tools applied in the data analytics include Hadoop, Spark,

Matlab, and NoSQL. Hadoop gives measurable solutions for the purpose of storing, loading and

processing huge amounts of data. If the application of Hadoop is to be considered, there is the

necessity to take into account the compatibilities of the data that is to be sought with the various

architectures of the Hadoop. One should note that the application of Hadoop is not suitable for

small data files but there may be approaches that give the provision for the processing of the

small data files even though it comes with restrictions. Spark is another tool for data mining and

analytic; its described as a big open data processing framework that performs data analytics in a

given distributed computer cluster. Spark is able to support and increase the speed of data

processing in the processing of the in memory. The third tool for the analysis and mining of big

data is the Matlab; a computing numerical environment and also a language of programming that

is in terms of Mathworks. Its leverage is determined by the scientific and technical computations

for the purpose of developing algorithms. In that case, Matlab involves activities such as the

manipulations of a matrix, data functions plotting, algorithms implementation, and user interface

creations. The last of the listed data tool of analytics is the NoSQL databases; they are databases

that have been designed for the purpose of efficiently handling the huge data amounts when

compared to the traditional databases (Shmueli, Bruce, Yahav, Patel, & Lichtendahl Jr, 2017).

The main function for the creation of these databases is to get faster and efficient responses to

the queries when they are fed in a database. The databases in the NoSQL are classified into four

main groups of Key value where data is represented by a chosen pair of keys or values; Column-

oriented which can be related to tables where the columns are dynamic and differentiated in the

various database records; Oriented document databases also presented in keys and pairs of

values; Graph database which presents data in a graph that has a relation in the real world. Even

though there are many discussions on the data analytics, there are not as many tools for data

analytics. Nearly all or most of the big data analytical tools have been developed and evolved

from the freely available Hadoop Apache Ecosystem. This system is useful because it provides

solutions that are effective on the processing the big data subsets of given characteristics. It

provides the analysis of a given part of the big files over the small ones. Matlab provides

different approaches to the handling of the big data by enabling and preventing the overflow of

memory as it may occur due to the loading of huge data amounts at once. The development of

the NoSQL databases has been advantageous in addressing problems that may arise in the

databases (Prakash, & Aarthi, 2018). They enable and make it possible for the storage of data

that is big in an efficient and faster manner. However, it’s necessary for one to consider the best

suitable of the tools and technologies for the purpose of dealing the presented problems in the

given specific data sets.

Cloud computing

Even though the term cloud computing has been in use for quite some time now, there has been

no developed official definition of the term. Different specialties in the computing industry from

programmers, system planners and analysts and technologists have their own ways of defining

the term cloud computing. Through cloud computing, companies are able to consume computer

resources, like storage and virtual machine as utilities; the basic idea here can be related to

electricity where the organizations will use services provided services instead of having to build

and maintain the services as their own. Therefore on a general view, cloud computing can be

defined as delivery of internet-hosted services (Hashem, et al., 2015). Cloud computing today

continues to be applied by organizations and is characterized by; self-service on-demand whose

purpose is to provide the users with the abilities to pay for the services they demand thus

reducing the heavy investments that the user could have incurred by the use of both the

traditional hardware and software. This simply means that the clients need not to own physical

data center for the purpose of supporting their need of information technology. The second

characteristic by the cloud computing is the broad access of network which means that the

resources contained in the cloud computing are accessible from devices such as the laptops,

computers and even smart phones. Rapid elasticity is another characteristic of the cloud

computing which illustrates that there is no limited ability to make any necessary changes in the

addition and or removal of the capacity in memory and processing. Resource pooling is the

fourth characteristic of the cloud computing which in this case is applied to pool resources

together by the resource provider to multiple consumers by applying a model of multi-tenant.

Resource pooling in this case it enables the clients to change the levels of service with no

limitations in the virtual and physical resources. The last of the characteristics is the measured

service which gives the providers of the cloud serviceability to measure, monitor, and make

reports on the resource usage by looking into the resource utilization (Botta, De Donato, Persico,

& Pescapé, 2016). The future development of the cloud computing is depended on three cloud

computing types of services of Infrastructure as a service (IaaS), Software as a service(SaaS)

and Platform as a service(PaaS). Therefore, the anticipated future of cloud computing is


projected to be on; Growth strategies in the solutions of cloud services. The solutions to the

problems faced by the cloud computing will be solved by the efficient application of the IaaS,

Paas, and the SaaS. Data predictions show that almost 60% of the future solutions w8ill be for

the SaaS. Businesses that want to become competent by making their products accessible will

turn in the cloud computing services. There will also be an increase in the capacities of storage as

a future development of cloud computing. Predictions on this aspect show that more service

providers will be able to provide data centers that have more and bigger capacities for holding

more and more data (Nunes, Mendonca, Nguyen, Obraczka, & Turletti, 2014). This will

encourage the use of the cloud services because the clients will be able to have large data sets

stored in the clouding computer services. Computer clouding future is also to be affected by the

internet of everything (IOE). Cloud computing role in this is that it will provide the foundation of

IoE because it depends on data, machine communications and human interactions all of which

are provided by the cloud computing due to its ability to have simplified interactions. Another

future development on the cloud computing will be based on the enhancement of the quality of

internet. The high-quality expectations on the internet will lead to the upgrade of businesses due

to a fact that there will be fast and high-quality services that can handle the cloud computing of

the customers through the use of apps. Industries will also gain advantage due to the fast speed at

which they will be able to send and receive data in a more reliable real time.

Ethics in data collection and analytics

In the fast-growing technological world, the security that has to be given on the data collected for

analysis is vital since there are high chances of fraud and increased loose of control over the data,

losing one’s privacy and being a target of activities that are unethical. The issues on the ethics of

data collection and the utilization of the data may be confusing at times and can be difficult in

getting the difference between the right thing to do and the wrong thing to avoid. If there are

guidelines, one should follow these guidelines together with respect on the views and ideas of the

customers and making sure that the data collected is only the needed data only, secure that data

that has been collected and communicate intentions of the data collected to the customers. The

types of data being analyzed and the personal information volumes involved are some of the

ethical based considerations that an organization mat put into considerations (Drachsler, &

Greller, 2016). Therefore, the various types of ethical considerations approaches may include;

Ethical policies and values and data initiatives that can be as applied as the reference points for

workers and guide them in the decision making process as far as data assessments are concerned;

Ethical checklists and frameworks to be applied for the data initiatives as described in the five

ethical categories of privacy, ownership, consent, data sharing and data governance; and Ethics

board and committee that is responsible of setting standard for the ethic that employees need to

follow. The guidelines provide for the purpose of ensuring the ethical issues are categorized into

five categories. These categories are a guideline aimed at achieving ethic and the security of data

and data analysis and they include; Consent taking which gives the participants a chance to make

decisions on the participation in provision of data; Privacy which is the control of who can

access the information given by the participants of a research; Ownership which defined who is

in control of the data and to what extent does they control data before giving it up; Data sharing

which involves the assessment on the benefits of the research; and Governance that shows the

implementation o and oversight of the data and information (Pardo, & Siemens, 2014). It’s vital

for the assessment of the privacy and ethical risks that are associated with the technological

advances experienced in the rapidly growing technology mainly to avoid the involvement and

practice of unethical practices in data analytics, privacy and security of data.



