Machine Learning For Big Data

1
Machine Learning for Big Data
Syed Murtaza Haider Zaidi

Westcliff University
MSIT 690: Big Data Analytics
Professor Hemphill
May 29, 2021
2
Table of Content
Title Page --------------------------------------------------------------------------------------
Table of Content ------------------------------------------------------------------------------
Introduction ------------------------------------------------------------------------------------
Machine Learning for big data --------------------------------------------------------------
Application for big data-----------------------------------------------------------------------
Neural Data-------------------------------------------------------------------------------------
How does Neural data work------------------------------------------------------------------
Statistical Analysis methods------------------------------------------------------------------
Data Visualization -----------------------------------------------------------------------------
Types of data visualization -------------------------------------------------------------------
Data visualization tools-----------------------------------------------------------------------

3
Comparison of data visualization tools------------------------------------------------------
Conclusion--------------------------------------------------------------------------------------
Reference----------------------------------------------------------------------------------------

4
Introduction
Over the most recent quite a while, more information has been created than in centuries of
mankind's set of experiences In terms of commercial value, this data is a treasure trove, and also
fundamental published sources for authorities. In almost any case, the majority of this potential
will go unused or, more sadly, misconstrued as long as the technologies needed to analyze huge
volumes of data are present. Without a lot of computational capacity, extracting meaningful
insights from big data's trends, correlations, and patterns can be challenging. However, big data
analytics methodologies and technologies allow for more learning from enormous data sets. It
contains visualize data, regardless of size or form.
Machine Learning and Big Data are the existing IT sector's blue-chips. Big data stores analyze
and gather information from huge amounts of data. Machine learning, on either side, is the
ability to understand and enhance from perception without even being predictive analytics.
The central of machine learning comprises of self-learning Algorithms which develop by
constantly enhancing at their designated duty. When it is structured properly and cater proper
data, these algorithms in the end produce results in the factors of pattern identification and
predictive modeling. Data is like exercise for machine learning Algorithms. Algorithms modify
5
based on the data they are trained on, just as Elite athletes sharpen their bodies and abilities by
practicing every day. Machine learning Algorithms setoff more effectual as training datasets
become larger. As a result, when big data and machine learning are combined, it benefits double
For example; the Algorithms The algorithms assist us in maintaining up with constant influx of
data, while the volume and wide range of the very same data feeds and aids the algorithms'
growth.
Designers could perhaps expect to see delineated and analyzed outcomes, such as hidden patterns
and analytics, when we feed big data to a machine-learning algorithm, which can support with
predictive analytics. Up to 2 Mbps download and upload speeds are possible. That was a
resounding hit, and digital network grew in popularity swiftly, to the point where they were
widely utilised by the end of the twentieth century, with Apple's Steve Jobs playing a key part.
Then when technology improves, network users will have a better experience.
Machine Learning Applications for Big Data
Given below are the examples that illustrates how Big data and Machine learning can work
together:
 Web Scraping: Assume a household appliance maker learns about market trends and
consumer contentment patterns via a store's financial statements. The manufacturer
decides to web-scrape the immense number of relevant data pertaining to online feedback
from customers and customer reviews in attempt to discover out how the reviews may
have missed. The company realizes how to enhance and properly illustrate its product
lines by combining the whole data and feeding it into a high model. This leads to
6
increased sales. Even as web scraping produces a big quantities of data, it's worth
mentioning that one of the most important element is selecting the datasets.
 Cloud Networks: A research organization has a huge quantity of data they wants to
study , unfortunately they require servers , networking , storage and other security assets
to complete their task. this all will sum up as a obstructive expense. the organization
determine to allocate in EMR Amazon which is a cloud service. it offers data analysis
model with in managed framework. GPU-accelerated recognition software and
classification algorithms are examples of machine-learning models of this type. Since
these algorithms do not really learn after they've been dispatched, they can be dispersed
and endorsed by a content delivery network (CDN).
 Combined Initiative Systems: Clustering algorithm is used in the Netflix prediction
model, which suggests titles on your homepage: Big data is used to monitor your history,
and machine-learning algorithms are used to determine what it should recommend after
that. In the same way, smart-car automakers use big data and machine learning in the
predictive-analytics systems that power their vehicles.
There's a few requirements for getting accurate results from machine learning. Clean data,
optimised tools, and a clear idea of what you want to achieve are all required in addition to a
well-designed learning algorithm.
NEURAL NETWORKS
7
Neural networks are a type of algorithms that recognize patterns and are broadly modeled well
after human mind. They use a certain kind of machine vision to interpret sensory data, labeling
or clustering original data. All actual statistics, whether pictures, audio, message, or written data,
should be transcribed into another trends they recognise, which are statistical and enclosed in
integers. The list of companies that provide neural network software :
 BioComp System Inc: It is a organization that mainly focuses in genetic Algorithms for
advisory and software designs and neural networks .
 Attra Soft: it provides variety of neural network formed products which is used for
reorganization of sounds, pictures, mining of data and trend analysis.
 Applied Analytical Systems: A company that mainly focuses in neural networks,
statistical analysis, and applied mathematics, as well as systems analysis and artificial
intelligence expert and development process.
 Jurik Research: It is An Excel append that utilizes neural networks to strengthen
predictions.
 NeuralWare: Offers neural network formed analysis products and engineering services to
companies, government agencies, industrial sectors, and academic institutions to help
them fix mining of data, categorization, forecasting, and pattern matching difficulties.
 Nonlinear Solutions Oy: Control systems and material actions models are among the
services based on nonlinear modeling, especially neural networks. Offers custom
applications, nonlinear model simulation software, technique known, and industrial
course materials.
8
How does a Neural Network work?
When an input is provided to neural network , it brings back the output . on first attempt it is not
possible to get the correct output by itself , and this is the reason at the time of learning duration,
each inputs come with its tag, deciding which output neural network should guessed. Whereas if
option chosen is the finest, the current settings are retained, then the next input is supplied.
Weights are modified if the resulting output does not suit the tag. During the process of learning,
these were the only variables that can be modified. When an input is still not correctly guessed,
this procedure might be thought of as a series of buttons that are changed into other options. A
procedure known as back propagation is used to decide which frequency should be modified. We
didn't comment on that much because the neural network we'll design won't follow this exact
procedure, but it will entail walking back through the neural network and inspecting each link
and see how the output would respond to a change in the weight. Furthermore, there has been
one more parameter that must be understood in order to influence how the neural network learns:
the "growth speed." This new variable controls the speed at which the neural network develops,
or perhaps more precisely, how it changes a weight, either incrementally or in larger steps.
STATISTICAL ANALYSIS METHODS
Over the previous ten years, the central tendency has evolved dramatically. Few things look the
same as they did in the past, even if the mechanism utilized over workstations and also the
software that allows people to interact Another thing that is radically different is the amount of
data we have at our disposal. What's been previously scant has now become a seemingly
9
insurmountable amount of information. However, if you did not understand exactly how to
examine company's data to uncover actual or meaningful means, it can be daunting. Now, what
do you get from point A, where you have a lot of data, to point B, where you can effectively
analyse it? It all boils down to employing the proper statistical analysis procedures, which are
used to process and gather data samples in order to find trends and patterns.
There are five options for this analysis: Mean , Standard Deviation , Regression, Hypotheses
development and Calculation of sample size
FIVE TECHNIQUES FOR IMPLEMENTING STATISTICAL ANALYSIS
If you're a data analyst or otherwise, so there is no denying that big data is capturing the attention
of the world. As a result, you'll need to figure out where to start. All five strategies are simple but
effective when it comes to making data-driven decisions.
1. MEAN: The mean, often known as the average, is the first approach used to undertake
statistical analysis. When calculating the mean, one adds up a list of integers then divide the
amount by the number of items on the list. Whenever this technique is utilized, it is possible to
identify a data set's overall trend as well as gain a quick and succinct perspective of the data. The
method's users also profit from the method's easy and rapid analysis. The statistical mean
determines the center point of the data being analyzed. The result is known also as mean of the
data collected. In actual situations, people regularly utilize the word Mean when discussing
studies, economics, and sports. Consider how frequently a baseball player's strikeout rate is
mentioned, that is mean.

10
How to find it: To get the mean of the data, sum all of the numbers altogether, then divide the
total by the number of numbers in the dataset or lists.
2. STANDARD DEVIATION : The standard deviation is a statistical tool for calculating the
dispersion of data from its mean. Once you have a greater variance, you're working with data
which is far from the mean. A low variance, on the other hand, indicates that most data is in
accordance with both the mean and can also be referred to as the set's predicted values. When
determining the dispersion of data points, standard deviation is commonly utilised. Now let us
pretend you're a salesperson who somehow finished a marketing research. When you obtain the
study results, you want to know how reliable the answers are so you can forecast if a bigger
portion of people will have the same responses. A low standard deviation indicates that the
responses can be projected to a broader set of customers.
HOW TO FIND IT: σ2 = Σ(x − μ)2/n
The symbol for standard deviation is σ
Σ = Sum of data set
x = value of dataset
μ = mean of the data
σ2 = Variance
n = Number of data items in total
3. REGRESSION : In statistics, regression is the relationship between a dependent variable and a
predictor variables. This could also be expressed in terms of how one variable impacts others, or
11
how changes in one variable cause changes in the other, as in cause and effect. It indicates that
each or more variables have an impact on the outcome.
HOW TO FIND: Y=a +b (x)
The y-intercept, or the value of y when x = 0, is denoted by the letter A.
X = a variable that is reliant
Y = variable that is not dependent
B= refers to the slope, or rise over run
4. HYPOTHESIS TESTING : Hypothesis testing is used in statistical analysis to examine the 2
pairs of explanatory variables inside a data collection. The technique is used to see if a given
thesis or result stands true for the given data collection. It enables the data to be compared to
alternative hypotheses and beliefs. It can also help predict how business actions will effect the
company. A hypothesis test in statistics estimates a quantity below a certain assumption. The
test's outcome indicates if the assessment is wrong or whether it has been broken. The null
hypothesis, often known as 0 hypothesis , is this assumption. Hypothesis one or the first
hypothesis, is any other hypothesis that contradicts hypothesis 0 in any way. Whenever you
undertake testing of hypothesis, the answers are statistically noteworthy if they show it could not
have happened by chance or arbitrary incidence.
HOW TO FIND IT: A statistical hypothesis test's results must be interpreted in order to make a
particular assertion, that is known as the p-value. Now let us assume the answer we are expecting
for has a equal chance of getting it right

12
5. DETERMINING THE SAMPLE SIZE: At any time when it is related to statistical analysis,
quite often the set of data is extremely great, ensuring efficient data collection for each piece of
the dataset problematic. Because this is the case, many people opt for sample size determination,
which entails studying a small effect size of data. To perform this successfully, we will have to
figure out how big our sample should be. We won't get accurate answers in the last of our
analysis if the sample size is too tiny. We will use any single of the various sampling of data
strategies to arrive at this result. We can do this by giving out a question to our consumers, and
afterwards selecting consumer data to be evaluated at arbitrary using purposive sampling. Size of
the sample that is excessively huge, but at the other hand, can result in a waste of time and
resources. We can look at things like price, effort, and the ease with which we can gather data to
decide the sample size.
HOW TO FIND IT: There is no one-size-fits-all formula for calculating size of the sample, not
like the rest of the four methods of statistical analysis. Furthermore, here are several common
guidelines to follow when calculating sample group:
 .Conduct a census when working with a smaller sample size.
 Apply a sample size from a related research with your own. For all of this, you might
wish to check through academic databases for a study that is comparable to yours.
 If you're performing a general study, you might be able to exploit an existing table to
your benefit.
 Calculate the representative sample with a sample size calculator.
 Only because there is no really a single prescription that works does not imply you will
not be sure to locate one that does. Depending on what you know or don't know about the
sample in question, there appear to be a variety of alternatives. Slovene’s and Cochran's

13
formulas are two that you might want to use.
DATA VISUALIZATION
Like the "era of Big Data" accelerates, visualizing will become a more important tool for making
use of the billions of rows and columns of the data generated each day. Visual analytics aids in
the conveying of tales by transforming data into a more understandable format and showing
trends and outliers. A good visualization narrates a tale by removing congestion from data and
emphasizing the most important facts. Unfortunately, that's not as simple as throwing the "data"
element of an illustration on top of a graph to make it appear nicer. A careful fine balance among
shape and structure is needed for optimal data display. The most basic graph may be too
uninteresting to be observed, or it could send a strong message; the far more striking
representation may completely fail to convey the proper idea, or it may raise questions. The facts
and the images must complement each other, and merging outstanding analysis with outstanding
narrative is an art.
14
TYPES OF DATA VISUALIZATION
Simple bar graphs or pie charts are typically the first things that come to mind when you think of
data visualization. Though these are an important aspect of data visualization and a frequent
starting point for many data visualizations, the proper visualization must be combined only with
proper set of data.
1. TEMPORAL: If data visualizations meet two criteria, they fall into the temporal category: they
must be linear and one-dimensional. Lines that may stand alone or overlap one other, having a
start and ending time, are commonly used in temporal representations. The advantage is that
they have been common charts from education and the workplace, which makes them more
likely to comprehend when we see it.
Example:
 Scatter plots are a type of graph that is used to show
 Diagrams of polar areas
 Sequences of time series
 Timetables
 Graphs in a straight line

15
2. HIERARCHICAL : The hierarchical category includes data visualizations that organize
groups within bigger groups. If you want to showcase groups of data, especially if they come
from a single source, hierarchical visualizations are the way to go.
Example:
 Diagrams of trees
 Ring diagrams
 Diagrams of sunbursts
3. NETWORK: Datasets are intricately linked to one another. Network traffic visualizations
depict how nodes in a network are connected to one another. To put it another way, it's
displaying links between datasets without relying on detailed arguments.
Example:
 Matrix diagrams
 Diagrams of nodes and links
 Clouds of words
 Diagrams of alluvial deposits
4. MULTIDEMSIONAL : Multidimensional data visualizations, as the name implies, contain
numerous dimensions. That implies that while creating a 3D data visualization, there's always
16
two or even more variables in play. These kind of visualizations are the most bright or gaze
because to the multiple concurrent layers and datasets. Such visualizations may help you distil a
lot of information into a few crucial points.
Example:
 Scatter plots are a type of graph that is used to show
 Graphs in the form of pie charts
 Venn diagrams are a type of diagram that is used to show
 Graphs with stacked bars
 Histograms are a type of graph.
5. GEOSPATIAL : Geospatial or geospatial data visualizations overlay familiar maps with
various data elements and link to actual physical locales. Such data visualizations are typically
sometimes used show sales or mergers through time, and are best known for their use in political
campaigns or to show market penetration in foreign companies.
Example:
 Flowchart
 Map of densities
17
 Heat map
DATA VISUALIZATION TOOLS
Data visualization technologies make creating visual representations of massive data sets simpler
for data visualization designers. While working with sets of data containing thousands or
millions of data sets, automated the methodology makes a designer's job much easier, at least in
part.
Interfaces, yearly reports, marketing and sales brochures, shareholder presentation decks, and
nearly anywhere else information has to be digested quickly can all benefit from these data
visualizations.
There are a few features that all of the finest data visualization tools have in common. The first is
that they are simple to use. There are some pretty difficult data visualization programs available.
These included better robustness and videos, and are created in a user-friendly manner. Some,
irrespective of their other capabilities, are missing in those areas, excluding them from any list of
greatest tools.
COMPARISON OF DATA VISUALIZATION TOOLS

18
To visualize huge amounts of data, there seem to be hundreds, if not hundreds, of apps, tools,
and programs accessible. Most are fairly simple and also have a range of characteristics that
overlap. However, there's a few high points that are either more capable in terms of the kind of
visualizations they can produce or are substantially easier to use than the remaining choices.
 Tableau : Tableau offers a desktop program, server and hosted web editions, and a free public
option, among other things. CSV files, Google Ads and Analytics data, and Sales force
information are just a few of the data import possibilities accessible.
 Infogram: Infogram is a data visualisation tool that allows you to click and drag data. that even
anti can use to generate excellent data reports on marketing visuals graphics, online posts,
maps, monitors, and much more. The visualizations can be saved in a variety of formats,
including.PNG,.JPG,.GIF,.PDF, and.HTML. Interactive visualizations are indeed
conceivable, making them ideal for use in websites and apps. Infogram additionally
provides a Word press theme that simplifies the process of integrating visualizations for
Word press sites.
 Chart blocks: Data may be loaded from "everywhere" via ChartBlocks' API, including video
broadcasts, according to the company. Although they claim that data can be imported from any
19
resource in "in a few clicks," it's likely to be more complicated than other applications that have
automation components or plugins for particular sources of data.
 Google Charts : Google Charts is a freeware data visualization process that is able for working
prototype charts that can be displayed publicly. It utilizes different data, as well as the outputs are
all HTML5 and SVG, so it can be seen in devices without the need for additional plugins. Google
Spreadsheets, Google Fusion Tables, Sales force, and other SQL databases are among the sources
of data.
 Poly Maps: Polymaps is a mapping-specific JavaScript framework. With picture toppings to sign
maps to density maps, the outputs are dynamic, responsive maps in a variety of forms. Because the
pictures are created using SVG, designers can alter the graphics of the charts using CSS.It might
be difficult for developers to determine which visualization tool to employ because there are so
many options. Developers of data visualizations should consider things including simplicity of use
and if a tool offers the functionality they require.
Conclusion
Big data expands our knowledge base, while machine learning improves our major issue
abilities. When combined, the duo offer the potential to expand whole businesses. In order to
take advantage of this, one also must scale our additional tools. People can make better decisions
by programming machines to analyses data that is too large for humans to process alone.
20
This is no longer unique that big data is a driving force behind many of the world's most
successful technological companies. Yet, because more businesses adopt this to collect, analyze,
or higher price out the massive amounts of data, it is getting increasingly difficult for them to
make the most use of the information gathered.
And that is where machine learning could come in handy. Machine learning systems benefit
from data. Therefore more data a system collects, the better it learns to serve businesses. As a
result, adopting artificial intelligence for advanced analytics is a logical next step for businesses
looking to optimize the benefits of cloud computing adoption.

21
References
B Adam, IFC Smith, F Asce, Reinforcement learning for structural control. J Comput Civil Eng
22(2), 133–139 (2008)
https://www.udacity.com/blog/2020/08/machine-learning-for-big-data.html
(n.d.), What is SASE, Palo Alto Networks.
https://www.paloaltonetworks.com/cyberpedia/what-is-sase
R Bekkerman, EY Ran, N Tishby, Y Winter, Distributional word clusters vs. words for text
categorization. J Mach Learn Res 3, 1183–1208 (2003)
https://www.simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm
Mobbs RJ, Phan K, Malham G, et al. Lumbar interbody fusion: techniques, indications and
comparison of interbody fusion options including PLIF, TLIF, MI-TLIF, OLIF/ATP, LLIF and ALIF. J
Spine Surg 2015;1:2–18.
https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks
https://www.toptal.com/designers/data-visualization/data-visualization-tools
22
Capua JD, Somani S, Kim JS, et al. Analysis of risk factors for major complications following
elective posterior lumbar fusion. Spine (Phila Pa 1976) 2017;42:1347–54.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5041595/
https://www.sciencedirect.com/book/9780128037324/computational-and-statistical-methods-for-
analysing-big-data-with-applications

Machine Learning For Big Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning For Big Data

Uploaded by

Copyright:

Available Formats

1

Machine Learning for Big Data

Syed Murtaza Haider Zaidi

Title Page --------------------------------------------------------------------------------------

Table of Content ------------------------------------------------------------------------------

Machine Learning for big data --------------------------------------------------------------

Application for big data-----------------------------------------------------------------------

How does Neural data work------------------------------------------------------------------

Statistical Analysis methods------------------------------------------------------------------

Data Visualization -----------------------------------------------------------------------------

Types of data visualization -------------------------------------------------------------------

Data visualization tools-----------------------------------------------------------------------

Comparison of data visualization tools------------------------------------------------------

Machine Learning for Big Data

contains visualize data, regardless of size or form.

Machine Learning for Big Data

The central of machine learning comprises of self-learning Algorithms which develop by

Machine Learning Applications for Big Data

consumer contentment patterns via a store's financial statements. The manufacturer

model with in managed framework. GPU-accelerated recognition software and

classification algorithms are examples of machine-learning models of this type. Since

and endorsed by a content delivery network (CDN).

 Combined Initiative Systems: Clustering algorithm is used in the Netflix prediction

predictive-analytics systems that power their vehicles.

well-designed learning algorithm.

integers. The list of companies that provide neural network software :

advisory and software designs and neural networks .

reorganization of sounds, pictures, mining of data and trend analysis.

 Applied Analytical Systems: A company that mainly focuses in neural networks,

intelligence expert and development process.

 Jurik Research: It is An Excel append that utilizes neural networks to strengthen

 NeuralWare: Offers neural network formed analysis products and engineering services to

companies, government agencies, industrial sectors, and academic institutions to help

services based on nonlinear modeling, especially neural networks. Offers custom

applications, nonlinear model simulation software, technique known, and industrial

How does a Neural Network work?

STATISTICAL ANALYSIS METHODS

development and Calculation of sample size

FIVE TECHNIQUES FOR IMPLEMENTING STATISTICAL ANALYSIS

effective when it comes to making data-driven decisions.

mentioned, that is mean.

total by the number of numbers in the dataset or lists.

responses can be projected to a broader set of customers.

HOW TO FIND IT: σ2 = Σ(x − μ)2/n

The symbol for standard deviation is σ

Σ = Sum of data set

μ = mean of the data

n = Number of data items in total

3. REGRESSION : In statistics, regression is the relationship between a dependent variable and a

each or more variables have an impact on the outcome.

HOW TO FIND: Y=a +b (x)

The y-intercept, or the value of y when x = 0, is denoted by the letter A.

X = a variable that is reliant

Y = variable that is not dependent

B= refers to the slope, or rise over run

4. HYPOTHESIS TESTING : Hypothesis testing is used in statistical analysis to examine the 2

have happened by chance or arbitrary incidence.

for has a equal chance of getting it right

decide the sample size.

guidelines to follow when calculating sample group:

 .Conduct a census when working with a smaller sample size.

 Calculate the representative sample with a sample size calculator.