Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

UNIT I:Introduction toin Machine Learning

Arthur Samuel, an early American leader in the field of computer gaming and artificial
intelligence,coined the term “Machine Learning” in 1959 while at IBM. He defined machine
learning as “the field of study that gives computers the ability to learn without being explicitly
programmed”.

Machine Learning Defination


Machine learning is a subfield of artificial intelligence that involves the development of
algorithms and statistical models that enable computers to improve their performance in tasks
through experience. These algorithms and models are designed to learn from data and make
predictions or decisions without explicit instructions. Machine learning is used in a wide
variety of applications, including image and speech recognition, natural language processing,
and recommender systems.
Definition of learning:
A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks T, as measured by P, improves with
experience E.
Examples
 Handwriting recognition learning problem
 Task T : Recognizing and classifying handwritten words within images
 Performance P : Percent of words correctly classified
 Training experience E : A dataset of handwritten words with given
classifications
 A robot driving learning problem
 Task T : Driving on highways using vision sensors
 Performance P : Average distance traveled before an error
 Training experience E : A sequence of images and steering commands recorded
while observing a human driver
Definition: A computer program which learns from experience is called a machine learning
program or simply a learning program.
A machine can learn if it can gain more data to improve its performance.

How does Machine Learning work

A machine learning system builds prediction models, learns from previous data, and predicts the
output of new data whenever it receives it. The amount of data helps to build a better model that
accurately predicts the output, which in turn affects the accuracy of the predicted output.

History of Machine Learning

Before some years (about 40-50 years), machine learning was science fiction, but today it is the
part of our daily life. Machine learning is making our day to day life easy from self-driving
cars to Amazon virtual assistant "Alexa". However, the idea behind machine learning is so old
and has a long history. Below some milestones are given which have occurred in the history of
machine learning:

The early history of Machine Learning (Pre-1940):


o 1834: In 1834, Charles Babbage, the father of the computer, conceived a device that
could be programmed with punch cards. However, the machine was never built, but all
modern computers rely on its logical structure.
o 1936: In 1936, Alan Turing gave a theory that how a machine can determine and execute
a set of instructions.
The era of stored program computers:
o 1940: In 1940, the first manually operated computer, "ENIAC" was invented, which was
the first electronic general-purpose computer. After that stored program computer such as
EDSAC in 1949 and EDVAC in 1951 were invented.
o 1943: In 1943, a human neural network was modeled with an electrical circuit. In 1950,
the scientists started applying their idea to work and analyzed how human neurons might
work.

Computer machinery and intelligence:


o 1950: In 1950, Alan Turing published a seminal paper, "Computer Machinery and
Intelligence," on the topic of artificial intelligence. In his paper, he asked, "Can
machines think?"

Machine intelligence in Games:


o 1952: Arthur Samuel, who was the pioneer of machine learning, created a program that
helped an IBM computer to play a checkers game. It performed better more it played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur Samuel.

The first "AI" winter:


o The duration of 1974 to 1980 was the tough time for AI and ML researchers, and this
duration was called as AI winter.
o In this duration, failure of machine translation occurred, and people had reduced their
interest from AI, which led to reduced funding by the government to the researches.

Machine Learning from theory to reality


o 1959: In 1959, the first neural network was applied to a real-world problem to remove
echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce 20,000
words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against the chess
expert Garry Kasparov, and it became the first computer which had beaten a human chess
expert.

Machine Learning at 21st century

2006:

o Geoffrey Hinton and his group presented the idea of profound getting the hang of
utilizing profound conviction organizations.
o The Elastic Compute Cloud (EC2) was launched by Amazon to provide scalable
computing resources that made it easier to create and implement machine learning
models.
2007:

o Participants were tasked with increasing the accuracy of Netflix's recommendation


algorithm when the Netflix Prize competition began.
o Support learning made critical progress when a group of specialists utilized it to prepare a
PC to play backgammon at a top-notch level.

2008:

o Google delivered the Google Forecast Programming interface, a cloud-based help that
permitted designers to integrate AI into their applications.
o Confined Boltzmann Machines (RBMs), a kind of generative brain organization, acquired
consideration for their capacity to demonstrate complex information conveyances.

2009:

o Profound learning gained ground as analysts showed its viability in different errands,
including discourse acknowledgment and picture grouping.
o The expression "Large Information" acquired ubiquity, featuring the difficulties and open
doors related with taking care of huge datasets.

2010:

o The ImageNet Huge Scope Visual Acknowledgment Challenge (ILSVRC) was presented,
driving progressions in PC vision, and prompting the advancement of profound
convolutional brain organizations (CNNs).

2011:

o On Jeopardy! IBM's Watson defeated human champions., demonstrating the potential of


question-answering systems and natural language processing.

2012:

o AlexNet, a profound CNN created by Alex Krizhevsky, won the ILSVRC, fundamentally
further developing picture order precision and laying out profound advancing as a
predominant methodology in PC vision.
o Google's Cerebrum project, drove by Andrew Ng and Jeff Dignitary, utilized profound
figuring out how to prepare a brain organization to perceive felines from unlabeled
YouTube recordings.

2013:

o Ian Goodfellow introduced generative adversarial networks (GANs), which made it


possible to create realistic synthetic data.
o Google later acquired the startup DeepMind Technologies, which focused on deep
learning and artificial intelligence.

2014:

o Facebook presented the DeepFace framework, which accomplished close human


precision in facial acknowledgment.
o AlphaGo, a program created by DeepMind at Google, defeated a world champion Go
player and demonstrated the potential of reinforcement learning in challenging games.

2015:

o Microsoft delivered the Mental Toolbox (previously known as CNTK), an open-source


profound learning library.
o The performance of sequence-to-sequence models in tasks like machine translation was
enhanced by the introduction of the idea of attention mechanisms.

2016:

o The goal of explainable AI, which focuses on making machine learning models easier to
understand, received some attention.
o Google's DeepMind created AlphaGo Zero, which accomplished godlike Go abilities to
play without human information, utilizing just support learning.

2017:

o Move learning acquired noticeable quality, permitting pretrained models to be utilized for
different errands with restricted information.
o Better synthesis and generation of complex data were made possible by the introduction
of generative models like variational autoencoders (VAEs) and Wasserstein GANs.
o These are only a portion of the eminent headways and achievements in AI during the
predefined period. The field kept on advancing quickly past 2017, with new leap
forwards, strategies, and applications arising.

Machine Learning at present:

The field of machine learning has made significant strides in recent years, and its applications are
numerous, including self-driving cars, Amazon Alexa, Catboats, and the recommender system. It
incorporates clustering, classification, decision tree, SVM algorithms, and reinforcement
learning, as well as unsupervised and supervised learning.

Present day AI models can be utilized for making different expectations, including climate
expectation, sickness forecast, financial exchange examination, and so on.
What is Machine Learning?

Machine Learning is defined as the study of computer algorithms for automatically


constructing computer software through past experience and training data.

It is a branch of Artificial Intelligence and computer science that helps build a model based on
training data and make predictions and decisions without being constantly programmed. Machine
Learning is used in various applications such as email filtering, speech recognition, computer
vision, self-driven cars, Amazon product recommendation, etc.

Classification of Machine Learning

At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

1) Supervised Learning

In supervised learning, sample labeled data are provided to the machine learning system for
training, and the system then predicts the output based on the training data.

The system uses labeled data to build a model that understands the datasets and learns about each
one. After the training and processing are done, we test the model with sample data to see if it
can accurately predict the output.

The mapping of the input data to the output data is the objective of supervised learning. The
managed learning depends on oversight, and it is equivalent to when an understudy learns things
in the management of the educator. Spam filtering is an example of supervised learning.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning

Unsupervised learning is a learning method in which a machine learns without any supervision.

The training is provided to the machine with the set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects with
similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find useful
insights from the huge amount of data. It can be further classifieds into two categories of
algorithms:

o Clustering
o Association

3) Reinforcement Learning

Reinforcement learning is a feedback-based learning method, in which a learning agent gets a


reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning, the
agent interacts with the environment and explores it. The goal of an agent is to get the most
reward points, and hence, it improves its performance.

The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

Well Posed Learning Problem

A computer program is said to learn from experience E in context to some task T and some
performance measure P, if its performance on T, as was measured by P, upgrades with
experience E.
Any problem can be segregated as well-posed learning problem if it has three traits –
 Task
 Performance Measure
 Experience
Certain examples that efficiently defines the well-posed learning problem are –
1. To better filter emails as spam or not
 Task – Classifying emails as spam or not
 Performance Measure – The fraction of emails accurately classified as spam or not spam
 Experience – Observing you label emails as spam or not spam
2. A checkers learning problem
 Task – Playing checkers game
 Performance Measure – percent of games won against opposer
 Experience – playing implementation games against itself
3. Handwriting Recognition Problem
 Task – Acknowledging handwritten words within portrayal
 Performance Measure – percent of words accurately classified
 Experience – a directory of handwritten words with given classifications
4. A Robot Driving Problem
 Task – driving on public four-lane highways using sight scanners
 Performance Measure – average distance progressed before a fallacy
 Experience – order of images and steering instructions noted down while observing a
human driver
5. Fruit Prediction Problem
 Task – forecasting different fruits for recognition
 Performance Measure – able to predict maximum variety of fruits
 Experience – training machine with the largest datasets of fruits images
6. Face Recognition Problem
 Task – predicting different types of faces
 Performance Measure – able to predict maximum types of faces
 Experience – training machine with maximum amount of datasets of different face images
7. Automatic Translation of documents
 Task – translating one type of language used in a document to other language
 Performance Measure – able to convert one language to other efficiently
 Experience – training machine with a large dataset of different types of languages

Applications of Machine learning

1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.

2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms
are widely used by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning.

Google understands the user interest using various machine learning algorithms and suggests the
product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars. Machine learning
plays a significant role in self-driving cars. Tesla, the most popular car manufacturing company
is working on self-driving car. It is using unsupervised learning method to train the car models to
detect people and objects while driving.

6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as important, normal, and spam.
We always receive an important mail in our inbox with the important symbol and spam emails in
our spam box, and the technology behind this is Machine learning. Below are some spam filters
used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve
Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As
the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.

These assistant record our voice instructions, send it over the server on a cloud, and decode it
using ML algorithms and act accordingly.

8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is
a genuine transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these values
become the input for the next round. For each genuine transaction, there is a specific pattern
which gets change for the fraud transaction hence, it detects it and makes our online transactions
more secure.

9. Stock Market trading:

Machine learning is widely used in stock market trading. In the stock market, there is always a
risk of up and downs in shares, so for this machine learning's long short term memory neural
network is used for the prediction of stock market trends.

10. Medical Diagnosis:

In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact position
of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

11. Automatic Language Translation:

Nowadays, if we visit a new place and we are not aware of the language then it is not a problem
at all, as for this also machine learning helps us by converting the text into our known languages.
Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural
Machine Learning that translates the text into our familiar language, and it called as automatic
translation.

Common issues in Machine Learning

Although machine learning is being used in every industry and helps organizations make more
informed and data-driven choices that are more effective than classical methodologies, it still has
so many problems that cannot be ignored. Here are some common issues in Machine Learning
that professionals face to inculcate ML skills and create an application from scratch.

1. Inadequate Training Data

The major issue that comes while using machine learning algorithms is the lack of quality as well
as quantity of data. Although data plays a vital role in the processing of machine learning
algorithms, many data scientists claim that inadequate data, noisy data, and unclean data are
extremely exhausting the machine learning algorithms. For example, a simple task requires
thousands of sample data, and an advanced task such as speech or image recognition needs
millions of sample data examples. Further, data quality is also important for the algorithms to
work ideally, but the absence of data quality is also found in Machine Learning applications.
Data quality can be affected by some factors as follows:
o Noisy Data- It is responsible for an inaccurate prediction that affects the decision as well
as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
o Generalizing of output data- Sometimes, it is also found that generalizing output data
becomes complex, which results in comparatively poor future actions.

2. Poor quality of data

As we have discussed above, data plays a significant role in machine learning, and it must be of
good quality as well. Noisy data, incomplete data, inaccurate data, and unclean data lead to less
accuracy in classification and low-quality results. Hence, data quality can also be considered as a
major common problem while processing machine learning algorithms.

3. Non-representative training data

To make sure our training model is generalized well or not, we have to ensure that sample
training data must be representative of new cases that we need to generalize. The training data
must cover all cases that are already occurred as well as occurring.

Further, if we are using non-representative training data in the model, it results in less accurate
predictions. A machine learning model is said to be ideal if it predicts well for generalized cases
and provides accurate decisions. If there is less training data, then there will be a sampling noise
in the model, called the non-representative training set. It won't be accurate in predictions. To
overcome this, it will be biased against one class or a group.

Hence, we should use representative data in training to protect against being biased and make
accurate predictions without any drift.

4. Overfitting and Underfitting

Overfitting:

Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists. Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model. Let's understand with a simple example where we have a few training
data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a
considerable probability of identification of an apple as papaya because we have a massive
amount of biased data in the training data set; hence prediction got negatively affected. The main
reason behind overfitting is using non-linear methods used in machine learning algorithms as
they build non-realistic data models. We can overcome overfitting by using linear and parametric
algorithms in the machine learning models.

Methods to reduce overfitting:


o Increase training data in a dataset.
o Reduce model complexity by simplifying the model by selecting one with fewer
parameters
o Ridge Regularization and Lasso Regularization
o Early stopping during the training phase
o Reduce the noise
o Reduce the number of attributes in training data.
o Constraining the model.

Underfitting:

Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained
with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and
destroys the accuracy of the machine learning model.

Underfitting occurs when our model is too simple to understand the base structure of the data,
just like an undersized pant. This generally happens when we have limited data into the data set,
and we try to build a linear model with non-linear data. In such scenarios, the complexity of the
model destroys, and rules of the machine learning model become too easy to be applied on this
data set, and the model starts doing wrong predictions as well.

Methods to reduce Underfitting:

o Increase model complexity


o Remove noise from the data
o Trained on increased and better features
o Reduce the constraints
o Increase the number of epochs to get better results.

5. Monitoring and maintenance

As we know that generalized output data is mandatory for any machine learning model; hence,
regular monitoring and maintenance become compulsory for the same. Different results for
different actions require data change; hence editing of codes as well as resources for monitoring
them also become necessary.

6. Getting bad recommendations

A machine learning model operates under a specific context which results in bad
recommendations and concept drift in the model. Let's understand with an example where at a
specific time customer is looking for some gadgets, but now customer requirement changed over
time but still machine learning model showing same recommendations to the customer while
customer expectation has been changed. This incident is called a Data Drift. It generally occurs
when new data is introduced or interpretation of data changes. However, we can overcome this
by regularly updating and monitoring data according to the expectations.

7. Lack of skilled resources

Although Machine Learning and Artificial Intelligence are continuously growing in the market,
still these industries are fresher in comparison to others. The absence of skilled resources in the
form of manpower is also an issue. Hence, we need manpower having in-depth knowledge of
mathematics, science, and technologies for developing and managing scientific substances for
machine learning.

8. Customer Segmentation

Customer segmentation is also an important issue while developing a machine learning


algorithm. To identify the customers who paid for the recommendations shown by the model and
who don't even check them. Hence, an algorithm is necessary to recognize the customer behavior
and trigger a relevant recommendation for the user based on past experience.

9. Process Complexity of Machine Learning

The machine learning process is very complex, which is also another major issue faced by
machine learning engineers and data scientists. However, Machine Learning and Artificial
Intelligence are very new technologies but are still in an experimental phase and continuously
being changing over time. There is the majority of hits and trial experiments; hence the
probability of error is higher than expected. Further, it also includes analyzing the data, removing
data bias, training data, applying complex mathematical calculations, etc., making the procedure
more complicated and quite tedious.

10. Data Bias

Data Biasing is also found a big challenge in Machine Learning. These errors exist when certain
elements of the dataset are heavily weighted or need more importance than others. Biased data
leads to inaccurate results, skewed outcomes, and other analytical errors. However, we can
resolve this error by determining where data is actually biased in the dataset. Further, take
necessary steps to reduce it.

Methods to remove Data Bias:

o Research more for customer segmentation.


o Be aware of your general use cases and potential outliers.
o Combine inputs from multiple sources to ensure data diversity.
o Include bias testing in the development process.
o Analyze data regularly and keep tracking errors to resolve them easily.
o Review the collected and annotated data.
o Use multi-pass annotation such as sentiment analysis, content moderation, and intent
recognition.

11. Lack of Explainability

This basically means the outputs cannot be easily comprehended as it is programmed in specific
ways to deliver for certain conditions. Hence, a lack of explainability is also found in machine
learning algorithms which reduce the credibility of the algorithms.

12. Slow implementations and results

This issue is also very commonly seen in machine learning models. However, machine learning
models are highly efficient in producing accurate results but are time-consuming. Slow
programming, excessive requirements' and overloaded data take more time to provide accurate
results than expected. This needs continuous maintenance and monitoring of the model for
delivering accurate results.

13. Irrelevant features

Although machine learning models are intended to give the best possible outcome, if we feed
garbage data as input, then the result will also be garbage. Hence, we should use relevant
features in our training sample. A machine learning model is said to be good if training data has
a good set of features or less to no irrelevant features.

Why is data important for machine learning?

We’ve covered the question ‘why is machine learning important,’ now we need to understand the
role data plays. Machine learning data analysis uses algorithms to continuously improve itself
over time, but quality data is necessary for these models to operate efficiently.

To truly understand how machine learning works, you must also understand the data by which it
operates. Today, we will be discussing what machine learning datasets are, the types of data
needed for machine learning to be effective, and where engineers can find datasets to use in their
own machine learning models.

What is a dataset in machine learning?

To understand what a dataset is, we must first discuss the components of a dataset. A single row
of data is called an instance. Datasets are a collection of instances that all share a common
attribute. Machine learning models will generally contain a few different datasets, each used to
fulfill various roles in the system.

For machine learning models to understand how to perform various actions, training datasets
must first be fed into the machine learning algorithm, followed by validation datasets (or testing
datasets) to ensure that the model is interpreting this data accurately.
Once you feed these training and validation sets into the system, subsequent datasets can then be
used to sculpt your machine learning model going forward. The more data you provide to the
ML system, the faster that model can learn and improve.

What type of data does machine learning need?

Data can come in many forms, but machine learning models rely on four primary data types.
These include numerical data, categorical data, time series data, and text data.

Numerical data

Numerical data, or quantitative data, is any form of measurable data such as your height, weight,
or the cost of your phone bill. You can determine if a set of data is numerical by attempting to
average out the numbers or sort them in ascending or descending order. Exact or whole numbers
(ie. 26 students in a class) are considered discrete numbers, while those which fall into a given
range (ie. 3.6 percent interest rate) are considered continuous numbers. While learning this type
of data, keep in mind that numerical data is not tied to any specific point in time, they are simply
raw numbers.

Categorical data

Categorical data is sorted by defining characteristics. This can include gender, social class,
ethnicity, hometown, the industry you work in, or a variety of other labels. While learning this
data type, keep in mind that it is non-numerical, meaning you are unable to add them together,
average them out, or sort them in any chronological order. Categorical data is great for grouping
individuals or ideas that share similar attributes, helping your machine learning model streamline
its data analysis.

Time series data

Time series data consists of data points that are indexed at specific points in time. More often
than not, this data is collected at consistent intervals. Learning and utilizing time series data
makes it easy to compare data from week to week, month to month, year to year, or according to
any other time-based metric you desire. The distinct difference between time series data and
numerical data is that time series data has established starting and ending points, while numerical
data is simply a collection of numbers that aren’t rooted in particular time periods.

Text data

Text data is simply words, sentences, or paragraphs that can provide some level of insight to
your machine learning models. Since these words can be difficult for models to interpret on their
own, they are most often grouped together or analyzed using various methods such as word
frequency, text classification, or sentiment analysis.
What is data remediation?
Data remediation is the process of cleansing, organizing and migrating data so that it’s properly
protected and best serves its intended purpose. There is a misconception that data remediation
simply means deleting business data that is no longer needed. It’s important to remember that the
key word “remediation” derives from the word “remedy,” which is to correct a mistake. Since
the core initiative is to correct data, the data remediation process typically involves replacing,
modifying, cleansing or deleting any “dirty” data.

You might also like