Introduction to Machine Learning Chapter One Introduction 1.1 What is machine learning Astificial Intelligence (AI) is a field of computer science that emphasizes the creation of intelligent machines, which can work, and react like humans. Machine leaming is an application or subset of AI that allows machines to lear from data without explicitly being programmed. ‘Machine Learning is the science (and art) of programming computers so they can /earn from dara. ‘Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed, (Arthur Samuel, 1950) ‘A computer program is said to leam from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. (Tom Mitchell, 1997) For example, your spam filter is a Machine Learning program that can leam to flag spam given examples of spam emails (e.g., flagged by users) and examples of regular (nonspam, also called “ham”) emails. The examples that the system uses to learn are called the training set. Each training example is called a training instance (or sample). In this case, the task T is to flag spam for new emails, the experience E is the training data, and the performance measure P needs to be defined; for example, you can use the ratio of correctly classified emails. This particular performance measure is called accuracy and it is often used in classification tasks. If you just download a copy of Wikipedia, your computer has a lot more data, but it is not suddenly better at any task. Thus, it is not Machine Leaning. ‘Machine learning focuses on the development of computer programs that can access data and use it to lea for themselves. The leaming process starts with data and ML provides statistical tools to explore and analyse the data. The primary aim is to allow the computers to learn automatically without human intervention. 1,2 Foundation of machine learning ‘Traditional Programs vs. Machine Learning SS Department of Information Systems 1 Introduction to Machine Learning In traditional programs, a developer designs logic or algorithms to solve a problem. The program applies this logic to input and computes the output. But in Machine Learning, a model is built from the data, and that model is the logic. Traditional programming Wren atc} learning Let’s take the problem of detecting email spam and compare both methods. Traditional programs detect spam by checking an email against a fixed set of heuristic rules. For example: Does the email contain FREE, weight loss, or lottery several times? Did it come from known spammer domain/IP addresses? As spammers change tactics, developers need to continuously update these rules. In Machine Learning Solutions, an engineer wil ‘Prepare a data set: a large number of emails labeled manually as “spam” or “not spam”. ‘Train, test, and tune models, and select the best. ‘* During inference, apply the model to decide whether to keep an email in the inbox or in the spam folder. SS Department of Information Systems 2 Introduction to Machine Learning Ifthe user moves an email from inbox to spam or vice versa, add this feedback to the training data. Retrain the model to be up-to-date with the spam trends. As you can notice traditional programs are deterministic, but ML programs are probabilistic. Both make mistakes. But the traditional program will require constant manual effort in updating the rules, while the ML program will leam from new data when retrained. kK o& Example 2: if (speeded) { if (speeded) { if (speeded) { status=WALKING; status=WALKING; status=WALKING; } } else ¢ } else if(speed<12){ status=RUNNING ; status=RUNNING ; } }else { status=BIKING; } A traditional approach to recognize a person’s activity k AX & 0101001010190101 1010190101001018 1901010011111018 9101001010101001 1010191010010010 1011101010111010 @111010100191010 @100010010011111 1011101016101111 @1@1010010191001 @101011111010100 @101010111111118 910100101018 100111101011 001111010101 Label = Label = Label = WALKING RUNNING BIKING A Machine Learning approach to recognize a person’s activity Oe] Department of Information Systems 3 Introduction to Machine Learning Example 3: ‘The following scenario is to be considered as hello world of machine learning and demonstrates how a machines learning approaches works. Take look the following numbers. There is a pattern that connects them or there is a relationship between each X and Y pairs. Can you see what a pattern is? X =-1, 0, 1, 2, 3, 4 ¥ ==3)-1, 1,3, 5, 7 The relationship here is that every Y is two times of X minus one ie Y=2X-1. In machine leaming approaches like humans deduce the relationship between X and Y by looking the two pairs, the machines can also find out the patterns and define the relationship between the input Xand the output Y. 1.3 History and relationships to other fields The term machine leaming was coined in 1959 by Arthur Samuel, an IBM employee and pioneer in the field of computer gaming and artificial intelligence. In addition, the synonym selfteaching computers were used in this time period. Before some years (about 40-50 years), machine leaming was science fiction, but today it is the part of our daily life. Machine learning is making our day to day life easy from self-driving cars to Amazon virtual assistant "Alexa". However, the idea behind machine learning is so old and hhas along history. Below some milestones are given which have occurred in the history of machine learning: + 1834: In 1834, Charles Babbage, the father of the computer, conceived a device that could be programmed with punch cards. However, the machine was never built, but all modem computers rely on its logical structure. * 1940: In 1940, the first manually operated computer, "ENIAC" was invented, which was the first electronic general-purpose computer. 1950: Alan Turing proposes the Turing test. Can machines think? 1952: Asthur Samuel, who was the pioneer of machine leaning, created a program that helped an IBM computer to play a checkers game. It performed better more it played. SS Department of Information Systems 4 Introduction to Machine Learning 1957: Perceptron developed by Frank Roseblatt. Can be combined to form a neural network, 1959: In 1959, the term "Machine Learning” was first coined by Arthur Samuel. + The duration of 1974 to 1980 was the tough time for AI and ML researchers, and this duration was called as Al winter. In this duration, failure of machine translation occurred, and people had reduced their interest from AI, which led to reduced funding by the government to the researches. + Early 1990's: Statistical leaming theory. Emphasize leaming from data instead of rule- based inference ‘Now machine learning has a great advancement in its research, and it is present everywhere around us, such as self-driving cars, Amazon Alexa, Catboats, recommender system, and many more. ‘Modem machine learning models can be used for making various predictions, including weather prediction, disease prediction, stock market analysis, etc. As an interdisciplinary field, machine learning shares common threads with the mathematical fields of statistics, information theory, game theory, and optimization. ‘* Artificial Intelligence It is naturally a subfield of computer science, as our goal is to program machines so that they will lea. In a sense, machine learning can be viewed as a branch of AI (Artificial Intelligence), since, after all, the ability to turn experience into expertise or to detect meaningful pattems in complex sensory data is a comerstone of human (and animal) intelligence. However, one should note that, in contrast with traditional AI, machine learning is not trying to build automated imitation of intelligent behaviour, but rather to use the strengths and special abilities of computers to complement human intelligence, often performing tasks that fall way beyond human capabilities. For example, the ability to scan and process huge databases allows machine- leaming programs to detect patterns that are outside the scope of human perception. Statistics Machine learning and statistics are closely related fields in terms of methods, but distinct in their principal goal: statistics draws population inferences from a sample, while machine learning finds generalizable predictive pattems. SS Department of Information Systems Introduction to Machine Learning Data mining Data mining is a process of analysing data and deriving insights from a (large) dataset by applying business rules to it. Machine leaning and data mining often employ the same methods and overlap significantly, but while machine leaning focuses on prediction, based on known properties leamed from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases). Data mining uses many machine learning methods, but with different goals. Data science Data science is all about tuning data into products. It is analytics and machine learning put into action to draw inferences and insights out of data. Data science is a superset of Machine learning, data mining, and related subjects. It extensively covers the complete process starting from data loading until production. 1.4 Applications of machine learning The following are some applications of machine learning Image recognition Online Fraud detection Emails spam and malware filtering Traffic prediction Speech recognition Selfdriving cars Product recommendations Customer sentiment analysis And soon 1.5 Types of machine learning techniques Machine Leaming systems can be classified according to the amount and type of supervision they get during training. There are four major categories: supervised leaming, unsupervised leaming, semi supervised learning, and Reinforcement Learning SS Department of Information Systems 6 Introduction to Machine Learning Supervised learning Supervised learning is the types of machine learning in which machines are trained using well “labelled” training data, and on basis of that data, machines predict the output. The labelled data means some input data is already tagged with the correct output. In supervised leaming, the training data provided to the machines work as the supervisor that teaches the machines to predict the output correctly. It applies the same concept as a student learns in the supervision of the teacher. Supervised leaming is a process of providing input data as well as correct output data to the machineleaming model. The aim of a supervised leaming algorithm is to find a mapping function to map the input variable(s) with the output variable(y). ‘The model is being trained on labelled dataset. The labelled dataset have both input and output parameters. [Weert Gender Age Slay 24010 Male 1510364 Male 15650575 Female 15e03208 Female 190002 Male 1572577 ale sso Farle 25400573 Male 1577311 Female 1550799 female 15606274 Female 1970339 ale 1570987 Male 1562972 Male 1597606 le SURES RABE SN Gy RES Fig FigB Both the above figures have a labelled dataset. Figure A: It is a dataset ofa shopping store which is useful in predicting whether a costumer will purchase a particular product based on his gender, age, and salary. Input: Gender, Age, Salary Output: Purchased i.e 0 (means “No” that customer won’t purchase it ) or 1 (means “Yes” the customer will purchase the product ) Department of Information Systems Introduction to Machine Learning Figure B: it is a meteorological dataset used to predict wind speed based on different parameters Input: Temperature, Pressure, Relative humidity and Wind direction Output: Wind Speed Supervised leaming can be further divided into two types of problems: A. — Classification It is a supervised leaming task where the output (dependent variable) is having defined labels Giscrete values). Classification algorithms are used when the output variable is categorical, which means there are two classes such as Yes-No, Male-Female, True-false, etc. For example in the above figure A, Output — purchased has defined labels (0 and 1). 1 means “Yes” the customer will purchase a product and 0 means “No” that customer won’t purchase a product. It can be binary classification or multiclass classification. Training set >PE- New instance Fig: Classification B. Regression It isa supervised leaming task where the output (dependent variable) is having continues values. A regression problem is when the output variable is a real value, such as dollars or weight, price SS Department of Information Systems 8 Introduction to Machine Learning Regression algorithms are used ifthere is a relationship between the input variable and the output variable. In addition, the goal here is to predict a value as much closer to actual output value. For example in above figure B, output — wind speed having continues value in particular range. Value Feature 1 New instance Fig: regression Here are some of the most important supervised learning algorithms k-Nearest Neighbors Linear Regression Logistic Regression Support Vector Machines (SVMs) Decision Trees and Random Forests “Neural networks 2. Unsupervised learning ‘Unsupervised leaming is a machine leaning technique in which models are not supervised using training dataset. Instead, models itself find the hidden pattems and insights from the given data, The training data is unlabelled and the system tries to leam without a teacher. SS Department of Information Systems 9 Introduction to Machine Learning ‘Unsupervised learning cannot be directly applied to a regression or classification problem because unlike supervised leaming, we have the input data but no corresponding output data. The goal of unsupervised leaming is to find the underlying structure of dataset, group that data according to similarities, and represent that dataset in a compressed format. Fig: An unlabelled training set for unsupervised learning Example 1: Suppose the unsupervised learning algorithm is given an input dataset containing images of different types of cats and dogs. The algorithm is never trained upon the given dataset, which means it does not have any idea about the features of the dataset. The task of the unsupervised leaming algorithm is to identify the image features on their own. Unsupervised leaming algorithm will perform this task by clustering the image dataset into the groups according to similarities between images. Example 2: say you have many data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors figure below. At no point do you tell the algorithm which group a visitor belongs to: it finds those connections without your help. For example, it might notice that 40% of your visitors are males who love comic books and generally read your blog in the evening, while 20% are young sci-fi lovers who visit during the weekends, and so on. If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group. SS Department of Information Systems 10 Introduction to Machine Learning Feature 2 aa i a3 ‘ “ga % a a Speer a \ g ' 3 Feature Fig: Clustering Here are some of the most important unsupervised learning algorithms ‘» —_ Kemeans clustering © KNN Gcnearest neighbors) “* — Hierarchal clustering * Anomaly detection © Neural Networks Principle Component Analysis Independent Component Analysis Apriori algorithm Singular value decomposition 3. Semi supervised learning Some algorithms can deal with partially labelled training data, usually a lot of unlabelled data and a little bit of labelled data. Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). SS Department of Information Systems u Introduction to Machine Learning Fig: Semi supervised leaming 4. Reinforcement Learning ‘The leaming system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as in Figure below. It must then leam by itself what is the best strategy, called a policy, to get the most reward over time. Action! Get reward orpenatty Update policy (learning step) Fig: Reinforcement Learning 1.6. Overview of Data mining and KDD process Department of Information Systems 2 Introduction to Machine Learning Data mining is one of the most useful techniques that help entrepreneurs, researchers, and individuals to extract valuable information from huge sets of data. Data mining is also called Knowledge Discovery in Database (KDD). ‘The process of extracting information to identify patterns, trends, and useful data that would allow the business to take the data-driven decision from huge sets of data is called Data Mining. ‘What is KDD? EDD is referred to as Knowledge Discovery in Database and is defined as a method of finding, transforming, and refining meaningful data and pattems from a raw database in order to be utilised in different domains or applications. EDD in data mining is an iterative process that analyses patterns based on three factors Importance Usability Understandability Steps Involved in KDD ‘The KDD involves 9 steps and their sequence is important for obtaining the expected results. In some cases, it may be necessary to retum after the identification of an opportunity for improvement in the processing of the data. The essential steps of KDD Knowledge Discovery in Databases) are: A. — Understanding the Data Set Not everything is mathematics and statistics, but understanding the problems we are going to face and having context to propose viable and real solutions is. It is important to know the properties, limitations, and rules of the data or information understudy, and define the goals to be achieved. ‘This is the first step in the process and requires prior understanding and knowledge of the field to be applied in. This is where we decide how the transformed data and the patterns amived at by data mining will be used to extract knowledge. This premise is extremely important which, if set wrong, can lead to false interpretations and negative impacts on the end-use SS Department of Information Systems 13 Introduction to Machine Learning Data Selection After setting the goals and objectives, the data collected needs to be selected and integrate them into a single one that can help to reach the objectives ofthe analysis. Many times this information can be found in the same source or can also be distributed. These parameters are critical for data mining because they make the base for it and will affect what kinds of data models are formed. C. Cleaning and Pre-processing At this stage, the reliability of the information is determined, that is, camying out tasks that guarantee the usefulness of the data. This step involves searching for missing data and removing noisy, redundant and low-quality data from the data set in order to improve the reliability of the data and its effectiveness. Certain algorithms are used for searching and eliminating unwanted data based on attributes specific to the application. D. Data Transformation ‘This step prepares the data to be fed to the data mining algorithms. Hence, the data needs to be in consolidated and aggregate forms. The data is consolidated based on functions, attributes, features etc. ‘At this stage, the quality of the data is improved with transformations that involve either dimensionality reduction (reducing the number of variables in the data set) or transformations such as converting the values that are numbers to categorical (discretization). E, Select the Appropriate Data Mining Task In this phase, the right data mining process can be chosen be it classification, regression, or grouping, according to the objectives that have been set for the process. - Choice of Data Mining Algorithms Subsequently, we proceed to select the technique or algorithm or both, to search for the pattem and obtain knowledge. The meta-leaming focuses on explaining the reason why an algorithm works better for certain problems, and for each technique, there are different possibilities ofhow to select them. Each algorithm has its own essence, its own way of working and obtaining the results, so it is advisable to know the properties of those candidates to use and see which one best fits the data. Oe] Department of Information Systems 14 Introduction to Machine Learning G. Application of Data Mining Algorithms Finally, once the techniques have been selected, the next step is to apply them to the data already selected, cleaned, and processed. It is possible that the execution of the algorithms in several trying to adjust the parameters that optimize the results. These parameters vary according to the selected method. H. Evaluation Once the algorithms have been applied to the data set, we proceed to evaluate the pattems that ‘were generated and the performance that was obtained to verify that it meets the goals set in the first phases. To carry out this evaluation there is a technique called Cross-Validation, which performs data partition, dividing it into training (which will be used to create the model) and test (which will be used to see that the algorithm really works and does its job well). L Interpretation Ifall the steps are followed correctly and the results of the evaluation are satisfied, the last stage is simply to apply the knowledge found to the context and begin to solve its problems. If otherwise, the results are not satisfactory then it is necessary to retum to the previous stages to make some adjustments, analyzing from the selection of the data to the evaluation stage. Results must be presented in an understandable format. For this reason, data visualization techniques are important for the results to be useful since mathematical models or descriptions in text format can be difficult for end-users to interpret. Note: EDD is an iterative process where evaluation measures can be enhanced, mining can be refined, new data can be integrated and transformed in order to get different and more appropriate results. 1.7, Prediction vs. Description modelling Predictive modelling Predictive model, it identifies patterns found in past and transactional data to find risks and future outcomes. It will help an organization to know what might happen next, it predicts future ee) Department of Information Systems 1 Introduction to Machine Learning based on present data available. It will analyse the data and provide statements that have not happened yet. It makes all kinds of predictions that you want to know and all predictions are probabilistic in nature. Descriptive modelling A descriptive model will exploit the past data that are stored in databases and provide you with the accurate report. It will help an organization to know what has happened in the past, it would give you the past analytics using the data that are stored. For a company, it is necessary to know the past events that help them to make decisions based on the statistics using historical data. For example, you might want to know how much money you lost due to fraud and many more. Basis for Comparison Descriptive Analytics Describes ‘What happened in the past? By using the stored data. Process Involved Involves Data Agsregation and Data Mining. Definition The process of finding useful and important information by analyzing the huge data. Data Volume It involves in processing huge data that are stored in data warehouses. Limited to past data. Predictive Analytics ‘What might happen in the future? By using the past data and analyzing it. Involves Statistics and forecast techniques. This process involves in forecasting the future of the company, which are very useful. It involves analyzing large past data and then predicts the future using advance techniques. Department of Information Systems 16 Introduction to Machine Learning Examples Sales report, revenue of a company, performance analysis, ete. Accuracy It provides accurate data in the reports using past data. Approach It allows the reactive approach Department of Information Systems Sentimental analysis, credit score analysis, forecast reports for a company, etc. Results are not accurate, it will not tell you exactly what will happen but it will tell you what might happen in the future. ‘While this a proactive approach

