(Good) Ian H. Witten, Eibe Frank, Mark A. Hall Data Mining - Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) 2011
(Computing 14) A. Aguilera, D. Ayala (Auth.), Professor Dr. Guido Brunnett, Dr. Hanspeter Bieri, Professor Dr. Gerald Farin (Eds.) - Geometric Modelling-Springer-Verlag Wien (2001)
Analytics CONTENT • Supervised Learning • Unsupervised Learning • Reinforcement Learning • Steps In Building Machine Learning Models • Differences Between Machine Learning And Marketing Research Models Section I: Introduction to Machine Learning Supervised Learning When input-output data pairs are the core ingredient to each other then such learning is called supervised learning. Supervised learning methods are the most convenient methods among machine learning. • There are a few popular algorithms which are used in Supervised Learning as follows: Logistic regression is a fast, reliable algorithm to predict a binary response such as Yes or No Gradient Boosted Decision Trees (GBDT) algorithm is used for structured data since they provide high accuracy with relatively less training time. For example, Face book uses GBDT Algorithm Deep learning algorithms excel in predicting labels for unstructured data such as images and text. Unsupervised Learning An unsupervised learning comprises methods where input data is in abundance, but there is no desired output available. Research in unsupervised learning is only beginning to gather steam There are two popular methods of unsupervised learning are: Principal Components Analysis (PCA) is a technique that can be used to construct a handful of custom features from a large number of inputs. PCA is often used as an intermediate step in supervised learning so as to reduce the number of output features and then feeding these as inputs to another algorithm Clustering methods are often used to group and segment customer data. Reinforcement Learning Reinforcement learning methods rely on input data which has no clear outputs but a signal that grades the model output. In this, the algorithm has to sift through several possible outputs therefore, these methods are computationally intensive. Under such learning, it is difficult to build algorithms that are able to learn the signal in noisy environments A common application for reinforcement learning is ad content optimization. Section II: Building Machine Learning Models STEPS IN BUILDING MACHINE LEARNING MODELS Our objective in developing a machine learning model is to use a set of inputs to predict an observed response We start with input data that contains previously observed values and we wish to figure out a rule that governs the relationship between inputs and the output We want this rule to hold on data and the model has never seen before Once we are sure that the model works on unseen data, we then need to deploy it. There is a sequence of steps that one follows in the development of a machine learning model. STEPS INVOLVED IN MODEL DEVELOPMENT ARE:
Step 1: Metric Choice and Data Splitting
We choose a metric by which we will judge the performance of the model For Instance, o For numeric outcomes, it is standard practice to choose the Root Mean Squared Error (RMSE) as the metric which is the square root of the mean of the squared difference between the model predicted value and the actually observed value. We then split our data into three parts – training, validation and test. The training data is used to learn the relationship between the input and output Step 1 (Contd..) The validation data is used to get a sense of how well the model performs to improve the training process The test set is used as the final checkpoint to select the desired model. For instance, o To split the data, we borrow a common ratio used in practice for the train-validation-test split: 70-10-20, that is, we randomly sample 70% of our data and assign it to be training data. Then we randomly sample 10% of the remaining data and assign it to be the validation data. The balance 20% data is our test data In precise we can say that, In the development of a machine learning model we divide the data into training, validation and testing sets. Training and validation are used to choose the best settings for a specific model architecture, while the test set is used to decide among different model architectures. Step 2: Pre-Processing Once the data is split into training, validation and test, we verify the sanity of the input data by checking for missing values, the numerical range of the predictors, and possible relationships between inputs. If missing values are detected, we either discard the samples or impute the missing values using a suitable method. Next we check the numerical range of the predictors by looking at the mean and the quantiles of the predictors. This not only helps in optimizing to converge faster during model training but also aids interpretation. Finally, we check for the possible correlation between the inputs. The performance of the model is affected by a set of highly correlated features. Step 3: Model Training and Hyper parameter Tuning A key step of tuning of hyper parameters of the chosen model is to be taken during the model training process. As the model choice goes beyond simple linear models, we have to make a set of choices about model parameters that are specific to the data set. The choices we make are fixed before the learning process from training data begins and are called hyper parameters. An important thing to note is that the hyper parameters cannot be inferred directly from the training data alone. The training data would be input-specific parameters, such as regression coefficients. We can clear the above analogy by giving an easy example as follows: Step 3 (Contd..) While driving a car, the choice of deciding which gear to drive in is a hyper parameter choice while the decisions made by the car, such as, how much fuel to inject given the choice of gear, are input-specific parameters. The car cannot decide on its own which gear to drive in, but once the gear is set it can adapt itself to optimize its internal machinery for this gear. We need to note that during the model training process, we wiggle our choice of input-specific parameters till we reach the lowest level of RMSE. The leap we make here is that rather than making a conscious decision on a gear based on our judgement , we allow the data structure to dictate the choice of gear as well. So, we make a choice of gear, allow the model to fit to the training data and measure RMSE. We iterate over the four available gears and choose the best gear as one that has the least RMSE. Step 3 (Contd..) During model training process, we have two broad sets of knobs. One set of knobs (hyper-parameters) control how complex we wish the specific algorithm employed to be(sort of ‘top level control). and The second set of knobs(parameters) control the input-specific parameters of model at the chosen level of complexity. To make a choice between several candidate models that are fit to the data, we use the test set to choose a final model among these competing models. Let us take the previous example of driving the car, the choice of which gear to drive is done using the validation set that then is utilized by the car to tune its parameters based on the training data. We also have to make a decision on which car to drive which is done based on the test set. Step 3 (Contd..) We do have to first make a choice on which specific models we wish to fit to the data, while hyper-parameter tuning forms the bulk of the model training process. All we have to do is to iterate over several model choices to fit the training data and choose the model that has the lowest RMSE on the test data. Step 4: Model Deployment Much attention and research is focused on developing new model architectures, deployment of trained models in decision-making scenarios is also a difficult process. During the time taken for the model to process the new input and produce the output, there might be structural differences in the input data and the live data which results in the derail model predictions. The deployment process goes through beta testing phase before a model is put in production. Under beta testing phase, live data is fed to the model, but predictions are logged and analyzed. DIFFERENCES BETWEEN MACHINE LEARNING AND MARKETING RESEARCH MODELS As we already discussed, Models are tuned and trained to minimize predictive errors, with far less focus on the explanatory power of the model which is particularly different from a typical marketing perspective on modeling. In marketing research, the analyst starts with a set of hypotheses which are then tested by collecting relevant data and measuring variables required to test these hypotheses. Whereas, in case of machine learning, the analyst starts by collecting as much data as possible and chooses a model, however complex it might be, that best fits this data. Section III: Machine Learning vs Marketing Research ML vs Marketing Research (Contd..) The extent to which the model fits the data is important in both Machine Learning and Marketing Research. The assessment of the impact of specific changes in a feature are a core concern in marketing research while predictions, are a core concern in machine learning. That is the reason, linear models are dominant in marketing research. The focus on prediction is the main reason why machine learning models tend to become ‘black boxes’, In traditional marketing research process, the statistical models usually employed are designed with small samples in mind. ML vs Marketing Research (Contd..) Marketing research methods hinge on a large body of research in statistics whereas, in case of machine learning, models tend to be data hungry. Collecting and optimally using large data sets is a feature of the machine learning. In machine learning, the focus is much less on ensuring the general is ability of findings Thus, we can say that, both the approaches differ widely in their approach to modeling but in general practice it is often worthwhile to explore both options for a given data set. ML vs Marketing Research (Contd..) Even when the data sets become too large and pose difficulties for small sample methods, we can fit linear models on appropriately drawn random samples from the data which can lead to insights that can then be incorporated into the training of machine learning models. Over the last decade, there has been a veritable explosion of data available to the marketer and an associated increase in the methods available to analyze this data. ML vs Marketing Research (Contd..) Most of these methods borrow their roots from the machine learning approach, and is optimized for prediction on large amounts of data While machine learning methods deviate from methods traditionally accessed by the marketer in scale and intent, they provide a rich array of potent tools to analyze large data sets. THANK YOU
(Good) Ian H. Witten, Eibe Frank, Mark A. Hall Data Mining - Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) 2011
(Computing 14) A. Aguilera, D. Ayala (Auth.), Professor Dr. Guido Brunnett, Dr. Hanspeter Bieri, Professor Dr. Gerald Farin (Eds.) - Geometric Modelling-Springer-Verlag Wien (2001)