DATA MINIING Unit 1 Notes

Introduction to Data ing Data mining is the process of extracting useful information from large sets of data. It involves using various techniques from statistics, machine learning, and database systems to identify patterns, relationships, and trends in the data. This information can then be used to make data-driven decisions, solve business problems, and uncover hidden insights. Applications of data mining include customer profiling and segmentation, market basket analysis, anomaly detection, and predictive modeling. Data mining tools and technologies are widely used in various industries, including finance, healthcare, retail, and telecommunications. Data mining is the process that helps in extracting information from a given data set to identify trends, patterns, and useful data. The objective of using data mining is to make data-supported decisions from enormous data sets. Data mining serves a unique purpose, which is to recognize patterns in datasets for a set of problems that belong to a specific domain. Data mining is the process of searching large sets of data to look out for patterns and trends that can’t be found using simple analysis techniques. It makes use of complex mathematical algorithms to study data and then evaluate the possibility of events happening in the future based on the findings. It is also referred to as knowledge discovery of data or KDD. In general terms, “Mining” is the process of extraction of some valuable material from the earth e.g. coal mining, diamond mining, etc. In the context of computer science, “Data Mining” can be referred to as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging Itis basically the process carried out for the extraction of useful information from a bulk of data or data warehouses, The primary goal of data mining is to uncover hidden patterns and relationships within the data that can be used to make informed business decisions, predict future trends, and improve organizational strategies.Benefits of Data Mining 1. Improved decision-making: Data mining can provide valuable insights that can help organizations make better decisions by identifying patterns and trends in large data sets. 2. Increased efficiency: Data mining can automate repetitive and time- consuming tasks, such as data cleaning and preparation, which can help organizations save time and resources. 3. Enhanced competitiveness: Data mining can help organizations gain a competitive edge by uncovering new business opportunities and identifying areas for improvement. 4, Improved customer service: Data mining can help organizations better understand their customers and tailor their products and services to meet their needs. 5. Fraud detection: Data mining can be used to identify fraudulent activities by detecting unusual patterns and anomalies in data 6. Predictive modeling: Data mining can be used to build models that can predict future events and trends, which can be used to make proactive decisions. 7. New product development: Data mining can be used to identify new product opportunities by analyzing customer purchase patterns and preferences. 8. Risk management: Data mining can be used to identify potential risks by analyzing data on customer behavior, market conditions, and other factors.Data Mining Techniques 1. Classification: It involves categorizing data into predefined classes or categories based on attributes or features. Classification algorithms are used for tasks like spam email detection or credit risk assessment 2. Clustering: Clustering aims to group similar data points together based on certain characteristics or features without predefined classes. It helps in understanding the inherent structure within the data. 3. Regression Analysis: This technique is used to predict numerical values based on relationships between variables. It helps in forecasting and understanding the correlation between different factors. 4, Anomaly Detection: Anomaly detection identifies data points that deviate significantly from the normal behavior or patterns in a dataset. It's crucial for fraud detection, network security, and fault detection in systems. 5. Predictive Analytics: It involves using historical data to predict future outcomes. Machine learning models are often used in predictive analytics for forecasting and decision-making. or 1.Association Itis one of the most used data mining techniques out of all the others. In this technique, a transaction and the relationship between its items are used to identify a pattern. This is the reason this technique is also referred to as a relation technique. It is used to conduct market basket analysis, which is done to find out all those products that customers buy together on a regular basis. This technique is very helpful for retailers who can use it to study the buying habits of different customers. Retailers can study sales data of the past and then lookout for products that customers buy together. Then they can put those products in close proximity of each other in their retail stores to help customers. ‘save their time and to increase their sales.This data mining technique adopts a two-step process. + Finds out all the repeatedly occurring data sets. + Develop strong association rules from the recurrent data sets. Three types of association rules are: + Multilevel Association Rule + Quantitative Association Rule + Multidimensional Association Rule 2. Clustering This creates meaningful object clusters that share the same characteristics. Clustering analysis identifies data that are identical to each other. It clarifies the similarities and differences between the data. It is known as segmentation and provides an understanding of the events taking place in the database. Different types of clustering methods are: + Density-Based Methods + Model-Based Methods + Partitioning Methods + Hierarchical Agglomerative methods + Grid-Based Methods 3. Classification This technique finds its origins in machine learning. It classifies items or variables in a data set into predefined groups or classes. It uses linear programming, statistics, decision trees, and artificial neural network in data mining, amongst other techniques. Classification is used to develop software that can be modelled in a way that it becomes capable of classifying items in a data set into different classes4. Prediction This technique predicts the relationship that exists between independent and dependent variables as well as independent variables alone. It can be used to predict future profit depending on the sale. Let us assume that profit and sale are dependent and independent variables, respectively. Now, based on what the past sales data says, we can make a profit prediction of the future using a regression curve. 5. Sequential patterns This technique aims to use transaction data, and then identify similar trends, patterns, and events in it over a period of time. The historical sales data can be used to discover items that buyers bought together at different times of the year. Business can make sense of this information by recommending customers to buy those products at times when the historical data doesn't suggest they would, Businesses can use lucrative deals and discounts to push through this, recommendation. 6. Statistical Techniques Statistics is one of the branches of mathematics that links to the data's collection and description. Many analysts don't consider it a data mining technique. However, it helps to identify the patterns and develop predictive models. Therefore, data analysts must have some knowledge about various statistical techniques. 7. Induction Decision Tree Technique Decision tree induction is a popular data mining technique used for classification and prediction tasks. It involves creating a tree-like model of decisions by splitting data into smaller subsets based on the values of input attributes.These trees are valuable in various fields, including finance (for credit scoring), healthcare (for diagnosis), and marketing (for customer segmentation), among others, due to their ability to handle both categorical and numerical data while providing understandable decision-making pathways. 8. Visualization In data mining, visualization techniques play a crucial role in understanding and presenting complex patterns, trends, and relationships within datasets. Some common visualization techniques used in data mining include: * Scatter Plots: These display individual data points as dots on a graph. Tree Maps: These hierarchical visualizations represent data in a nested, tree-like structure * Network Diagrams: Also known as graphs, these visualizations illustrate relationships between entities. * 3D Visualization: 3D visualization helps in visualizing multidimensional data. + Line Charts: Displaying data points connected by lines, useful for representing trends or changes over time in a dataset. * Box Plots: Illustrating the distribution of numerical data through quartiles, highlighting outliers and providing insights into the spread and skewness of the dataset. * Parallel Coordinates: Displaying multidimensional data by using multiple axes aligned in parallel, aiding in visualizing relationships between multiple variables simultaneously. These visualization techniques help analysts and data scientists to explore, interpret, and communicate insights derived from large and complex datasets in a more accessible and understandable manner. They facilitate pattern recognition, anomaly detection, and decision-making processes by providing visual cues and intuitive representations of data relationshipsData Mining Process Here's an overview of the process: 4, Understanding the Problem Define the problem and objectives. Determine the data mining goals and criteria for success. 2. Data Collection Gather relevant data from various sources (databases, text files, websites, etc.) Ensure data quality by cleaning and preprocessing (handling missing values, normalization, etc.). 3. Data Exploration Explore the dataset to understand its characteristics. Use statistical methods and visualizations to identify patterns, correlations, and outliers. 4, Preprocessing Transform the data into a suitable format for analysis. Select features that are relevant to the analysis. Reduce dimensionality if needed (e.g., using techniques like PCA). 5, Model Building Choose appropriate data mining techniques (classification, clustering, regression, etc). Apply algorithms to the prepared data to build models. Evaluate and validate the models using metrics like accuracy, precision, recall, 6. Interpretation and Evaluation Interpret the results obtained from models. Evaluate the effectiveness of the models against the initial goals. iteratively refine the models if needed. 7. Deployment Implement the insights gained from data mining into practical applications. Monitor the performance of deployed models and systems. 8. Maintenance and Updates Continuously update models to adapt to changing data patterns and improve performance Maintain data quality and ensure relevance to the problemData Mining Tools few data mining methodology and tools that are currently being used in the industry: 1. RapidMiner: RapidMiner is an open-source platform for data science that is available for no cost and includes several algorithms for tasks such as data preprocessing ML/DL, text mining, and predictive analytics. For use cases like fraud detection and customer attrition, RapidMiner’s easy GUI(graphical user interface)and pre-built models make it easy for non- programmers to construct predictive processes. Meanwhile, RapidMiner's R and Python add-ons allow developers to fine-tune data mining to their specific needs. 2. Oracle Data Mining: Predictive models may be developed and implemented with the help of Oracle Data Mining, which is a part of Oracle Advanced Analytics. Models built using Oracle Data Mining may be used to do things like anticipating customer behaviour, dividing up customer profiles into subsets, spot fraud, and zeroing in on the best leads. These models are available as a Java API for integration into business intelligence tools, where they might aid in the identification of previously unnoticed patterns and trends. 3. Apache Mahout: It is a free and open-source machine-learning framework. Its purpose is to facilitate the use of custom algorithms by data scientists and researchers. This framework is built on top of Apache Hadoop and is written in JavaScript. Its primary functions are in the fields of clustering and classification. Large-scale, sophisticated data mining projects that deal with plenty of information work well with the Apache Mahout.4. KNIME: KNIME (Konstanz Information Miner) is an (open-source) data analysis platform that allows you to quickly develop, deploy, and scale. This tool makes predictive intelligence accessible to beginners. It simplifies the process through its GUI tool, which includes a step-by-step guide. The product is endorsed as an ‘End to End Data Science’ product. 5. ORANGE: You must know what is data mining before you use tools like ORANGE. It is a machine learning and data science tool. It uses visual programming and Python scripting that features engaging data analysis and component- focused assembly of data mining mechanisms. Moreover, ORANGE is one of the versatile mining methods in data mining because it provides a wider range of features than many other Python-focused machine learning and data mining tools. Moreover, it presents a visual programming platform with a GUI tool for engaging data visualization. Or Data mining tools are software applications or platforms used to extract valuable insights and patterns from large datasets. These tools employ various techniques and algorithms to analyze data and uncover hidden pattems, correlations, trends, and relationships. Some popular data mining tools include: RapidMiner: An open-source data science platform offering a wide range of tools for data preparation, machine learning, predictive analysis, and text mining. Weka: Another open-source software offering a collection of machine learning algorithms for data mining tasks. It provides a graphical user interface for easy experimentation. KNIME: An open-source data analytics, reporting, and integration platform. KNIME allows users to visually create data flows, execute diverse analyses, and deploy workflows.IBM SPSS Modeler: A powerful data mining software with a visual interface that enables users to build predictive models and conduct advanced analytics without needing extensive programming skills. SAS Enterprise Miner: A tool used for data mining, statistical analysis, and predictive modeling. It provides a GUI-based interface for building analytical models. Microsoft SQL Server Analysis Services (SSAS): Part of the Microsoft SQL Server suite, SSAS provides tools for creating data mining models and deploying them in business intelligence solutions. Oracle Data Mining (ODM): An option within Oracle's database that provides powerful data mining algorithms to enable users to discover insights and make predictions. TensorFlow and scikit-learn: These are popular libraries in Python for machine learning and data mining tasks. While not standalone tools, they offer a vast array of algorithms and tools for data mining when used within Python environments. Tableau: While primarily known for its visualization capabilities, Tableau also offers features for data exploration and basic predictive analysis. R Language: While it’s a programming language, R has numerous packages and libraries dedicated to data mining and machine learning tasks. Packages like caret, e1071, and randomForest are widely used.Data mining, a process of discovering patterns and extracting valuable insights from large datasets, finds applications across various industries and domains. Here are some common applications of data mining: Marketing and Sales: Analyzing customer behavior, purchase history, preferences, and trends helps in targeted marketing, customer segmentation, and personalized recommendations. Healthcare: pata mining aids in disease prediction, diagnosis, treatment optimization, and identifying trends in public health. It also helps in managing and analyzing electronic health records (EHRs). Finance and Banking: Detecting fraudulent activities, credit scoring, risk assessment, market trend analysis, and customer relationship management are some crucial areas where data mining is employed. Telecommunications: Analyzing call records, network traffic, and customer data helps in improving network performance, optimizing services, and predicting customer churn. E-commerce and Retail: Recommendation systems, market basket analysis, inventory management, and pricing strategies benefit from data mining by understanding customer preferences and behaviors. Education: Educational data mining assists in improving learning experiences by analyzing student performance, identifying at-risk students, and personalizing educational content. Manufacturing and Logistics: Predictive maintenance, supply chain optimization, quality control, and demand forecasting are areas where data mining enhances operational efficiency. Social Media Analysis: Understanding user behavior, sentiment analysis, trend prediction, and targeted advertising are common applications of data mining in social media platforms.Fraud Detection and Security: Identifying anomalies in transactions, network traffic, or user behavior helps in detecting fraud and enhancing security measures. Science and Research: Data mining supports scientific research by analyzing large datasets, identifying patterns, and making discoveries in various fields like genetics, astronomy, climate modeling, etc. Retail and E-commerce: Retailers use data mining to analyze customer buying patterns, preferences, and behaviors. This helps in targeted marketing, inventory management, and recommendation systems (like product recommendations on e-commerce websites). Marketing and Customer Relationship Management (CRM): Data mining techniques are employed to analyze customer data, behavior patterns, and preferences. Marketers use this information to create targeted marketing campaigns and improve customer engagement. Manufacturing and Supply Chain Management: Data mining aids in optimizing production processes, predictive maintenance of equipment, supply chain optimization, and inventory management. Government and Security: Governments use data mining for various purposes like crime pattern analysis, national security, optimizing public services, and fraud detection. Energy and Utilities: Data mining techniques assist in predictive maintenance of equipment, optimizing energy consumption, and improving overall operational efficiency in the energy sector. These applications highlight the broad spectrum of industries and domains where data mining plays a crucial role in extracting valuable insights and driving decision-making processes.Challenges of Data Mining Data mining, the process of extracting knowledge from data, has become increasingly important as the amount of data generated by individuals, organizations, and machines has grown exponentially. However, data mining is not without its challenges. In this article, we will explore some of the main challenges of data mining. 4] Data Quality The quality of data used in data mining is one of the most significant challenges. The accuracy, completeness, and consistency of the data affect the accuracy of the results obtained. The data may contain errors, omissions, duplications, or inconsistencies, which may lead to inaccurate results. Moreover, the data may be incomplete, meaning that some attributes or values are missing, making it challenging to obtain a complete understanding of the data. Data quality issues can arise due to a variety of reasons, including data entry errors, data storage issues, data integration problems, and data transmission errors. To address these challenges, data mining practitioners must apply data cleaning and data preprocessing techniques to improve the quality of the data, Data cleaning involves detecting and correcting errors, while data preprocessing involves transforming the data to make it suitable for data mining. 2] Data Complexity Data complexity refers to the vast amounts of data generated by various sources, such as sensors, social media, and the internet of things (loT). The complexity of the data may make it challenging to process, analyze, and understand. In addition, the data may be in different formats, making it challenging to integrate into a single dataset. To address this challenge, data mining practitioners use advanced techniques such as clustering, classification, and association rule mining. These techniques help to identify patterns and relationships in the data, which can then be used to gain insights and make predictions.3] Data Privacy and Security Data privacy and security is another significant challenge in data mining. As more data is collected, stored, and analyzed, the risk of data breaches and cyber-attacks increases. The data may contain personal, sensitive, or confidential information that must be protected, Moreover, data privacy regulations such as GDPR, CCPA, and HIPAA impose strict rules on how data can be collected, used, and shared. To address this challenge, data mining practitioners must apply data anonymization and data encryption techniques to protect the privacy and security of the data. Data anonymization involves removing personally identifiable information (Pll) from the data, while data encryption involves using algorithms to encode the data to make it unreadable to unauthorized users. 4] Scalability Data mining algorithms must be scalable to handle large datasets efficiently. As the size of the dataset increases, the time and computational resources required to perform data mining operations also increase. Moreover, the algorithms must be able to handle streaming data, which is generated continuously and must be processed in real-time. To address this challenge, data mining practitioners use distributed computing frameworks such as Hadoop and Spark. These frameworks distribute the data and processing across multiple nodes, making it possible to process large datasets quickly and efficiently. 4] interpretability Data mining algorithms can produce complex models that are difficult to interpret. This is because the algorithms use a combination of statistical and mathematical techniques to identify patterns and relationships in the data. Moreover, the models may not be intuitive, making it challenging to understand how the model arrived at a particular conclusion. To address this challenge, data mining practitioners use visualization techniques to represent the data and the models visually. Visualization makes it easier to understand the patterns and relationships in the data and to identify the most important variables.5]Ethics Data mining raises ethical concems related to the collection, use, and dissemination of data. The data may be used to discriminate against certain groups, violate privacy rights, or perpetuate existing biases. Moreover, data mining algorithms may not be transparent, making it challenging to detect biases or discrimination Or Data mining, while a powerful tool for extracting valuable insights from large datasets, is not without its challenges and issues. Some major issues in data mining include: Data Quality: Poor data quality can significantly impact the outcomes of data mining. Inaccurate, incomplete, or inconsistent data can lead to misleading results and flawed insights. Data Preprocessing: Cleaning and preprocessing raw data for analysis can be time-consuming and complex. Handling missing values, outliers, and noise requires careful attention to ensure the accuracy of the analysis. Overfitting: Overfitting occurs when a model learns the training data too well, capturing noise and irrelevant patterns. This results in poor performance when applied to new, unseen data. Selection Bias: Biases in the data collection process can lead to skewed results. For instance, if certain demographics are overrepresented or underrepresented, it can affect the accuracy of predictions or conclusions. Algorithmic Fairness: Biases can be inherent in the algorithms themselves or introduced through the data. Ensuring fairness in predictions across different demographic groups is a significant challenge in data mining. Ethical Concerns: Using data mining for purposes like targeted marketing, surveillance, or decision-making raises ethical questions regarding consent, manipulation, and the potential for discrimination or harm.Dynamic Nature of Data: Real-world data is dynamic and ever-changing. Models built on historical data might become outdated or less effective over time as trends, patterns, or behaviors evolve. Complexity and Scalability: Handling massive volumes of data requires robust computational resources. The complexity of algorithms and scalability to process large datasets efficiently can pose challenges. Algorithm Selection: Choosing the right algorithm for a specific task is critical. Different algorithms have varying strengths and weaknesses, and selecting an inappropriate one can lead to suboptimal results. Interpretability and Explainability: Complex models might provide accurate predictions, but understanding how they arrive at those conclusions can be challenging. Ensuring models are interpretable is crucial, especially in sensitive domains like healthcare or finance. Overfitting and Generalization: Models can become too tailored to the training data, resulting in overfitting and poor performance on new, unseen data. Achieving a balance between fitting the training data well and generalizing to new data is essential. Bias and Fairness: Biases present in data can lead to biased models, perpetuating and even amplifying societal biases. Ensuring fairness and mitigating biases in the data and algorithms is a critical concern. Security Risks: Handling sensitive information poses security risks. Protecting data from unauthorized access, cyber threats, and ensuring data integrity are significant challenges. Data Integration: Combining data from diverse sources with varying formats and structures can be challenging. Ensuring compatibility and meaningful integration is crucial for accurate analysis. Human Involvement: While data mining involves automated processes, human expertise is necessary to interpret results, validate findings, and ensure the relevance of discovered pattems.Data mining functionalities are used to represent the type of patterns that have to be discovered in data mining tasks. In general, data mining tasks can be classified into two types including descriptive and predictive. Descriptive mining tasks define the common features of the data in the database and the predictive mining tasks act inference on the current information to develop predictions. There are various data mining functionalities which are as follows - Data characterization - It is a summarization of the general characteristics of an object class of data. The data corresponding to the user-specified class is generally collected by a database query. The output of data characterization can be presented in multiple forms. Data discrimination - It is a comparison of the general characteristics of target class data objects with the general characteristics of objects from one or a set of contrasting classes. The target and contrasting classes can be represented by the user, and the equivalent data objects fetched through database queries. Association Analysis - It analyses the set of items that generally occur together in a transactional dataset. There are two parameters that are used for determining the association rules -It provides which identifies the common item set in the database.Confidence is the conditional probability that an item occurs in a transaction when another item occurs. Classification - Classification is the procedure of discovering a model that represents and distinguishes data classes or concepts, for the objective of being able to use the model to predict the class of objects whose class label is anonymous. The derived model is established on the analysis of a set of training data (i.e., data objects whose class label is common).Prediction - It defines predict some unavailable data values or pending trends. An object can be anticipated based on the attribute values of the object and attribute values of the classes. It can be a prediction of missing numerical values or increase/decrease trends in time-related information. Clustering - It is similar to classification but the classes are not predefined. The classes are represented by data attributes. It is unsupervised learning. The objects are clustered or grouped, depends on the principle of maximizing the intraclass similarity and minimizing the intraclass similarity. Outlier analysis - Outliers are data elements that cannot be grouped in a given class or cluster. These are the data objects which have multiple behaviour from the general behaviour of other data objects. The analysis of this type of data can be essential to mine the knowledge. Evolution analysis - It defines the trends for objects whose behaviour changes over some time. or Data mining encompasses a range of functionalities and techniques used to extract meaningful patterns, information, or knowledge from large datasets. Some of the key functionalities in data mining include: Association Rule Mining: Identifying interesting relationships or associations among data items. It involves finding patterns where one event leads to another at a significant frequency, like market basket analysis in retail. Classification: Sorting data into predefined classes or categories based on certain features. It involves building models that can predict the class of new data instances based on the training dataset. Examples include spam email detection or predicting customer churn. Clustering: Grouping similar data points together based on their characteristics or features. It helps in identifying inherent structures in thedata without predefined classes. Applications include customer segmentation or grouping documents for information retrieval. Regression Analysis: Predicting a continuous outcome variable based on one or more predictor variables. It's used to understand relationships between variables and make predictions, such as predicting sales based on advertising expenditure. Anomaly Detection: Identifying outliers or unusual patterns in the data that do not conform to expected behavior. It's useful for fraud detection, network intrusion detection, or identifying defects in manufacturing. Sequential Pattern Mining: Discovering sequential patterns or trends in data, such as analyzing sequences of events or transactions. It helps in understanding patterns of behavior over time, like web clickstream analysis. Text Mining: Extracting meaningful information from unstructured text data. It involves techniques like natural language processing (NLP), sentiment analysis, and topic modeling to derive insights from text documents, social media, or web content. Prediction and Forecasting: Making predictions or forecasts about future trends or values based on historical data. It's used in financial markets, weather forecasting, and demand forecasting in supply chain management. Dimensionality Reduction: Reducing the number of variables or features in a dataset while preserving important information. Techniques like Principal Component Analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (t-SNE) help in visualization and analysis of high-dimensional data. Recommendation Systems: Suggesting items or content to users based on their preferences or behavior. Collaborative filtering, content-based filtering, and hybrid methods are used in recommendation systems in e- commerce, streaming platforms, and more.Data analysis Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making [ Types of Data Analysis ] Analysis of data is a vital part of running a successful business. When data is used effectively, it leads to better understanding of a business's previous performance and better decision-making for its future activities. There are many ways that data can be utilized, at all levels of a company's operations. There are four types of data analysis that are in use across all industries. While we separate these into categories, they are all linked together and build upon each other. As you begin moving from the simplest type of analytics to more complex, the degree of difficulty and resources required increases. At the same time, the level of added insight and value also increases. Four Types of Data Analysis The four types of data analysis are: Descriptive Analysis The first type of data analysis is descriptive analysis. It is at the foundation of all data insight. It is the simplest and most common use of data in business today. Descriptive analysis answers the “what happened” by summarizing past data, usually in the form of dashboards.The biggest use of descriptive analysis in business is to track Key Performance Indicators (KPIs). KPIs describe how a business is performing based on chosen benchmarks. Business applications of descriptive analysis include: + KPI dashboards + Monthly revenue reports + Sales leads overview agnostic analysis takes the insights found from descriptive analytics and drills down to find the causes of those outcomes. Organizations make use of this type of analytics as it creates more connections between data and identifies patterns of behavior. A critical aspect of diagnostic analysis is creating detailed information. When new problems arise, it is possible you have already collected certain data pertaining to the issue. By already having the data at your disposal, it ends having to repeat work and makes all problems interconnected. Business applications of diagnostic analysis include: + A freight company investigating the cause of slow shipments in a certain region + A SaaS company drilling down to determine which marketing activities increased trials Predictive Analysis Predictive analysis attempts to answer the question “what is likely to happen’. This type of analytics utilizes previous data to make predictions about future outcomes. This type of analysis is another step up from the descriptive and diagnostic analyses. Predictive analysis uses the data we have summarized to make logical predictions of the outcomes of events. This analysis relies on statistical modeling, which requires added technology and manpower to forecast. It is also important to understand that forecasting is only an estimate; the accuracy of predictions relies on quality and detailed data.While descriptive and diagnostic analysis are common practices in business, predictive analysis is where many organizations begin show signs of difficulty. Some companies do not have the manpower to implement predictive analysis in every place they desire. Others are not yet willing to invest in analysis teams across every department or not prepared to educate current teams. Business applications of predictive analysis include: + Risk Assessment + Sales Forecasting + Using customer segmentation to determine which leads have the best chance of converting + Predictive analytics in customer success team Prescriptive Analysis The final type of data analysis is the most sought after, but few organizations are truly equipped to perform it. Prescriptive analysis is the frontier of data analysis, combining the insight from all previous analyses to determine the course of action to take in a current problem or decision. Prescriptive analysis utilizes state of the art technology and data practices. It is a huge organizational commitment and companies must be sure that they are ready and willing to put forth the effort and resources. Artificial Intelligence (Al) is a perfect example of prescriptive analytics. Al systems consume a large amount of data to continuously learn and use this information to make informed decisions. Well-designed Al systems are capable of communicating these decisions and even putting those decisions into action. Business processes can be performed and optimized daily without a human doing anything with artificial intelligence. Currently, most of the big data-driven companies (Apple, Facebook, Netflix, etc.) are utilizing prescriptive analytics and Al to improve decision making. For other organizations, the jump to predictive and prescriptive analytics can be insurmountable. As technology continues to improve and more professionals are educated in data, we will see more companies entering the data-driven realm.

DATA MINIING Unit 1 Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DATA MINIING Unit 1 Notes

Uploaded by

Copyright:

Available Formats

You might also like