Download as pdf
Download as pdf
You are on page 1of 10
CONTENT . INTRODUCTION TO DATA SCIENCE . AI PROJECT CYCLE FRAMEWORE AND DATA SCIENCE . BASIC STATISTIC DATAEXPLORATION AND VISULISIATION . CLASSIFICATION MODEL INTRODUCTION TO DATA SCIENCE The term “data science” was coined in 2001, attempting to describe a new field. Some argue that it's nothing more than the natural evolution of statistics, and shouldn't be called a new field at all. But others argue that it's more interdisciplinary. For example, in The Data Science Design Manual (2017), Steven Skiena says the following. | think of data science as lying at the intersection of computer science, statistics, and substantive application domains. From computer science comes machine learning and high-performance computing technologies for dealing with scale. From statistics comes a long tradition of exploratory data analysis, significance testing, and visualization. From application domains in business and the sciences comes challenges worthy of battle, and evaluation standards to assess when they have been adequately conquered. This echoes a famous blog post by Drew Conway in 2013, called in which he drew the following diagram to indicate the various fields that come together to form what we call “data science.” "Regardless of whether data science is just a part of statistics, and regardless of the domain to which we're applying data science, the goal is the same: to turn data into actionable value. The professional society INFORMS defines the related field of analytics as “the scientific process of transforming data into insight for making better Ags Substantive Expertise 1.2. What do data scientists do?4 Turning data into actionable value usually involves answering questions using data. Here’s a typical workflow for how that plays out in practice. 1. Obtain data that you hope will help answer the question. 2. Explore the data to understand it. 3. Clean and prepare the data for analysis. 4. Perform analysis, model building, testing, etc. (The analysis is the step most people think of as data science, but it's just one step! Notice how much more there is that surrounds it.) 5. Draw conclusions from your work. 6. Report those conclusions to the relevant stakeholders. Our course focuses on all the steps except for the analysis. You've learned some introductory statistical analysis in one of the course prerequisites (GB213), and we will leverage that. (Later in our course we will review simple linear regression and hypothesis testing.) If you have taken other relevant courses in statistics, mathematical modeling, econometrics, etc., and want to bring that knowledge in to use in this course, great, but it’s not a requirement. Other advanced statistics and modeling courses you take later will essentially plug into step 4 in this data science workflow. 1.3. What's in our course? Our course covers the following four foundational aspects of data science. « Mathematics: We will cover foundational mathematical concepts, such as functions, relations, assumptions, conclusions, and abstraction, so that we can use these concepts to define and understand many aspects of data manipulation. We will also make use of statistics from GB213 (and optionally other statistics courses you may have taken) in course projects, and we will briefly review that statistical material as well. We will also see small previews of other mathematics and statistics courses and their connections to data science, including graphs for social network analysis, matrices for finding themes in relations, and supervised machine learning. Technology: We will extend your Python knowledge from the CS230 prerequisite with more advanced table manipulation functions, extended practice with data cleaning and manipulation tasks, computational notebooks (such as Jupyter), and GitHub for version control and project publishing. Visualization: We will learn new types of plots for a wide variety of data types and what you intend to communicate about them. We will also study the general principles that govern when and how to use visualizations and will learn how to build and publish interactive online visualizations (dashboards). Communication: We will study how to write comments in code, documentation for code, motivations in computational notebooks, interpretation of results in computational notebooks, and technical reports about the results of analyses. We will prioritize clarity, brevity, and knowing the target audience. Many of these same principles will arise when creating presentations or videos as well. Each of these modes of communication is required at some point in our course 1.4. Will this course make me a data scientist? This course is an introduction to data science. Learning more math, stats, and technology will make you more qualified than just this one course can. (Bentley University has both a, if you’re curious which courses are relevant.) But there are two focuses of our course that will make a big difference: Al PROJECT CYCLE FRAMEWORE veseeeAND DATA SCIENCE .......... In the rapidly evolving world of artificial intelligence (Al), project management can be as complex as the technology itself. A staggering number of Al projects fail, not due to a lack of technical prowess, but because of ineffective project management. Implementing a well-defined Al project life cycle can significantly improve the success rate of these endeavors, transforming raw data and innovative ideas into practical, efficient solutions. As shown below, the 6 key phases of the Al life cycle are (1) Problem Definition, (2) Data Acquisition and Preparation; (3) Model Development; (4) Model Evaluation and Refinement; (5) Deployment; and (6) MLOps. Problem Piel) pete) ML Ops Acquisition & ere loa DNC Des macad Perec Scand Model Pues ila Understanding the Al Life Cycle Conceptually, one can think of an Al project life cycle as the sequential progression of tasks and decisions that drive the development and deployment of Al solutions. Problem Definition This is where the journey begins. It involves defining the problem to be solved or the opportunity to be explored using Al. It’s a crucial stage that sets the direction for the entire project. Having a clear, well-defined problem helps guide data collection, model development, and ultimately, the successful implementation of the solution. This is where the role of an Al product manager ean be useful Data Acquisition and Preparation After identifying the problem, the next step is to collect and prepare data. Al and machine learning algorithms need data to learn, so this stage involves gathering relevant data and preparing it for use. This preparation may involve cleaning the data, dealing with missing values, or transforming the data into a format suitable for the chosen Al models. While the least glamorous, this can be the most time-consuming phase of the Al life cycle, Model Development and Training This phase involves developing the Al model that will solve the defined problem and training it with the prepared data. This stage is iterative, often involving multiple rounds of model development and refinement based on the model's performance during training. Model Evaluation and Refinement Once the model has been trained, it must be evaluated to see how well it performs. This involves testing the model on unseen data and analyzing its predictions. If the model's performance is not satisfactory, its refined and tweaked. This could mean adjusting the mode''s parameters, changing the mode's architecture, or even returning to the data acquisition phase to gather additional data. Deployment Once the model is performing satisfactorily, it is deployed to a production environment where it can start solving real-world problems. Deployment might involve integrating the model with existing systems, creating an application or service that uses the model, or leveraging the insights via on offline context such as a report to management. Machine Learning Operations Most of the time, after deployment, the model will need to be maintained and updated. In this machine learning operations phase, the team monitors the model's performance to ensure it's still working as expected, updating the model with new data, or refining the model based on feedback from its users. Furthermore, teams often need to go back to a previous phase (ex. Going from model evaluation back to model development). This is to be expected and should be considered a normal part of the Al life cycle (and not an issue with the Al development team). Importance of Each Stage in the Al Project Life Cycle Each stage in the Al project life cycle serves a vital role. The problem definition phase establishes the project's direction. The data acquisition and preparation phase creates the foundation for the Al solution, The model development and training phase turns this foundation into a functional tool. Then, the model evaluation and refinement phase ensures that the tool/model meets the expected standards. Finally, deployment brings the Al solution to its intended users, and maintenance keeps it running smoothly over time. In addition, Al projects often need to adapt to changes quickly, whether these are changes in the project's requirements, unexpected issues with the data, or new developments in Al technology. Building this adaptability into the project life cycle can be difficult but is crucial for long-term project success. This is where the use of an agile framework can help. Benefits of Implementing a Robust Al Project Life Cycle Employing a structured Al project life cycle has numerous benefits: © Increased Success Rate: A robust project life cycle helps ensure that each necessary step in the development of an Al solution is followed, greatly increasing the likelihood of project success. © Risk Reduction: By flagging potential issues early in the process, a well-structured project life cycle helps to mitigate risks. For example, during the problem definition phase, if the problem isn't defined clearly, the project may lose direction. Identifying this risk early on allows teams to refocus and avoid costly, time-consuming revisions later on. © Improved Efficiency and Productivity: A structured project life cycle streamlines, the workflow, ensuring that everyone on the team understands their roles and responsibilities at each stage. This clarity can significantly improve efficiency and productivity, reducing the time to deployment. © Enhanced Quality of Al Solution: By enforcing thoroughness and rigor at each stage, a well-defined project life cycle enhances the quality of the final Al solution. Rigorous evaluation and refinement ensure the Al solution performs as expected, while regular maintenance and updates keep it running smoothly over time. © Enhanced Resource Allocation: Al projects require significant resources, including time, human expertise, and computational power. Identifying and balancing these resources across each phase of the life cycle can be challenging, but being explicit about resource allocation across the life cycle can help the team appropriately resource the project. In short, a well-defined Al life cycle can help teams plan their Al projects more effectively, maximizing their chances of success while minimizing potential hurdles. An Example Al Project Life Cycle Let’s explore the simple example of using the Al project life cycle in the development of Abbased recommendation system (specifically Amazon's system recommending what to purchase): 1. Problem Definition: The primary problem is clearly defined - to improve the accuracy of product recommendations and thereby enhance the shopping experience for users while driving increased sales 2. Data Acquisition and Preparation: Amazon collects vast amounts of user data, including browsing history, purchase history, and ratings. These data points can be identified as the critical information needed and collected / prepared for the model development phase 3. Model Development and Training: Amazon uses many machine learning models, such as collaborative filtering, to create their recommendation system. Their models are trained with the prepared data to predict a customer's interests based on similarities with other customers. 4, Model Evaluation and Refinement: The model is tested extensively, and its predictions are compared to actual customer behavior to evaluate its accuracy. Based on these tests, the model is continually refined and improved to increase the precision of its recommendations. 5. Deployment: Once the recommendation model meets the performance benchmark, it is deployed on Amazon's platform. The model now operates in realtime, suggesting products to users based on their browsing and purchasing behavior. 6. Machine Learning Operations: Post-deployment, the model is continually monitored and updated. As user behavior and preferences evolve over time, the model is retrained and updated to ensure its recommendations remain relevant and accurate. This is an example of the Al project life cycle in action, showcasing how each stage plays an important role in delivering a successful Al solution. This Al life cycle works equally well for building/refining generative Al models. Iterating through the Al Life Cycle Itis important to note that the Al life cycle should be thought of as an iterative process that incrementally delivers a better solution. In other words, each of the life cycle phases is typically revisited many times throughout an Al project. In the context of an Al project life cycle, an MVP (Minimal Viable Product) is a simplified version of the Al solution that is developed as quickly as possible to validate the underlying concept. It includes just enough features to be usable by early customers who can provide feedback for future development. For example, in the model development and training phase, rather than training the Al model on the entire data set, an MVP might be trained on a subset of the data to speed up the development process. This allows the team to quickly validate whether their approach is viable before investing more resources. By leveraging an MVP and gathering user feedback, teams can identify any issues or areas for improvement early in the development process, making it easier to make changes and enhancements before the full solution is rolled out. Key Take-aways Using a well-defined Al project life cycle should not be optional—it should be an integral part of successful Al development. Embracing a life cycle approach can significantly improve the efficiency, productivity, and overall success of Al projects, making it an essential consideration for any team venturing into the world of Al. For more information on Al project management, explore our post on 6 Concepts to Help Lead an Al Team. Or, if you want more structured training, explore our Al Project Management course, which more deeply explores how to effectively do Al project management and build Al systems. To best support your needs, we have a range of individual and team courses.

You might also like