ML 01 Course Intro

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

CSc 59929

Introduction
to
Machine Learning

Spring 2020

Erik K. Grimmelmann, Ph.D.


The City College of New York

The City College of New York


CSc 59929 – Introduction to Machine Learning
Spring 2020 – Erik K. Grimmelmann, Ph.D.
My contact information

egrimmelmann@ccny.cuny.edu

The City College of New York


CSc 59929 – Introduction to Machine Learning 2
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Prerequisites

• Algorithms
• Programming (Python)
• Data structures
• Linear algebra
• Probability and statistics

The City College of New York


CSc 59929 – Introduction to Machine Learning 3
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Prerequisites

• Algorithms
• Programming (Python)
• Data structures
• Linear algebra
• Probability and statistics
• A desire to learn one of the most in-demand
areas of computer science (and, IMHO, the most
interesting)

The City College of New York


CSc 59929 – Introduction to Machine Learning 4
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Textbook

The City College of New York


CSc 59929 – Introduction to Machine Learning 5
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Textbook

• Raschka, Sebastian, and Vahid Mirjalili. Python machine


learning: machine learning and deep learning with Python,
scikit-Learn, and TensorFlow. Packt, 2017.
• Note that this is the second edition of this book; the first
edition had only Raschka as the author and a slightly
different title.
• Available in print and various eBook editions from
Pactpub.com, Amazon, and others. The electronic versions
are in color; the print version is in black and white.
• GitHub link https://github.com/rasbt/python-machine-
learning-book-2nd-edition

The City College of New York


CSc 59929 – Introduction to Machine Learning 6
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Machine learning environments

• Python
• Scikit Learn
• TensorFlow
•R
• Proprietary

The City College of New York


CSc 59929 – Introduction to Machine Learning 7
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Leading ML platforms

KD Nuggets, 2019-06
The City College of New York
CSc 59929 – Introduction to Machine Learning 8
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Leading deep learning platforms

KD Nuggets, 2019-06
The City College of New York
CSc 59929 – Introduction to Machine Learning 9
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Where the jobs are

KD Nuggets, 2019-06
The City College of New York
CSc 59929 – Introduction to Machine Learning 10
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Where the jobs aren’t

KD Nuggets, 2019-06
The City College of New York
CSc 59929 – Introduction to Machine Learning 11
Spring 2020 – Erik K. Grimmelmann, Ph.D.
How the jobs openings are changing

KD Nuggets, 2019-06
The City College of New York
CSc 59929 – Introduction to Machine Learning 12
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Course programming environment

• Pythons 3.x
• Anaconda (strongly recommended)
• Choose your favorite environment (Mac, Linux,
Windows, …)
• Provided at no charge by Continuum.io
• Jupyter Notebook
• JupyterLab is close to 1.0

The City College of New York


CSc 59929 – Introduction to Machine Learning 13
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Analytics & business intelligence platforms

The City College of New York


CSc 59929 – Introduction to Machine Learning 14
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Data science and machine learning platforms

The City College of New York


CSc 59929 – Introduction to Machine Learning 15
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Garner Magic Quadrant – Anaconda
Anaconda is based in Austin, Texas, U.S. It offers Anaconda Enterprise 5.2, a
data science development environment based on the interactive notebook
concept (this analysis excludes the Conda Distribution Packages) that sees
users exploiting open-source Python and R-based packages. Anaconda
continues to provide a loosely coupled distribution environment, which offers
access to a wide range of open-source development environments and open-
source libraries, primarily Python-based. Anaconda benefits from the
growing popularity of Python, the newly preeminent language for data
scientists.
Anaconda remains a Niche Player. It still suffers from a disparity between its
power to federate a very large number of Python developers, who are
continuously building additional capabilities, and its lack of control over
these developers’ efforts in terms of quality, dependability and predictability.
Anaconda is well-suited to seasoned data scientists who are fluent in Python
or R and eager to explore a continuous stream of capabilities in Anaconda
Cloud, while still benefiting from an environment more structured than a pure
notebook environment. The City College of New York
CSc 59929 – Introduction to Machine Learning 16
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Garner Magic Quadrant – Anaconda Strengths
Python and open-source support: The dominance of Python among data
scientists gives Anaconda great visibility to developers. Anaconda is the only
data science vendor not just supporting but also indemnifying and securing
the Python open-source community. In the past year, the company has
revamped its user interface by providing enhanced collaboration and model
reproducibility features, giving data scientists better productivity and model
management capabilities.
Active ecosystem: Reference customers praised Anaconda’s extensive and
active community engagement. The community fosters cutting-edge Python
code libraries and integration with other open-source data science projects.
Anaconda Cloud also provides wide means of collaboration and code library
exchanges, for data scientists and developers to explore and accelerate model
development production, whether in the cloud or on-premises.
Scalable development for open-source libraries: Anaconda’s scalability
takes two main forms: capabilities relating to automatic GPU code production
and the ability to embed its platform seamlessly within any of the large cloud
providers.
The City College of New York
CSc 59929 – Introduction to Machine Learning 17
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Garner Magic Quadrant – Anaconda Cautions
Designed for experts: Anaconda targets experienced data scientists familiar
with Python and notebooks. Many data scientists’ favorites, including the
widely popular Jupyter notebooks, are readily able for use through
Anaconda’s environment. But, however flexible they are, those environments
are not conducive to fruitful discussions with business users — a capability to
support such exchanges is increasingly valued by large organizations lacking
data science talent.

Open-source shortcomings: Like many open-source promoters, Anaconda


suffers from the usual drawbacks associated with large and flexible developer
communities: backward compatibility issues between versions; lack of
visibility into important upcoming capabilities (model operationalization, for
example); lack of code optimization for models’ integration with existing
applications; and, despite marked progress in terms of workbench
homogeneity, a lack of overall coherence.

The City College of New York


CSc 59929 – Introduction to Machine Learning 18
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Garner Magic Quadrant – Anaconda Cautions
Automation and augmentation: Novice Anaconda users will have difficulty
finding their way through the Python “jungle.” Citizen data scientists will
find themselves in uncharted territory within Anaconda’s environment. Also,
the do-it-yourself skills and attitude exhibited by typical Anaconda users are
not suited to ML automation practices (such as AutoML’s automation of only
part of the model development process), which are increasingly popular with
data scientists.

The City College of New York


CSc 59929 – Introduction to Machine Learning 19
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Course environment

• CUNY Blackboard
• Course materials
• Schedule
• Lecture slides
• Code used in class
• Your submissions
• Errors, corrections, and improvements

The City College of New York


CSc 59929 – Introduction to Machine Learning 20
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Course components

• Classroom
• Lectures
• Code demonstrations & reviews
• Discussions
• Guest appearances
• Assignments
• Problems
• Programming
• Project

The City College of New York


CSc 59929 – Introduction to Machine Learning 21
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Bug Bounties

• Please report errors (including typos) in the posted


classroom presentations.
• Post your error report on the “Bug Reports” discussion board.
• You may receive an extra point toward your final grade
• If I catch the error before it’s reported, no point will be awarded.
• Only one point per posted presentation; if more than one error is
posted, only the person who posted the most significant error will
receive the point. In event of ties, the first one to report the error
gets the point.
• This bounty offer may be withdrawn before the end of the
semester.

The City College of New York


CSc 59929 – Introduction to Machine Learning 22
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Course style

• Lecture
• Discussions of my and your experiences

The City College of New York


CSc 59929 – Introduction to Machine Learning 23
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Course schedule

• Tuesdays & Thursdays


• Section M 11:00 am to 12:15 pm, NAC 7/312
• Section P 2:00 pm to 3:15 pm, NAC 4/220-C
• Office Hours 12:45 pm to 1:45 pm, NAC 8/202-L
on days that we have class except as announced

The City College of New York


CSc 59929 – Introduction to Machine Learning 24
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Key dates

Tue Jan 28 Class First class


Tue Mar 31 No Class Data Science Day at Columbia University
Apr 7-16 No Class CUNY Spring Break
Tue May 5
Thu May 7
Class Project Presentations
Tue May 12
Thu May 14
Tue May 19 No Class Project Presentation Charts Due
Thu May 21 No class Written Projects Due

The City College of New York


CSc 59929 – Introduction to Machine Learning 25
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Please arrive on time

M
P

The City College of New York


CSc 59929 – Introduction to Machine Learning 26
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Please alert me when we’re out of time

M
P

The City College of New York


CSc 59929 – Introduction to Machine Learning 27
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Please turn off your phone

The City College of New York


CSc 59929 – Introduction to Machine Learning 28
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Grading

Component Contribution
Attendance
Punctuality
Class Participation 30%
No Cell Phones
Bug Bounties (extra credit)
Assignments 40%
Final Project 30%

The City College of New York


CSc 59929 – Introduction to Machine Learning 29
Spring 2020 – Erik K. Grimmelmann, Ph.D.
My career so far

• Phase I Science
• Phase II Big-companies
• Phase III Startup tech companies
• Phase IV Interruption
• Phase V Tech-related non-profits
• Phase VI Academics

The City College of New York


CSc 59929 – Introduction to Machine Learning 30
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Phase I – Science (12 years)
• Started programming in earnest over 50 years ago at age 16
• B.A., M.S., & Ph.D. in chemistry (10 years)
• Chose chemistry over computer science for grad school for no good
reason
• Post-Doc at Bell Labs (2 years)
• Helped invent the field of computational chemistry
• Most cited research paper is on numerical methods for simulating infrequent events

The City College of New York


CSc 59929 – Introduction to Machine Learning 31
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Phase II – Big company technology
• Engineer & engineering manager at Bell Labs (10 years)
• Divestiture planning
• Supervisor of Nuclear Weapons Effects
• Supervisor of Robust Network Planning
• Head of Government Communications Systems Department
• Defense Nuclear Agency (DNA)
• Federal Aviation Agency (FAA)
• National Security Agency (NSA)
• Another three-letter-agency
• Business side of AT&T (8 years)
• Software product management
• Internet strategist
• Managed Dun & Bradstreet’s global infrastructure (2 years)

The City College of New York


CSc 59929 – Introduction to Machine Learning 32
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Phase III – Startup tech companies
• Chief Technology Officer (11 years)
• The Redtop Company
• Electronic medical records for psychiatry
• Declared bankruptcy
• Cometa Networks
• Wholesale Wi-Fi
• Founded by IBM, AT&T, and Intel
• Failed, but shut down gracefully
• Send Word Now
• Critical communications and alerting
• Sold for over $200,000,000 in 2017

The City College of New York


CSc 59929 – Introduction to Machine Learning 33
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Phase IV – Interruption (1 year)

The City College of New York


CSc 59929 – Introduction to Machine Learning 34
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Phase V – Tech-related non-profits
• CEO & President of tech-related non-profits (7 years)
• New York Technology Council (NYTECH)
• NY Tech Alliance (NYTA)
• Produces NY Tech Meetup (NYTM)

The City College of New York


CSc 59929 – Introduction to Machine Learning 35
Spring 2020 – Erik K. Grimmelmann, Ph.D.
Phase VI – Academics
• CUNY
• School of Professional Studies (SPS)
• Adjunct (5 years)
• M.S. in Data Science & B.S. in Information Systems
• Faculty advisor
• The City College of New York (CCNY)
• Adjunct (2+ years)
• Machine Learning & Scientific Computing
• Full-time regular faculty (1+ years)
• Machine Learning & Scientific Computing
• NYU Tandon (1 year)
• Developed online graduate course in Machine Learning

The City College of New York


CSc 59929 – Introduction to Machine Learning 36
Spring 2020 – Erik K. Grimmelmann, Ph.D.
• CUNY 2X Tech is a new initiative to double by 2022 the
number of CUNY students graduating annually with a tech-
related bachelor’s degree prepared to launch careers in the
NYC tech ecosystem.
• Designed in partnership with NYC Tech Talent Pipeline,
industry, and academic leaders, this five-year, multi-million-
dollar initiative brings together CUNY senior colleges and
NYC tech employers to better align tech education with
industry needs and expand access to quality tech careers.

The City College of New York


CSc 59929 – Introduction to Machine Learning 37
Spring 2020 – Erik K. Grimmelmann, Ph.D.

You might also like