Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Bunker Hill Community College

Computer Technology Department

CIT-137-M1
Introduction to Big Data
With R and R Studio

Professor Michael Harris


Email Address: mdharris@bhcc.mass.edu
Office: (617) 228 2486
Cell (text): (617) 480-3003
COMMONWEALTH OF MASSACHUSETTS
BUNKER HILL COMMUNITY COLLEGE
CHARLESTOWN, MASSACHUSETTS

COMPUTER INFORMATION TECHNOLOGY DEPARTMENT

Introduction to Big Data with R and R Studio


COURSE OUTLINE & REQUIREMENTS

COURSE DESCRIPTION: This course provides foundation level training for students who want to learn the
programming language R. The course provides grounding in basic and moderate analytic methods along with
an introduction to the field of data science and some of the tools used. Students will learn the language
through labs which offer opportunities to understand how the language is used to real world business
challenges. The students will learn some of the more popular libraries in R (dplyr, ggplot, lubridate, and tidyr)
and will be able to prepare data for future analysis. The course takes an "Open", or technology-neutral
approach, and includes a final lab which addresses a data science challenge by applying the concepts taught in
the course with an open source database. Prerequisite: Information Technology Problem Solving (CIT113) or
equivalent (CIT110, CIT120, CIT182 or department chair approval).

PREREQUISITES FOR THIS COURSE:

CIT-113 or Equivalent (110, 120, 182) or dept. char signature

2
COURSE LEARNING OBJECTIVES: By the end of this course students will be able to:

 Utilize the R programming language to write functions, loops, examine and explore data and utilize
libraries for added functionality for data analysis such as: dplyr, ggplot2, lubridate, and tidyr.

 Utilize basic statistical parameters and show how data can be used and analyzed and modeled from
various distributions.

 Demonstrate how to turn unstructured data (messy data) into structured data (tidy data).

 Demonstrate how to link R with databases to extract data.

 Demonstrate how to search for online databases, find open data sources on the internet.

 Utilize resiliency skills, improve communication, and learn to overcome obstacles in a rapidly changing
environment while working on a complex, multistage project.

 Present research findings to a group of people in an online environment.

 Show how to web scrape data, clean it, and present the data to a user in a readable, often visual,
format which utilizes tools and techniques learned throughout the course.

3
INSTRUCTOR: The instructor for this course is: Professor Michael Harris Office: D123E
Email address: mdharris@bhcc.mass.edu
Telephone: Office:(617)228-2486 cell:(617) 480-3003
Office Hours: M 1;2:15, W 2:30-3:45, Th 1:00-2:15

REQUIRED COURSE MATERIAL:


1. R and R Studio Software
2. Data Camp Online course content MOOC
3. https://sites.google.com/site/cit137fa19 Course website

SUPPLEMENTAL COURSE MATERIAL:


1. R for Data Science https://r4ds.had.co.nz/
2. An Introduction To Data Science Textbook (note Creative Commons book, no publisher)
3. OpenIntro Statistics 3rd edition, Textbook (note Creative Commons book, no publisher)
4. OpenIntro Labs for R Labs for Open Intro to Statistics

BLOGS AND OTHER DATA SCIENCE RESOURCES


1. http://flowingdata.com/
2. http://fivethirtyeight.com/
3. http://www.kdnuggets.com/
4. https://www.kaggle.com/

STUDENT REQUIREMENTS: To complete this course, receive a final grade and full credit each student must:

1. Complete assigned homework and be active in all discussions


2. Complete all homework assignments
3. Complete all required Lab Projects
4. Complete a final project and report

STUDENT EVALUATION: A letter grade will be awarded at the completion of the course according to the following
weighted average:

The point to Letter Grade equivalency is as follows:

940 - 1,000 Points A


939 - 900 “ A-
899 - 870 “ B+
869 - 830 “ B
829 - 800 “ B-
799 - 770 “ C+
769 - 700 “ C
699 - 600 “ D
Less Than 600 F

4
COURSE ASSIGNMENT GRID

Programming
Wk Topic Datacamp Work Discussion
Assignment

Course Introduction Introductions


1
Why are you
2 Introduction to RStudio Introduction to R Datacamp
taking the class?

Facebook Data
3 Subsetting Importing Data Datacamp
Policies

Google data
4 Loops Intermediate R:
policies

Dplyr
5 dplyr Apple data policies
Datacamp

Function Ch1-3 What can FAANG


6 Functions
Datacamp do with your data?

Ggplot2 pt1 How data sways


7 Ggplot2
Datacamp your opinion

Cambridge Tourism
8 Ggplot2 cont
Analytica Project

Ethical Concerns
9 Intermediate Data Wrangling Data Cleaning Datacamp
part 1

Ethical Concerns
10 Tidy Data Datacamp Tidy Data
part 2

Ethical Concerns Web


11 Web Scraping
part 3 Scraping

Intro to Machine Learning


12 Machine Learning FERPA
Datacamp

13 Tableau Introduction Tableau Hello World Project HIPPA

What is a good
14 Tableau Project work
data policy?

What worked and Final


15 Final Tableau Project
what didn’t Project

5
GRADING INFORMATION AND CRITERIA:

Assignment Project Frequency of Points for Each Percentage of Total


Assignment Assignment Grade
Data Camp Assignments 10 20 40
Programming Assignments 2 50 25
Discussions 15 7 20
Final Project 1 100 20

Datacamp ASSIGNMENTS: Lab assignments are to be completed on the Datacamp.com website. Students
will receive an e-mail from Datacamp at the beginning of the semester and will sign up for the classroom on
Datacamp via the e-mail. Students can also use the BHCC computer lab.

 The College’s Computer Lab is open five (5) days per week during the summer, and their fall/spring
schedule is as follows:
o Charlestown Campus, Room D111
Fall and Spring Semesters Hours:
Monday - Thurs, 7am to 9:45 pm
Friday, 8am to 9:45pm
Saturday – Sunday, 9:00 – 3:45

 The library has computers and their schedule is as follows:


o Monday - Friday: 8 a.m. - 8 p.m.

PROGRAMMING ASSIGNMENTS: Students will receive 2 different programming assignments during the
semester. The programming assignments test a depth of knowledge on the software and are comprehensive
in design. The students will submit their code to moodle as a .R or a .Rmd file by the required due date. Plan
on spending at least 10 hours per programming assignment because they take on average 10-15 hours to
complete.

DISCUSSIONS: There will be weekly discussions on various topics throughout the semester. Students are
expected to post an initial answer to the discussion topic by Tuesday at 11:59pm is the discussion forum, and
they are expected to respond to at least 2 of their fellow classmates as well. The follow up discussions should
either:
Extending or adding to his or her point(s).
Asking a clarifying question.
Disagreeing (with reasoning and evidence, if possible).
Otherwise adding to understanding of the topic.

FINAL PROJECT: This course does not have a final examination, instead there will be a final project of the
student’s choosing. The final project will consist of a data project where the student will examine, clean,
explore and run a prediction algorithm on the data. The final project will be graded according to a rubric
which will be handed out when the discussion on the project starts. This occurs during the middle of the
semester (week 8 or 9).

6
DISCUSSION POLICY: Students must be active in all class discussion sessions. The Student Services Office
(617.228.2000) should be notified if a student would be absent for an extended period of time. See the Student
Handbook for more details.

TEACHING METHODOLOGY: This is an online class which will be taught through a problem based
learning methodology. This means your grade will be determined not by exams, but by how well you do on
your homework, discussions and the class projects.

ATTENDEES: Only registered students are allowed in the classroom and the door must be kept closed during
class time. If the student wishes to leave the room during class time, he/she will close the door behind them and
will be let back in upon their arrival.

STUDENT CODE OF BEHAVIOR: Students found guilty of violating the code of ethics will be subject to
the rule listed by BHCC policy. Below is a statement from BHCC catalog:

“If it is proven that a student in any course in which he or she is enrolled has knowingly cheated or
plagiarized, this may result in a failing grade for an exam or assignment, withdrawal from the course or a
failing grade for the course. The student would also be subject to disciplinary proceedings as outlined in
the Student Handbook for violation of the Student Code of Conduct.”

POLICY FOR INDIVIDUALS WITH DISABILITIES: BHCC is committed to providing equal access to the
educational experience of all students in compliance with Section 504 of the Rehabilitation Act of 1973 and the
Americans with Disabilities Act of 1990. A student with a documented disability, who has not already done so,
should schedule an appointment at the Office for Students with Disabilities (Room D106) in order to obtain
appropriate services.

7
Please Note: The above schedule is subject to change.

This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. To
view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to Creative
Commons, PO Box 1866, Mountain View, CA 94042, USA.

You might also like