Professional Documents
Culture Documents
CS378 Cloud Computing Syllabus
CS378 Cloud Computing Syllabus
Fall 2023
Course Description
This course provides a comprehensive exploration of cloud computing models, tools, and tech-
niques. Students will actively engage in developing multiple cloud-native applications, gaining
hands-on experience with popular cloud computing platforms. The course serves as an intro-
ductory guide to cloud computing. The class will primarily concentrate on cluster computing
software tools and programming techniques essential for data engineers. On the tools side, par-
ticipants will learn about fundamental systems and techniques for storing large volumes of data.
Additionally, the course will cover modern cluster computing systems based on Map-Reduce
patterns, such as Hadoop MapReduce, Apache Hadoop, and Spark. Students will gain a solid
understanding of both the practical tools and the underlying concepts of cloud computing area.
Prerequisite
Prerequisite: Upper-division standing; additional prerequisites vary with the topic.
Students should be familiar with Java and Python programming.
Recommended Textbook
There is no single textbook that covers all topics of this course. We will provide comprehensive
reading materials including lecture notes, slides and programming code examples.
Here you can find a list of important books that you can use as main references.
• Cloud Computing for Science and Engineering. By Ian Foster and Dennis B. Gannon. The
MIT Press. 2017. https://cloud4scieng.org/
• Cloud Computing for Machine Learning and Cognitive Applications. By Hwang, Kai. The
MIT Press. 2017.
1
329E ELEMENTS OF DATA ANALYTICS
• Distributed and Cloud Computing From Parallel Processing to the Internet of Things. By
Kai Hwang, Geoffrey C. Fox and Jack J. Dongarra. 2012 Elsevier.
• Spark: The Definitive Guide: Big Data Processing Made Simple. Bill Chambers, Matei
Zaharia, O’Reilly Media; 1st edition (March 20, 2018)
• Advanced Analytics with Spark. By Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills.
O’Reilly Media. 2015, 978-1-491-91276-8.
• High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark. Holden
Karau, Rachel Warren, O’Reilly Media; 1st edition, 2017, 9781491943205.
• MapReduce Design Patterns. By Donald Miner and Adam Shook. O’Reilly Media. 2013.
• UNIX and Linux System Administration Handbook. Evi Nemeth, Garth Snyder, Trent
Hein, Ben Whaley, Dan Mackin. Addison-Wesley Professional; 5th edition (August 8, 2017)
Learning Objectives
By successfully completing this course you will be able to:
• Run a Big Data Processing pipeline on Amazon AWS and Google Cloud
Class Participation
Our class will be in-person with a live streaming on Zoom. However, there is a live Zoom link
for anybody that is uncomfortable with an in-person setting. For all Zoom links, see the Zoom
tab in Canvas.
Either in-person or zoom attendance is required. You may miss up to 3 classes without any
excuse. If you miss multiple sessions of the class we will reduce up to 10% of your grade.
• Please join us over zoom if you have any health concerns, or may be sick.
• We will record the zoom sessions and publish them 10 days later on Canvas for your review.
• Office hours will be exclusively through Zoom. Please make an appointment over email
before you show up.
2/12
329E ELEMENTS OF DATA ANALYTICS
Course Topics
You can find the tentative course schedule on Canvas.
• Key-Value basic concepts, MapReduce, Apache Hadoop programming (Java), Apache Hadoop
with AWS Elastic Map Reduce (EMR)
• File storage in the cloud, Hadoop HDFS, Hadoop commands, Amazon S3, Amazon In-
stance Store and Amazon EBS.
• Spark, Spark programming (Python), pyspark, RDD, Spark operators, Java and Scala pro-
gramming in Spark.
• Databases in the cloud, SQL for data scientists. Declarative and imperative SQL.
All lectures will be recorded on Zoom Canvas and published 10 days later.
Code Examples
You can find the code examples of this course on the following GitHub repository.
https://github.com/kiat/Cloud-Computing
Assignments
There will be 10 programming assignments,each equally weighted to total 40% of your
final grade.
3/12
329E ELEMENTS OF DATA ANALYTICS
• The assignments will require a substantial time commitment over several days (an
average of 10 to 12 hours per week should be expected). Be sure to budget sufficient
time to complete assignments before the deadline. Turn in your assignments on
time. This permits grading to start promptly after the submission deadline so that
assignments maybe returned promptly. If you do not finish an assignment by the
deadline you have a maximum of two days to turn your assignment in. However,
there is a penalty of 10 points (out of a 100 points) per day. Your assignment is one
day late until the midnight of the day after it is due, two days late from then until
midnight of the second day. We will accept your assignment after two days if there
is a compelling reason.
You can submit your assignments 3 times late (up to 2 days after original due
date) without any penalties.
• You can work in groups of up to 4 students.
Tests
Students have to do two tests. Tests (1 and 2).
Grading Structure
• The 10 Programing Assignments are focused on applying theory learned in the
class. Weekly course assignments include both theoretical analysis and practical
algorithmic implementation in python. The assignments for 40% of your course
grade.
• Class Activities are 5% of your grade
4/12
329E ELEMENTS OF DATA ANALYTICS
A > 94%
A- 90%
B+ 87%
B 84%
10 Programming Assignments 45%
B- 80%
Class Programming Activities 5%
C+ 77%
Test 1 20%
C 74%
Paper Presentation 10%
C- 70%
Test 2 20%
D+ 67%
Table 1: Percentage Coverage D 64%
D- 60%
F < 60%
• The Test 1 (20%) includes topics learned up to the mid-time of this class. It will be
proctored examination in-person in classroom consisting of implementation ques-
tions and may include additional multiple choice questions.
• Research Paper Presentation Project. Your term project is 15% of your grade.
• The Test 2 (20%) will be comprehensive and will cover material from the entire
course. It will be an open-book proctored exam in-person in classroom consisting
of implementation questions and may include additional multiple choice questions.
The exam 2 will make 20% of your grade.
We will use the +/- grades according to the grade breaks table.
Course Policies
Usage of Cloud Machines
In this class, we use real-world cloud systems existing on Amazon AWS and Google
Cloud. You will receive educational credit coupons or credited access to such cloud
systems. You should never use your private account or use your credit card for this
class assignment. You will receive enough education credits so you can run successful
assignments on Google Cloud or Amazon AWS.
The credit amount is 100 USD for Google Cloud. You should use only this amount
to finish your assignments. This would be more than enough to finish the assignments,
learn how AWS or Google Cloud work, and have your first enjoyable experience with
it. You can choose different numbers of Machines, and different configurations of those
machines. And each will cost you differently!
Since this is real money, it makes sense to develop your code and run your jobs locally,
on your laptop, using the small data set (we will provide two types of the same data set,
small and big). Once things are working, you’ll then move to Amazon AWS or Google
5/12
329E ELEMENTS OF DATA ANALYTICS
Cloud. We will ask you to run your Spark jobs over the “real” data using a set of cluster
machines.
As you can see on the Google Cloud Price list, a medium size machine costs around
50 cents per hour. That is not much, but IT WILL ADD UP QUICKLY IF YOU FOR-
GET TO SHUT OFF YOUR MACHINES.
Be very careful and stop your machine as soon as you are done working!
You can always come back and start your machine or create a new one easily when
you begin your work again. Another thing to be aware of is that Amazon charges you
when you move data around. To avoid such charges, do everything in the “N. Virginia”
region on AWS or US Central on Google Cloud. That’s where data is, and that’s where
you should put your data and machines.
Assignment Identification
All assignments must be submitted with the proper header, containing your name (as
registered), your unique section number, and the assignment number at the top of the
assignment. The format for the header will be specified in the assignment. That specifi-
cation will over ride any other header specification.
• You are allowed to use AI for review, study, or research purposes throughout the
course.
6/12
329E ELEMENTS OF DATA ANALYTICS
• You are allowed to interact with AI while answering homework questions, although
the final answer must be developed by you and documented in your own words.
• AI is not allowed during In-Class Programming Activities, any questions you may
have can be asked of me or the TAs.
• For projects you are allowed to interact with AI as much as you like but each piece
of code written by the AI must have a comment about the AI authorship.
• You will be asked in the followup questions about if/how you interacted with AI
for your programming assignment and projects.
Grade Dispute
Scores for assignments will be posted on Canvas. You have one week from the date the
assignment grade is posted to dispute your grade. The student assistants will be grading
the assignments. Visit the TAs and see if you can resolve your differences. If you cannot
resolve your differences, you may visit me to explain the situation. We will not entertain
any grade disputes three days after the grades are returned. You may resubmit your
assignment for regrading after grades are returned. But the maximum that you can get
is 80 points out of 100 points.
Study Groups
Please organize yourselves into study groups of 4 students who will meet once a week
or more to discuss the course. Typically, you will review the lectures, do the reading,
and attempt the homework independently before your weekly meeting with your study
group. Studying for tests together is permitted and encouraged. If you are unsure about
how to work together with your friend in a legal, helpful manner, do come and talk with
us. Remember, it is always ok to "work together" with your professor or TA!
7/12
329E ELEMENTS OF DATA ANALYTICS
Communication
We will be using Piazza or Ed Discussion for general discussion of class related questions
rather than the discussion board on Canvas. Please do not post solutions or code to
any home work assignment problems on Piazza. All communications to the Teaching
Assistants will be through Piazza. If you want to reach out to the Teaching Assistants
then post a private note to them on Piazza. Do NOT send them private e-mails. If you
want to reach me send me mail at (kiat@cs.utexas.edu). If you have assignment related
questions, it is best to visit the TAs during their office hours. If you have content related
questions visit me during my office hours.
During Class
I understand that the electronic recording of notes will be important for class and so
computers will be allowed in class. Please refrain from using computers for anything
but activities related to the class. Phones are prohibited as they are rarely useful for
anything in the course. Eating and drinking are allowed in class but please refrain from
it affecting the course. Try not to eat your lunch in class as the classes are typically active.
8/12
329E ELEMENTS OF DATA ANALYTICS
General Policies
If you are absent from class or examination for the observance of a religious holy day
you may turn in your assignment or take the examination on an alternate date provided
you have given me written notice fourteen days prior to the class absence. For religious
holy days that fall within the first two weeks of class notice must be given on the first
class day.
Students with disabilities who need special accommodations should contact the Ser-
vices for Students with Disabilities (SSD) Office (471-6259 or 471-4641 TTY).
• Copying solutions, code, or programs from someone else or giving someone else
your solutions, code, or programs
• Participation in a discussion group that develops a solution that everyone copies
9/12
329E ELEMENTS OF DATA ANALYTICS
We urge everyone in the class to take appropriate measures for protecting one’s work.
You should protect your files, homework solution sheets, etc. as deemed reasonable.
The only exception that we will make to these guidelines is if you are involved in pair
programming with a friend.
10/12
329E ELEMENTS OF DATA ANALYTICS
• Disposable masks are available for students at the William C. Powers, Jr. Student
Activity Center and Texas Union hospitality desks.
• The exposure action chart (https://www.healthyhorns.utexas.edu/coronavirus_
exposure_action_chart.html ) offers guidance on what to do if you have been
exposed to someone who has COVID-19 or if you test positive. If you experience
symptoms, stay home and isolate and follow the instructions for symptomatic in
the chart.
• Stay up to date on COVID-19 vaccinations (https://www.healthyhorns.utexas.
edu/coronavirus_vaccination.html) by getting all available boosters when eligi-
ble. Vaccines are available through University Health Services.
• Additionally, UHS maintains up to date resources on COVID-19, which can be
found here:
• COVID-19 Information and Resources
https://www.healthyhorns.utexas.edu/coronavirus.html
• Students in this class who hold a license to carry are asked to review the university
policy regarding campus carry here https://www.utexas.edu/campus-carry#ac.
• Individuals who hold a license to carry are eligible to carry a concealed handgun on
campus, including in most outdoor areas, buildings and spaces that are accessible
to the public, and in classrooms.
• It is the responsibility of concealed-carry license holders to carry their handguns on
or about their person at all times while on campus. Open carry is NOT permitted,
meaning that a license holder may not carry a partially or wholly visible handgun
on campus premises or on any university driveway, street, sidewalk or walkway,
parking lot, parking garage, or other parking area.
• Per my right, I prohibit carrying of handguns in my personal office. Note that this
information will also be conveyed to all students verbally during the first week of
class. This written notice is intended to reinforce the verbal notification, and is not
a “legally effective” means of notification in its own right.
TITLE IX DISCLOSURE
Beginning January 1, 2020, Texas Education Code, Section 51.252 (formerly known as
Senate Bill 212) requires all employees of Texas universities, including faculty, report any
information to the Title IX Office here https://titleix.utexas.edu/ regarding sexual
11/12
329E ELEMENTS OF DATA ANALYTICS
harassment, sexual assault, dating violence and stalking that is disclosed to them. Texas
law requires that all employees who witness or receive any information of this type
(including, but not limited to, written forms, applications, one-on-one conversations,
class assignments, class discussions, or third-party reports) must report it to the Title IX
Coordinator. Before talking with me, or with any faculty or staff member about a Title
IX related incident, please remember that I will be required to report this information.
Although graduate teaching and research assistants are not subject to Texas Educa-
tion Code, Section 51.252, they are mandatory reporters under federal Title IX regula-
tions and are required to report a wide range of behaviors we refer to as sexual miscon-
duct, including the types of misconduct covered under Texas Education Code, Section
51.252. Title IX of the Education Amendments of 1972 is a federal civil rights law that
prohibits discrimination on the basis of sex – including pregnancy and parental status
– in educational programs and activities. The Title IX Office has developed supportive
ways and compiled campus resources to support all impacted by a Title IX matter.
If you would like to speak with a Case Manager for Support and Resources, who can
provide support, resources or academic accommodations, in the Title IX Office, please
email supportandresources@austin.utexas.edu. A Case Manager can also provide sup-
port, resources and accommodations for pregnant, nursing, and parenting students.
For more information about reporting options and resources, visit http://www.titleix.
utexas.edu/, contact the Title IX Office via email at titleix@austin.utexas.edu, or call 512-
471-0419.
CAMPUS SAFETY
The following are recommendations regarding emergency evacuation from the Office of
Campus Safety, 512-471-5767,
• Students should sign up for Campus Emergency Text Alerts at the page linked
above.
• Occupants of buildings on The University of Texas at Austin campus must evac-
uate buildings when a fire alarm is activated. Alarm activation or announcement
requires exiting and assembling outside.
• Familiarize yourself with all exit doors of each classroom and building you may
occupy. Remember that the nearest exit door may not be the one you used when
entering the building.
• Students requiring assistance in evacuation shall inform their instructor in writing
during the first week of class.
• In the event of an evacuation, follow the instruction of faculty or class instructors.
Do not re-enter a building unless given instructions by the following: Austin Fire
Department, The University of Texas at Austin Police Department, or Fire Preven-
tion Services office.
• For more information, please visit the Office of Emergency Management.
12/12