Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

CS 378: Cloud Computing

Instructor: Dr. Kia Teymourian

Fall 2023

E-mail: kiat@cs.utexas.edu Web: http://www.teymourian.de


Class Hours: TTH 2:00 - 3:15 PM
Office Hours: See Canvas main page - by appointment via email

Course Description
This course provides a comprehensive exploration of cloud computing models, tools, and tech-
niques. Students will actively engage in developing multiple cloud-native applications, gaining
hands-on experience with popular cloud computing platforms. The course serves as an intro-
ductory guide to cloud computing. The class will primarily concentrate on cluster computing
software tools and programming techniques essential for data engineers. On the tools side, par-
ticipants will learn about fundamental systems and techniques for storing large volumes of data.
Additionally, the course will cover modern cluster computing systems based on Map-Reduce
patterns, such as Hadoop MapReduce, Apache Hadoop, and Spark. Students will gain a solid
understanding of both the practical tools and the underlying concepts of cloud computing area.

Prerequisite
Prerequisite: Upper-division standing; additional prerequisites vary with the topic.
Students should be familiar with Java and Python programming.

Recommended Textbook
There is no single textbook that covers all topics of this course. We will provide comprehensive
reading materials including lecture notes, slides and programming code examples.
Here you can find a list of important books that you can use as main references.

• Cloud Computing for Science and Engineering. By Ian Foster and Dennis B. Gannon. The
MIT Press. 2017. https://cloud4scieng.org/

• Cloud Computing for Machine Learning and Cognitive Applications. By Hwang, Kai. The
MIT Press. 2017.

1
329E ELEMENTS OF DATA ANALYTICS

• Distributed and Cloud Computing From Parallel Processing to the Internet of Things. By
Kai Hwang, Geoffrey C. Fox and Jack J. Dongarra. 2012 Elsevier.

• Hadoop, The Definitive Guide, O’Reilly 2009

• Spark: The Definitive Guide: Big Data Processing Made Simple. Bill Chambers, Matei
Zaharia, O’Reilly Media; 1st edition (March 20, 2018)

• Advanced Analytics with Spark. By Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills.
O’Reilly Media. 2015, 978-1-491-91276-8.

• High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark. Holden
Karau, Rachel Warren, O’Reilly Media; 1st edition, 2017, 9781491943205.

• MapReduce Design Patterns. By Donald Miner and Adam Shook. O’Reilly Media. 2013.

• UNIX and Linux System Administration Handbook. Evi Nemeth, Garth Snyder, Trent
Hein, Ben Whaley, Dan Mackin. Addison-Wesley Professional; 5th edition (August 8, 2017)

Learning Objectives
By successfully completing this course you will be able to:

• Explain the main challenges of Big Data Processing

• Run a Big Data Processing pipeline on Amazon AWS and Google Cloud

• Implement Big Data code in Apache Spark (in PySpark)

• Be able to store and query large-scale data on cloud storage systems

Class Participation
Our class will be in-person with a live streaming on Zoom. However, there is a live Zoom link
for anybody that is uncomfortable with an in-person setting. For all Zoom links, see the Zoom
tab in Canvas.
Either in-person or zoom attendance is required. You may miss up to 3 classes without any
excuse. If you miss multiple sessions of the class we will reduce up to 10% of your grade.

• We use Instapoll (https://polls.la.utexas.edu/) for attendance (please bring a phone or


laptop).

• Please join us over zoom if you have any health concerns, or may be sick.

• We will record the zoom sessions and publish them 10 days later on Canvas for your review.

• Office hours will be exclusively through Zoom. Please make an appointment over email
before you show up.

2/12
329E ELEMENTS OF DATA ANALYTICS

Course Topics
You can find the tentative course schedule on Canvas.

• How cloud computing started? Distributed Computing, Limitations of single machine.

• Functional Programming, Lambda expression, Map and Reduce in Java 8, Streaming in


Java 8

• Key-Value basic concepts, MapReduce, Apache Hadoop programming (Java), Apache Hadoop
with AWS Elastic Map Reduce (EMR)

• File storage in the cloud, Hadoop HDFS, Hadoop commands, Amazon S3, Amazon In-
stance Store and Amazon EBS.

• Spark, Spark programming (Python), pyspark, RDD, Spark operators, Java and Scala pro-
gramming in Spark.

• Big Data Programming in pyspark, numpy, scipy

• Databases in the cloud, SQL for data scientists. Declarative and imperative SQL.

• Dataframes, Datasets in Spark, Spark operations on Datasets and Dataframes

• NoSQL databases, Key-value databases (Cassandra), Document-based Databases, Mon-


goDB, Graph Databases.

• Visualization and Containerization, Docker, Docker commands

• Apache Spark Machine Learning Libraries, IBM SystemML.

• Data Streaming, Spark Streaming, Apache Storm, Kafka, RabbitMQ.

All lectures will be recorded on Zoom Canvas and published 10 days later.

Code Examples
You can find the code examples of this course on the following GitHub repository.
https://github.com/kiat/Cloud-Computing

Assignments
There will be 10 programming assignments,each equally weighted to total 40% of your
final grade.

• Programming assignments must be completed using Java Or Python. We will use


a set of tools and frameworks like Apache Hadoop and Apache Spark. Students
should be familiar with Java and Python programming.

3/12
329E ELEMENTS OF DATA ANALYTICS

• The assignments will require a substantial time commitment over several days (an
average of 10 to 12 hours per week should be expected). Be sure to budget sufficient
time to complete assignments before the deadline. Turn in your assignments on
time. This permits grading to start promptly after the submission deadline so that
assignments maybe returned promptly. If you do not finish an assignment by the
deadline you have a maximum of two days to turn your assignment in. However,
there is a penalty of 10 points (out of a 100 points) per day. Your assignment is one
day late until the midnight of the day after it is due, two days late from then until
midnight of the second day. We will accept your assignment after two days if there
is a compelling reason.
You can submit your assignments 3 times late (up to 2 days after original due
date) without any penalties.
• You can work in groups of up to 4 students.

In-Class Activities and Quizzes


Throughout the semester, there will be in-class activities. They will vary in nature and
will be random in occurrence. Some will be graded for correctness, and some will only
be graded for completion. This will make up 5% of your final grade.

Tests
Students have to do two tests. Tests (1 and 2).

1 Student Research Paper Presentations


• Students will read and analyze a research paper related to cloud computing and
large-scale data processing. Students will present the results of the analysis in a
presentation in class.
• Students can work in groups of up to 4 students.
• Presentation guidelines will be published on Canvas.

Grading Structure
• The 10 Programing Assignments are focused on applying theory learned in the
class. Weekly course assignments include both theoretical analysis and practical
algorithmic implementation in python. The assignments for 40% of your course
grade.
• Class Activities are 5% of your grade

4/12
329E ELEMENTS OF DATA ANALYTICS

A > 94%
A- 90%
B+ 87%
B 84%
10 Programming Assignments 45%
B- 80%
Class Programming Activities 5%
C+ 77%
Test 1 20%
C 74%
Paper Presentation 10%
C- 70%
Test 2 20%
D+ 67%
Table 1: Percentage Coverage D 64%
D- 60%
F < 60%

Table 2: GRADE BREAKS

• The Test 1 (20%) includes topics learned up to the mid-time of this class. It will be
proctored examination in-person in classroom consisting of implementation ques-
tions and may include additional multiple choice questions.
• Research Paper Presentation Project. Your term project is 15% of your grade.
• The Test 2 (20%) will be comprehensive and will cover material from the entire
course. It will be an open-book proctored exam in-person in classroom consisting
of implementation questions and may include additional multiple choice questions.
The exam 2 will make 20% of your grade.

We will use the +/- grades according to the grade breaks table.

Course Policies
Usage of Cloud Machines
In this class, we use real-world cloud systems existing on Amazon AWS and Google
Cloud. You will receive educational credit coupons or credited access to such cloud
systems. You should never use your private account or use your credit card for this
class assignment. You will receive enough education credits so you can run successful
assignments on Google Cloud or Amazon AWS.
The credit amount is 100 USD for Google Cloud. You should use only this amount
to finish your assignments. This would be more than enough to finish the assignments,
learn how AWS or Google Cloud work, and have your first enjoyable experience with
it. You can choose different numbers of Machines, and different configurations of those
machines. And each will cost you differently!
Since this is real money, it makes sense to develop your code and run your jobs locally,
on your laptop, using the small data set (we will provide two types of the same data set,
small and big). Once things are working, you’ll then move to Amazon AWS or Google

5/12
329E ELEMENTS OF DATA ANALYTICS

Cloud. We will ask you to run your Spark jobs over the “real” data using a set of cluster
machines.
As you can see on the Google Cloud Price list, a medium size machine costs around
50 cents per hour. That is not much, but IT WILL ADD UP QUICKLY IF YOU FOR-
GET TO SHUT OFF YOUR MACHINES.
Be very careful and stop your machine as soon as you are done working!
You can always come back and start your machine or create a new one easily when
you begin your work again. Another thing to be aware of is that Amazon charges you
when you move data around. To avoid such charges, do everything in the “N. Virginia”
region on AWS or US Central on Google Cloud. That’s where data is, and that’s where
you should put your data and machines.

Assignment Identification
All assignments must be submitted with the proper header, containing your name (as
registered), your unique section number, and the assignment number at the top of the
assignment. The format for the header will be specified in the assignment. That specifi-
cation will over ride any other header specification.

Assignment Completion & Late Work


• All assignments should be submitted on time.
• Late submissions without reasons will result in grade deduction. You can turn in
an assignment up to 24 hours late, in which case you receive a 10% penalty (that is,
10 points are subtracted from an assignment that is worth 100 points), or up to 48
hours late, in which case you receive a 20% penalty.
• Assignments turned in after that are not accepted and will be given 0.
• We kept on saying no exceptions, but there are exceptions in very extreme circum-
stances, with proper documentation. For example, if you obtain a doctor/dentist
note stating that you were so ill at the due date/time that you could not reasonably
be expected to meet the deadline, it is possible to get an extension.
• You can submit your assignments 3 times during the semester late (up to 2 days
after original due date) without any penalties.
• Answers of Quizzes/Assignments are publish 48 hours after due date.

Using Generative AI Tools doing your Assignments


As generative artificial intelligence tools, like ChatGPT, become more prevalent we must
be careful about how we integrate them into our class.

• You are allowed to use AI for review, study, or research purposes throughout the
course.

6/12
329E ELEMENTS OF DATA ANALYTICS

• You are allowed to interact with AI while answering homework questions, although
the final answer must be developed by you and documented in your own words.
• AI is not allowed during In-Class Programming Activities, any questions you may
have can be asked of me or the TAs.
• For projects you are allowed to interact with AI as much as you like but each piece
of code written by the AI must have a comment about the AI authorship.
• You will be asked in the followup questions about if/how you interacted with AI
for your programming assignment and projects.

For more information about using Generative AI Tools at UT Austin,


please visit the following links:
5 Things To Know About ChatGPT https://ctl.utexas.edu/5-things-know-about-chatgpt

Grade Dispute
Scores for assignments will be posted on Canvas. You have one week from the date the
assignment grade is posted to dispute your grade. The student assistants will be grading
the assignments. Visit the TAs and see if you can resolve your differences. If you cannot
resolve your differences, you may visit me to explain the situation. We will not entertain
any grade disputes three days after the grades are returned. You may resubmit your
assignment for regrading after grades are returned. But the maximum that you can get
is 80 points out of 100 points.

Schedule and weekly learning goals


The schedule is tentative and subject to change. The learning goals below should be
viewed as the key concepts you should grasp after each week, and also as a study guide
before each exam, and at the end of the semester. Each exam will test on the material
that was taught up until 1 week prior to the exam. The applications in the second half
of the semester tend to build on the concepts in the first half of the semester though, so
it is still important to at least review those concepts throughout the semester.

Study Groups
Please organize yourselves into study groups of 4 students who will meet once a week
or more to discuss the course. Typically, you will review the lectures, do the reading,
and attempt the homework independently before your weekly meeting with your study
group. Studying for tests together is permitted and encouraged. If you are unsure about
how to work together with your friend in a legal, helpful manner, do come and talk with
us. Remember, it is always ok to "work together" with your professor or TA!

7/12
329E ELEMENTS OF DATA ANALYTICS

Communication
We will be using Piazza or Ed Discussion for general discussion of class related questions
rather than the discussion board on Canvas. Please do not post solutions or code to
any home work assignment problems on Piazza. All communications to the Teaching
Assistants will be through Piazza. If you want to reach out to the Teaching Assistants
then post a private note to them on Piazza. Do NOT send them private e-mails. If you
want to reach me send me mail at (kiat@cs.utexas.edu). If you have assignment related
questions, it is best to visit the TAs during their office hours. If you have content related
questions visit me during my office hours.

During Class
I understand that the electronic recording of notes will be important for class and so
computers will be allowed in class. Please refrain from using computers for anything
but activities related to the class. Phones are prohibited as they are rarely useful for
anything in the course. Eating and drinking are allowed in class but please refrain from
it affecting the course. Try not to eat your lunch in class as the classes are typically active.

Your Responsibilities in This Class


• Your performance in this class will be determined by you! It will require a strong
dedication to learning the material and will require a substantial time commitment
to complete the programming assignments.
• You are expected to show up on time for class and stay for the whole lecture.
• You are responsible for all material posted to the CANVAS web site and sent as
email. Ignorance of such material is no excuse.
• You are responsible for all material presented in the lectures. Note that lectures
will include some material that is not available elsewhere. If you miss a lecture you
are expected to see the recording.
• You are responsible for turning in your own work on all assignments. Unauthorized
collusion is not allowed and constitutes a violation of the university’s policies on
academic integrity. See above guidelines for more information on what is or is not
allowed.
• You are responsible for protecting your work from being copied by others.
• Your conduct in class should be conducive towards a positive learning environment
for your class mates as well as your self.

8/12
329E ELEMENTS OF DATA ANALYTICS

SHARING OF COURSE MATERIALS IS PROHIBITED


No materials used in this class, including, but not limited to, lecture hand-outs,
videos, assessments (quizzes, exams, papers, projects, homework assignments), in-
class materials, review sheets, and additional problem sets, may be shared online or
with anyone outside of the class without explicit, written permission of the instruc-
tor. Unauthorized sharing of materials promotes cheating. The University is aware of the
sites used for sharing materials, and any materials found online that are associated with
you, or any suspected unauthorized sharing of materials, will be reported to Student
Conduct and Academic Integrity in the Office of the Dean of Students. These reports
can result in initiation of the student conduct process and include charge(s) for academic
misconduct, potentially resulting in sanctions, including a grade impact.

General Policies
If you are absent from class or examination for the observance of a religious holy day
you may turn in your assignment or take the examination on an alternate date provided
you have given me written notice fourteen days prior to the class absence. For religious
holy days that fall within the first two weeks of class notice must be given on the first
class day.
Students with disabilities who need special accommodations should contact the Ser-
vices for Students with Disabilities (SSD) Office (471-6259 or 471-4641 TTY).

Academic Misconduct Policy


While you are free to discuss the course material with your classmates and are encour-
aged to form study groups for the exams, collaboration on homework or programming
assignments is not permitted unless you are working with partner on a pair program-
ming assignment. Helping a friend understand the intent of a homework or program-
ming assignment specification is permitted. Students who work together too closely (e.g.
design their solution together) outside of pair programming should be aware that this
is a form of cheating called COLLUSION and is subject to academic penalties. Penalties
for academic misconduct include a failing grade in this course.
The homework, programs, and exams must be the work of students turning them in.
University policy (see Dean of Students’ policies on academic integrity) will be followed
strictly. We will be running a sophisticated program on all submitted assignments to
detect plagiarism. If we do detect any cases of academic dishonesty, we will assign a
grade of F to all students involved and refer the cases to the Dean of Students.
Acts that exceed the bounds defined by the approved collaboration practices will be
considered cheating. Such acts include:

• Copying solutions, code, or programs from someone else or giving someone else
your solutions, code, or programs
• Participation in a discussion group that develops a solution that everyone copies

9/12
329E ELEMENTS OF DATA ANALYTICS

• Posting your code to homework problems on Piazza or Facebook or other internet


sites.
• Copying code from the internet (e.g. from Piazza or Facebook or other internet
sites)
• Employing someone to write the solutions for you on homework assignment prob-
lems.

We urge everyone in the class to take appropriate measures for protecting one’s work.
You should protect your files, homework solution sheets, etc. as deemed reasonable.
The only exception that we will make to these guidelines is if you are involved in pair
programming with a friend.

ACADEMIC INTEGRITY EXPECTATIONS


Students who violate University rules on academic misconduct are subject to the stu-
dent conduct process and potential disciplinary action. A student found responsible for
academic misconduct may be assigned both a status sanction and a grade impact for the
course. The grade impact could range from a zero on the assignment in question up to a
failing grade in the course. A status sanction can range from probation, deferred suspen-
sion and/or dismissal from the University. To learn more about the academic integrity
standards, tips for avoiding a potential academic misconduct violation and the overall
conduct process, please visit the Student Conduct and Academic Integrity website at:
http://deanofstudents.utexas.edu/conduct.

Important Safety Information


If you have concerns about the safety or behavior of fellow students, TAs or professors,
contact BCCAL (the Behavior Concerns and COVID-19 Advice Line) at
https://safety.utexas.edu/behavior-concerns-advice-line
or
by calling 512-232-5050.
Confidentiality will be maintained as much as possible, however the university may
be required to release some information to appropriate parties.

CLASSROOM SAFETY AND COVID-19


• For any illness, students should stay home if they are sick or contagious, not only
to stop the spread, but also to prioritize their personal well-being.
• UHS provides symptomatic COVID-19 testing (https://www.healthyhorns.utexas.
edu/coronavirus_testing.html) for students. Schedule your appointment by call-
ing 512-471-4955 or online within the MyUHS patient portal. Learn more about
symptomatic COVID-19 testing here https://www.healthyhorns.utexas.edu/coronavirus_
testing.html.

10/12
329E ELEMENTS OF DATA ANALYTICS

• Disposable masks are available for students at the William C. Powers, Jr. Student
Activity Center and Texas Union hospitality desks.
• The exposure action chart (https://www.healthyhorns.utexas.edu/coronavirus_
exposure_action_chart.html ) offers guidance on what to do if you have been
exposed to someone who has COVID-19 or if you test positive. If you experience
symptoms, stay home and isolate and follow the instructions for symptomatic in
the chart.
• Stay up to date on COVID-19 vaccinations (https://www.healthyhorns.utexas.
edu/coronavirus_vaccination.html) by getting all available boosters when eligi-
ble. Vaccines are available through University Health Services.
• Additionally, UHS maintains up to date resources on COVID-19, which can be
found here:
• COVID-19 Information and Resources
https://www.healthyhorns.utexas.edu/coronavirus.html

CARRYING OF HANDGUNS ON CAMPUS


Texas’ Open Carry law expressly prohibits a licensed to carry (LTC) holder from carrying
a handgun openly on the campus of an institution of higher education such as UT Austin.
Students in this class should be aware of the following university policies:

• Students in this class who hold a license to carry are asked to review the university
policy regarding campus carry here https://www.utexas.edu/campus-carry#ac.
• Individuals who hold a license to carry are eligible to carry a concealed handgun on
campus, including in most outdoor areas, buildings and spaces that are accessible
to the public, and in classrooms.
• It is the responsibility of concealed-carry license holders to carry their handguns on
or about their person at all times while on campus. Open carry is NOT permitted,
meaning that a license holder may not carry a partially or wholly visible handgun
on campus premises or on any university driveway, street, sidewalk or walkway,
parking lot, parking garage, or other parking area.
• Per my right, I prohibit carrying of handguns in my personal office. Note that this
information will also be conveyed to all students verbally during the first week of
class. This written notice is intended to reinforce the verbal notification, and is not
a “legally effective” means of notification in its own right.

TITLE IX DISCLOSURE
Beginning January 1, 2020, Texas Education Code, Section 51.252 (formerly known as
Senate Bill 212) requires all employees of Texas universities, including faculty, report any
information to the Title IX Office here https://titleix.utexas.edu/ regarding sexual

11/12
329E ELEMENTS OF DATA ANALYTICS

harassment, sexual assault, dating violence and stalking that is disclosed to them. Texas
law requires that all employees who witness or receive any information of this type
(including, but not limited to, written forms, applications, one-on-one conversations,
class assignments, class discussions, or third-party reports) must report it to the Title IX
Coordinator. Before talking with me, or with any faculty or staff member about a Title
IX related incident, please remember that I will be required to report this information.
Although graduate teaching and research assistants are not subject to Texas Educa-
tion Code, Section 51.252, they are mandatory reporters under federal Title IX regula-
tions and are required to report a wide range of behaviors we refer to as sexual miscon-
duct, including the types of misconduct covered under Texas Education Code, Section
51.252. Title IX of the Education Amendments of 1972 is a federal civil rights law that
prohibits discrimination on the basis of sex – including pregnancy and parental status
– in educational programs and activities. The Title IX Office has developed supportive
ways and compiled campus resources to support all impacted by a Title IX matter.
If you would like to speak with a Case Manager for Support and Resources, who can
provide support, resources or academic accommodations, in the Title IX Office, please
email supportandresources@austin.utexas.edu. A Case Manager can also provide sup-
port, resources and accommodations for pregnant, nursing, and parenting students.
For more information about reporting options and resources, visit http://www.titleix.
utexas.edu/, contact the Title IX Office via email at titleix@austin.utexas.edu, or call 512-
471-0419.

CAMPUS SAFETY
The following are recommendations regarding emergency evacuation from the Office of
Campus Safety, 512-471-5767,
• Students should sign up for Campus Emergency Text Alerts at the page linked
above.
• Occupants of buildings on The University of Texas at Austin campus must evac-
uate buildings when a fire alarm is activated. Alarm activation or announcement
requires exiting and assembling outside.
• Familiarize yourself with all exit doors of each classroom and building you may
occupy. Remember that the nearest exit door may not be the one you used when
entering the building.
• Students requiring assistance in evacuation shall inform their instructor in writing
during the first week of class.
• In the event of an evacuation, follow the instruction of faculty or class instructors.
Do not re-enter a building unless given instructions by the following: Austin Fire
Department, The University of Texas at Austin Police Department, or Fire Preven-
tion Services office.
• For more information, please visit the Office of Emergency Management.

12/12

You might also like