Professional Documents
Culture Documents
BDP 2023 01
BDP 2023 01
Jiaul Paik
Lecture 1
Today’s topics
• Email ids:
• jiaul@cet.iitkgp.ac.in
• jia.paik@gmail.com
Prerequisites (Must)
• Programming
• Python is highly recommended
Type # times
Written Test 2
Programming Assignment 6
Major Topics of the Course
• Fundamentals of Hadoop
• Dealing with distributed data storage
• Mapreduce programming with Hadoop
• Functional Programming: Python & Scala
• Spark
• Basics
• Streaming data
• Relational data
• Graph data
• High level language: PIG Latin
• Apache Hbase
Programming Assignments
• Objectives
• Make you familiar with basics of big data processing technologies
• Submission
• Through moodle (link will be provided)
• Typical deadline
• 10-15 days (depending upon the complexity of the assignment)
Important Notes
Course Content
• This is a general purpose practical course
• If you want to do something with ‘big data’
• The techniques you learn can be applied to any form of data which is ‘big’
• Thus …….
• hands-on programming experience with modern big data systems
is absolutely essential
Attendance
• There were 800 applications for this course
Attendance is MANDATORY
Main Flavour of the Course
• This is a programming heavy course
• Evaluation:
• If your program does not run correctly, you will get ZERO credit
(no excuse please!!!)
What can you expect from this course?
1. Limitations of classical data processing systems
• MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and
Other Systems: Book by Adam Shook and Donald Miner