Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Project Expectations

Over the duration of the course, you have worked with multiple technology platforms
(Python/ Hadoop/Spark/BigQuery/Gephi…) and have witnessed a wide variety of business
applications.

Your project must demonstrate a business application using a platform chosen from the
above. You may choose to implement it either on a laptop or on a public cloud such as
AWS/Azure/…

Real world big data efforts require you to grapple with large datasets. For the BDA project,
the "suggested" size is 10+ million rows, although circumstances may necessitate a lower
size: certainly no lower than a million rows. There are several sources for large data sets on
the web. Here's an excellent site: https://webscope.sandbox.yahoo.com/

The technique that you use (e.g. from machine learning/ AI) may be something that we have
not covered in class - you must still be able to explain it in detail.

Given the large data size, you must pick a platform that exploits parallelism, such as Spark.
Simplistic spreadsheet-based or Rattle-based project efforts that read in data from a file
source are unacceptable.

Here's one possible approach for your project:

1. Select a sizeable dataset, prove results that have been demonstrated elsewhere, and
extend them – OR –
2. You pose a different, interesting question of the same dataset, and provide a solution.

You may choose not to take the above route. Innovation will be rewarded!

When to start thinking of the Project…


By Session 14, you would have developed a rich understanding of big data analysis, and seen
datasets hosted and used with commercial platforms like Amazon.

Deliverable Schedule

 EOD Sunday 30th August:  A one-page write-up in less than 150 words on what your
project will accomplish. Your submission must clearly answer these questions.
o Group composition? Individual member roles? Which dataset? What
platform? What technique? (3 marks)
 EOD Sunday 13th September: A 3-page write-up summarising your set-up, initial
investigations and results, plus any exhibits. Stick to the page length. (5 marks)
 EOD Sunday 20th September: Provide a shared link to working code with all the data.
The code should run on any laptop, so no hard-coded directory links, etc. (5 marks)
o Prepare a video recording of 7-10 minutes showing all your code in action.
Supply a shared drive link to the video. (5 marks)
 23 – 26th September: A face to face viva with the entire project group. Everyone
rd

must come prepared to explain every single aspect of the project. (12 marks)
Avoid submitting code that is publicly available, with minor modification – e.g. Kaggle
tryouts by other developers. This shall be considered an act of plagiarism. You know the
consequences.

You might also like