Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Course Project

Course Title: Statistics and Probability


Course Code: STA 2101

Name of the Faculty: Khan Raqib Mahmud


E-mail: raqib.mahmud@ulab.edu.bd

Course Objectives

 To provide a thorough understanding the meaning of Statistics and Probability with real-life
applications and the resource requirements.
 To introduce the various methods of statistical data analysis in an investigation.
 To expose the students to a variety of techniques that have practical applications, while
conducting detailed analysis of the requirements.

Project Descriptions
Project Topics and Related Works Analysis
1) Your first task is to pick a project topic. If you're looking for project ideas, please come to
office hours, and I'd be happy to brainstorm and suggest some project ideas.

1. What is the problem you are trying to solve?


2. How can you use data to answer the question?
The most important part of the project is to understand the problem of stakeholder and approach this
problem with statistical learning techniques.

For example, if a business owner asks: “How can we reduce the costs of performing an
activity?” We need to understand, is the goal to improve the efficiency of the activity? Or is
it to increase the businesses profitability? To solve these two problems, we may have to take
two different approaches and thus it is must to understand the problem at a very granular
level.

The next step is the Analytic Approach; if the issue is to determine the probabilities of
something, then a predictive model might be used; if the question is to show relationships, a
descriptive approach may be required.
2) Once you have identified a topic of interest, it can be useful to look up existing research on
relevant topics by searching related keywords on an academic search engine such
as: http://scholar.google.com.

Data Analysis
3) An important aspect of designing your project is to identify one or several datasets suitable for
your topic of interest. If that data needs considerable pre-processing to suit your task (You can
choose to use pre-prepared datasets e.g. from Kaggle, the UCI machine learning repository, etc),
or if you intend to collect the needed data yourself, keep in mind that this is only one part of the
expected project work, but can often take considerable time. We still expect a solid methodology
and discussion of results, so pace your project accordingly.

4) Preprocessed datasets: While we don't want you to have to spend much time collecting raw
data, the process of inspecting and visualizing the data, trying out different types of
preprocessing, and doing error analysis is an important part of Statistical Analysis and inference.
Hence if you choose to use pre-prepared datasets (e.g. from Kaggle, the UCI machine learning
repository, etc.) we encourage you to do some data exploration and analysis to get familiar with
the problem (Data acquisition, data cleaning, exploratory data analysis).
Data preprocessing: The process of inspecting and visualizing the data, trying out different types
of preprocessing, and doing error analysis is an important part of Statistical Analysis and
Inference. Data preprocessing involves addressing:

1. Missing Data
2. Invalid Values
3. Remove Duplicates
4. Formatting
5. Feature Engineering

Descriptive and Predictive Modeling


5) Once data are prepared for the statistical inference, we are ready for modelling and
evaluation phases. Modelling focuses on developing models that are either descriptive or
predictive, and these models are based on the analytic approach chosen in the very first stage.
For example, a descriptive model can tell what new service a customer may prefer based on
the customer’s existing preference. While Predictive modelling is a process that uses data
mining and probability to forecast outcomes; for example, a predictive model might be used
to predict the sales of next month.
6) Accuracy of predictive model: Next, Model evaluation is performed during model
development that evaluates the model’s quality and checks whether it addresses the business
problem fully and appropriately.
Project Report
7) Communication. (Are the authors able to clearly and effectively explain the work that they
did, including context, methods, and results? Do the paper and poster balance clarity with rigor?)
8) The technical quality of the work. (I.e., Does the technical material make sense? Are the
things tried reasonable? Are the proposed algorithms or applications clever and interesting? Do
the authors convey novel insight about the problem and/or algorithms? Does the project have
sufficient scope for the given team size?)
9) Originality. (Did the authors add their own data processing, methods, or analysis? Does the
final project avoid being a mirror image of existing papers/projects with no net new work?)
10) The authors need to clearly and effectively explain the work that they did, including context,
methods, and results. In order to highlight these components, it is important you present a solid
discussion regarding the learnings from the development of your method, and summarizing how
your work compares to existing approaches. Please see the appendix to see the guidelines of
report writing.
-------****-----***-----****-------
* Please include a section that describes what each team member worked on and contributed to
the project. If you have any concerns working with one of your project teammates, please contact
with me. We may reach out and factor in contributions and evaluations when assigning project
grades.
** Please include a link to a Github repository or zip file with the code for your final
project.

*** Write the project report in your own words. The plagiarism / similarity index must be
below 15%. There must be unique features in your work.
Appendix

Report writing guideline

1. Abstract [1 paragraph]
The abstract should consist of 1 paragraph consisting of the motivation for your paper and a
high-level explanation of the methodology you used/results obtained.
2. Introduction [0.5-1 pages]
Explain the problem and why it is important. Discuss your motivation for pursuing this problem.
Give some background if necessary. Clearly state what the input and output is. Be very explicit:
The input to our algorithm is an image, amplitude, patient age, rainfall measurements, grayscale
video, etc. We then use linear regression or Bayes algorithm to output a predicted age, stock
price, cancer type, music genre, etc. This is very important since different teams have different
inputs/outputs spanning different application domains. Being explicit about this makes it easier
for readers. If you are using your project for multiple classes, add a paragraph explaining which
components of the project were used for each class.
3. Related work [0.5 -1 pages]
You should find existing papers, group them into categories based on their approaches, and
discuss their strengths and weaknesses, as well as how they are similar to and differ from your
work. In your opinion, which approaches were clever/good? What is the state-of-the-art? Do
most people perform the task by hand? You should aim to have at least 5 references in the related
work. Include previous attempts by others at your problem, previous technical methods, or
previous learning algorithms. Google Scholar is very useful for this: https://scholar.google.com/
(you can click “cite” and it generates MLA, APA, BibTeX, etc.)
4. Dataset and Features [0.5 to 1 pages]
Describe your dataset: how many training/validation/test examples do you have? Is there any
preprocessing you did? What about normalization or data augmentation? What is the resolution
of your images? How is your time-series data discretized? Include a citation on where you
obtained your dataset from. Depending on available space, show some examples from your
dataset. You should also talk about the features you used. If you extracted features using Fourier
transforms, word2vec, histogram of oriented gradients (HOG), PCA, ICA, etc. make sure to talk
about it. Try to include examples of your data in the report (e.g. include an image, show a
waveform, etc.).
5. Methods [1 to 1.5 pages]
Describe your learning algorithms, proposed algorithm(s), or theoretical proof(s). Make sure to
include relevant mathematical notation. For example, you can briefly include the SVM
optimization objective/formula or say what the softmax function is. It is okay to use formulas
from the lecture notes. For each algorithm, give a short description (1 paragraph) of how it
works. Again, we are looking for your understanding of how these statistical learning techniques
work. Additionally, if you are using a niche or cutting-edge algorithm, you may want to explain
your algorithm using 1/2 paragraphs.
6. Experiments/Results/Discussion [1 to 3 pages]
You should also give details about what (hyper) parameters you chose (e.g. why did you use X
learning rate for gradient descent, what was your mini-batch size and why) and how you chose
them. Did you do cross-validation, if so, how many folds? Before you list your results, make sure
to list and explain what your primary metrics are: accuracy, precision, AUC, etc. Provide
equations for the metrics if necessary. For results, you want to have a mixture of tables and plots.
If you are solving a classification problem, you should include a confusion matrix or
AUC/AUPRC curves. Include performance metrics such as precision, recall, and accuracy. For
regression problems, state the average error. You should have both quantitative and qualitative
results. To reiterate, you must have both quantitative and qualitative results! This includes
unsupervised learning (talk with your project TA on how to quantify unsupervised methods).
Include visualizations of results, heatmaps, examples of where your algorithm failed and a
discussion of why certain algorithms failed or succeeded. In addition, explain whether you think
you have over t to your training set and what, if anything, you did to mitigate that. Make sure to
discuss the figures/tables in your main text throughout this section. Your plots should include
legends, axis labels, and have font sizes that are legible when printed.
7. Conclusion/Future Work [1 to 2 paragraphs]
Summarize your report and reiterate key points. Which algorithms were the highest performing?
Why do you think that some algorithms worked better than others? For future work, if you had
more time, more team members, or more computational resources, what would you explore?
8. References/Bibliography (No page limit)
This section should include citations for: (1) any papers mentioned in the related work section.
(2) Papers describing algorithms that you used which were not covered in class. (3) Code or
libraries you downloaded and used. This includes libraries such as scikit-learn, Matlab toolboxes,
Tensor
ow, etc. Acceptable formats include: IEEE.
9. Formatting
We highly recommend using a conference/journal template (e.g. NeurIPS, IEEE, ICML).

You might also like