Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Foundation of Business Analytics

Instructions for Final Project


You've learned lots about doing statistical analyses. It's time to work without a net....

Due Dates

Submission due date: Submission due date: Following the submission deadline announced on
Canvas, online submission to the Canvas Submission Folder.

General Description

In this final project, you would address some questions that interest you with the statistical
methodology we learn through the FBA course. You choose the question; you decide how to
collect data/ pick the data from the folder uploaded on MS Teams; you do the analyses. The
questions can address almost any topic (although I have veto power), including topics in
business, and economics (remember our course nominated “Foundation of Business Analytics”).

The group project requires you to synthesize the material from the course. Hence, it's one of the
best ways to solidify your understanding of statistical methods. Plus, you get answers to issues
that pique your intellectual curiosity.

Your project will be presented in a paper long 10 page of writing (1800-2200 words at least, no
upper limit for the length) and some graphs, diagrams, table of analysis. In the paper, each group
makes visual materials that explain the discovery from the data. Report sessions are extremely
common at professional data analytics in many disciplines.

There is a formal write-up of the assignment (as mentioned). Each group must submit the paper
on the GG Drive Folder found in the beginning of this instruction. The format is pretty opened,
but a font of Times New Roman 12, spacing of 1.2, and margin of 1-1-1-1 inch is preferred by
us. Moreover, the file should be submitted in the pdf or word files associated with the Excel file
as pieces of evidence of data analysis process.

You should get started on the assignment as early as possible, particularly in thinking about
procuring data and collecting background information. Keep in mind that by the end of lectures,
you will have learned many statistical techniques, these techniques will help you address your
question of interest.

In this very last group assignment of this course, we hope that you have gain useful knowledge
for conducting basic data analytics operations, which are commonly utilized in the real world
and industry tasks. The section Regression model is actually a very important tools for data
analytics in economics, accounting, finance, and public policy domains…. Therefore, we really
want to see how you can combine your skills, ranging from the data collection, cleaning,
descriptive statistics, and regression modeling (associated with hypothesis testing of course) in
this assignment.
Basic Requirements

Technically, the assignment requires you collect the data from a reliable data source, such as
World Bank, IMF, or FRED of the US Federal Reserve, or the Vietnamese Statistics Office of
the Government. Other sources such as ADB,.. can be also found in my small paper “NHỮNG
NGUỒN SỐ LIỆU QUAN TRỌNG CHO FINAL PROJECT” posted in the FINAL PROJECT
channel of the MS Teams.
For the instruction of how to collect the data from a secondary source, such as World Bank
Database, please view the following video from the time laps from 1:40:00:
https://www.youtube.com/watch?v=G0ChktJ8-TQ
Moreover, another Vietnamese version of the instruction:
https://www.youtube.com/watch?v=kPQAGh7Dis8
Lastly, the contents of the last multiple regression class recorded, you can find the way to run the
multiple regression and model interpretation here:
https://www.youtube.com/watch?v=G5LypCn6T8g

Basically, I want you to download these secondary data and join them into one single data file
with at least the following variables: macroeconomics data – GDP (in $), GDP growth (in %),
Inflation rate (in %), Unemployment Rate (in %), FDI (Foreign direct investment, net
inflows % of GDP), environment data - CO2 emission (metric tons per capita), and social
indicator – population growth (annual %), and a dummy variable – global recession years
(encoding 1 if that year was a global recession and 0 if that year was not), …
While the other quantitative dataset can be found from WB database: https://data.worldbank.org/
, you can find the information about the history of global recessions from the following paper:
https://openknowledge.worldbank.org/handle/10986/33415#:~:text=Abstract,1982%2C%201991
%2C%20and%202009.
You then can determine which year would be a recession year.

The basic idea of the data file is to try to explain which determinants impact the Unemployment
of 2 different countries (you will choose 2 specific countries to analyze the phenomenon, but I
recommend you to pick developed countries such as US, Australia, Japan, Germany,… for
better data quality).
You can review the marco-economic equation of national GDP as following:
𝑈𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒 = 𝛽0 + 𝛽1 ∗ 𝐺𝐷𝑃 𝐺𝑟𝑜𝑤𝑡ℎ + 𝛽2 ∗ 𝐼𝑛𝑓𝑙𝑎𝑡𝑖𝑜𝑛 + 𝛽3 ∗ 𝐹𝐷𝐼 + 𝛽4 ∗ 𝐶02 𝐸𝑚𝑖𝑠𝑠𝑖𝑜𝑛
+ 𝛽5 ∗ 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝐺𝑟𝑜𝑤𝑡ℎ + 6 ∗ 𝐺𝑙𝑜𝑏𝑎𝑙 𝑅𝑒𝑐𝑒𝑠𝑠𝑖𝑜𝑛 𝑌𝑒𝑎𝑟

as the basis of the regression model (adding the other variables to the equation for extra effects).
Moreover, you can add in any other variables (attributes) you want. The data should cover as
more years as possible (but should be at least from 1961 until 2021 data or today).
Lastly, with the time series of GDP (in $) of the two countries, you can conduct some forecasting
analysis with the Moving Average, and Exponential Smoothing methods and compare the
forecasting accuracy (based on the criteria of MAE, MSE, and MAPE) of each methods to answer
the question: Is the result from Moving Average is better than the Exponential Smoothing or
otherwise? You can view the following video for the detailed instruction on time series
forecasting:

https://www.youtube.com/watch?v=pEATS5lr1Uc

Now, if you are ready for the further steps, let’s move the the required sections of the
assignment.

Final Project Report

A typical assignment report should content the following components, (but not limited to)
1. Executive Summary

What is the topic of your project?

What are the main issues or problems the project to address?

What are your plans for obtaining background information (if needed) about your project?

Describe the data that you used or collected, including the variables measured. You don't
have to give a detailed version of your data collection design; the detailed design plans
should be represented in the later parts of the paper.

What questions and/or concerns do you have about your project?

What is your key findings from the empirical analysis?

What is your recommendations/ suggestions for managerial decision making related to the
results of analysis?

2. Data collection and background writing

In this report, you collect your data by your own, so please feel free in choosing the way you
want to conduct the data collection. Technically, you can collect the secondary data from
some websites and databases of reliable organization.

Anyway, a big-enough dataset (with a few hundred observations or more) will be more ideal
for your further analysis in the next stage.

You can refer some sources of secondary data as follows:

WB: https://data.worldbank.org/country/VN

IMF: https://www.imf.org/en/Data

FRED: https://fred.stlouisfed.org/

ADB: https://data.adb.org/

Or some other data sources from this page: https://www.freecodecamp.org/news/https-


medium-freecodecamp-org-best-free-open-data-sources-anyone-can-use-a65b514b0f2d/
The most important aspects of any statistical analysis are stating questions understanding
data. Hence, to get the full experience of running your own study, the assignment requires
you to write a data background to understand your data.

Moreover, although data cleaning is definitely a tedious task in data analytics, please spend
enough time for this stage, checking your data validity, their format, cleaning the missing
values (or replacing them); in short, try to do everything you can to prepare the data for the
further steps.

3. Literature review

As this is the final exercise of a course, a short paragraph of literature review is required.
You should provide some background knowledge about the topic that the data will involve
in, 3 to 4 journal research papers from Google Scholar will be a very good resource for these
purposes. Remember, all good analysis come from excellent literature review foundation.

4. Data analysis

Using analysis tools, such as the charts, diagram, and table, numerical analysis would be a
very start for the data analysis. As of now, we have scan through my approaches in all the
chapters; so, based on the data properties, you can pick the suitable approach to generate the
component of analysis in your report. In short, for this first assignment, I want to see 5-6
graphs/ tables in advance. More importantly, you should do your best in analyzing these
results, since it will be much easier to write more in this step than in the modeling one.

Moreover, the data distribution, interval estimation, hypothesis testing… are the all the
preparation for the modeling part. Although the scope of this course just covers the
regression analysis as a major approach for data modeling, you can choose any other
modeling method that you know if you can prove that the modeling method is appropriate
and effective for the analysis purpose; hence, the EFA (explanatory factor analysis), SEM, or
machine learning model… they are all welcome to show up in this report. Furthermore, the
modeling and hypothesis testing interpretation are the more important part that I want to
read. Please represent them in the business contexts associated with the foundation of
literature review formed from the previous section of the report. All the things I want is some
very compact and succinct analysis that what the model is telling and what that information
can be used for the decision-making process of your organization. In sum, the following
sections are required for this final project
a. Descriptive Statistics: Analyzing the categorical and quantitative data by the
tables, charts, and numerical analysis of mean, median, standard deviation, max, and min.
Here is an example of descriptive statistics for the numerical analysis you can refer for
presentation:

Moreover, preliminary descriptive analysis for 2 time-series of the GDP of 2 countries.


For example, you can present 2 time lines in the same graph like this:
b. Interval Estimation: Measure the interval estimate of the Unemployment variable for
the 2 countries with the different confidence level of 90%, 95%, and 99%, respectively
(hint: because the standard deviation of the population is unknown, t-distribution and the
relevant interval estimation is more appropriate to measure in this questions).
c. Hypothesis Testing: Regarding the average unemployment rate of each country in
the research period, I am wondering that if the mean of unemployment rate is different to
5% or not, in case that the population standard deviation σ is unknown (t-test). Please
develop your null and alternative hypothesis, then conduct the quantitative analysis to
clarify your statement. Please run this analysis on SPSS and copy a screenshot of your
SPSS outcome as evidence.
d. Regression model analysis: Analyzing the multiple regression suggested at the
beginning of the project and try to answer some of the questions, such as: (i) What does
the model means – Regression equations and parameter interpretation, (ii) How does the
model fit to the data collected – Coefficient of Determination, (iii) Are the whole model
and the individual significant or insignificant? Please run this analysis on SPSS and
copy a screenshot of your SPSS outcome as evidence.
In detail, some of the stages that I require in this assignment includes:

(i) Literature review for the foundation of the model


(ii) The model presentation.
(iii) The coefficient interpretation
(iv) The goodness of fit (adjusted R-square)
(v) The model overall validity (F-test)
(vi) The individual variable significance (t-test for each variable).
(vii) Compare the results of 2 countries. Do you see any difference? Explain.

The regression model presentation of 2 parallel countries should look like this:
e. Forecasting: Use the moving average (with order m = 3) and exponential smoothing
(with alpha = 0.8), then forecasting the time series data of GDP of the two countries. In each
country data, use the accuracy criteria (MAE, MSE, and MAPE) to select which forecasting
method is more accurate?

5. Story telling/ Data interpretation.

As the basic implication of this course, I want you to tell me the story behind the data, what
the graphs and tables imply, and there are the underlying relationships, linkages behind the
variables. You could read some literature reviews (which should be available online, and
some good academic sources might come from Google Scholar summary), base on your
knowledge, and quick brain storming process, write out words to tell an interesting story
about the data.
6. Reference.

For an academic writing paper, it will be highly comprehensive and relevant if the writers
could make proper citation. Therefore, the paper citation with reference will be an extra if
students can provide some.

Final Project Poster Presentation:

Besides the writing report, your project will be also presented in a poster session during the
last week of lab sections. In a poster session, each group makes visual materials that explain
the project. Then, people wander around looking at the posters and talking to the presenters,
thereby learning about the various projects. Poster sessions are extremely common at
professional conferences in many disciplines, including statistics. In our poster session,
some members of each group are stationed at the poster to answer questions, while the others
wander around to examine the projects. The poster-sitters and wanderers switch off after the
wanderers have examined all the posters.

You can find the instruction of how to present your idea and the statistical analysis result on
a research poster from these links:

How to make an academic poster in Powerpoint:


https://www.youtube.com/watch?v=_WnhoIbfcoM

Some general guidelines for making a better research poster:

https://www.youtube.com/watch?v=AwMFhyH7_5g
An example of poster presentation:
https://www.youtube.com/watch?v=vMSaFUrk-FA

Moreover, here provides some resource for the effective poster templates that you can refer
and follow:

https://www.makesigns.com/SciPosters_Templates.aspx

https://www.posterpresentations.com/free-poster-templates.html

https://www.postermywall.com/index.php/posters/search?s=data%20science#

Please notice that, you will have to submit the electronic form of the poster together with the
report and the data files in the same zip file (.zip or .rar) at least 2 days before the final
presentation date).
Assignment grading guidelines
You will be graded by your instructor, who will be looking for the following characteristics:

1. Consistency: Did you answer your question of interest?


2. Clarity: Is it easy for your reader to understand what you did and the arguments
you made?
3. Relevancy: Did you use statistical techniques wisely to address your question?
4. Interest: Did you tackle a challenging, interesting question (good), or did you just
collect very simple statistical analysis results (bad)?

Some suggestions for scoring high on these criteria, and suggestions you should keep in mind
whenever you write anything, are the following:

1. State your question up front and use statistics to help answer it. The statistics should not
drive the question; the question should drive the statistics.
2. Don't just collect data and publish it, rather have a specific question in mind. Otherwise,
you wind up being hard-pressed to come up with something challenging and interesting.
3. Most importantly, talk to your instructor for advice. You can ask them, for example,
about your planned methods of analysis and see what they think.
4. Be selective with computer output to help clarity.

If you are using techniques we learned in class, you do not have to re-explain the techniques.
That hurts clarity. If you are using techniques that we did not cover in class, you should
definitely explain the techniques. That is clarity!

Procedures for when group members are not contributing their fair share

Each group should spread the work among members so that everyone shares in the project. If
some group members do not contribute their assigned workload, or are unwilling to take on
work, your group may petition to have such group members dropped from the group. The
process of this petition proceeds as follows:

1) Send an e-mail to the instructor explaining how the group members have not contributed
adequately. ALL MEMBERS OF THE GROUP MUST BE SENT THIS E-MAIL. This is to
ensure that everything is done openly.
Group Peer Evaluation:

As an outcome of a group assignment, your work must be associated with a peer evaluation (you
can include it separately or right in the end of this assignment paper). Please clearly clarify how
each member did contribute to the group work and how many % from the total 100% that member
should receive. You can fill in this form for details:

Team member What did he/she do? How many % of contribution


(100% for total)
Example Member A Write the part 1, collect the 25%
data, edit the paper
Example Member B Write the part 2, collect the 20%
data

a.

You might also like