BAN401 Final Group Based 2023

You might also like

You are on page 1of 14

BAN401

“Applied Programming and Data Analysis for Business”

FINAL GROUP-BASED PROJECT

SUBMISSION DEADLINE:

DATE: 10 November 2023


TIME: 14.00

It is mandatory to submit the final group-based project in a group of three (or four in
exceptional cases) people.
This means that a group size of three individuals is the minimum, while a group size of four
individuals is the maximum. Submission of the final group-based project in a group with a
number of members other than three or four is not allowed.

For the final group-based project, it is compulsory to use the latest versions of Python, R, and
DB Browser for SQLite.
2

I. REPORT PREPARATION:

You should submit TWO separate files:

File 1: One Report file (only PDF format) that includes the following:

1.1 Problems 1-2. For each problem:

a. Python code
- Copy your code and past it into your report
- Maintain the original code formatting when pasting it into your report for improved
readability.
b. Explanation of your solution
- Describe the concept behind your designed solution.
- Explain how your code addresses the problem.
c. Screenshots of the results from the PyCharm (or any other Python editor)
d. Required:
- Include comments in your code to explain crucial aspects. For example:
s = [5, 35, 26, 44, 4, 1] # Initialize list 's'
- Ensure that your code is implemented in Python 3 (not Python 2).
- Adhere to the problem descriptions and requirements.

1.2 Problem 3:

• Conducted Analysis: Please follow the problem description closely.

1.3 Problems 4-5. For each problem:

a. R code
- Copy your code and past it into your report
- Maintain the original code formatting when pasting it into your report for improved
readability.
b. Explanation of your solution
- Problem 4:
- Please follow the problem description closely
- Problem 5:
- Describe the concept behind your designed solution.
- Explain how your code addresses the problem.
c. Screenshots of the results from the RStudio
d. Required:
- Include comments in your code to explain crucial aspects. For example:
y <- c(1:10) # create a data vector with elements 1-10
- Adhere to the problem descriptions and requirements.
3

1.4. Problem 6:
• Follow the Problem 6 guidelines for reporting your results.
• Ensure that you provide the complete ER-model (relational schema) of your
database in your report.

GENERAL REQUIREMENTS:

- The report must be typed; handwritten submissions are not accepted.


- The report must be written in English.
- Text within the report should be selectable and copyable. This requirement does not
apply to figures (if any) and screenshots.

File 2: One compressed file (ZIP or RAR format) containing the following:

2.1 Problems 1-2 (Python):


a. 2 code files (.py format) - for problems #1 and #2
• The names of the .py files should include the problem numbers (in the following format):
problem_1.py; problem_2.py

2.2 Problems 4-5 (R):


a. 2 code files (.r format) - for problems #4 and #5
• The names of the .r files should include problem numbers (in the following format):
problem_4.r; problem_5.r

2.3 Problem 6 (SQL):


a. Database file (.db format)
b. The .txt file with the SQL queries’ code
c. The overall ER-model (relational schema) of your database (in the PDF format)

II. SUBMISSION PROCEDURE:

You must submit your final group-based project via WISEFlow:

File 1: One Report file (only PDF format)


File 2: One compressed file (ZIP or RAR format)

Please ensure that these two files are prepared in accordance with the guidelines provided in "I. REPORT
PREPARATION".
4

NOTES

1. Code snippets (in MS Word): format-preserving

To display code snippets in MS Word while preserving their format and syntax highlighting,
you can utilize the following online tools:

For Python 3 and SQL: http://hilite.me/


For R: https://emn178.github.io/online-tools/syntax_highlight.html

In the final step, please adjust the "Font Size" (in MS Word) of the pasted code snippets to
ensure the best readability.

2. Ethics & Data Protection:

- Discussing your solutions with anyone outside your group is


strictly prohibited.

- Distribution or posting of any parts of the given group-based


project on the Internet or any other platform, in any format, during
and after the exam period, is not allowed.

- It is not allowed to use generative artificial intelligence (AI)


or AI-assisted technologies in your work on the problems.

- YOUR SUBMISSION WILL BE CHECKED FOR PLAGIARISM


5

PROBLEM 1

Problem Statement:

Speech recognition has become widely prevalent in various applications. Many tools incorporate
trigger word detection systems, with examples including Amazon Echo, Baidu DuerOS, and Apple Siri.
These systems are activated by a specific trigger word, such as "Hey Siri." For instance, you can
simply say, "Hey Siri count down 15 minutes," and initiate a timer on your phone, watch, or home
assistant using voice commands.

In this problem, you will tackle a simplified version of a trigger word detection task. The audio has
already been transcribed and pre-processed to eliminate noise and correct typos. Please see the
resulting text in the sub-section “Input text is below”.

Your Task:

Your task is to create a Python program that can detect a trigger word in a given conversation and
identify its position. Subsequently, your code should extract sentences following the trigger word.

The trigger word in this “Problem 1” is 'hei' (case-insensitive 1). The expected output should
resemble the following; note that the positions may slightly vary depending on your string
processing:

Trigger word 'hei' found at position 40:


can you turn on the lights?

Trigger word 'hei' found at position 79:


can you close the curtains?

Trigger word 'hei' found at position 137:


can you play some music!

Trigger word 'hei' found at position 198:


assistant turn off the tv!

Clarifications:

- When we refer to the position of the trigger word 'hei', we mean the location of the trigger
word 'hei' within the conversation, starting from 0 for the first word in the conversation.

- A 'word' is defined as any sequence of characters and symbols separated by empty spaces.
For example:

• "Hey," counts as one word.


• "how's" counts as one word.
• "doing?" counts as one word.

1
"case-insensitive" means that you should consider the trigger word to match regardless of whether it's written in
uppercase (capital letters) or lowercase (small letters).
6

Requirements:

- You are not allowed to use/import any Python modules/packages.


- You are not allowed to use Python regular expressions 2. Basic string manipulation techniques
should suffice.
- Provide the screenshot of your results from your Python editor.

Input text is below:


Hey, how's everyone doing? Pretty good! Just got back from a weekend hiking
trip with breathtaking views. I wish I could've joined. Any plans for
summer vacation? Well, I was thinking of a beach getaway with sun, sea, and
relaxation. Hei can you turn on the lights? Sounds heavenly! Count me in. I
might join too, but first, I need to finish this coding project I've been
working on. Speaking of work, have you seen the latest company newsletter?
Hei can you close the curtains? Yeah, I skimmed through it. They're
introducing a new benefits package, which is cool. We could use some extra
perks. Agreed, but I hope they improve the coffee machine too! Haha,
priorities! Speaking of coffee, do you guys prefer iced or hot coffee? Iced
coffee, especially in the summer, is my go-to. Hei can you play some music!
I'm all about hot coffee, no matter the season. I like to mix it up - iced
in the morning, hot in the afternoon. Variety keeps it interesting! Well,
whatever keeps us caffeinated, right? Absolutely! Speaking of caffeine,
anyone up for a coffee run now? Sounds like a plan! Count me in. Let's do
it! hei assistant turn off the TV!

2
Regular expressions, often referred to as 'regex' or 'regexp,' are a tool for working with text patterns and
searching, matching, or manipulating strings based on specific patterns. Python regular expressions were not
covered in the BAN401 course. Once again, you are not allowed to use Regular Expressions for solving this
problem. Basic string manipulation techniques should suffice.
7

PROBLEM 2

Problem Statement:

It’s no secret that Bergen and Vestland County Municipality is one of the most beautiful and
attractive places for hiking in the world. Let’s imagine that you, as an NHH student, decided to go
with your BAN401 classmates for a hiking trip on the weekend, right after completing your BAN401
group-based project.

As you prepare for your journey into the mountains to enjoy the beauty of Norwegian nature, you
need to pack well for the trip. You have a good backpack for carrying things, but you also know that
you can carry a maximum of only 3 kg in it.

Thus, you have decided to create a list of what you want to bring for the trip. However, the total
weight of all the items on your list exceeds your backpack's capacity. To solve this problem, you've
added columns to your initial list detailing their weights and a numerical value representing how
important the item is for the trip.

Here is the list of items, where each row contains the following information:

• The first value is the name of the item.


• The second value is the weight (in dag, i.e., decagrams) for each item.
Note: 1 dag is equal to 10 grams.
• The third value is a numerical value representing how important the item is for the trip:

item weight (dag) value


hiking boots 36 90
BAN401 printed slides 60 110
water bottle 20 50
energy bars 30 40
first aid kit 40 60
flashlight 10 30
pocketknife 15 45
sunscreen 25 70
sunglasses 7 20
rain jacket 45 80
hat 12 25
camera 50 100

Conditions:

- You can choose to take any combination of items from the list, but only one of each item is
available.
- Also, you cannot cut or diminish the items, so you can only take whole units of any item.

Your Task:

Write a Python code that shows which items you can carry in your backpack so that their total weight
does not exceed 300 dag (i.e., 3 kg), and their total value is maximized.
8

Your code should print results in the following format:

The optimal combination of items:


. . .
The total value is:
. . .
and the total weight is:
. . .

Note: the “. . .” fields should contain actual results

Example Output (Please note, the provided output is for illustrative purposes and is not accurate):

The optimal combination of items:


BAN401 printed slides
camera
rain jacket
The total value is:
290
and the total weight is:
155 dag

Remember:

You should take into account two things at the same time. First, the total weight of all items you take
should not exceed 3 kg. At the same time, you should maximize the total value (based on the
"Value" column) of all items you take.

Tip: You may find the combinations() function from the Python itertools package useful,
though it's not required to use for solving this problem:
https://docs.python.org/3/library/itertools.html
9

PROBLEM 3

Perform an in-depth analysis of how Python for data analysis can be effectively utilized in one of the
following industries: healthcare, finance, agriculture, or education. Provide real-world examples,
and/or relevant case studies, and/or hypothetical situations to illustrate Python's applications in that
chosen industry. Include a minimum of 10 references to research literature related to the selected
industry.

Text Requirements:

- Format: Essay
- The analysis must be original, and plagiarism will be checked.
- Word Limit:
o Minimum: 3000 words (excluding code, figures, tables, and references).
o Maximum: 4000 words (excluding code, figures, tables, and references). Text
exceeding this limit will be disregarded.

Instructions:

- Choose an Industry: Select one of the following industries: healthcare, finance, agriculture, or
education.

- Perform the Analysis: In your essay, discuss comprehensively how Python for data analysis
can be effectively employed within the chosen industry. This should encompass various
aspects such as data collection, data preprocessing, data analysis, visualization, and/or any
other relevant applications.

- Provide Evidence: Support your analysis with real-world examples, and/or relevant case
studies, and/or, where applicable, create hypothetical but realistic situations to illustrate
Python's practical applications within the chosen industry.

- Research References: Include a minimum of 10 references to research literature related to


the selected industry. These references should support and enhance your analysis.

- Originality: Ensure that your analysis is original work. Any instances of plagiarism will be
strictly assessed.

- Use of AI Technologies: It is not allowed to use generative artificial intelligence (AI) or AI-
assisted technologies in the completion of this problem.

- Word Limit: Keep your essay within the specified word limits. The minimum word count
ensures that the analysis is sufficiently detailed, while the maximum limit is in place to
maintain clarity and relevance.
10

PROBLEM 4

Problem Statement: Design a Recursive Problem and Implement R Code

Your task is to create a programming problem that demonstrates the advantages of using a recursive
function over a traditional iterative approach based on FOR and/or WHILE control flow statements.
This problem should be set within a business-related scenario of your choice.

(a) Problem Creation:

Your objective is to craft a problem that effectively illustrates the benefits of recursion in solving a
specific business-related scenario. Follow these steps:

1. Select Your Business Scenario:

Choose a business scenario from any industry or profession that interests you. This scenario can
encompass areas like finance, healthcare, retail, manufacturing, education, or any other field. The
business scenario you select should represent a real-life situation within the business world.

For example:

You could choose a scenario where a retail store owner needs to optimize inventory during the
holiday season to meet customer demand while minimizing costs.

2. Define the Problem:

Clearly outline the problem statement, including the inputs and expected outputs. Ensure that the
problem can be solved using both recursive and iterative approaches.

For example:

In the scenario where a retail store owner is trying to optimize inventory during the holiday season
(mentioned above), the problem can be defined as developing a solution that assists a retail store
owner in optimizing their inventory during the holiday season. The objective is to determine the ideal
quantity of each product to stock in the store, taking into account historical sales data, predicted
customer demand, current inventory levels, and any constraints, such as budget or storage space
limitations. The primary goal is to effectively meet customer demand while minimizing costs
associated with overstocking or understocking.

3. Explain the Recursive Aspect:

- Describe why a recursive approach is advantageous for your developed problem within your
chosen business scenario.

- Identify the specific aspect of the problem where recursion offers clear benefits over an
iterative approach based on FOR and/or WHILE control flow statements.
11

(b) R Code Implementation:

Develop R code to solve your created problem using both recursive and iterative approaches:

1. Recursive Solution:

Remember that this entails employing a function that calls itself to deconstruct complex tasks into
smaller, manageable subtasks.

2. Iterative Solution:

Remember that, in contrast to recursion, an iterative solution employs loops such as FOR or WHILE
loops to iteratively execute a set of instructions until specific conditions are met.

* * *

Please remember to provide screenshots from RStudio for outputs for both solutions (i.e., recursive
solution and iterative solution).
12

PROBLEM 5

Imagine your university has a long corridor with 930 doors, all initially closed and numbered from 1
to 930.
You have 930 individuals, each tasked with changing the state of the doors – either closing an open
door or opening a closed one, i.e., to close the door (if it is open) OR to open the door (if it is closed).

The first person sent to the corridor has to change the state of all doors
(1, 2, 3, 4, …)
The second person sent to the corridor has to change the state of every other door
(2, 4, 6, 8, …)
The third person sent to the corridor has to change the state of every third door
(3, 6, 9, 12, …)
The fourth person sent to the corridor has to change the state of every fourth door
(4, 8, 12, 16, …)
This pattern continues, with each subsequent person changing the state of doors based on their
number: the fifth person – of every fifth door, the sixth person – of every sixth door, and so on. Thus,
the given procedure continues until all persons (i.e., all 930 persons) have gone through the corridor.

Your task is to write an R-code that outputs the numbers of the first thirty closed doors after all 930
individuals have completed their task.

Note: your R-code should not output the total number of the closed doors after all 930 persons went
through the given university corridor. It should output the particular numbers of the first thirty doors
that are closed after all 930 persons went through the given university corridor. This means that your
code should output thirty numbers.

Requirements:

- Solving this problem, you are not allowed to have any packages loaded into your R script;
do not use library()
- Provide a screenshot of your results from the RStudio console.
13

PROBLEM 6

Design and create your own database (using SQLite3).

The database should be logically consistent and reflect realistic entities, attributes and relations. It
should pertain to one of the following industries:
(1) healthcare, (2) finance, (3) agriculture, or (4) education.

PLAGIARISM CONTROL:

1. It is prohibited to use any RDBMS database templates from the Internet or other sources
2. Remember: It is prohibited to use generative artificial intelligence (AI) and AI-assisted technologies
for solving this problem.
3. Ensure that you create your own original database.

1. DATABASE IDEA:

Describe your database idea. Explain how your database reflects the chosen industry: What does your
database reflect in terms of the selected industry?

2. ER-MODEL:

Create the overall ER-model (relational schema) for your database.

When constructing the ER-model, ensure that it includes the following components:
• Entities
• Attributes
• Attributes’ data types
• Relationships
• Primary keys, foreign keys

Entities requirements:
Your database should comprise 11-15 entities (i.e., tables).

Relationships requirements:
Your database should include:

- at least two one-to-one (1:1) relationships


- at least three one-to-many (1:N) / many-to-one (N:1) relationships that are not a part of any
N:N relationships
- at least two many-to-many (N:N) relationships
14

Please refer to slide 31 (SQL - Lecture 1 “Databases” in Canvas) to recall what the overall database ER-model is.
Your ER-model should contain all necessary details concerning entities, attributes, data types, and relationships.

NOTE:
You can submit a scanned, hand-drawn version of the overall database ER-model (relational schema).
In this case, ensure that your hand-drawn ER-model is readable (i.e., high-resolution).

3. DATABASE CREATION:

- The database schema and data records should be created using SQL queries only.

- Each table should be populated by at least five records using INSERT INTO statement.

- You must use only SQLite DB Browser (used in this course) to create your database.

You must use the “Execute SQL”-tab in the SQLite DB Browser for all operations when creating
the database and populating it with data records. No other tools are permitted.

3.1 SQL code:

- Document all SQL queries in the PDF-report.


All SQL queries should be reported in the right order (sequentially).

- Attach the .txt file with the overall SQL code to the submission.
If you encounter difficulties creating and saving a .txt file, you can use online tools, such as
https://www.editpad.org/

- Attach your .db database file to the submission.

3.2 For each relationship (between tables)

Briefly describe the purpose of the relationship within the database:

- Explain why the relationship is created and which entities it connects


- Specify the type of relationship: 1:1, 1:N (or N:1), or N:N

3.3 For each table:

- Provide a brief description of its role in the database and what it represents.
- Explain briefly why specific columns were designated as primary and/or foreign keys.

You might also like