Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

FINAL YEAR PROJECT PROPOSAL

Submitted By:
SELBY BRIGHT
9413019
BOAMPONG GABRIEL ADJEI
9401519

Submitted To:
DR. FRIMPONG TWUM
DEPARTMENT OF COMPUTER SCIENCE
KNUST

PROJECT TOPIC: Developing a Desktop Application for Cleaning, Analyzing and


Visualizing Data including Machine Learning for Predictive Analysis using Python
1. Introduction and Background
Data is everywhere - in spreadsheets, your sales pipeline, social media
platforms, customer satisfaction surveys, customer support tickets, and more.
In our modern information age, it’s created at blinding speeds and, when data
is analysed correctly, it can be a company’s most valuable asset. Businesses
need to know what their customers need, so that they can increase customer
retention and attract new customers. But to know exactly what customers
need and what their pain points are, businesses need to deep-dive into their
customer data. Product teams, for example, often analyse customer feedback
to understand how customers interact with their product, what they’re
frustrated with, and which new features they’d like to see. Then, they
translate this insight into UX improvements, new features, and enhanced
functionalities. In short, through data analysis and machine learning,
businesses can reveal insights that tell you where you need to focus your
efforts to help your company grow. Through this process, you can also detect
the weaknesses and strengths of your competition, uncovering opportunities
for improvement.
The amount of data available today is enormous, and it is growing every day.
The use of data in decision-making has become increasingly important in both
business and research. However, the process of cleaning, analyzing, and
visualizing data can be time-consuming and complex. Therefore, there is a
need for a tool that automates and simplifies the process of data cleaning,
analysis, and visualization and the utilizes machine learning for predictive
analysis can help businesses and researchers make better decisions based on
data.

2. Problem Statement
What are the problems associated with the current process of data analysis
and visualization and utilizing it in making decisions?
Time Consuming: Analyzing scientific data sets can consume weeks, or months
of every year. Each project whether it includes lab experiments, field studies,
or simulation studies can yield hundreds if not thousands of data files. Each of
these files must be opened, studied to ensure that the
test/monitoring/simulation proceeded correctly, and analyzed to find the
result contained in that file. Then the result must be added to another file and
saved for later analysis. Manually doing this takes a lot of time. It’s expensive.
It’s repetitive and boring. Automation solves all of those problems. this process
can be performed in minutes instead of months.

High error potential: Humans make mistakes. That’s simply part of being
human. Analyzing hundreds of test files requires thousands of calculations. It
involves creating hundreds of plots. It requires saving hundreds of data points
in the right location. Each of these actions has the potential for typos, for
incorrectly remembered constants, for files to be saved in the wrong location,
for inconsistent plot axis labels, and so on. This has always been part of the
process, and requires both significant amounts of care and time to avoid.
Again, automation has the potential to avoid this issue completely.

Slowing down the whole decision-making process: Since the data analysis
process is tedious and time-consuming, it can slow down the decision-making
process of a business and this can be critical in the event where a decision
needs to be made instantly. Automation can help solve this problem and free
up time for other important activities.

High expertise required: The process of using machine learning for predictive
analysis is a very technical field which requires years of training and practice
for one to master. Companies therefore spend a lot of money on the services
of experts in this field Automating this process can help even a beginner
perform these tasks with just a little training and reduce the need to spend so
much on an expert.

3. Aim of the Project


The aim of this project is to develop a desktop application that simplifies the
process of cleaning, analyzing, and visualizing data while also incorporating
machine learning for predictive analysis using Python, solving all the problems
stated above.

4. Specific Objectives
 To develop a user-friendly graphical interface with buttons and menus for the
software to make user interaction easier
 To allow the users to be able to import data from different sources like excel
sheets, the web, databases and transform it into a compatible format for
further processing
 To clean and process the data by removing outliers, filling empty fields,
extracting date and time using python libraries and functions with a single
click.
 To offer statistical analysis capabilities such as regression analysis, hypothesis
testing, and correlation analysis to help users identify trends and patterns in
the data with a single click.
 To visualize the data in the form of graphs, chart and maps using python
libraries like matplotlib and seaborn with a single click.
 Train and test machine learning models – regression, clustering using large
amounts of training and test data.
 To make predictions on any imported data using these trained machine
learning models with a single click.
 To allow users fine-tune their machine learning models using hyperparameter
tuning and cross-validation methods, in order to improve upon their predictive
accuracy.
 To evaluate these models using evaluation techniques like confusion matrix,
precision score, recall and f1 score.

6. Justification for Project


The project is justified by the need for a tool that simplifies the process of
cleaning, analyzing, and visualizing data while also incorporating machine
learning for predictive analysis. The desktop application will be useful for
businesses and researchers who need to make data-driven decisions but do
not have a background in data science. Additionally, the application will be
useful for those who have a background in data science but need a tool that
simplifies the process of data cleaning, analysis, and visualization.

7. Motivation for Project


The motivation for this project is to provide a tool that simplifies the process
of data cleaning, analysis, and visualization while also incorporating machine
learning for predictive analysis. The application will make it easier for
businesses and researchers to make data-driven decisions, which can lead to
better outcomes and improved performance.

8. Project Scope
The scope of the project includes developing a desktop application for
cleaning, analyzing, and visualizing data with machine learning for predictive
analysis using Python. The application will be developed using a user-centered
design approach, which will ensure that the user interface is intuitive and easy
to use. The application will be able to load data from a variety of sources,
including CSV, Excel, and SQL databases. Additionally, the application will
provide features for data cleaning, analysis, and visualization, as well as
machine learning for predictive analysis.

9. Project Methodology
Project Approach:
The project will be developed using an Agile software development
methodology. The approach will involve breaking down the project into
smaller, manageable tasks and iterating on each task until it is completed. This
approach will allow the team to prioritize tasks and make adjustments to the
project plan as needed. It will also ensure that the development process
remains flexible and responsive to feedback from stakeholders.

 Define the requirements: Define the requirements for the application by


understanding the target audience, the intended use cases, and the business
objectives. Identify the specific features and functionalities required for the
application.
 Choose the technology stack: Choose the appropriate technology stack for
developing the application based on the requirements. Some popular
technology stacks for desktop applications include Electron, PyQt, and Tkinter.
 Develop the user interface: Develop the user interface of the application based
on the requirements. The user interface should be intuitive, easy to use, and
visually appealing.
 Implement data cleaning, analysis, and visualization features: Implement the
features for cleaning, analyzing, and visualizing data based on the
requirements. The features should allow users to load, clean, and visualize
data in a variety of formats, such as CSV, Excel, and SQL.
 Implement machine learning features: Implement the machine learning
features based on the requirements. The features should allow users to train
and evaluate machine learning models for predictive modeling. Users should
be able to select and configure various algorithms, hyperparameters, and
validation methods.
 Test and debug the application: Test and debug the application to ensure that
it works correctly and is free of bugs and errors. Conduct user testing to get
feedback and identify areas for improvement
Development Tools:

 Python: Python will be used as the primary programming language for


the development of the application. Python is a versatile and powerful
language that is widely used in data science and machine learning.

 NumPy: NumPy is a library for Python that provides support for large,
multi-dimensional arrays and matrices. It will be used for data
manipulation and numerical operations.

 Pandas: Pandas is a library for Python that provides support for data
manipulation and analysis. It will be used for data cleaning and
preparation.

 Matplotlib: Matplotlib is a library for Python that provides support for


data visualization. It will be used to create visualizations for data
exploration and analysis.

 Scikit-Learn: Scikit-Learn is a library for Python that provides support for


machine learning algorithms. It will be used to develop machine learning
models for predictive analysis.

 PyQt: PyQt is a framework for Python that provides support for creating
desktop applications with a modern and intuitive user interface. It will
be used to create the graphical user interface for the application.

 SQLite: SQLite is a lightweight database management system that will be


used to store and retrieve data for the application. It is a popular choice
for small to medium-sized applications and will be used to store data
used for analysis and modeling.

10.Project Deliverables and Output


 User-friendly graphical interface for loading, cleaning, and visualizing data.
 Statistical analysis and data visualization features.
 Machine learning algorithms for regression, classification, and clustering.
 Hyperparameter tuning and cross-validation methods.
 Feature for model training and evaluation.
 Feature for prediction and forecasting.
 Documentation and user guide for the application.
 Technical report documenting the development process, methodology, and
project outcomes.
 Source code and executable file for the application.
The deliverables and output of the project will provide a desktop application that
simplifies the process of cleaning, analyzing, and visualizing data while also
incorporating machine learning for predictive analysis. The application will be
intuitive, user-friendly, and will provide features that allow businesses and
researchers to make data-driven decisions quickly and efficiently. Additionally, the
project outcomes will be documented in a technical report that can be used for
future reference and to inform future development of the application.

You might also like