Professional Documents
Culture Documents
Data Project Proposa
Data Project Proposa
Submitted By:
SELBY BRIGHT
9413019
BOAMPONG GABRIEL ADJEI
9401519
Submitted To:
DR. FRIMPONG TWUM
DEPARTMENT OF COMPUTER SCIENCE
KNUST
2. Problem Statement
What are the problems associated with the current process of data analysis
and visualization and utilizing it in making decisions?
Time Consuming: Analyzing scientific data sets can consume weeks, or months
of every year. Each project whether it includes lab experiments, field studies,
or simulation studies can yield hundreds if not thousands of data files. Each of
these files must be opened, studied to ensure that the
test/monitoring/simulation proceeded correctly, and analyzed to find the
result contained in that file. Then the result must be added to another file and
saved for later analysis. Manually doing this takes a lot of time. It’s expensive.
It’s repetitive and boring. Automation solves all of those problems. this process
can be performed in minutes instead of months.
High error potential: Humans make mistakes. That’s simply part of being
human. Analyzing hundreds of test files requires thousands of calculations. It
involves creating hundreds of plots. It requires saving hundreds of data points
in the right location. Each of these actions has the potential for typos, for
incorrectly remembered constants, for files to be saved in the wrong location,
for inconsistent plot axis labels, and so on. This has always been part of the
process, and requires both significant amounts of care and time to avoid.
Again, automation has the potential to avoid this issue completely.
Slowing down the whole decision-making process: Since the data analysis
process is tedious and time-consuming, it can slow down the decision-making
process of a business and this can be critical in the event where a decision
needs to be made instantly. Automation can help solve this problem and free
up time for other important activities.
High expertise required: The process of using machine learning for predictive
analysis is a very technical field which requires years of training and practice
for one to master. Companies therefore spend a lot of money on the services
of experts in this field Automating this process can help even a beginner
perform these tasks with just a little training and reduce the need to spend so
much on an expert.
4. Specific Objectives
To develop a user-friendly graphical interface with buttons and menus for the
software to make user interaction easier
To allow the users to be able to import data from different sources like excel
sheets, the web, databases and transform it into a compatible format for
further processing
To clean and process the data by removing outliers, filling empty fields,
extracting date and time using python libraries and functions with a single
click.
To offer statistical analysis capabilities such as regression analysis, hypothesis
testing, and correlation analysis to help users identify trends and patterns in
the data with a single click.
To visualize the data in the form of graphs, chart and maps using python
libraries like matplotlib and seaborn with a single click.
Train and test machine learning models – regression, clustering using large
amounts of training and test data.
To make predictions on any imported data using these trained machine
learning models with a single click.
To allow users fine-tune their machine learning models using hyperparameter
tuning and cross-validation methods, in order to improve upon their predictive
accuracy.
To evaluate these models using evaluation techniques like confusion matrix,
precision score, recall and f1 score.
8. Project Scope
The scope of the project includes developing a desktop application for
cleaning, analyzing, and visualizing data with machine learning for predictive
analysis using Python. The application will be developed using a user-centered
design approach, which will ensure that the user interface is intuitive and easy
to use. The application will be able to load data from a variety of sources,
including CSV, Excel, and SQL databases. Additionally, the application will
provide features for data cleaning, analysis, and visualization, as well as
machine learning for predictive analysis.
9. Project Methodology
Project Approach:
The project will be developed using an Agile software development
methodology. The approach will involve breaking down the project into
smaller, manageable tasks and iterating on each task until it is completed. This
approach will allow the team to prioritize tasks and make adjustments to the
project plan as needed. It will also ensure that the development process
remains flexible and responsive to feedback from stakeholders.
NumPy: NumPy is a library for Python that provides support for large,
multi-dimensional arrays and matrices. It will be used for data
manipulation and numerical operations.
Pandas: Pandas is a library for Python that provides support for data
manipulation and analysis. It will be used for data cleaning and
preparation.
PyQt: PyQt is a framework for Python that provides support for creating
desktop applications with a modern and intuitive user interface. It will
be used to create the graphical user interface for the application.