Advanced Database - Akbar Rosyidi - 20220130008

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

ADVANCED DATABASE

FINAL ASSESSMENT (70%)

Instructions to students:
• Complete this cover sheet and attach it to your assignment – this should be your first page.

Student declaration:

I declare that:
§ I understand what is meant by plagiarism
§ The implication of plagiarism have been explained to us by our lecturer This project is all our
work and I have acknowledged any use of the published or unpublished works of other people.
Student Name Student ID

1. Akbar Rosyidi 20220130008


Read the description and requirements below carefully. If you have any questions, please
see your lecturer(s) for further clarification.

Final submission and milestones

You are required to submit a written report (softcopy) via EDLINK by 5th February
2023 at 5.00 pm.

Motivation of Learning Assignment

• In this final assessment, the student is free to choose the dataset, and
generate the model based on the proposed dataset. Students can discuss
with the lecturer for the proposed dataset, to ensure the selected dataset is
sufficient for the final assessment. Students cannot choose the dataset that is
already chosen by another student. The list of dataset can be seen here:
https://docs.google.com/document/d/1QmkfpaiCoGkHK5A5yKVDCemQtBep
wR0tW2iF0qQIwFo/ edit?usp=sharing

• Students will come up with a research question and explore it using the
techniques of machine learning that we have described in class and explored
in the practical activity.

• Students are suggested to work on the tasks in three parts (Preprocessing,


Machine Learning, and Data Analytics), replicating what has been done in
class on the proposed dataset that you will be using for the final assessment.
Your Big Idea and Deliverables
For this final assessment, students have a lot of freedom to generate innovative ideas.

The student will also need to present their idea in the following 2 submission forms:

• Executive Summary (written submission): To explain the problem, process the data,
display and analyse the data. In any case, include a brief description and describe the
analysis of your data.

• Machine learning model.

2
Contents of the report

Analysing and Finding the Best Nvidia Graphics Processing Unit using NumPy and Pandas

I. Abstract

In modern computing, the Graphics Processing Unit (GPU) plays a crucial role
in enabling high-quality graphics and video, as well as accelerating scientific
simulations, machine learning, and other demanding tasks. It can significantly
improve the performance of a system by offloading computational tasks from the
CPU, freeing up resources for other tasks.

A Graphics Processing Unit (GPU) is a specialized electronic circuit designed to


handle the complex calculations required to render images and video. It is designed
to perform parallel computations, allowing it to perform many tasks much faster
than a general-purpose Central Processing Unit (CPU). The GPU can be thought of
as a highly parallel, many-core processor optimized for graphics and video
processing. It plays a critical role in modern computing, enabling the creation of
complex, visually-rich applications and games, as well as accelerating scientific
simulations and other demanding tasks.

II. Introduction

A Graphics Processing Unit (GPU) is a dedicated processor that performs the


complex mathematical calculations required to generate images and video. It is
composed of hundreds or thousands of small processing cores, each of which can
perform simple mathematical operations in parallel. This design allows the GPU to
tackle large amounts of data much more efficiently than a traditional CPU, which is
optimized for sequential processing.

The Graphics Processing Unit (GPU) operates on a separate memory space,


known as VRAM, which is optimized for high-speed access and is dedicated to
handling graphics data. This separate memory space helps to reduce the load on
the system's main memory and enables the Graphics Processing Unit (GPU) to work
more efficiently.

In modern computing, the Graphics Processing Unit (GPU) plays a crucial role
in enabling high-quality graphics and video, as well as accelerating scientific

3
simulations, machine learning, and other demanding tasks. It can significantly
improve the performance of a system by offloading computational tasks from the
CPU, freeing up resources for other tasks.

Additionally, the Graphics Processing Unit (GPU) is also responsible for


rendering images and video to the display device, handling tasks such as shading,
texturing, and compositing. The GPU can perform these tasks much faster than a
CPU, making it an essential component for applications such as video games,
professional video editing, and scientific visualization.

Graphics Processing Unit (GPU) can be used in data mining to accelerate


certain types of computation-intensive tasks. Data mining involves analyzing large
datasets to uncover patterns, correlations, and other insights that can be used to
inform decision-making. Some of the common data mining techniques that can
benefit from GPU acceleration include:

 Deep learning: GPU-accelerated deep learning algorithms can be used to


train neural networks on large datasets, which can be used for tasks such as
image classification, speech recognition, and natural language processing.
 Matrix operations: GPU-accelerated matrix operations can be used to speed
up tasks such as linear regression, singular value decomposition, and
eigenvalue calculations, which are common in data mining.
 Parallel processing: GPUs can be used to perform large-scale parallel
processing tasks, such as distributed matrix multiplication, which is
commonly used in recommendation systems and other data mining
applications.

In summary, Graphics Processing Unit (GPU) can be used to speed up data


mining tasks by leveraging their parallel processing capabilities and specialized
hardware for matrix operations. However, not all data mining tasks can be
accelerated using GPUs, and it is important to evaluate the suitability of a GPU-
based solution on a case-by-case basis..

III. Methodology

1. Data Mining

4
Data mining is a series of processes to extract added value in the form of
information that has so far been unknown manually from a database. Data mining
has existed since the 1990s as a correct and appropriate way to retrieve patterns
and information used to find relationships between data for grouping into one or
more clusters so that object in one cluster will have high similarity between one
another. Data mining is part of the knowledge discovery process from the
knowledge discover database in.

2. Phyton

Python is a high-level, interpreted programming language. It was first released


in 1991 and has since become one of the most popular programming languages in
the world. Python is known for its simplicity, readability, and versatility, making it a
great choice for beginners as well as experienced programmers.

3. NumPy

NumPy (Numerical Python) is a library in Python used for scientific computing


and data analysis. It provides support for large, multi-dimensional arrays and
matrices, along with a large collection of high-level mathematical functions to
operate on these arrays.

NumPy is built on the C programming language and provides an interface for


interacting with low-level memory structures in Python. This allows NumPy to
perform fast computations, even for large arrays, making it a popular choice for
many scientific computing tasks.

Some of the key features of NumPy include:

 N-dimensional arrays (ndarrays): NumPy's primary data structure is the


ndarray, which is used to store large arrays of homogeneous data (data of
the same type, such as integers or floating-point numbers).

 Broadcasting: Broadcasting is a feature of NumPy that allows arrays of


different shapes to be combined and manipulated together, without the
need for explicit loops.

5
 Mathematical and statistical functions: NumPy provides a large collection of
mathematical and statistical functions, including linear algebra, random
number generation, and Fourier transforms.

 Interoperability: NumPy can be used in conjunction with other popular


libraries in the scientific Python ecosystem, such as Matplotlib (for plotting
and visualization) and SciPy (for more advanced scientific computing).

4. Pandas

Pandas is a Python library used for data analysis and data manipulation. It provides data
structures for efficiently storing large datasets and tools for working with them. The two
most important classes in Pandas are the Series and DataFrame classes, which provide 1-
dimensional and 2-dimensional data structures, respectively.

Some of the key features of Pandas include:


 Handling of missing data: Pandas provides methods for handling missing data in a
way that is both flexible and efficient.
 Data manipulation: Pandas provides a wide range of functions and methods for
manipulating data, such as grouping, merging, and reshaping.
 Data alignment: Pandas automatically aligns data in operations, making it easy to
work with data from multiple sources.
 Time series functionality: Pandas provides robust support for working with time
series data, including date and time parsing and conversion, and handling of irregular
time series.
 Input/Output: Pandas provides functions for reading from and writing to a variety of
data formats, including CSV, Excel, and SQL.

IV. Result and Discussion

The sample data that I’ am using for this study are “Full specifications of NVIDIA and
AMD graphics processing units” from
https://www.kaggle.com/datasets/alanjo/graphics-card-full-specs. The data set are
scraped from TechPowerUp (https://www.techpowerup.com/gpu-specs/), sourced
from NVIDIA, AMD and Intel official websites.

first we import Numpy and Pandas

6
After that we get dataset /kaggle/input/graphics-card-full-specs/gpu_specs_v6.csv

And then we import the library like this.

With result
manufacturer productName releaseYear memSize memBusWidth gpuClock memClock unifiedShader tmu rop pixelShader vertexShader igp bus memType gpuChip
0 NVIDIA GeForce RTX 4050 2023.0 8.000 128.0 1925 2250.0 3840.0 120 48 NaN NaN No PCIe 4.0 x16 GDDR6 AD106
1 Intel Arc A350M 2022.0 4.000 64.0 300 1500.0 768.0 48 24 NaN NaN No PCIe 4.0 x8 GDDR6 DG2-128
2 Intel Arc A370M 2022.0 4.000 64.0 300 1500.0 1024.0 64 32 NaN NaN No PCIe 4.0 x8 GDDR6 DG2-128
3 Intel Arc A380 2022.0 4.000 64.0 300 1500.0 1024.0 64 32 NaN NaN No PCIe 4.0 x8 GDDR6 DG2-128
4 Intel Arc A550M 2022.0 8.000 128.0 300 1500.0 2048.0 128 64 NaN NaN No PCIe 4.0 x16 GDDR6 DG2-512
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2884 3dfx Voodoo5 5000 AGP NaN 0.016 128.0 166 166.0 NaN 2 2 2.0 0.0 No AGP 4x SDR VSA-100
2885 3dfx Voodoo5 5000 PCI NaN 0.016 128.0 166 166.0 NaN 2 2 2.0 0.0 No PCI SDR VSA-100
2886 3dfx Voodoo5 6000 NaN 0.032 128.0 166 166.0 NaN 2 2 2.0 0.0 No AGP 4x SDR VSA-100
2887 Intel Xe DG1 NaN 4.000 128.0 900 2133.0 640.0 40 20 NaN NaN No PCIe 4.0 x8 LPDDR4X DG1
2888 Intel Xe DG1-SDV NaN 8.000 128.0 900 2133.0 768.0 48 24 NaN NaN No PCIe 4.0 x8 LPDDR4X DG1

Figure 1. Dataset Gpu 2889 rows × 16 columns

Data set that have been classified in the process now using ‘data-Frame’ function
we classified by ‘manufacture’ using function:

data_frame["manufacturer"].unique()

with result like this:

array(['NVIDIA', 'Intel', 'AMD', 'ATI', 'Sony', 'Matrox', 'XGI', '3dfx'],


dtype=object)

and the describe as table like this

7
Figure 2. data_frame.describe

Using .hist () we visualizing the distribution data by memClock and gpuCLock like this.

8
From that we classified again by manufacture ‘Nvidia’ like this.

Figure 3. Classified GPU by Nvidia Manufacture

After that we can search the Best GPU by using function.

Figure 4. Best Gpu

9
as for Conclusion from data mining up there for the Best Gpu by manufacture are
‘Nvidia’

10
Marking Scheme: Marks will be allocated considering the following aspects and marking
rubric.

Group:

Criteria Outstanding (2125) Mastering (1620) Developing (1015) Beginning (0-9)

Propose Develop a Develop a Develop a Develop a plan


solutions comprehensive feasible and feasible plan to to solve
to existing and consistent consistent plan solve problems problems and
and plan to solve to solve and recognizes recognizes a
emerging problems and problems and some few
problems recognize the recognizes the consequences of consequences
consequences of consequences of solutions that of solutions
solutions that solutions that articulate a that articulate
articulates a articulate a reason for a reason for
reason for reason for choosing a choosing a
choosing a choosing a solution solution
solution. solution
Implement Implement a Implement a Implement a Implement a
a solution solution and solution and solution and solution and
monitor the monitor the monitor the monitor the
process in a process in a process in a process in a
manner that manner that manner that superficial
addresses, addresses addresses manner that
thoroughly and multiple limited does not
indepth, multiple contextual contextual directly address
contextual factors factors factors contextual
factors
Demonstrate Able to Able to Able to Unable to
problem- understand case understand understand understand
solving ability study and related case study and case study and given case
problem(s) and related related study and

11
able to propose a problem(s) and problem(s) but problem to be
suitable solution able to propose unable to solved
with rigorous a suitable propose a
scientific solution suitable
justification and solution
by considering
data ethics, data
privacy and social
impact
Total Marks
(Max 75
points)
Total marks
will scale
down to 70

12

You might also like