Professional Documents
Culture Documents
Advanced Database - Akbar Rosyidi - 20220130008
Advanced Database - Akbar Rosyidi - 20220130008
Advanced Database - Akbar Rosyidi - 20220130008
Instructions to students:
• Complete this cover sheet and attach it to your assignment – this should be your first page.
Student declaration:
I declare that:
§ I understand what is meant by plagiarism
§ The implication of plagiarism have been explained to us by our lecturer This project is all our
work and I have acknowledged any use of the published or unpublished works of other people.
Student Name Student ID
You are required to submit a written report (softcopy) via EDLINK by 5th February
2023 at 5.00 pm.
• In this final assessment, the student is free to choose the dataset, and
generate the model based on the proposed dataset. Students can discuss
with the lecturer for the proposed dataset, to ensure the selected dataset is
sufficient for the final assessment. Students cannot choose the dataset that is
already chosen by another student. The list of dataset can be seen here:
https://docs.google.com/document/d/1QmkfpaiCoGkHK5A5yKVDCemQtBep
wR0tW2iF0qQIwFo/ edit?usp=sharing
• Students will come up with a research question and explore it using the
techniques of machine learning that we have described in class and explored
in the practical activity.
The student will also need to present their idea in the following 2 submission forms:
• Executive Summary (written submission): To explain the problem, process the data,
display and analyse the data. In any case, include a brief description and describe the
analysis of your data.
2
Contents of the report
Analysing and Finding the Best Nvidia Graphics Processing Unit using NumPy and Pandas
I. Abstract
In modern computing, the Graphics Processing Unit (GPU) plays a crucial role
in enabling high-quality graphics and video, as well as accelerating scientific
simulations, machine learning, and other demanding tasks. It can significantly
improve the performance of a system by offloading computational tasks from the
CPU, freeing up resources for other tasks.
II. Introduction
In modern computing, the Graphics Processing Unit (GPU) plays a crucial role
in enabling high-quality graphics and video, as well as accelerating scientific
3
simulations, machine learning, and other demanding tasks. It can significantly
improve the performance of a system by offloading computational tasks from the
CPU, freeing up resources for other tasks.
III. Methodology
1. Data Mining
4
Data mining is a series of processes to extract added value in the form of
information that has so far been unknown manually from a database. Data mining
has existed since the 1990s as a correct and appropriate way to retrieve patterns
and information used to find relationships between data for grouping into one or
more clusters so that object in one cluster will have high similarity between one
another. Data mining is part of the knowledge discovery process from the
knowledge discover database in.
2. Phyton
3. NumPy
5
Mathematical and statistical functions: NumPy provides a large collection of
mathematical and statistical functions, including linear algebra, random
number generation, and Fourier transforms.
4. Pandas
Pandas is a Python library used for data analysis and data manipulation. It provides data
structures for efficiently storing large datasets and tools for working with them. The two
most important classes in Pandas are the Series and DataFrame classes, which provide 1-
dimensional and 2-dimensional data structures, respectively.
The sample data that I’ am using for this study are “Full specifications of NVIDIA and
AMD graphics processing units” from
https://www.kaggle.com/datasets/alanjo/graphics-card-full-specs. The data set are
scraped from TechPowerUp (https://www.techpowerup.com/gpu-specs/), sourced
from NVIDIA, AMD and Intel official websites.
6
After that we get dataset /kaggle/input/graphics-card-full-specs/gpu_specs_v6.csv
With result
manufacturer productName releaseYear memSize memBusWidth gpuClock memClock unifiedShader tmu rop pixelShader vertexShader igp bus memType gpuChip
0 NVIDIA GeForce RTX 4050 2023.0 8.000 128.0 1925 2250.0 3840.0 120 48 NaN NaN No PCIe 4.0 x16 GDDR6 AD106
1 Intel Arc A350M 2022.0 4.000 64.0 300 1500.0 768.0 48 24 NaN NaN No PCIe 4.0 x8 GDDR6 DG2-128
2 Intel Arc A370M 2022.0 4.000 64.0 300 1500.0 1024.0 64 32 NaN NaN No PCIe 4.0 x8 GDDR6 DG2-128
3 Intel Arc A380 2022.0 4.000 64.0 300 1500.0 1024.0 64 32 NaN NaN No PCIe 4.0 x8 GDDR6 DG2-128
4 Intel Arc A550M 2022.0 8.000 128.0 300 1500.0 2048.0 128 64 NaN NaN No PCIe 4.0 x16 GDDR6 DG2-512
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2884 3dfx Voodoo5 5000 AGP NaN 0.016 128.0 166 166.0 NaN 2 2 2.0 0.0 No AGP 4x SDR VSA-100
2885 3dfx Voodoo5 5000 PCI NaN 0.016 128.0 166 166.0 NaN 2 2 2.0 0.0 No PCI SDR VSA-100
2886 3dfx Voodoo5 6000 NaN 0.032 128.0 166 166.0 NaN 2 2 2.0 0.0 No AGP 4x SDR VSA-100
2887 Intel Xe DG1 NaN 4.000 128.0 900 2133.0 640.0 40 20 NaN NaN No PCIe 4.0 x8 LPDDR4X DG1
2888 Intel Xe DG1-SDV NaN 8.000 128.0 900 2133.0 768.0 48 24 NaN NaN No PCIe 4.0 x8 LPDDR4X DG1
Data set that have been classified in the process now using ‘data-Frame’ function
we classified by ‘manufacture’ using function:
data_frame["manufacturer"].unique()
7
Figure 2. data_frame.describe
Using .hist () we visualizing the distribution data by memClock and gpuCLock like this.
8
From that we classified again by manufacture ‘Nvidia’ like this.
9
as for Conclusion from data mining up there for the Best Gpu by manufacture are
‘Nvidia’
10
Marking Scheme: Marks will be allocated considering the following aspects and marking
rubric.
Group:
11
able to propose a problem(s) and problem(s) but problem to be
suitable solution able to propose unable to solved
with rigorous a suitable propose a
scientific solution suitable
justification and solution
by considering
data ethics, data
privacy and social
impact
Total Marks
(Max 75
points)
Total marks
will scale
down to 70
12