Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

COMP2304 Computer Architecture and Organization

Comparison between the INTEL Core I7-4790K Processor


and the AMD Vishera FX-9590 Processor

By:

Dwi Putri Wahyuningsih

2019390004

Submission Date: March 27, 2020

Faculty of Engineering & Technology

Sampoerna University
I. Introduction
The processor is a chip in the form of an Integrated Circuit (IC) that controls the
entire computer system and is used as the center or brain of computer activities in
performing calculations and carrying out input and output tasks. The processor is located
on the socket on the motherboard. The processor can be replaced with another type as long
as the socket on the motherboard matches and the system on the motherboard supports the
processor architecture. Processor speed affects computer speed significantly because this
one object is the center of data processing. Today's processors are often called
microprocessors because of their very small physical size but their large processing speed.
The processor is divided into three important parts, namely:
• Arithmetic Logical Unit (ALU)
The Arithmetic Logical Unit (ALU) is the center for calculating arithmetic and
logic operations to carry out all the commands that must be carried out by a computer.
• Control Unit (CU)
Control Unit (CU) is the part that regulates all data traffic and calculations
performed by the processor. With the control of this unit, all calculations and executions
that must be carried out can be carried out sequentially without any overlap between one
command and another.
• Unit Memory (MU)
Memory Unit (MU) memory unit is a supporting unit, where all the commands that
are often used by the processor will be stored temporarily in this section. With the memory
unit, the processor no longer calls the same command to another part. Thus, the time used
to execute commands can be shortened. in modern processors, the memory unit already
exists in the processor section (CORE) and is known as the memory cache. This is what
affects the speed of an Execution's performance.
Each part of this processor has its respective duties according to its name. The
processor currently has many brands, but there are only two well-known ones in Indonesia,
namely INTEL and AMD.
1) INTEL Corporation
INTEL Corporation is a multinational company based in the US and is well known
for the design and production of microprocessors and specializes in integrated circuits.
INTEL also makes network cards, motherboard chipsets, components, and other devices.
INTEL has advanced research projects in all aspects of semiconductor production,
including MEMS. INTEL changed its logo and slogan on January 1, 2008. The old slogan
"INTEL inside" was replaced with "INTEL Leap Ahead". Some INTEL processor products
available to date include:
1. INTEL® Pentium® 4
2. INTEL® Pentium® Dual-Core
3. INTEL® Core ™ 2 Duo
4.INTEL® Core ™ 2 Quad
5. INTEL® Core ™ 2 Extreme
6. INTEL® Core ™ i3, i5, i7

2) AMD (Advanced Micro Devices)


AMD (Advanced Micro Devices) is a company manufacturing integrated circuits,
processors or ICs (integrated circuits) based in Sunnyvale, California, USA. The first plant
is in Austin, Texas, America and the second plant is in Dresden, Germany which is set to
produce Athlon only. If all goes well, the dream of the price of a PC system will be cheaper
can be realized because it is no longer monopolized by INTEL. Also, in 2006, AMD has
succeeded in acquiring a well-known graphics company from America namely ATI
Technology. The company is the second-largest provider of x86-compatible processors.
AMD is also known by the world, some of its products are:
1. AMD Sempron ™
2. AMD Athlon ™ FX
3. AMD Athlon ™ 64
4. AMD Athlon ™ X2
5. AMD PHENOM ™ X3
6. AMD PHENOM ™ X4
7. AMD Bulldozer

II. Comparison

A. INTEL Core I7-4790K

o The components
Intel Core i7-4790K is a desktop processor with 4 cores, launched in May 2014.
This is part of the Core i7 lineup, using Haswell architecture with Socket 1150. Thanks to
Intel Hyper-Threading, core computations effectively doubled, up to 8 strands. Core i7-
4790K has 8MB of L3 cache and operates at 4 GHz by default, but can increase up to 4.4
GHz, depending on workload. Intel built the Core i7-4790K in the 22 nm production
process using 1,400 million transistors. The multiplier is locked on the Core i7-4790K,
which limits its overclocking capability.
With a TDP of 88 W, the Core i7-4790K consumes a lot of power, so proper cooling
is needed. Intel processors support DDR3 memory with a dual-channel interface. For
communication with other components on the computer, Core i7-4790K uses a PCI-
Express Gen 3 connection. This processor features an integrated Intel HD 4600 graphics
solution.
Hardware virtualization is available on Core i7-4790K, which greatly improves
virtual machine performance. In addition, IOMMU virtualization (PCI passthrough) is
supported, so that virtual guest machines can directly use host hardware. Programs that use
Advanced Vector Extensions (AVX) will run on this processor, increasing performance for
many calculation applications. Besides AVX, Intel also includes the newer AVX2
standard, but not the AVX-512.

o Registers: types, number, size


The Intel Core I7-4790K processors have 16 registers in 64-bit mode.
In 64-bit mode, there are 16 general-purpose registers and the default
operand size is 32 bits. However, general-purpose registers can work with 32-bit or
64-bit operands. If a 32-bit operand size is specified: EAX, EBX, ECX, EDX, EDI,
ESI, EBP, ESP, R8D - R15D are available. If the 64-bit operand size is specified:
RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8-R15 are available. R8D-R15D
/ R8-R15 represents eight new general-purpose registers. All of these registers can
be accessed at the byte, word, dword and qword levels. The REX prefix is used to
generate a 64-bit operand size or to refer to the R8-R15 register.
Registers only available in 64-bit mode (R8-R15 and XMM8-XMM15) are
maintained throughout the transition from 64-bit mode to compatibility mode then
back to 64-bit mode. However, the R8-R15 and XMM8-XMM15 values are
undefined after the transition from 64-bit mode through compatibility mode to
legacy or real mode and then back through compatibility mode to 64-bit mode.
EAX, AX, AH, AL:
Called the Accumulator register.
It is used for I/O port access, arithmetic, interrupt calls, etc.
EBX, BX, BH, BL:
Called the Base register.
It is used as a base pointer for memory access
Gets some interrupt return values
ECX, CX, CH, CL:
Called the Counter register
It is used as a loop counter and for shifts
Gets some interrupt values
EDX, DX, DH, DL:
Called the Data register
It is used for I/O port access, arithmetic, some interrupt calls.
In 64-bit mode, there are restrictions on accessing register bytes. An instruction cannot
reference the old high byte (for example: AH, BH, CH, DH) and one new byte register at the same
time (for example low byte of the RAX register). However, instructions can refer to legacy low-
bytes (for example AL, BL, CL or DL) and register news bytes at the same time (for example low
bytes from register R8, or RBP). The architecture imposes this limitation by converting high byte
references (AH, BH, CH, DH) to low byte references (BPL, SPL, DIL, SIL: 8 low bits for RBP,
RSP, RDI, and RSI) for instructions using the REX prefix.
When in 64-bit mode, the size of the operand determines the number of valid bits in the general-
purpose purpose register:
• 64-bit operands produce 64-bit results in general-purpose purpose registers.
• 32-bit operands produce 32-bit, zero-extended results in 64-bit results in general-purpose
purpose registers.
• 8-bit and 16-bit operands produce 8-bit or 16-bit results. The upper 56 bits or 48 bits
(each) of the general-purpose purpose register are not modified by operation. If the results
of an 8-bit or 16-bit operation are intended for 64-bit address calculations, explicitly sign
the extension to the full 64-bit register.
Because the top 32 bits of a 64-bit general-purpose register is undefined in 32-bit mode,
the top 32 bits of any general-purpose register are not retained when switching from 64-bit mode
to 32-bit mode (to the protected mode or compatibility mode). The software should not depend on
this bit to retain values after switching from 64-bit to 32-bit mode.

o Buses
The bus is a subsystem that transfers data between computer components or between
computers. Existing types include the front side bus (front side bus or FSB), which carries data
between the CPU and the memory controller hub; direct media interface (DMI), which is an
interconnection between points between Intel's integrated memory controller and Intel's I / O
controller hub on the computer's motherboard; and Quick Path Interconnect (QPI), which is the
interconnection between points between the CPU and the integrated memory controller.
Intel I7 4790K use DMI bus and the bus speed is 5 GT / s. So, it was referring to the raw
data rate—the number of bps that the bus can move, or transfer. The encoding process reduces the
rate of useful data transferred over the bus to 80% of the bus’s raw speed.

o Cache (level, size, mapping)


A processor cache is an area of fast memory that is on the processor. Intel® Smart Cache
refers to an architecture that allows all cores to dynamically share access to the last level of cache.
The Intel Core i7 4790k processor has 8MB of Intel Smart Cache (Level 3 cache) and
Integrated Memory Controller (IMC) which supports two-channel DDR3 memory at 1333/1600
MHz speeds that are officially supported (but can do much higher). Already seen memory partners
and motherboards advertise nearly 3 GHz on DDR3 memory). For performance-mongers Intel
Core I7 4790K models, the most interesting for easy overclock is the K model.
A little deeper into the core then your first question is, what about the Level 1 and Level 2
caches? The Intel Core I7 4790k cache memory consists of a 32 KB Level 1 data cache, an
instruction cache of 32 KB (= 64 KB Level 1) and then we found a 256 KB Level 2 cache per core.
Bandwidth load, however, has doubled from 32 to 64 Bytes per cycle and stores bandwidth from
16 to 32 Bytes per cycle. So, the Level 1 data cache in Intel Core I7 4790K has the same size and
latency, but with more third bandwidth. Then there is a good Level 3 cache that is shared among
CPU cores totaling 8MB for an Intel Core I7 4790K processor. The Level 3 cache is where the
magic happens, surrounded by segments in the mold, the Level 3 cache is in the physical form of
the bus ring. Thus, the Level 3 cache can be used by processor cores and graphics cores.

Intel Core I7 4790K Level 1 Cache


256 kb
8 way associative
Writeback
TLB access & cache tag can occur in parallel
Translation Lookaside Buffer (i.e. TLB) is required only if Virtual Memory is used by a
processor. In short, TLB speeds up translation of virtual address to physical address by
storing page-table in a faster memory.
Does not suffer from bank conflicts
Minimum latency: 4 cycles
Minimum lock latency of Intel Core I7 4790K is 12 cycles
Intel Core I7 4790K Level 2 Cache
1 MB
Bandwidth doubled
Can deliver 64-bit line to data or instruction cache every cycle
11 cycle latency  256 KB for each cache
Intel Core I7 4790K Level 3 Cache
Shared between all cores
Size varies between models and generations between 6MB and 15MB
Most Haswell models have an 8MB cache
Size reduced for power efficiency
L2 Cache per core: 0.25 MB/core
L3 Cache per core: 2MB/core
The directly mapped cache is never used in modern high-performance CPUs. Power savings
are outweighed by the large gain in hit rates for set-associative caches of the same size, with only
a little more complexity in the control logic. The transistor budget is very large today.
It is very common for software to have at least several arrays which are multiples of 4k from each
other, which will create a longing conflict in the cache that is mapped directly. (Code tuning with
more than a few arrays can involve skewing them to reduce conflict errors if the loop needs to
repeat everything simultaneously)
Modern CPUs are so fast that DRAM latency is more than 200 core clock cycles, which is
too large for even powerful CPU out-of-order executions to hide very well in a cache miss.
The multi-level cache is very important (and all high-performance CPUs are used) to provide low
latency (~ 4 cycles) / high throughput for the hottest data, while still large enough to store a sized
work set cache. It's physically impossible to build one very large / very fast / very associative cache
that performs as well as the current multi-level cache for a typical workload; the delay in the speed
of light when data must travel far physically is a problem. Power costs will be prohibitive too
All cache levels (except the top cache) are physically indexed / physically marked on all
x86 CPUs that I know of. L1D cache in most designs takes their index bits from the bottom of the
page offset, and thus also VIPT allows TLB lookups to occur in parallel with tag retrieval, but
without alias problems. Thus, the cache does not need to be watered on the context switch or
anything.
Personal L1D / L1I and L2 (per-core) caches are the traditional associative set-cache, often
8-way or 4-way for small/fast caches. The cache line size is 64 bytes on all modern x86 CPUs.
The Data cache is a reply.
Personal L1D / L1I and L2 (per-core) caches are the traditional associative set-cache, often
8-way or 4-way for small/fast caches. The cache line size is 64 bytes on all modern x86 CPUs.
The Data cache is a reply.

Advantages Intel Core I7 4790K


Better Process Speed. Faster in doing your work related to computers.
Support All Applications.
Better Durability.
Disadvantages Intel Core I7 4790K
Cost
Lack of Core i7 processor with other Intel processors, namely the selling price is more
expensive and requires high hardware specifications to support the performance of Core i7
Device
With the various advantages possessed by Intel Core i7 4790K, the enhancements that need
to be added such as VGA, RAM, etc. must have a high spec. So that it can compensate and
maximize the work of this processor.

B. AMD VisheraFX-9590

o The Components
FX-9590 is the highest FX series processor now. The processor uses AM3 + sockets, just
like processors from other FX families. This is the first AMD retail desktop processor in the world
that dares to carry a 5.0 GHz clock.
Before proceeding to the testing section, let's review this processor's specifications a bit.
The FX-9590 processor carries eight cores with eight threads. The eight cores running at a standard
speed of 4.7 GHz and can increase to 5.0 GHz when Turbo Core is active. The FX-9590 processor
is not equipped with an integrated graphics processor and has a 220 Watt TDP.
Integrated graphics: None
Integrated graphics refer to a GPU (graphics processing unit) that’s literally part of the
motherboard, integrated at the silicon level, whereas discrete graphics, more powerful in
nature, use a slot-in board to connect up to a motherboard, usually via a conduit that’s
called PCI-Express.
Memory controller:
❖ The number of controllers: 1
Memory controller is responsible for the control of up to eight memory banks,
interfacing to SRAM, EPROM, Flash EPROM, various DRAM devices, and other
peripherals. The maximum number of controllers that can be connected will vary
depending on the type of controllers and features that are used.
❖ Memory channels: 2
The channels on the motherboard/processor are different from the channels in
RAM. The channel on the motherboard is the path between the RAM to the memory
controller on the processor.
❖ Supported memory: DDR3-1866
DDR3-1866 can run at a high enough speed without the need for special
configuration in the BIOS of the user. Users simply install the memory module into
the system and do the "Load Optimized Default" in the BIOS to get a configuration
that is fast enough for various daily needs.
❖ Maximum memory bandwidth (GB/s): 29.9
Other peripherals: HyperTransport technology
HyperTransport™ technology is a high-speed protocol for use in connecting
peripherals to computers, mobile computers, servers, communication systems,
network equipment, and embedded equipment. It provides up to 128 Gbps
aggregate bandwidth, and can be configured with 2, 4, 8, 16, or 32-bit buses.

o Registers: types, number, size


AMD Vishera FX-9590 processors have 16 registers in 64-bit mode.
In 64-bit mode, eight new GPRs (General Purpose Registers) are added to eight
GPR inheritance, all 16 GPRs have 64 bits width, and low bytes of all registers can be
accessed. GPR, flag registers, and instruction-pointer registers are available in 64-bit mode.
GPR includes:
Sixteen 64-bit registers (RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8, R9,
R10, R11, R12, R12, R12, R13, R14, R14, R15).
The size of the register used by an instruction depends on the size of the effective
operand or, for certain instructions, opcode, address size, or stack size. For most
instructions, access to extended GPR requires the REX prefix. The four high byte registers
(AH, BH, CH, DH) available in the old mode cannot be addressed when the REX prefix is
used.
In general, byte and word operands are stored in 8 or 16 low GPR bits without
modifying high 56 or 48 bits. However, the Doubleword operand is usually stored in 32-
bit GPR which is low and zero-expanded to 64 bits.
64-bit RFLAGS registers, containing old EFLAGS in the low 32-bit range. 32 bits
high are reserved. They can be written with anything, but they are always read as zero
(RAZ). The 64-bit RIP pointer register contains the address of the next instruction to be
executed.

o Buses
Bus speed of the AMD Vishera FX-9590 is 5.4 GT/s with One 2600 MHz 16-bit
HyperTransport link.
HyperTransport bus will depend on the hardware developer. Also, some developers announce an
exaggerated transfer rate of the HyperTransport bus they are using. AMD Vishera FX-9590
processors use 16-bit links, even though HyperTransport allows the use of 32-bit links.
HyperTransport 1.x (“HT1”) is used on all socket 754 processors and socket AM2 Sempron
processors. (Other AM2-based processors use HyperTransport 2.0.)
HyperTransport transfers two data per clock cycle, a concept also known as DDR, double
data rate. The formula to find the maximum theoretical transfer rate is:
Transfer rate = width (number of bits) x clock x number of data per clock cycle / 8
Thus, with socket 754 processors, the HyperTransport bus can work up to 800 MHz or 3,200 MB/s.
Some people advertise this clock and transfer rate using other numbers, generating a lot of
confusion in the market.
Some say that the clock rate used by HyperTransport 1.x is 1,600 MHz. This occurs
because since on each clock cycle two data are transferred, the performance obtained is like 1,600
MHz clock rate, transferring only one data per clock cycle. In the end, the transfer rate will be the
same, as in the above formula instead of using “2” for “number of data per clock cycle,” it will use
“1” instead. This is the same thing that happens with DDR memories, where the announced clock
rate is double the actual clock rate (e.g., DDR3-1600 memories work, in fact, at 800 MHz,
transferring two data per clock cycle).
Also, some people refer to this 1,000 MHz (4,000 MB/s) HyperTransport link as 2,000
MHz This happens because since on each clock cycle two data are transferred, the performance
obtained is like 2,000 MHz clock rate transferring only one data per clock cycle. In the end, the
transfer rate will be the same, as in the formula presented in the previous page. Instead of using
“2” for “number of data per clock cycle,” it will be using “1” instead.
HyperTransport 3.0 adds the following new clock rates, keeping compatibility with HT1
and HT2 rates (transfer rates assuming 16-bit links, which is the configuration used by AMD
processors):
1,800 MHz = 3,600 MT/s = 7,200 MB/s
2,000 MHz = 4,000 MT/s = 8,000 MB/s
2,400 MHz = 4,800 MT/s = 9,600 MB/s
2,600 MHz = 5,200 MT/s = 10,400 MB/s

o Cache (level, size, mapping)


L1 Cache (Code): 384 KB
L1 Cache (Data): 128 KB
L2 Cache: 8 MB
L3 Cache: 8 MB
L2 Cache per core: 1 MB/core
L3 Cache per core: 1 MB/core
Level 1 cache size:
4 x 64 KB 2-way set associative shared instruction caches
8 x 16 KB 4-way set associative data caches
Level 1 cache, often called primary cache, is static memory integrated with the processor
core which is used to store information recently accessed by the processor. Level 1 cache is
often abbreviated as an L1 cache. The purpose of level 1 cache is to increase data access speed
if the CPU accesses the same data multiple times. For this reason, level, 1 cache access time
is always faster than system memory access time. The processor may have additional level 2
and level 3 cache, although the cache is always slower than the L1 cache.
In modern microprocessors, the primary cache is divided into two caches of the same size
- one is used to store program data, and the other is used to store microprocessor instructions.
Some old microprocessors use the main "unified" cache, which is used to store data and
instructions in the same cache.
Level 2 cache size: 4 x 2 MB 16-way set associative exclusive shared caches
Level 2 cache, also called secondary cache, is a memory that is used to store recently accessed
information. The goal of having the level 2 cache is to reduce data access time in cases when the
same data was already accessed before. In modern microprocessors that incorporate data
prefetching feature the level 2 cache may also be used to buffer program instructions and data that
the processor is about to request from memory. This also reduces data access time. Please note that
the level 2 cache is secondary to the CPU - it is not as fast as the level 1 cache, although it is
usually much larger. All data that is requested from level 2 cache is copied to level 1 cache.
Requested data stays in the secondary cache if it's an inclusive cache and is removed from the
secondary cache if it's an exclusive cache. The secondary cache is usually unified, i.e. it is used to
store both program instructions and program data.
Level 2 cache is often abbreviated as "L2 cache". The L2 cache may be placed:
on the processor core - integrated or on-die cache.
in the same package/cartridge as the processor but separate from the processor core -
backside cache. This type of L2 cache was used in Pentium Pro, Pentium II, early Pentium
III, and slot An Athlon processors.
separate from the core and processor package. In this case, L2 cache memory is usually
located on the motherboard.
Level 3 cache size: 8 MB 64-way set associative shared cache

Advantages:
Same Socket Support
Extra Speed
Performance Improvement
New Toys for Over clockers
Disadvantages:
There are no new features
Wasteful power
Relatively High Prices

Table 1 Comparison Intel Core I7-4790K with AMD Vishera FX-9590

No The thing that distinguishes INTEL Core I7- AMD Vishera FX-
4790K 9590
1 Thermal Design Power (consumes 88W 220W
power)
2 Semiconductor Size 22nm 32nm
3 PassMark Results (CPU 11455 10535
performance)
4 Memory Bandwidth 25.6 GB/s 21 GB/s
5 Hyperthreading Technology Yes No
6 Integrated Graphics Card Yes No
7 Lots of transistors 1400 Object 1200 Object
8 Tri-Gate transistor (low power, high Yes No
speed)
9 Turbo Clock Speed 4.4 GHz 5 GHz
10 PassMark results (single) 2546 1756
11 Many PCI Express (PCIe) (high 3 2
speed expansion cards)
12 INTEL Quick Sync Video Yes No
13 RdRand (For Cryptography) Yes No
14 CPU Clock Speed 4 x 4 GHz 8 x 4.7 GHz
15 Ram Speed 1600 MHz 1866 MHz
16 L1 Cache 256 KB 384 KB
17 L2 Cache 1 MB 8 MB
18 L2 Cache per core 0.25 MB/core 1 MB/core
19 L3 Cache per core 2 MB/core 1 MB/core
20 FMA4 (speed up the image contrast No Yes
or volume)
21 Transfer bus value 5 GT/s 5.4 GT/s

III. Conclusion
The processor is a chip in the form of an Integrated Circuit (IC) that controls the entire
computer system and is used as the center or brain of computer activities in performing
calculations and carrying out input and output tasks. The comparison of Intel Processors is
stronger than AMD processors in multimedia applications, as AMD processors win over Intel
processors in gaming and its 3D programs

IV. References
a. AMD FX-Series FX-9590 - FD9590FHW8KHK ... - CPU World. www.cpu-

world.com/CPUs/Bulldozer/AMD-FX-Series FX-9590.html.

b. Hinum, Klaus. “Intel Core i7-4790 SoC - Benchmarks and Specs.” Notebookcheck,

Notebookcheck, 18 Jan. 2018, www.notebookcheck.net/Intel-Core-i7-4790-SoC-

Benchmarks-and-Specs.279044.0.html.
c. “Intel® Core™ i7-4790 Processor (8M Cache, up to 4.00 GHz) Product

Specifications.” (8M Cache, up to 4.00 GHz) Product Specifications,

ark.intel.com/content/www/us/en/ark/products/80806/intel-core-i7-4790-processor-

8m-cache-up-to-4-00-ghz.html.

d. “Registered Ram for Amd FX CPU.” Tom's Hardware Forum,

forums.tomshardware.com/threads/registered-ram-for-amd-fx-cpu.1455558/.

e. rjiextreme Junior Member Join Date: Feb 2015 Posts: 1, et al.

“Announcement.” G.SKILL TECH FORUM, 2 Feb. 2015,

www.gskill.us/forum/forum/general-discussion/general-discussion-aa/12194-best-

ram-for-crosshair-v-formula-z-and-amd-fx-9590-vishera-8-core-4-7ghz.

f. Score, CPUBoss. “AMD FX 9590.” CPUBoss, cpuboss.com/cpu/AMD-FX-9590.

g. Smith, James E. “Retrospective: Implementing Precise Interrupts in Pipelined

Processors.” 25 Years of the International Symposia on Computer Architecture

(Selected Papers) - ISCA 98, 1998, doi:10.1145/285930.285948.

h. “Intel® Core™ i7 Processor Series Datasheet, Vol. 1.” Intel,

www.intel.la/content/www/xl/es/products/docs/processors/core/core-i7-900-ee-and-

desktop-processor-series-datasheet-vol-1.html.

i. Mujtaba, Hassan. “Intel Core i7-4790K Haswell Refresh ‘Devil's Canyon’ Processor

Review.” Wccftech, wccftech.com/review/intel-core-i7-4790k-haswell-refresh-devils-

canyon-processor-review/.

j. “AMD FX-9590 Review.” AMD FX-9590 Review, 26 Feb. 2020,

benchmarks.ul.com/hardware/cpu/AMD FX-9590 review.

You might also like