Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

2/ Architecture of System on

Chip (SoC)

1. Introduction to System on Chip


Single-board computing provides researchers, hobbyists, and educators a
that is responsive for rapid development and programming platform
ata processing, they certainly can be utilized as
prototyping. While SBCs are not geared for large
versatile testbed environments. Such systems
have the ability to introduce users to new
development environments while being cost effective
at the same time. Their utility to cost ratio is very
high. Single-board computers can be used as a
standalone hardware solution or integrated into larger systems.
ASystem on Chip or an SoC is an Integrated
Circuit that incorporates a majority of components
present on a computer. SoC integrates the replaceable components onto a
single
Teducing the size and increasing efficiency. An SoC includes software and an chip, thereby
Structure for integration. The hardware-software integration approach interconnection
size, allows for less power makes the SoC compact in
consumption, and more reliable than a standard multi-chip system.
An SoC
ProcessingintegratUnites(GPU), Wi-Fi module, or one or more coprocessors. peripherals
a microcontroller or microprocessor with advanced
An SoC canlikebe Graphics
seen as
integrating aamicrocontroller with even more advanced peripherals.
Combinconnect
SoCs ing multotiplother
e component s into asingle chip saves on space, cost, and power consumpti
components too, such as cameras, a display, RAM, flash storage, and
on.
much more.

2-1
2-2 useN Embedded System Design

SoC includes both the hardware and software, it uses less powe, has better perfornDane
requires less space and is more reliable than multi-chip systems. Most system-on-chips toda
come inside mobile devices like smartphones and tablets.

Figure 2.1:System on Chip

Types of SoCs

In general, there are four distinguishable types of So Cs


i SoCs built around a microçontroller,
SoCs built around a microprocessor, often
found in mobile phones;
i.
Specialized
that do not fitapplication-specific
into the above two
integrated circuit SoCs designed for specific applicatio
iv.
categories, and
Programmable SoCs (PSoC), where most
in a manner analogous tofunctionality
is is fixed but some
reprogrammable a functiona
field-programmable gate array.
Architecture of System on Chip (S0c) wion\ 2-3

Cusually contains
AnSoC
various components such as
Operatingsystem and Uility software applications.
Voltage regulators and power management circuits.
Timing sources such as phase lock loop control systems or
oscillators.
iN.
Amicroprocessor, microcontroller or digital signal processor.
Peripherals such as real-time clocks, counter timers and
External interfaces such as USB, firewire,
power-on-reset generators.
ethernet, universal asynchronous
receiver-transmitter or serial peripheral interface bus.
i Analog interfaces such as digital-to-analog converters and analog-to-digital converters.
Vili. RAM and ROM memory.

Comparison beBween System on Chip and Processors on Chip


System on Chip Processors on chip
Processor Multiple, simple, heterogeneous. Few, complex, homogeneous.
Cache One level, small. 2-3 levels, extensive.
Memory Embedded, on chip. Very large, off chip.
Functionality Special purpose. General purpose.
Interconnect Wide, high bandwidth. Often through cache.
Power, cost Both low. Both high.
Operation Largely stand-alone. Need other chips.

Advant
1.
ages
Lower cost per gate, Lower power consumption, Faster circuit operation, More reliable

2. im plementatio n, More compact( Small Size), greater design security.


Flexibility: SoCs are reprogrammable, which makes them flexible.
easily
Dihsesaedvsystantageems are extremely complex in design andsorequire more verification.
2-4 Embedded System Design

2. Architecture of SoC
An SoC consists of hardware functional units, including microprocessors that run software .,
as well as a communications subsystem to connect, control, direct and interface betvween h
functional modules.

2.1 Basic Architectural Block Diagram

SoC chip

Multimedia Encoders \ Decorders


Memory
Direct Memory Access

CPU
Digital Signal
Processor (DSP)
Storage

NIC Audio USB Video

Main Functional Blocks of SoC


Processor cores of SoC structure

An SoC must have at least one processor core, but typically an SoC has more than one cor
Processor cores can be a microcontroller, microprocessor, Digital Signal Processor (DS?
or Application-Specific Instruction Set Processor (ASIP) core. ASIPs have instruction sets th
are customized for an application domain and designed to be more efficient than general-purpos
instructions for a specific type of workload. Multiprocessor SoCs have more than one process
core by definition.
use RISC
Whether single-core, multi-core or many core, SoC processor cores typically
instruction set architectures. RISC architectures are advantageous over CISC processors N
SoCs because they require less digital logic, and therefore less power and area on board, ano
the embedded and mobile computing markets, area and power are often highly constrained
Architecture of System on Chip (SOc) Iion\ 2-5
particular, Soc processor cores often use the ARM architecture because it is a soft
processorspecified as an IP core and is more power efficient than x86.
DigitalSignalProcessors

Digital Signal Processor (DSP) cores are often included on SoCs. They perform signal
prozessing operations in SoCs for sensors, actuators, data collection, data
nedia processing DSP cores typically feature Very Long analysis and
Instruction Multiple Data (SIMD) Instruction Word(VLIW)
and Single instruction set architectures,
and are therefore
amenable to
highly exploiting instruction-level
ing and superscalar execution. DSP cores are most often parallelisrn through parallel
feature application-specific
instructions, and as such are typically
Application-Specific Instruction-set Processors (ASIP).
Such application-specific instructions correspond to dedicated hardware functional uníts that
compute those instructions.
TIypical DSP instructions include multiply-accumulate, fast fourier transform, fused
multiply-add, and convolutions.

Other Components
As with other computer systems, SoCs require timing sources to generate
clock signals, control
execution of SoC functions and provide time context to signal processing
applications of the
SoC, if needed. Popular time sources are crystal oscillators and phase-locked loops.
SoC peripherals include counter-timers, real-timne timers and power-on reset
generators. SoCs
also include voltage regulators and power management circuits.

Memory
SoCs include semiconductor mnemory blocks to perform their computation, as do
microcontrollers and other embedded systems. Depending on the application, SoC memory may
form a
but in memory
rmany
hierarchy and cache hierarchy. In the mobile computing market, this is common,
for iow-power embedded microcontrollers, this is not necessary. Memory technologies
SoCs include read-only mnemory(ROM), random-access memory (RAM), Electrically
Erasable
RAM can Programmable into
ROM (EEPROM) and flash memory. As in other computer systems,
be
slower but subdivided relatively faster but more expensive static RAM(SRAM) and the
Usualy be cheaper dynamic RAM (DRAM), When an SoC has a cache hierarchy, SRAM will
used to implement processor registers and cores' LI caches whereas DRAM will be
2-6 on Embedded System Design

nsed for lower levels of the cache hierarchy including main memory. "Main memory" may ke
specific to a single processor (which can be multi-core)when the SoC has multiple processo
in which case it is distributed memory and must be sent via intermodule communication on-chi
to be accessed by a different processor.

Interfaces
SoCs include extermal interfaces, typically for conmmunication protocols. These are often based
upon industry standards such as USB, firewire, ethernet, USART, SPI, HDMI, IIC, etc. These
interfaces willdiffer according to the intended application. Wireless networking protocols such
as Wi-Fi. bluetooth, 6LOWPAN and near-field communication may also be supported.
When needed, SoCs include analog interfaces including analog-to-digital and digital-to-analog
different types
cOnverters, often for signal processing. These may be able to interface with
transducers. They may interface wit
of sensors or actuatos, including smart
analog
application-specificmodules or shields. Or they may be internal to the SoC, such as if an
to digital signals for mathematica
sensor is built into the SoC and its readings must be converted
processing
Inter-module communication
must often send data and instructions back
SoCs consist of many execution units. These units
SoCs require communication subsystems
and forth. Because of this, all but the most trivial
architectures were used, bu
Originally, as with other microcomnputer technologies, data bus networks known as
based on sparse intercommunication
recently designs
forecast to overtake bus
Networks-on-Chip (NoC) have risen to prominence and are
architectures for SoC design in the near future.
Bus-based communication
a
Historically, a shared global computer bus typically connected the different components,
royalty-ti
ARM's
called "blocks" of the SoC. Avery common bus for SoC communications is
Advanced Microcontroller Bus Architecture (AMBA) standard.
and
Direct memory access controllers route data directly between external interfaces
the PC
memory. bypassing the CPU or control unit, thereby increasing the data throughput ofmodule
iThis is similar to some device drivers of peripherals on component-based multi-chip
,architectures.
Architecture of System on Chip (SOC) es\ 2-7

Computer buses are limited in scalability, supporting only upto tens of cores (multicore) on a
single chip. Wire delay is not scalable due to continued
iniaturizatíon, system
performancedoes not scale with the number of cores attached, the SoC's operating
frequency must decrease with cach additional core attached for power to be sustainable, and long
wires consume large amounts of electrical power. These challenges are prohibitive to
Supporting many core systems on chip.
Network on a chip
b the Jate 2010s, a trend of SoCs implementing communication subsystems in terms of a
network-like topology instead of bus-based protocols has emerged. A trend towards more
nrocessor cores on SoCs has caused on-chip communication efficiency to become one of the key
factors in determining the overall system performance and cost. This has led to the emergence of
interconnection networks with router-based packet switching known as "Network on Chip"
(NoCs) to overcome the bottlenecks of bus-based networks.
Network-on-chips have advantages including destination and application-specific routing.
greater power efficiency and reduced possibility of bus contention. Network-on-chip
architectures take inspiration from networking protocols like TCP and the Internet protocol
suite for on-chip communication, although they typically have fewer network layers. Optimal
network-on-chip network architectures are an ongoing area of much research interest. NoC
architectures range from traditional distributed computing network topologies such
a5 torus, hypercube, meshes and tree networks to genetic algorithm scheduling to
randomized
agorithms such as random walks with branching and randomized time to live (TTL).
Nany SoC researchers consider NoC architectures to be the future of SoC design because they
have been shown to efficiently meet power andthroughput needs of SoC designs.

ApplThe imostcations
smart common application of SOCstoday is in mobile applications, including smart phones,
watches, tablets. Other applications include signal speech processing, PC interfaces, data
ccommuni
ommuniccatatiioonn. modul
SoCs are being applied to personal computers as well due to the integration of
es like LTE and wireless networks onto the chip.
2-8 ion Embedded System Design

3. Basic Version of Single Board Computer, Pin Description of


Raspberry Pi
Single board computers consist of everything on a single board itself. On the board, we havea
processor (CPU) and all other necessary peripherals and circuitry as well. SBC basic version has
onboard RAM, ROM, flash storage, AV ports, ethernet port, etc. This means that one board is
sufficient to act as a full-Nedged computer to built a complete embedded system based on his.

Figure 2.2: Basic version of SBC: Raspberry Pi

Raspberry Pi is a credit-card-sized single board computer developed by the UK based Raspbem


Pi Foundation for the sole intention of teaching programming and basic computer science t
school students. Itruns Linux on a 700 MHz ARM processor, has two USB ports to connect the
keyboardand mouse, supports video via HDMI and/or RCA, connects to the internet via the
ethernet port, storage handled by an SD card. The RPi started with the Model A in 2013 with
700 MHz single-core processor. The suite of RP is has grown to include the B and B+ model
with 1.2GHz and 1.4 GHz64-bit quad-core processors respectively as well as the entry lee
RPi Zero, both of which include wireless network interfaces.
Various models of Raspberry Pi have been released since the original Model B, each bringit
either improved specifications or features specific to a particular use-case.
Architecture of System on Chip (SOC) VISI0H 2-9

The Raspberry Pi Zero family, for example, is atiny version of the full-size Raspberry Pi which
dropsafewfeatures - in particular the multiple USB ports and wired network port-in favour of
asignificantly smaller layout and lowered power needs. All Raspberry Pi models have one thing
ia common, though: they re compatible, meaning that software written for one model will run on
any other model. It's even possible to take the very latest version of the Raspberry Pi's operating
systemand run it on an original pre-launch Model Bprototype.
12C
ID
EEPROM
UARTO_RXD CEN
Pn
N
XD
GPHO14
Ground GPIO15 Ground 18)
GPIO24
17 GroundGPIC2S GroundGFIO12GroundCPIOIE
GPIO2040)GPIO21

BB
Pi
model 19)20 Pimodel
B+

SCE1GPIO
GPIO3
120 GPIO2T
GPIO1
Ground
)8 D ()(6
GroundEEPROM
12C 0(8)
GPIOS GPIO6 GPIO13GPIO19GP102

Figure 2.3: GPIO pinout diagram

4 Squarely placed 40 GPIO SMSC LAN9514 USB


mounting holes headers ethernet controller

Run header used


to reset the Pl 2x2USB-A
ports to PC

Broadcom BCM2835

MicroSD card slot


(underneath)
DSIdisplay connector
Switching regulator
Scb 2s Ethernet
for less power
out port
consumption

3.5 mm audio and


5V Micro USB HDMI out port
composite output jack
power CSIcamera
ConneCtor

Figure 2.4: Pinout Diagram of Raspberry Pi 3 Model B


2-10 a Embedded System Design

General-purpose input/output pins tor


connecting electronic components

Micro SD card
USB ports
(underneath)

nUn Ethernet port

Audio jack
Micro USB
(underneath)
HDM! port Camera
module port

Figure 2.5: General Layout of various blocks on the Raspberry PIBoard

Raspberry Pi will have the following ports


USB: USB ports are used to connect a wide variety of components, most commnonly
mouse and keyboard.
ii. HDMI: The HDMI port outputs video and audio to your monitor.
Audio: The audio jack allows you toconnect standard headphones and speakers.
iv. Micro USB: The Micro USB port is only for power, do not connect anything else to thi
port. Always connect the power after you have already connected everything else.
V. GPIO: The GPIO ports allow the Raspberry Pi to control and take input from any
electronic component.
vi. SD card slot: The Raspberry Pi uses SD card the same way as a full-size computer uso
hard drive. The SD card provides the Raspberry Pi with internal memory, and stores
hard drive.
vii. The SoC (System on Chip) combines both CPU and GPUon a single
out to be faster than Pi 2 and Pi 3 models.
package and
Architecture of System on Chip (SOC) JON \

Improvements

The model B+ stays ahead in terms of processing speed and comes with an improved
wireless capability.
ii. The dual-band Wifi 802.1lac runs at 2.4 GHZ and 5GHz and provides a better range in
wireless challenging environments and Bluetooth 4.2 is available with BLE support.
The top side is painted with metal shielding, instead of plastic in the earlier models, that
acts as a heat sink and drains the excessive amount of heat if the board is subjected to the
high temperature or pressure.
iv. This B+ model is three times faster than Pi 2 and 3 which is a major development in terms
of speed, capable of exxecuting different functions at a decent pace.
V. The ethernet port comes with 300 Mbit/s which is much faster than earlier version with
100 Mbitsspeed. It is known as gigabit ethernet based on USB 2.0 interface.
vi. Four pin beader is added on the board that resides near 40 pin header. This allows the
Power over Ethernet (PoE), i.e., provides the necessary electrical current to the device
using data cables instead of power cords. It is very useful and reduces the number of
cables required for the installation of a device in the relevant project.
Following figure shows the pinout of Raspberry Pi 3B+
2
SDA GPIO2 4
SCL GPIO3 GND
7 8GPIO14 UARTO_TXD
GND 10 GPIO15 UARTO_RXD
12 CPlO18 CLK
GPIO17 11
GPIO27 13 14 GND
GPIO22 15 16 GPIo23
18 GPiO24
MOSI 19 20 GND
MISO GPIO9 21 22 GPiO25
CLK GPIO11 23 24 GPIO8 CEO N
GND 25 26GP07 CE1 N
120 DNG 27 28 DNO 120
GPIO5 29 30 GND
GPIO631 32 GPIO12
GPIO13 33 34 GND
GPIO19 35 36 GPO16
GPIO26 37 38 GPIO20
GND 39 40 GPIO21

Flgure 2.6: Raspberry Pl3 B+ pinout


2-12 0N Embedded System Design

i. 40 pin header is used to develop an external connection with the electronic deyice
the same as the previous versions, making it compatible with all the devices where
This is
versions can be used.
older
ii. Out of 40 pins,26 are used as a digital VOpins and 9 of the remaining 14 pins are termed
as dedicated VO pins which indicate they don't come with alternative function.
ii. Pins 3 and 5 come with an onboard pull up, resistor with 1.8 k2 and Pins 27 and 28 are
dedicated to D EEPROM. In B+ model, the GPIO header is slightly repositioned to allow
more space for the additional mounting hole. The devices that are compatible with the B
model may work with the B+ version; however, they may not sit identically to the
previous version.

Rasprry Pi 3 B+ Technical Specifications


i. CPU is 64 bit with 1GB RAM (random access memory).
ii. Contains Broadcom BCM2837B0 chipset.
Cortex-A53, 4 cores.
ii. Comes with 1.4GHz Quad-Core ARM
iv. Consists of 40 pin header (26 GPIOs).
and composite video is supported by 3.5mm jack connector.
V. Stereo audio
vi. 4USB 2.0 ports.
vii. Gigabit Ethernet. incorporated in this device that lacks in B
Ethernet) is a major feature
viii. PoE (Power over
model.
2-pin reset header.
ix.
used to enhance the memory capacity of the board.
Micro SD socket, power to the device.
X.
used for transferring
xi.
MicroUSBpower connector,
xii. HDMI.
CSI camera interface. Pi1
xiii.
facility that were not present in previous Raspberry
Bluetooth
xiv. Comes with WiFi and
and 2versions.
screen.
XV.
DSI connector for official
Architecture of Systemon Chip (SOC) `IOn\ 2-13

4. Architectural Features: CPU Overview, CPU Pipeline


stages, CPUCache Organizatlon, Branch Prediction and
Folding (Concept)
number of
CPU Overview: The CPUis a brain of this tiny computer that helps in carrying out a
instructions based on the mathematical and logical formulas. It comes with a capacity of 64 bit.
Oock speed and RAM: It comes with a clock speed of 1.4 GHz Broadcom BCM2837BO that
contains quad-core ARM Cortex-A53 and RAM memory is around 1GB (identical to the
previous version).
GPU: It stands for Graphics Processing Unit, used for carrying out image calculation. Broadcom
video core cable is added in the device that is mainly used for playing video games.
CPU Parameters
1. ARMI1J6HZF-S
.
1.
Iv.
ARMV6architecture
700 MHz clock
Single core
(,omputsr
V 32-bit RISC
vi. Branch prediction with return stack
vi. 8pipeline stages
Vi. 33 general purpose 32 bit registers
ix 7 dedicated 32 bit registers Model B Model B+ Model A+
Model A
Raspberry Pi: April, 2012 July, 2014 November,
Release date February, 2013
2014
Broadcom BCM2835
Chip:
|Processor: ARMV6 single core
700 MHz
Processor Speed:
Voltage and Power 600mA @ 5V
Draw: Dual Core VideoCore IV Mutimedia Co-Processor
GPU:
85 x 56 mm 65 x 56 mm
Size: 512 MB SDRAM
256 MB SDRAM
Memory: SD Card Micro SD Card
Storage:
GPIO:
SD Card
26 26 40
4 1
|USB 2.0: 2
10/100mb Ethernet RJ45 None
Ethernet: None
Jack
Audio: Multi-Channel HD Audio over HDMI, Analog Stereo from 3.5mm Headphone Jack
2-14 on Embedded System Design

RISC Architecture

Debug Optiornal VFP Coprocessor


nteriace controller

Instrucion ARM Data


cache cache
ustzone
TCRAM 0 TCRAM0
TCRAM1 ARMI1 core TCRAM1

Memory managemenet

AMBA AXI intertace

Instruction Data DMA Peripheral


interlace ntertace port

ARM 1176 Processor


Flgure 2.7

ARM1176JZF-S

The ARMI176JZF-S CPU is a member of the ARM11 Thumb family. The ARMI176JZ
macrocell is a 32-bit cached processor with ARM architecture vó that supports the ARMA
Thumb instruction sets and includes features for direct execution of Java byte codes. p .
Java byte codes requires the Java Technology Enabling Kin(JTEK). The development chipa
contains:
Architecture of System on Chip (SOC) Iaon\ 2-15

Cache
i.
Cache memory for instruction (32 KB) and data (32 KB).
Level 2 Cache Controller (L2CC) with 128 KB unified cache.
DSP
A range of Single lnstruction Multiple Data (SIMD) DSP instructions that operate
on 16-bitor 8-bit data values in 32-bit registers.
ii. MMU

The ARM1176JF-S contains a Memory Management Unit (MMU).


iv. TCM

8KBof data and instruction Tightly Coupled Memory (TCM). The TCM operates
with a single wait-state and provides higher data rates than external memory.
VFP

Vector Floating Point coprocessor (VFP), supporting the ARM VFPv2 floaing
point coprocessor instruction set.
vi. TrustZone: TrustZone security extensions
TrustZone Interrupt Controller (TZIC)
TrustZone Protection Controller (TZPC)
vi. EM: Provision for Intelligent Energy Management (1EM).
ARM Intelligent Energy Controller (IEC).
National Semiconductor Advanced Power Controller (APCI).
National Semiconductor Hardware Performance Monitor (HPM).
viii. AXI RAM
AXIRAM (512 kB) and boot ROM emulation (16 kB).
ix, AXI buses: The ARM1176JF processor uses the Configurable AXI Interconnect to
connect the processor core to the on-chip AXI controllers and peripherals. An AXI to
APB bridge provides the interface to the APB-based peripherals in the development chip.
One external AXI master bus and one external AXI slave bus provide the interface to the
FPGA peripherals and the optional Logic Tile.
2-16 SLON Embedded System Design

X. CAI

Configurable AXI Interconnect (CAI).


xi. CoreSight: CoreSight components include
CoreSight Embedded Trace Module (ETM11)
off-chip trace. The ET
The Embedded Trace Macrocell (ETM) provides signals for
analyzer where the signals can be store
transmits a 16-bit packet to an external trace port
and later analyzed to reconstruct the code flow.
memory.
CoreSight Embedded Trace Buffer (ETB11), 8KB
This high-performance, low-power Vector Floating-Point (VFP) coprocessa
xii. VFP11:
architecture.
implements the VFPv2 vector floating-point
ARM1176JZF-S development chip contains deskew PLLs that use a
xiii. Clock control: The internal clocks for the CPU, AHB bus, memory, an
external reference clock to generate
the chip are programmable and give flexibility in setin
off-chip peripherals. Dividers in
memory.
clock rates for the CPU, bridges, and
ARM1176JZF-S development chip includes an AXI memor
The
x0V. Memory controllers: and a single port static memory controller AHB devic
controller (for dynamic memory)
interfaces to external memory.
Both controllers have 32-bit interface to the interrupt sysi
GIC provides an
Interrupt controller: The PrimeCell interrupt sources from:
XV.
provides vectored interrupt support for high-priority
and
development chip.
peripherals in the ARM1176JZF-S
secondary interrupt controller is present in the Fro
peripherals in the FPGA (a
the development chip.
These are routed through the TZIC in interface that supports 2
CLCDC provides a flexible display
xví. CLCD controller: The (external interface circuitry connects the CLCD
monitor and digital LCD displays
Digital/Analog connector on the board). received
froa
to a DVI
UARTs perform serial-to-parallel conversion on data the peripk
xvÍi. UART: The four
parallel-to-serial conversion on data transmitted to
peripheral device and
by the FPGA.) interryp
device. (An additional UART is provided used to generate
32-bit down counters that can be
xviii. Timers: There are six Real-Time-Clock is fed with an external lHz signal.
programmable intervals. A
Architecture of System on Chip (SOC) Wo\ 2-17

i Synchronous serial port: The SSP provides a master or slave interface for synchronous
serial communication using Motorola SPI, TI or National Semiconductor Microwire
devices.
Smart card interface: The Smart Card Interface signals are programmable to enable
support for a Smart Card, Security Identity Module (SIM) card, or similar module.
oi, GPIO: Eight bits of GPIO are provided by the on-chip interface. (An additional GPIO is
provided by the FPGA.)
xi. Watchdog: AWatchdog module can be used to trigger an interrupt or system reset in the
event of software failure.

4.1 CPU Overview


BCM2835/37
This is the Broadcom chip Used in the Raspberry Pi3, and in later models of theRaspberry Pi
2. The underlying architecture of the BCM2837 is identical to the BCM2836. The only
significant difference is the replacement of the ARMV7 quad core cluster with a quad-core
ARM Cortex A53 (ARMv8) cluster.
The ARM cores run at 1.2 GHZ, making the device about 50% faster than the Raspberry
Pi2. The VideoCore IV runs at 400 MHZ.

The Broadcom BCM2835 SoC used in the first generation Raspberry Pi includes a
700MHz ARMI176JZF-S processor, VideoCore IV Graphics Processing Unit (GPU), and
RAM. It hasa level 1(Ll) cache of 16 KiB and a level 2 (L2) cache of 128 KiB. The level 2
cache is used primarily by the GPU. The SoC is stacked underneath the RAM chip, so only its
edge is visible. The ARMIu76JZF)-S is the same CPU used in the original iPhone, although at
ahigher clock rate, and mated with a much faster GPU.
Ine carlier V1.l model of the Raspberry Pi2 used a Broadcom BCM2836 SoC with a
O MHz 32-bit, quad-core ARM Cortex-A7processor, with 256 KiB shared L2 cache. The
poerry Pi2V1.2 was upgraded to a Broadcom BCM2837 SoC with a 1.2 GHz 64-bit quad
Core ARM Cortex-A53 processor, the same SoC which is used on the Raspberry Pi 3,
but underclocked (by defaul) to the same 900 MHz CPU clock speed as the V1.1. The
BCM2836 SoC is no longer in production as of late 2016.
The Raspberry Pi3 Model Buses a Broadcom BCM2837 SoC with a 1.2 GHz 64-bit quad-
core ARM Cortex-A53 processor, with 512 KiB shared L2 cache. The Model A+ and B+ are
14GHz.
2-18 ON Embedded System Design

The Raspberry Pi 4 uses a Broadcom BCM2711 SoC with a 1.5 GHz 64-bit quad-core ARM
Cortex-A72 processor, with 1 MiB shared L2 cache. Unlike previous models, which all
used
custom intemupt controller poorly suited for virtualisation, the interrupt controller on this Soe
compatible with the ARM Generic Interrupt Controller (GIC) architecture 2.0,
hardware support for interrupt distribution when using ARM virtualisation capabilities, providing
The Raspbery Pi Zero and Zero W use the same Broadcom BCM2835 SoC as the
generation Raspbery Pi,although now running at 1GHz CPUclock speed.
The CPU of the first and second generation
heat sink or fan, even when
Raspberry Pi board did not require cooling with a
overclocked.
overclocked, but the Raspberry Pi 3 may generate more heat when

4.2 CPU Pipeline


Stages
There are 8
pipeline stages in this processor design
The ARMIlcore has
Each stage has a set increased the length of its pipeline depth from three
function and to a certain level
more data in the same clock stages to eight stages.
cycle. having more stages allows you to
In the case of the prOcess
stage is responsiblepipeline
in the ARM, one
for all algorihmic stage's output is the next
The functions, such as addition or stage's input. The ALU
following are the 8 stages multiplication.
Two Fetch stages (Fel, Fe2)
ii. One Decode Stage (De)
iii. One Issue Stage (Iss)
iv.
Four integer execution
pipeline stages (Sh, ALU, Sat, WBex)
Datapath consists of three pipelines as shown in
i. ALU, shift, or Sat pipeline. figure.
ii.
MAC pipeline.
ii.
Load or store pipeline.
Architecture of System on Chip (sOC) Wi0m\ 2-19

Fe2 De Iss Sh ALU Sat WBex


Fet
tst fetch 2nd fetch Instruction Reg. read Shifter ALU Saturation Writeback
stage decode and issue stage operation stage MuWALU
stage

MAC1 MAC2 MAC3


1st multiply 2nd rmultiply 3rd multiply
acc. stage acc. stage acc. stage

ADD DC1 DC2 WBIs

Address Data Data Writeback


generation cache 1 cache 2 from LSU

Figure 2.8

i. Fetch stages can hold upto four instructions. Branch prediction is performed on
E:
F: instructions ahead of execution of earlier instructions.
Issue and Decode stages can contain any instruction in parallel with a predicted branch.
iü. Execute, Memory, and Write stages can contain a predicted branch, an ALU, or multiply
instruction load/store multiple instruction, and a coprocessor instruction in parallel
execution.

The order of the execution is as shown in the figure

Order of execution

Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7 Stage 8

1st fetch 2nd fetch Instruction Register Shifter Main Saturation Write back
decode read and stage ALU stage stage
stage stage
instruction
issue

Flgure 2.9

Ihe stages are executed in order from one to eight in one single clock cycle; so one clock cycle
can provide one ALU operation. Now think about if the pipeline was four stages long and not
cight: it would then take two full clock cycles to complete the same instruction. This makes the
process half as efficient. However, the ARMII is a superscalar architecture so it can do more
2-20 Hgm Embedded System Design

than one operation per clock cycle, as can most modern processors. Superscalar means that
functions inside the CPU core can operate in a parallel fashion. You can think of a
architecture like a Retail Mall with multiple checkout lines. You have many operators
superscala
many customers. The opposite to this is scalar: scalar would be a small green grocer withserving
one checkout that can serve onlyone person at a time.
The more stages you add, the higher the clock frequency you need to drive the stage. This be
the very unfortunate side effect of increased heat and power usage. Given that the ARM11 i
bad.
targeted to low power and low heat-embedded devices more stages would be very
The ARMV6 is special in another way too; it is the first ARM core to contain a vector floating
arithmetic by
point coprocessor. This coprocessor meets the IEEE standards for floating point
giving the ARMI1 a low-cost, high-performance, single-precision and double-precision
from this
computation ability in hardware. A lot of the performance inprovements will come
coprocessor that is potentially more than 10 times faster for certain operations.

4.3 CPUCache Organization


cache is an area of very fast
The ARM11 provides separate data and instruction caches (a
Raspberry Pi, these ar
memory that can be directly accessed by the main CPU). On the
ARMI176JZF-S, each cache is
someties known as D-cache" and "I-cache." In the case of the
are four-way associat1ve
16 kB in size and each cache can be locked independently so they
cache.
The I-cache and D-cache make up what is commonly called the L1
clock cycles the L
Broadcom has assigned the L2 cache to be used by the GPU; in terms of
dedicating the L2 cache to n
cache is closer to the GPU than the ARMI1 core. In addition, by
requests around the L
GPU youcan get more performance from the GPU. The CPU will map
too. It makes use o.
cache. The ARM1176JZF-S has a good way of controlling the caches
maininteger core C
dedicated coprocessor to handle all cache functions, This means that the
not controlling the cact
focus more on the task at hand, which is processing your request and
access. It will control what is in
This coprocessor also takes care of the main system memory
each cache has be
cache and how the main system memory has been accessed and how
memory is 64 bits wo
accessed as well. Each data path between the caches and main system
data to flow though each pa
giving the ARMIl a bit more throughput. This allows more
concurrently.
2-21
Architecture of System on Chip (5OC)

CP15 Virlual Write


interface address data

RAMSet base address and size Write butfer data (1-2 words)

Write bufter addresses

Micro
TLB

DATARAM TCM
TAGRAM

Way
select
Comparator

Cache Data
Micro TLB out
hit
miss and
Data abort

Figure 2.10

to 64 kB.
Four-way set associative with size configurable from 4
Cache is Harvard implementation.
Round Robin, which is controlled by
Cache replacement policies are Pseudo-Random or
the RR bit in CP15 register cl.
write-through.
MicroTLB determines if cache lines are write-back or
lines.
Contains both secure and non-secure data in cache
Branch h Prediction and Folding (Concept) history is available for dynamic
Processor handles branches first time execution when no
Prediction for the prefetch unit.
and return stack.
Integer Core (IC): Uses static branch prediction
Prefetch Unit (PU): Uses dynamic branch prediction.
Embedded System Design
2-22

iü. When a branch is resolved, the PU receives information from the IC and either alloca
space in the Branch Target Address Cache (BTAC) or updates an entry.
iv. Branches are resolved at or before the third execution stage.
Dynamic Branch Prediction
i.
Uses a Branch Target Address Cache (BTAC) as the first line of branch predicion
hold virtual target addresses.
Prediction history of a branch isstored as a two-bit value in the BTAC.
iii. BTAC is a 128-entry direct-mapped cache structure.
iv. Two bit values represent the following four states.
Strong predict branch taken.
b. Weak predict branch taken.
C. Strong predict branch not taken.
d. Weak predict branch not taken.
Static Branch Prediction
branch prediction, which is ba
Second level of branch prediction in processor is static
on the characteristics of the branch instruction.
iü. Uses no history information.
branches not taken and all backwa
ARM1176JZF-S predicts all forward conditional
branches taken.
trouble experienced by the miss when first encountering the bra
iv. Added to mitigate the
by the predictor.
Branch Folding
Branch instruction is removed from the pipeline and is storu
i. Technique where the
branches.
buffer, which is executedon alldynamic predicted
under 1.
ii. Can improve the Branch CPI to
link/exchange
Technique not done on the following
ii.
instructions, to avoid losing the link. (Branch with
a. BL and BLX
instruction set).
to another branch.
Predicted branches that lead directlywhen
b. fetched.
C. Branches that have been cancelled
Architecture of System on Chip (SOC) wion\ 2-23

5. GPU Overview
AGraphics Processing Unit (GPU) is a specialized, electronic circuit designed to rapidly
manipulate and alter memory to accelerate the creation of images in a frame buffer intended for
output to a display device. GPUs are used in embedded systems, mobile phones, personal
at
computers, workstations, and game consoles. Modern GPUs are very efficient
manipulating computer graphics and image processing. Their highly parallel structure makes
them more efficient than general-purpose Central Processing Units (CPUs) for algorithms that
process large blocks of data in parallel. In a personal computer, a GPUcan be present on a video
cardor embedded on the motherboard. In certain CPUs, they are embedded on the CPU die.
Modern GPUs use most of their transistors to do calculations related to 3D computer graphics.
The Broadcom Video Core IV 250 MHz supports OpenGL ES 2.0(24 GFLOPS) Mpeg-2 and
VC-lis the GPUwhich also includes a 1080p30 H.264/MPEG-4 AVC decoded/encoder.
VideoCore is a low-power mobile multimedia processor originally developed by Alphamosaic
makes it flexible and
Lid and now owned by Broadcom. Its two-dimensional DSP architecture
while
efficient enough to decode (as well as encode) a number of multimedia codecs in software
maintaining low power usage.

GPU VGABIOS

Bus Interface Graphics Memory


(BIF) Controller (GMC)

Power Compression Unit


Management
Unit (MPU)
Video Graphics and Compute
Processing Array (GCA)
Unit (VPU)
Display Intertace (DIF)

Flgure 2.11:Components of GPU


L
2-24 ON Embedded System Design

i. BIF Bus Interface): Data transfer through ISA, VLB, PCI.


AGP etc. is
controlled by the Bus Interface Unit. This transfer can be from outside
or from memory controller.
managed arm
GPU componenr
iüi. PMUPower Management Ünit): Is the unit
various components of GPUand optimize the responsible supply appropriate pOWers
to
power usage.
iüi. VPU (Video Power Unit): Various processes
related to processing of the vide
including Compression and De-compression is carried out by this unit.
iv. DIF (Display Interface): This consists of Display
controllers such as RAMDACs, HDM
5-video.
GCA (Graphics and Compute Array): It is also referred to as a 3D Engine.
vi. GMC(Graphicsmemory Controller): Controls all memory transfers with higher spee
GPUOverview
i Broadcom Videocore IV GPU.
a
Tile-based renderer (TBR) that use up to four cores.
b. 40 nm technology.
C. Integrated graphics card so shared memory.
ii. Capable of Blu-Ray quality of 1080p with H.264 at 40Mb/s.
iii. Graphics performance is similar to the Xbox 1.
iv. 24 GFLOPS of general purpose computational power.
V. Has texture filtering and DMA infrastructure.
vi. OpenGL ES 1.1, OpenGL ES 2.0, hardware accelerated OpenVG 1.1, Open EGL.
OpenMAx.
Architecture of Systemon Chip (S0C) VIon\ 2-25

Exercises
A. Choose correct option from the following:
1.
The hardware-software approach makes the SoC compact in size, allows for
less power consumption, and more reliable than a standard multi-chip system.
a. Integration b. Design
C. Isolation d. None

2 SoCs can be built arounda


Transmitter b Receiver

C. Microcontroller d. None

3. SoC processor cores typically use instruction set architectures.


a. CISC b. RISC

C. Assembly d. None

4 are often included on SoCs for Signal processing operations.


DSP b.SSP
ARY
C. RSP d. QSP

SoCs include interfaces for Communication protocols.


Internal b. TTL
a. Nashnt
C. External d. None

to and fro the interfaced devices.


6 The Execution units of SoC's send data and
b Addresses
Instructions
d. None
C Control signals
7 controllers route data directly between external interfaces and SoC memory.
b. Cache
a. DSP
d. None
C. DMA
8
The port outputs video and audio to the Monitor/ Display Unit.
b. GPIO
HDMI
d. None
USB

You might also like