Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 63

Error Detection And Correction with Advance Encryption Standard Cryptography

For Secured Communication (sample Title )

Abstract - On behalf of technology scaling, on-chip memories in a die undergoes bit errors
because of single events or multiple cell upsets by the ecological factors such as cosmic
radiation, alpha, neutron particles or due to maximum temperature in space, leads to data
corruption. Error detection and correction techniques (ECC) recognize and rectify the
corrupted data over communication channel. In this paper, an advanced error correction 2-
dimensional code based on divide-symbol is proposed to weaken radiation-induced MCUs
in memory for space applications. For encoding data bits, diagonal bits, parity bits and
check bits were analyzed by XOR operation. To recover the data, again XOR operation was
performed between the encoded bits and the recalculated encoded bits. After analyzing,
verification, selection and correction process takes place.

Temporary errors which are classified under soft errors are created because of fluctuations
in the voltage or external radiations. These errors are very common and obvious in
memories. In this paper, Diagonal Hamming based multi-bit error detection and correction
technique is proposed to identify errors 1 bit error for one row .

The proposed scheme is implemented with Data Security By using Advance Encryption
Standard Cryptography . was simulated and synthesized using Xilinx implemented in
Verilog HDL.

Index Terms—Space Applications ,Diagonal Hamming, Multibit error correction, Random


bit errors and correction techniques.
Chapter 1

INTRODUCTION

BINARY information is stored in a storage space called memory. This binary data is
stored within metal-oxide semiconductor memory cells on a silicon integrated circuit
memory chip. Memory cell is a combination of transistors and capacitors where capacitor
charging is considered as 1 and discharging considered as 0 and this can store only one bit.
Errors which are temporary or permanent are created in the memory cells and need to be
eliminated. Single bit error correction is most commonly used technique which is capable of
correcting upto one bit. Since technology is increasing rapidly, there are more probabilities
of getting multiple errors [1]. Use of Diagonal Hamming method leads to efficient
correction of errors in the memories. Memory was divided as SRAM, DRAM, ROM,
PROM, EPROM, EEPROM and flash memory [2]. Main advantages of semiconductor
memory is easy to use, less in cost, and have high bits per square micrometers. Temporary
errors are called transient errors which are caused because of fluctuations in potential level.
Permanent errors are caused because of defects during manufacturing process or large
amount of radiations [3].
For detection and correction of these soft errors, various methods have been
proposed [4]. In this paper, memory bits which are affected by errors are recognized and
rectified using Diagonal Hamming based multi-bit error detection and correction technique.
The aim of this method is to detect effectively up to 8 bit errors and correcting them. In this
paper, Section-II gives overview of other coding techniques. The advantages and
disadvantages of different coding techniques are discussed. Section-III describes the
proposed error detection and correction methods. G. Manoj Sai, K. Mohan Avinash, L. Sri
Ganesh Naidu, M. Shiva Rohith and M.Vinodhini are with the Department of Electronics
and Communication Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa
Vidyapeetham, India. (email:manojsai891999@gmail.com) Analysis of the coding
technique and the discussions related to it are described in Section-IV. Finally, the
conclusion is provided in Section-V.

II. RELATED WORKS Information in the form of binary digits is stored in a space called
memory. Errors occur in memory due to voltage fluctuations, manufacturing process or due
to very high radiations. There are many coding techniques for error correction and detection.
These techniques can correct from single to multiple bit errors. Coding methods like HVD,
HVDD, HVPDH, DMC, MDMC, 3D parity check with Hamming are used to correct bits in
memory. The 3 Dimensional parity check is used for checking and correcting the errors in
message bits and hamming is used for correction of parity bits [3]. In HVPDH method,
number of parity bits are reduced and hence reliability is increased [1]. HVD method [5] is
used for correction of soft error bits and the power consumption is less in this method. In
HVD method, the parity code uses four different directions in block of data.

HVDD methods can be used for correction of bit patterns which is in the form of
equilateral triangle and any irregular triangle [6]. This method can’t be used for
parallelogram structure. In DMC method [7], decimal program is used for identification and
correction of errors. The main drawback in this method is it uses more redundant bits for
reliability of memory. In MDMC less redundant bits are used and it uses re-configurable
Array Exclusive OR logic for addition in decimal [8]. If we use pipe-lined MDMC, the time
is saved since multiple stages of work are done at same time. Similar methods can be used in
ON-chip and OFF-chip networks. Even there, increase in parity bits improves the reliability
of communication [9-11]. Many schemes used for network communication can also be used
for semi-conductor memories

The internet of things (IoT) are interlinked with digital machines, computing
devices, physical objects and human beings which provide a unique identifier for each of
them through internet via clouds [1], [2]. Nowadays eco-friendly telecommunication
technologies have been a mandate in the wireless network field [3]. One of the major
problem to face is that of power usage of these wireless communication networks which has
serious environmental impact. Networks must be able to transfer data with high accuracy
and reliability [4]. So to retain the more reliability of these networks, errors must be detected
and corrected with great efficiency. There are number of error detecting and correcting
codes available but most of them confront lot of challenges such as unreliable wireless links,
the broadcast nature of wireless transmissions, interference, frequent topology changes, and
other effects of wireless channels. These are encountered if high data rate service, high
throughput, etc are incorporated on it [5], [6]. Several numbers of error detection and
correction codes such as Hamming code [7], Hsiao code [8], Reviriego et al. code [9] and
CA-based error detecting and correcting codes [10], [11], [12], [13], [14] have been
introduced in literature. Each of these codes has its own advantage to be used as the channel
coding scheme in a specific communication system. Nowadays, there have been wide spread
of research works based on these codes. Hamming, Hsiao and Reviriego codes are well
known binary linear codes which offer single error correction-double error detection (SEC-
DED) capability [7], [8], [9]. These codes can correct single error and detect double error.

Delay, power consumption and area requirements of these codes are lowered
compared to other existing codes. Cha and Yoon proposed a technique to design an ECC
processing circuits based on SEC-DED code for memories. The area complexity of the ECC
processing circuits have been minimized in [15]. On the other hand, the error correction
ability of the low density parity check (LDPC) code depends on the codeword length. The
decoder gives a better performance with a larger codeword and with parity-check matrices
[16], [17]. The higher codeword length coding technique needs larger memory and complex
decoding architecture. The LDPC codes are unable to correct errors if the number of errors
occurred are greater than the error correction capability of the decoder nevertheless of the
number of iterations [18], [19]. Adalid et al. presented a SEC-DED code that is able to
correct a single error, and to detect double errors for short data words [21], [22]. Liu et al.
described a scheme of SEC-DED to recover some critical bits when a double error is
detected [23].

Alabady et al. proposed a new coding technique to detect and correct single and
multiple bit errors named as low complexity parity check (LCPC) code [26], [24], [25].
These codes have lesser complexity and lower memory requirements compared to other
existing LDPC codes [16], [17] and Reed Solomon (RS) codes [20]. The LCPC (9, 4),
LCPC (8, 3) and LCPC (7, 3) codes do not need any iteration in the decoding process which
may reduces the overall circuit complexity. The algorithms, flowchart, error patterns and its
syndrome values are presented in [26], [24], [25]. Alabady et al. codes arenot fully satisfied
single error correction and double error detection functionality in some cases. Beside this
limitation, there are some mistakes in flowchart, tables and figure which are rectified in
[28]. Ming et al. proposed a SEC-DED-DAEC code to diminish noise source in memories
[27]. These existing codes require more area, power and delay. To overcome these
challenges, this paper aims to develop a new fault tolerant channel coding technique.

The main contributions of this paper are as follows: i) Proposed for a new technique
to construct the parity check matrices (H) for SEC-DED and SEC-DED-DAEC codes. ii)
Same H-matrix is employed to build the SEC-DED and SECDED-DAEC codes. iii) The
SEC-DED and SEC-DED-DAEC codes with message length of 4-bit, 8-bit and 16-bit have
been designed and implemented for WSN application. iv) Proposed designs are fast and
power efficient compared to existing related designs. v) These codes have simpler encoding
and decoding processes and lower memory size. This paper is organized for remainder
portion as follows. Section II presents the design of proposed SEC-DED and SEC-DED-
DAEC codes. Section III provides the encoding and decoding techniques of proposed codes.
Section IV presents the performance analysis. Section V describes the synthesis results.
Section VI contains the conclusion.

1. 1. Motivation
1.1. Problem Statement Coding theory is interested in providing a reliability and
trustiness in each communication system over a noisy channel. In the more communication
systems, the error correction codes are used to find and correct the possible bit changing [2],
for example, in wireless phones and etc. In any environment, Environmental interference,
physical fault noise, electromagnetic radiation and other kinds of noises and disturbances in
the communication system affect communication direction to corrupted messages, an errors
in the received codeword (message) or bit changing might be happened during a
transmission of data [1] [2]. So, the data can be corrupted during a transmission and
dispatching from the transmitter (sender) to the receiver, but during a data transmission, it
might be affected by a noise, then the input data or generated codeword is not the same as
received codeword or output data [1]. The error or bit changing which is happened in data
bits can change the real value of the bit(s) from 0 to 1 or vice versa. In the following simple
figure, a base structure of communication system is indicated. And this figure indicates that
how the binary signal can be affected by a noise or other effects during a transmission on the
noisy communication channel:

Generally, there are three types of errors that can be corrupted in data transmission
from the sender to the recipient:  Single bit error: The error is called single bit error when
bit changing or an error happens in one bit of the whole data sequence [3].  Multiple bit
errors: the occurred errors are called multiple bit errors, if there are a bit changing or an
errors in two or more than two bits of the sequence of forwarded data [3].  Burst errors: if
an errors or bits changing are happened on the set of bits in the transmitted codeword, then
the burst error is occurred. The burst error will be calculated from the first bit which is
changed until the last changed bit [3]. These types of errors will be discussed in chapter
three of this report in detail.

1.2. The General Idea

The main method is using redundancy (parity) to recover messages with an error
during transmission over a noisy channel. As a simple example, an error can happen in
human language in both verbal and written communication. For instance, if during a reading
of this sentence: “this sentence is miscake”, there is an error and wrong word in this
sentence and the error should be found that in this sentence in which word the mistake is
occurred and then it must be corrected, so two important things must be achieved here: error
detection and error correction, and the principles that must be used to achieve these goals are
first in English language the string “miscake” is not accepted word, so in this point, it is
obvious that the error is occurred, Secondly, the word “miscake” is closest to the real and
correct word “mistake” in English, so it is the closest and the best word which can be used
instead of the wrong word [4]. So, it shows how redundancy can be useful in the example of
human language, but the goal of this project is how the computers can use some of the same
principles to achieve error detection and error correction in the digital communication?

To get the best idea of correction by using a redundancy in digital communication,


first of all, it is necessary to model the main scheme contains two main parts called encoding
and decoding like the following figure and in the next parts all of these parts will be
explained in detail and implemented in verilog language.

According to this figure [2] [3], first of all, the code will be generated by a source,
and then to do encoding part, the parity (redundancy) bits will be added by the encoder to
the data bits which sent from a source. After that, the generated codeword which is a
combination of data bits and parity bits will be transmitted to the receiver side, and during
transmission, the error or bit changing might be happened in the communication channel
over produced codeword. At the end, the corrupted error must be detected and corrected by
decoder on the receiver side.

Chapter 2. 2. Literature Review

2.1. Historical Background related to Hamming Code

One of the proper subset of information theory is called coding theory, but the
concept of these theories are completely different [2]. The main subject is began from a
seminar paper which presented by Claude Shannon in the mid of 20th century. And in that
paper, he demonstrated that good code exist, but his assertions were probabilistic. Then after
Shannon’s theorem, Dr Hamming [1] [3] and Marcel Golay [5] presented their first error
correction codes which called Hamming and Golay codes [2]. Source Encoder Channel
Received message Decoder Receiver

In 1947, Dr Hamming introduced and invented the first method and generation of
error correction code called hamming code [1] [6]. Hamming code is capable to correct one
error in a block of the received message contains binary symbols [7]. After that, Dr
Hamming has published a paper [1] in the Bell technical journal with a subject of error
detection and error correction code. In 1960, other methods for error detection and error
correction codes were introduced and invented by another inventor, for example, BCH code
was invented by Bose, Chaudhuri and Hocquenghem who are a combination of the surname
of the initial inventors' of BCH code [7].

Another error detection and error correction code which presented in 1960 was Reed
Solomon (RS) code that invented by two inventors called Irving Reed and Gustave Solomon
and this method was developed by more powerful computers and more efficient algorithm
for decoding part [8]. 2.2. Types of Hamming Code There are three more significant types
of hamming codes which called 1. Standard hamming code, 2. Extended hamming code [9],
3. Extended hamming product code, but in this research, the standard hamming code is used
and implemented.

2.2.1. Standard Hamming Code

The standard method of hamming code is depended on a minimum hamming


distance, it means to design the standard hamming code, minimum hamming distance which
is three among any two codewords is needed, for instance, a standard hamming code with a
dimension of H (7, 4) which four data bits are encoded into 7 bits by appending 3
redundancy bits [10]. In the next chapter, this kind of hamming code will be explained in
detail. 2.2.2. Extended Hamming Code (EH) In this type of hamming code (EH), the
minimum distance will be increased to have one more than the standard form of hamming
code, so the minimum distance is equal to four among every two codewords.

The main frame and structure of the Hamming code is the same for both binary and
extended Hamming code. In fact, in the extended Hamming code there is an extra bit which
is added to the redundant (parity) bit that allows the decoder to recognize between single
and double bit errors. Afterwards, on the receiver side, the decoder can detect and correct an
error with one bit changing and also a double bit errors can be detected (not corrected) at the
same time, and if there is no attempting by a decoder to correct this single bit error, then it
also can detect maximum three errors [10]. In the last example above, the extended
hamming code defined as H (8, 4) and here there are 4 parity bits which are calculated by a
subscription among 8 total bits and 4 data bits.

2.2.3. Extended Hamming Product Codes


Extended Hamming product codes is a code which is built based on the extended
Hamming code (EH). This kind of code is consist of the new algorithms and outcomes. The
more important issue which is so significant for both researchers and designers is low error
rate for the evaluation of efficiency for channel coding scheme. Also, there are two
remarkable values called BER (bit error rate) and FER (frame error rate) which some fresh
digital radio links are designed for them, because nowadays, using wireless multimedia
applications are going to be extended and low-error-rates are so important issue for these
applications [11]. There are two more important schemes for coding which has a special
performance and proficiency that is very close to Shannon limit, and they are called turbo
codes [11], And low-density parity check codes [12]. So the product codes can be more
competitive, therewith, this kind of codes can be used for the implementation of the fast
parallel decoder. One of the most substantial and prosperous works and paper to accede the
Shannon limit was published in 1993 by three researchers called Berrou, Glavieux and
Thitimajshima, at the same time [14]. They also introduced and stated turbo codes, which
also known as PCCC (PCCC is an abbreviation of parallel concatenated convolutional
codes). The turbo codes is a special technique and method of the coding theory which used
to prepare a reliable and reputable communication over a noisy or messy communication
channel. The turbo codes are a special class of FEC (forward error correction) codes, and
they are used in 3G or 4G mobile communications, for instance, in LTE. The turbo codes
gain their considerable efficiency and performance with relatively low complication
algorithms for encoding and decoding parts. The turbo codes can attain BER or bit-error-rate
levels which should be around 10−5 at the code rate which is completely close to the
specifically related capacity with advisable complexity and 14 | P a g e convolution for
decoding. And the most important key that made it to the successful algorithm was using
decoding algorithm which is called soft-in soft-out [14] [15].

In the last years, other similar codes such as SCCCS, low-density parity check codes
and also the block product codes are introduced and studied, but the product codes are using
a high level of rating and degree in parallelization, for instance, PCCC [15]. In this method
(turbo codes), the main goal is preparing a perfect and complete set of techniques and
analytical methods to have the performance related to the low-errorrate for the extended
Hamming product codes. For doing analytical approximation three important parameters are
required which must be evaluated for the extended Hamming (EH) product codes, these
parameters are called the code performance at low-error-rates, the exact knowledge of the
code minimum distance and its multiplicities, respectively [16].

2.2.3.1. Binary linear code

Linear codes are determined by the special alphabets called Σ which are finite fields.
All over, it will be denoted by 𝐹𝑞 which means the finite fields with q elements (q is a
primer power and 𝐹𝑞 is {0, 1, . . . , q − 1}) [16]. If a required field is Σ and C ⊂ Σ 𝑛 is a
subset of Σ , then C will be called a linear code. And because C is a subset, so there are
several basis c (like: c1, c2, . . . , 𝑐𝑘 which k is a dimension of the subset), and each
generated codeword can be declared as the linear combination of these basis vectors and
these vectors can be written In the dominant of proper matrix as the columns of a n×k matrix
and this proper matrix is called G matrix (generator matrix) [16].

According to the binary linear code which is written in the form of C (n, k), also
there are some values which are described in the following: There are k data bits or
information frame and r parity bits, also, n is the total number of bits of a codeword where n
= (k + r). A particular symbol (. ) is the Hamming weight of a vector [16]. u = (𝑢1, 𝑢2, … ,
𝑢𝑘) c = (𝑐1, 𝑐2, … , 𝑐𝑛) c = (u | P) where the first k bits are a data bits which is generated by
u and P is a specific vector which is contained r parity bits and these parity bits will be
placed after the first k data bits [16].

A. Weight Enumerating Functions (WEF) Weight enumerating function (WEF)


admits the superlative description and complete information for a weight structure. The
turbo codes consist of weight enumerating function of the component codes, for instance,
the traditional trellis search [13]. There are three distinguished WEF for a code [16], the
weight enumerating function is like the following formula: 𝑊𝐶 (𝑦) = ∑ 𝑦 (𝐶) = ∑ 𝐴𝑖𝑦 𝑛 𝑖
𝑐∈C 𝑖=0 Where 𝐴𝑖 is the number of codewords in which the weight is 𝑊𝐻(𝑐) = i. And
another significant formula will be I𝑊𝐶 (𝑦) which is called the information weight of an
enumerating function and it will be written like the downward formula with some changed
parameters in comparison to above formula [16]: I𝑊𝐶 (𝑦) = ∑ 𝑊𝐻(𝑢)𝑦 𝑤𝐻(𝐶) = ∑ 𝑊𝑖𝑦 𝑛 𝑖
𝑐∈C 𝑖=0 In this formula 𝑊𝑖 is a data (information) multiplicity and it used instead of 𝐴𝑖
because it’s the summation of the Hamming weights of 𝐴𝑖 and u is a specific data frame
which the codeword with a weight of 𝑊𝐻(𝑐) = i will be generated by it [16].
The third significant function is IOWEF (or the input output weight enumerating
function) which is written like the underneath formula: IO𝑊𝐶(𝑥, 𝑦, 𝑋, 𝑌) = ∑ 𝑥 𝑘−𝑊𝐻(𝑢)
𝑦𝑊𝐻(𝑢) 𝑋 𝑟−𝑊𝐻(𝑃)𝑌 𝑊𝐻(𝑃) 𝑐∈C In this formula the number of codewords c = (u | P)
which 𝑊𝐻(𝑢) is equal to w and 𝑊𝐻(𝑃) is equal to p, so if we replace w and p instead of
𝑊𝐻(𝑢) and 𝑊𝐻(𝑃) in above formula, respectively [16], then the result will be like the
following formula called IO𝑊𝐶(𝑥, 𝑦, 𝑋, 𝑌): IO𝑊𝐶(𝑥, 𝑦, 𝑋, 𝑌) = ∑ ∑ 𝐴(𝑤𝑝) 𝑟 𝑝=0 𝑘 𝑤=0 𝑥
𝑘−𝑤𝑦 𝑤𝑋 𝑟−𝑝𝑌 𝑝 So, the minimum non zero 𝐴𝑖 = ∑𝑤 + 𝑝 = 𝑖 𝐴(𝑤𝑝) and 𝑤𝑖 = w. ∑𝑤 + 𝑝 =
𝑖 𝐴(𝑤𝑝)

BER and FER Performance for Maximum Likelihood Decoding: In this part, the
main foundation related to the evaluation of analytical code performance is at the SNR like,
low error rates. According to the binary linear codes C (n, k) which can be transmitted by a
binary averse constellation, for example, a 2-PAM, Gray labeled 4-PSK and etc. upon the
increasable White Gaussian Noise channel. So it will be so clear that BER and FER
performance for maximum likelihood decoding which are corresponded to the specific ratio
among the energy of data bits and the spectral density.

The bit error rate or BER is the number of bits containing an errors in each time, so
the ratio of bit error will be calculated by the number of bits containing an error over the
total number of bits which are transferred over a communication channel during the time
interval (the result can be expressed as a percentage). The frame error rate (FER) is a ratio of
errors for a received data bits, it can be used for evaluating a quality of the signal
connection. If the result of FER is so high, it means there are lots of errors among the whole
received bits, and in this situation, the connection can be rejected and dropped [16]. FER < =
∑ 1 2 𝐴𝑖 𝑛 𝑖=𝑑𝑚𝑖𝑛 𝑒𝑟𝑓𝑐 (√𝑖 𝑘 𝑛 𝐸𝑏 𝑁0 ) BER < = ∑ 1 2 𝑤𝑖 𝑘 𝑛 𝑖=𝑑𝑚𝑖𝑛 𝑒𝑟𝑓𝑐 (√𝑖 𝑘 𝑛 𝐸𝑏
𝑁0 ) Where 𝐸𝑏 is the energy of each data (information) bit and 𝑁0 is the noise of spectral
density and the ratio between them (𝐸𝑏 𝑁0 ) is significant and related to the BER and FER
performance for maximum likelihood decoding. When the SNR is very high, then the error
rate will be very low and actually the code performance will be concurred with the union
band truncated for contribution of 𝑑𝑚𝑖𝑛, so the following formula can be written according
to the mentioned condition and the above formulas for very high SNR and 𝐹𝐸𝑅𝐸𝐹 and
𝐵𝐸𝑅𝐸𝐹 are called the code error floor [16]: FER≈ 𝐹𝐸𝑅𝐸𝐹 ≜ 1 2 𝐴𝑚𝑖𝑛𝑒𝑟𝑓𝑐 (√𝑑𝑚𝑖𝑛 𝑘 𝑛 𝐸𝑏
𝑁0 ) BER ≈ 𝐵𝐸𝑅𝐸𝐹 ≜ 1 2 𝑤𝑚𝑖𝑛 𝑘 𝑒𝑟𝑓𝑐 (√𝑑𝑚𝑖𝑛 𝑘 𝑛 𝐸𝑏 𝑁0 )
BER properties: There are some main problems which are related to the computation
of multiplicity, so the new theoretical results are indicated in the below for solving these
problems. A value of 𝑑𝑚𝑖𝑛 (minimum distance) and its multiplicity or 𝐴𝑚𝑖𝑛 are explicit
and clear for many codes, but the value of 𝑤𝑚𝑖𝑛 is very difficult to compute. So, pursuant
to the above formula correspond to 𝐵𝐸𝑅𝐸𝐹, we will have: wmin k ≈ Amin. dmin n then the
mentioned error floor is connected by BEREF ≈ FEREF. dmin n (in fact, it’s a relation
between BEREFand FEREF error floor). BEREF ≈ FEREF. dmin n will be satisfied by
some codes with equality, so in this situation, they satisfy and process 𝐵𝐸𝑅 property, and if
all of the multiplicities which called 𝐴𝑖 and the data multiplicities which called 𝑤𝑖 can be
connected by this property: wi = Ai . i. k n , but both multiplicity and the properties
correspond to BER are still unsolved and open problem, in the next parts some solution
called extended hamming code and extended hamming product code will be explained for
these properties [16].

2.2.4. Extended Hamming code vs. Extended Hamming product code

In the extended Hamming code EHr(n, k), the minimum distance will be increased to
have one more than the standard form of the Hamming code, so the minimum distance is
equal to four among every two codewords. The main frame and structure of the Hamming
code is the same for both binary and extended Hamming code. In fact, in the extended
Hamming code there is an extra bit which is added to the redundant (parity) bit that allows
the decoder to recognize between single and double bit errors. So as mentioned above by
adding an additional parity bit to the Hamming code, EHr(n, k) will be characterized by n =
2 𝑟 , k = 2 𝑟 − 𝑟 − 1 and r is a parity bit which must be an integer numbers greater than two.
Also, the form of multiplicity for extended Hamming code can be found by the following
lemma [16]: A2i = ( n 2i) + (−1) i × (n−1) × ( n 2 i ) n , i =2, 3, …For introducing an
extended hamming product codes, first of all, if C1is a block code like (𝑛1, 𝑘1) and 𝐶2 is a
block code with a form of (𝑛2, 𝑘2), then a proper product code which is equal to the
multiplication between first and second block codeCP (𝑛𝑃, 𝑘𝑃 ) = (C1 × 𝐶2), so the result
will be equal to: (𝑛1 𝑛2, 𝑘1𝑘2) code [15].
The systematic encoder for the product code CP (𝑛𝑃, 𝑘𝑃 ) = ( C1 × C2) = EH𝑟1 (𝑛1,
𝑘1 ) × EH𝑟2 (𝑛2, 𝑘2 ) will be obtained by three important cases: 1) First of all the proper
matrix contains data bits must be written, to create this matrix, first the data symbols 𝑘1𝑘2
will be arranged into 𝑘2 × 𝑘1 array. 2) At the second step, 𝑘1 rows will be encoded by using
code C2 and then (𝑛2 − 𝑘2 ) parity bits can be added to each row of this matrix.

3) Finally, all 𝑛2 columns must be encoded by using code C1 which can add (𝑛1 −
𝑘1 ) parity bits to the end of each column in this matrix. Then it’s so clear that the minimum
distance of the product code CP (𝑛𝑃, 𝑘𝑃 ) = (C1 × C2) = EH𝑟1 (𝑛1, 𝑘1 ) × EH𝑟2 (𝑛2, 𝑘2 ) is
dPmin = d1min × d2min and the values of its multiplicity 𝐴𝑚𝑖𝑛 𝑃 and 𝑊𝑚𝑖𝑛 𝑃 are the
product of C1 and C2, so the minimum distance of product code is equal to 16 (𝐴𝑚𝑖𝑛 𝑃 =
𝐴16 𝑃 = 𝐴4 1 × 𝐴4 2 ). Also, each column of a matrix is a codeword of C1. But if both C1
and C2 are the linear codes, then all rows in this matrix are codewords of C2. So according
to these definitions, systematic encoders for C1 and C2 are assumed. And according to this
property.

As a proper conclusion for a product code which is a construction of placing the data
bits into the matrix, the rows and columns of this matrix will be encoded separately by using
the linear block codes, this kind of encoder for the product code is drawn in the underneath
figure which shows a typical encoding procedure for a product code when a block code is
used for encoding rows and columns of a matrix [15]:
Figure 4. Typical encoding procedure for a product code [15]

2.2.5. Codes with the property of BER and a group of transitive automorphism

In this part, the goal is evaluation and proofing that when a code has a transitive
automorphism group, then the specific multiplicities and the property of BER will be a part
of this code. After that, another issue must be proved which is the issue that the extended
Hamming product codes and the paronymous and derived form statement for their data
multiplicity [16]. A group of transitive automorphism is a group with a special property that
any two specified non-identity elements of the group, there is an automorphism of this group
which is sending the first to the second. According to the binary code C (n, k) which is
described completely in the last part, a permutation of the coordinates is a symmetry of C, if
it maps one codeword in another codeword [16]. Data bits Data bits Row parity bits Data
bits Row parity bits Column parity Row column bits parity bits 20 | P a g e Theorem A: A
binary codes has C (n, k) will satisfy the multiplicity pro
Theorem A: A binary codes has C (n, k) will satisfy the multiplicity property(𝑤𝑖), if
it has a transitive automorphism group, so: 𝑤𝑖 = 𝑖 𝐴𝑖𝑘 𝑛 . There are codewords which called
𝐴𝑖where I is a weight of the codewords. First of all, they should be put in a special matrix
called M which the number of rows and columns of matrix M are called 𝐴𝑖 and n,
respectively. Also, the first k columns of this matrix are contained the number of ones
because C is defined as a systematic [16] [17]. If i and j are two different coordinates (i≠j
and 1 <= i <= n) and then the permutation p is a member of Aut (C) which can map i into j
where Aut (C) is transitive, and the main responsibility of permutation p is mapping any row
and column of a matrix into another row and column.

Transitive automorphism group also exist in the extended Hamming code, so after
that they will contain the multiplicity and BER property, so the data multiplicity for EH𝑟 (𝑛,
𝑘) or an extended hamming code is: w2i = 2𝑖𝑘[( n 2i) + (−1) i × (n−1) × ( n 2 i )] n2 where i
= 2, 3 , … Theorem B: if there are 2 binary codes C1 and C2, then it must be proved that the
transitive automorphism group will be like Aut (C1 × C2) [38] [39]. According to CP (𝑛𝑃, )
= (C1 × C2) = EH𝑟1 (𝑛1, 𝑘1 ) × EH𝑟2 (𝑛2, 𝑘2 ) = (𝑛1 𝑛2, 𝑘1𝑘2), any codeword will be
contained some coordinates (xi , yi) related to its position in this matrix [16][17]. Then for
simplicity, if 1 <= i <= 𝑛 𝑃 , then i ≜ (xi , yi) and j ≜ (xj , yj) and for each of them, there are
two permutation symmetry p1 of C1 which will map xi in xj and permutation p2 of C2 that
map yi into yj , then the first permutation symmetry will be applied to the rows of specified
matrix and second permutation symmetry will be applied to the columns of this matrix, so
for each couple of coordinates, the proper permutation symmetry will be built to map
coordinate i into coordinate j and after that, it’s possible to create transitive automorphism
group [16] [17].

So, the dominant multiplicity for CP (𝑛𝑃, ) = EH𝑟1 (𝑛1, 𝑘1 ) × EH𝑟2 (𝑛2, 𝑘2 ) is
𝑊𝑚𝑖𝑛 𝑃 = 𝑘1(𝑛1−1)(𝑛1−2)𝑘1(𝑛2−1)(𝑛2−2) 36 . In the following table all dominant and
information multiplicity of square extended Hamming product codes are shown and by
looking this table, it’s obvious that product codes are contained a large minimum distance
by using r which is in the range among 3 to 9 and also they have very large multiplicities
[16]
Chapter 3. Theoretical Description of Hamming Code

3.1. Description of Hamming Code A hamming code is a simple method of error


detection and error correction which is used frequently [22]. A hamming code always check
the error for all the bits which exist in the codeword (in fact, it checks all bits of a codeword
from the first bit to the last available bit) [22]. And exactly for this reason the hamming code
is called linear code for error detection and error correction that can detect maximum two
synchronic bit errors and it is capable to correct only single bit error ( can correct just one bit
error) [3]. This method works by appending a special bits called parity or redundancy bits.
Several piece of these extra bits is installed and inserted among the data bits at the desired
positions, but the number of these extra bits is depend on the number of data bits [22].

The hamming code initially introduced code that enclosed [1], for example, to
construct the Hamming (7, 4) that the total bits are equal to seven, which four bits are data
bits into seven bits by adding three parity bits. And the number of each bit must start from
one, like 1, 2, 3, 4, 5, 6, 7 and to write them in binary: 001, 010, 011, 100, 101, 110, 111.
The parity bits are calculated at the proper positions that are calculated by powers of two: 1,
2, 4 and etc. And the data bits must be located at another remained positions which in this
example are: 3, 5, 6, 7. Each data bit is contained in the calculation of two or more parity
bits. In particular, parity bit one (r1) is calculated from those bits that their positions has the
least considerable or important set of bits which are: 1, 3, 5 and 7 (or in binary 001, 011, 101
and 111). Parity bit two or r2 (at index two, or 10 in binary) is calculated from the bits that
their index has the second least important set of bits which are: 2, 3, 6, 7 (or 010, 011, 110,
111 in binary). Parity bit three or r3 (which must be located at position four or 100 in
binary) is calculated from the bits where their positions has the third least considerable set of
bits which are: 4, 5, 6 and 7 (or 100, 101, 110, 111). The code sends message bits padded
with specific parity or redundancy bits in the form of is the block size or a whole number of
bits and k is the number of data bits in the generated codeword [1]. All the procedure which
is required for adding parity bits to the data bits to encode and create a proper codeword will
be explained in next parts of this report. In the following table, all the possible Hamming
codes are indicated:

According to table 1, if there are r parity bits, it can cover bits from 1 up to 2 𝑟 − 1
which called total bits (or n), and also the number of parity bits should be more than one (r >
1). For each dimension of hamming code, there is a rate which is equal to a division of the
number of data bits and number of total bits (rate = ) and always its result will be less than
1, another important factor is called overhead factor which is calculated by a division among
total bits or n and data bits or k (overhead factor = 𝑛 𝑘 ), and its result is always more than
one [2] [3]

3.2. An Alternative Description of the Hamming Code

In the following figure which indicated another description of the Hamming code.
There are three nested circles in this figure and they are related to the three parity equation
defining the Hamming code, also there are seven areas among these available circles in this
figure that are related to the seven bits in a final codeword [2]. The number of the circle can
be extended for each dimension of hamming code. When the single bit error happens, during
a transmission process, this error will be fixed by flipping this bit (single bit) in the proper
area, related to the position where the error happens.

For example: For H (7, 4) which encodes four data bits by appending three parity or
redundancy bits to generate a proper codeword which is a combination of both parity and
data bits.
In the upper figure which describe [7, 4] Hamming code, graphically, each parity bits
can cover just its adjacent bit positions which can be common with other parity bits in this
figure: For instance, parity bit 1 (r1) covers data bits: k1, k2, k4. And parity bit 2 (r2) covers
data bits: k1, k3, k4. And parity bit 3 (r3) covers data bits: k2, k3, k4

3.3. Theoretical Description for Encoding Part of Hamming Code

Hamming code is a specific linear code which can correct just one error by adding r
parity bits to k data bits that generated by the source to have a codeword with a length of n
which is equal to k + r. And the data bits or k is equal to 2 𝑟 − 𝑟 − 1 to generate a codeword
with a length of 2 𝑟 − 1 [1] [6] [7]. General algorithm for the encoding part of hamming
code is described in the following: 1. r parity or redundancy bits are combined to k data bits
to creating a proper codeword which is contains r + k bits. So the bit position sequence is
from the position number 1 to r + k. In fact, for decoding part on the receiver side which is
contained detection and correction step for an occurred error, extra bits is needed to send
some extra bits with data bits which called parity (redundant) bits and these parity bits are
added by the sender (by encoding) and they always removed by the receiver on the receiver
side (by decoding) [23]

But how to merge the data bits and parity bits into a codeword? The parity bit
positions are always numbered in powers of two, reserved and prepared for the parity bits
and another bit positions are related to the data bits [23]. The first parity bit is located in the
bit position: 2 0 = 1. The second parity bit is located in a bit position: 2 1 = 2. The third
parity bit is located in a bit position: 2 2= 2 and so on. In the following simple table, the
position of each parity bits (r) presented in a grey colored cells and they are reserved for the
parity purpose.
After that, the data bits (k) are copied to the other free and remaining positions where
they are not reserved before. And the data bits will be appended in the same order as they
are appeared in the main data bits which generated by source.

So, the transmitter adds parity bits through a process that creates a relationship
between parity bits and specified data bits. Then the receiver will check the two set of bits to
detect and correct the errors which will explain in the decoding part [23], but the following
figure can present the main idea of coding that is used in the hamming code.

2. The value of parity bits are calculated by XOR operation of some combination of
data bits [24]. A combinational of data bits are indicated in the downward table which all
parity bits and data bits and their position that should be calculated according to the rules of
encoding part of the hamming code. For example, for calculating the value of parity bit
number one which called r1, the XOR calculation must begin from r1, then check one bit,
and skip one bit until the last bit which exists in the sequence of bits in the codeword.

The result of XOR operation is zero if both two bits are the same, otherwise, if two
bits are different then the result will be one (according to the following table).
The XOR operation in table 4 indicates the odd function. If the number of 1’s among
the variables X and Y are odd number, then the result of XORing among these two single
bits are equal to one, otherwise if the number of 1’s are even numbers, then the result will be
zero
Chapter 4

Existing Model

Diagonal Hamming Code

Design of Diagonal Hamming method for memory Proposed design of Diagonal


Hamming based multi-bit error detection and correction technique to the memory is shown
in Fig. 1. Using this approach of diagonal Hamming bits, the errors in the message can be
recognized and can be rectified which are saved in the memory. In encoding technique
message bits are given as input to the Diagonal Hamming encoder and the Hamming bits are
calculated. Message and Hamming bits (32+24 bits) are saved in the memory after the
encoding technique.
Errors which are occurring in the message bits, are saved in the memory and can be
recognized and rectified in the decoding technique. B. Design of Diagonal Hamming
Encoder For example, message of 32 bits is accounted in the proposed method. The message
is represented in the form m x n matrix. The grouping of the message bits is depicted in Fig.
2. The encoder generates the hamming bits and the hamming bits are obtained by grouping
the message bits and the hamming bits are calculated with the help of hamming code. The
message bits are patterned as shown in Fig. 3. Eight diagonals are considered in this
Diagonal Hamming method and each diagonal consists of 4 message bits.

Message bits are grouped as shown in the Fig. 3 in the specified directions. The first
diagonal consists of m3[7], m3[5], m2[6], m1[7], the second diagonal has m3[6], m2[7],
m0[1], m1[0], the third diagonal has m0[0], m0[2], m1[1], m2[0], the fourth diagonal has
m3[4], m2[5], m1[6], m0[7], the fifth diagonal has m3[3], m2[4], m1[5], m0[6], the sixth
diagonal has m3[2], m2[3], m1[4], m0[5], the seventh diagonal has m3[1], m2[2], m1[3],
m0[4] and the eight diagonal has m3[0], m2[1], m1[2], m0[3]. Each one of the diagonals has
four message bits. For the respective groups, the hamming bits are calculated as shown in
Fig. 4. The hamming bits are shown as R1, R2, R3, R4, R5, R6, R7, R8 arrays and these
arrays consist of 3 bits. The hamming bits are calculated as given in equations (1)- (24) : For
the first row:

R1[1] = m1[7] ⊕ m2[6] ⊕ m3[7]; (1) R1[2] = m1[7] ⊕ m3[5] ⊕ m3[7]; (2) R1[3]
= m2[6] ⊕ m3[5] ⊕ m3[7]; (3) For the second row: R2[1] = m1[0] ⊕ m0[1] ⊕ m3[6]; (4)
R2[2] = m1[0] ⊕ m2[7] ⊕ m3[6]; (5) R2[3] = m0[1] ⊕ m2[7] ⊕ m3[6]; (6) For the third
row: R3[1] = m2[0] ⊕ m1[1] ⊕ m0[0]; (7) R3[2] = m2[0] ⊕ m0[2] ⊕ m0[0]; (8) R3[3] =
m1[1] ⊕ m0[2] ⊕ m0[0]; (9) For the fourth row: R4[1] = m0[7] ⊕ m1[6] ⊕ m3[4]; (10)
R4[2] = m0[7] ⊕ m2[5] ⊕ m3[4]; (11) R4[3] = m1[6] ⊕ m2[5] ⊕ m3[4]; (12) For the
fifth row: R5[1] = m0[6] ⊕ m1[5] ⊕ m3[3]; (13) R5[2] = m0[6] ⊕ m2[4] ⊕ m3[3]; (14)
R5[3] = m1[5] ⊕ m2[4] ⊕ m3[3]; (15) For the sixth row: R6[1] = m0[5] ⊕ m1[4] ⊕
m3[2]; (16) R6[2] = m0[5] ⊕ m2[3] ⊕ m3[2]; (17) R6[3] = m1[4] ⊕ m2[3] ⊕ m3[2]; (18)
For the seventh row: R7[1] = m0[4] ⊕ m1[3] ⊕ m3[1]; (19) R7[2] = m0[4] ⊕ m2[2] ⊕
m3[1]; (20) R7[3] = m1[3] ⊕ m2[2] ⊕ m3[1]; (21) For the eight row: R8[1] = m0[3] ⊕
m1[2] ⊕ m3[0]; (22) R8[2] = m0[3] ⊕ m2[1] ⊕ m3[0]; (23) R8[3] = m1[2] ⊕ m2[1] ⊕
m3[0]; (24) In the encoder, the hamming bits are calculated for message. We get 24
hamming bits in total for 32-bit message. C. Proposed Diagonal Hamming Decoder The
message bits which are encoded and kept in memory as a matrix as shown in Fig. 4, and are
given as input to the decoder. The decoder now segregates message and hamming bits and it
recalculates the hamming bits and evaluates syndrome bits. The syndrome bits are evaluated
using the equations given in (25)-(48) : For first row S1[1] = R1[1] ⊕ m1[7] ⊕ m2[6] ⊕
m3[7]; (25) S1[2] = R1[2] ⊕ m1[7] ⊕ m3[5] ⊕ m3[7]; (26) S1[3] = R1[3] ⊕ m2[6] ⊕
m3[5] ⊕ m3[7]; (27) For the second row: S2[1] = R2[1] ⊕ m1[0] ⊕ m0[1] ⊕ m3[6]; (28)
S2[2] = R2[2] ⊕ m1[0] ⊕ m2[7] ⊕ m3[6]; (29) S2[3] = R2[3] ⊕ m0[1] ⊕ m2[7] ⊕
m3[6]; (30) For the third row: S3[1] = R3[1] ⊕ m2[0] ⊕ m1[1] ⊕ m0[0]; (31) S3[2] =
R3[2] ⊕ m2[0] ⊕ m0[2] ⊕ m0[0]; (32) S3[3] = R3[3] ⊕ m1[1] ⊕ m0[2] ⊕ m0[0]; (33)
For the fourth row: S4[1] = R4[1] ⊕ m0[7] ⊕ m1[6] ⊕ m3[4]; (34) S4[2] = R4[2] ⊕
m0[7] ⊕ m2[5] ⊕ m3[4]; (35) S4[3] = R4[3] ⊕ m1[6] ⊕ m2[5] ⊕ m3[4]; (36) For the
fifth row: S5[1] = R5[1] ⊕ m0[6] ⊕ m1[5] ⊕ m3[3]; (37) S5[2] = R5[2] ⊕ m0[6] ⊕
m2[4] ⊕ m3[3]; (38) S5[3] = R5[3] ⊕ m1[5] ⊕ m2[4] ⊕ m3[3]; (39) For the sixth row:
S6[1] = R6[1] ⊕ m0[5] ⊕ m1[4] ⊕ m3[2]; (40) S6[2] = R6[2] ⊕ m0[5] ⊕ m2[3] ⊕
m3[2]; (41) S6[3] = R6[3] ⊕ m1[4] ⊕ m2[3] ⊕ m3[2]; (42) For the seventh row: S7[1] =
R7[1] ⊕ m0[4] ⊕ m1[3] ⊕ m3[1]; (43) S7[2] = R7[2] ⊕ m0[4] ⊕ m2[2] ⊕ m3[1]; (44)
S7[3] = R7[3] ⊕ m1[3] ⊕ m2[2] ⊕ m3[1]; (45) For the eight row: S8[1] = R8[1] ⊕ m0[3]
⊕ m1[2] ⊕ m3[0]; (46) S8[2] = R8[2] ⊕ m0[3] ⊕ m2[1] ⊕ m3[0]; (47) S8[3] = R8[3] ⊕
m1[2] ⊕ m2[1] ⊕ m3[0]; (48) If all the syndrome bits are equal to zero, then it represents
that the message bits are not corrupted and if anyone of the syndrome bits in non-zero, then
it represents the message bit(s) are corrupted. These corrupted bits need correction so, the
message bits are sent to the error correction part. In the correcting of error part, the location
of error is identified by doing the following calculations: Suppose if the error is in the third
row of the message organization, then the error position is calculated as: (S3[3] ∗ (22 )) +
(S3[2] ∗ (21 )) + (S3[0] ∗ (20 )); (49) After the position is calculated, the error corrector
negates the bit corresponding to that position to correct the data bit. This process is done till
all the corrupted bits are corrected. Now the output of the decoder will be the form of Fig. 2.
Chapter
Advanced Encryption Standard (AES) Cryptography

2.1 Advanced Encryption Standard (AES)

The Advanced Encryption standard is a 128 bit block cipher that has been widely used
since its acceptance in 2001 [23]. The design of AES was intended to be a more
secure replacement of DES (Data Encryption Standard). Many efficient hardware and
software designs have been documented, taking into consideration various tradeoffs of
speed and area resources. The following sections will provide a general functional
description of AES with an increased focus on the hardware design of AES
components. High speed hardware datapaths that will be relevant in understanding the
GCM datapath will be presented toward the end of this section.

2.1.1 AES Round Block

Each round of AES is modular and consists of four main computations namely, Byte
Substitution, Mix Columns, Shift Rows, and Round Key addition. All rounds in AES
are identical with the exception of the last round which has no Mix Columns operation.
Byte Substitution consists of 16, 8 bit word substitutions while the Mix Columns
operation is constructed from a matrix multiplication. Both of these operations are
defined by Galois field operations in GF(28), but there are different means to
implement them. The Shift Rows operation is simply a permutation on the inputs, and
the Round Key operation consists of XORing key values generated from a Key
Schedule component. The following diagram illustrates the general round structure of
AES which is repeated based on the key input. For a 128 bit key, a single round repeats
10 times, while the 192 and 256 bit keys have 12 and 14 rounds of computation
respectively for increased security.
Figure 2.1: AES Round Structure

Different hardware datapaths can be created from this modular round structure. An
iterative design can use the same design given above but simply adds a 128 bit data
register at the end of the round structure. After a maximum of 14 cycles the AES
encryption result can be obtained. This iterative design can be unrolled to create a
pipelined implementation that has registers placed between round blocks. This is an
outer pipelined AES design and a 128 bit output can be generated at each clock cycle
with a full pipeline. There is enough flexibility, however, in choosing locations of the
pipelined registers. Within each of the round components, additional pipelined stages
can be added within the Sub-bytes operation which will be described in Section 2.2.2.
This is labeled as an inner pipelined AES design, and although a higher latency and
area is present, higher throughputs are possible.

The 128 bit plain text input is mapped into a state array which is a 4x4 block of 8
bit words that is manipulated in each round. For the following sections the state
array block will be used to describe the different round operations so it is important to
understand how the input is transformed into the state array. Figure 2.2 shows this
transformation, by filling bytes of data into the state array by columns. After the AES
encryption round, the last state array outputted is transformed back into a 128 bit
stream.

Figure 2.2: AES Round State Array Transformation

2.1.2 Byte Substitution (Subbytes)

The subbytes operation uses multiple substitution box components (Sbox) each of
which performs an 8 bit substitution. Each 8-bit word of data in the state array, is
substituted using the Sbox. This results in 16 Sbox components used for each round
block, and is the most hardware area consuming part of an AES round. The Sbox
computation is essentially a multiplicative inverse in GF(2 8) followed by an affine
transformation which is a linear mapping from one vector space to another [30]. A
lookup table of 28 values can be used to implement the Sbox component, but it can also
be mathematically computed using logic gates.
Sbox Designs

Rijimen, one of the creators of AES showed in [26] a method of computing the Sbox
by breaking operations in GF(28) down to a composite field GF((24)2) resulting in
significant hardware area savings which would otherwise not be possible using look-up
table implementations. The inverse formula for the Sbox is given in its reduced
version, in Eq.(2.5), where λ is (1100)2. The addition, multiplication, and inverse
operations are computed in GF((24)2), and can be further broken down to the smaller
composite fields, GF((22)2) and GF(22), using the divide and conquer method.

aJ x + bJ = (ax + b)−1 = a(a2 λ + b(a + b))−1 x + (b + a)(a2 λ + b(a + b))−1 (2.5)

Figure 2.3 shows a visual diagram of the composite Sbox. The isomorphic mapping
to the composite field, (GF(28) → GF((24)2)) can be implemented us- ing a matrix
vector multiplication. The affine transformation consists of a linear transformation
followed by a translation which can be achieved by a matrix vector multiplication and
vector addition respectively. The isomorphic mapping and affine transformation both
use fixed matrices that are sparse so the computation costs of these operations are
minimal [30].
Figure 2.3: AES Composite Sbox design

The area consumed by the composite Sbox circuit is very low in comparison with
the lookup table approach (LUT). In the above design by Satoh, the Sbox component
consumes 250 gates while a LUT implementation is more than 4 times larger in area
[30]. The computational cost does increase the circuit delay, so for high speed designs,
LUT, and Binary Decision Diagram (BDD) Sbox implementa- tions are preferred. The
BDD implementation provides a slightly faster alternative to the LUT Sbox consuming
less area resources. Each bit of the 8-bit output is associated with a binary tree and
based on the input bits, each tree helps decide what the output bits should be. The 8 bit
input is used as selector values for sev- eral layers of multiplexers in order to realize the
binary trees in hardware. This type of construction faces large fan out issues for the
initial multiplexer layers. An improved alternative to the BDD that improves these
issues is the Twisted-BDD and is the fastest reported Sbox in the literature. The area
requirements of this design, however, is almost double that of the LUT Sbox design
[22].
2.1.3 Shift Rows

The Shift Rows operations consists of cyclically moving elements around in all but the
first row of the 128 bit input block. The rows are left shifted by 1, 2, and 3 times
respectively for rows 2, 3 and 4. The following mapping illustrates this process. In
hardware no logic is required for this step and simple wire connections are used for this
step to route the input to the output.

Figure 2.4: AES Shift Rows

2.1.4 Mix Columns

The mix columns operation consists of a multiplication and reduction operation over
GF(28). Each column of the state array is multiplied by the polynomial 3x3 + x 2 +x+2
and reduced modulo the field generating polynomial x4+1. This operation is generally
optimized into a single matrix vector product. The four column blocks are used as the
vectors, while a constant 4x4 matrix is used that combines the modulo operation. The
result vector is stored in the next state array at the same location as the original column
vector. All elements are 8 bits in width and the multiplication and addition operations
are performed over GF(28).

   
2 3 1 1 a0

1 2 3 1 a1 (2.6)
•
 1 1 2 3 a2
3 1 1  
  
2 a3
Since the elements of the matrix are of low degree the multiplications are sim-
plified. A multiplication with 2 in GF(2 8) consists of a shift operation along with a
modulo reduction if an overflow occurs. This operation can be reused with mul-
tiplying by 3, but an extra addition is required since 3·ai = (2·ai) ⊕ ai.

2.1.5 Key Schedule

Round keys are XORed at the end of every round and are generated using a Key
Schedule. These keys can be precomputed or generated at each round. The Sbox
components used in the subbytes section are also used here for the round key
generation. For each inputted key length, the method of generating keys is slightly
different, but they contain similar logic components.

The 128 bit key has an Sbox operation done on the last column of the cipher key
state array after the column bytes are rotated. This is followed by a rcon value XOR
addition. The rcon value is generated based on the exponentiation formula rcon(i) =
x254+i mod x8 + x4 + x3 + x + 1 performed over GF(28). These values are usually
precomputed and once the rcon value is added there is an XOR chain on the columns of
the state array that creates the next 128 bit round key. Figure
2.5 shows a single round key computation. This process is repeated by using the round
key as a cipher for generating the next 128 bits of key material. The rcon i value starts
at 1 and increments for each round key.
Figure 2.5: AES 128 bit Key Schedule Round

The 192 bit key schedule is similarly defined but the XOR chain is extended for
another 2 columns to achieve a full 192 bits of key material. Each round unit of AES,
however, only uses 128 bits of key material at a time, so the remaining bits are carried
over for the next round. Figure 2.6 shows this key generation process. The six column
vectors of the key state array are condensed here and shown as
{A0, A1, · · · , A5}.

Figure 2.6: AES 192 bit Key Schedule Round

The 256 bit key schedule has an additional Sbox computation involved in gen-
erating key material. The first 128 bits of key material is generated as shown in Figure
2.5. For the next 128 bits of key material, an Sbox computation is performed on the
fourth column and this follows another chain of XOR statements. Note that
the rcon operation is not performed here with the Sbox.

Figure 2.7: AES 256 bit Key Schedule Round

In order to compute the key schedule operation in hardware most designs gen-
erally precompute roundkeys before starting data encryption or decryption. Com-
puting the key schedule on the fly, while rounds are being computed is possible for
encryption, and has been implemented for iterative AES [16, 1]. There is added
complexity when supporting all keys primarily because of the overlap occurring in
operations. Figure 2.8 shows that although 128 bits of key material are generated at
each round, there is still key material computed from previous round keys for the 192
and 256 bit key schedules. The Srcon represents a Sbox computation with an rcon
computation, while an S represents a simple Sbox computation. The arrows represent
the XOR chaining of column vectors.

Figure 2.8: AES Key Schedule Pattern


For an Iterative key schedule the Sbox components are needed only once in each iteration
for all round keys as can be seen in Figure 2.8. Although the 192 bit key has an Sbox in the
middle of the round, it can still be used with the other key types. The delay of the design in this
way is also limited to have only Sbox, rcon, and an XOR chain of computations regardless of
the key used, so compared to a AES round block, it would not be apart of the critical path. The
rcon values may or may not be included depending on if the key is 256 bits. The control unit for
this key schedule drives multiplexers to guide input into the Sboxes, and direct outputs to the
correct round key registers based on the key type.

Iterative key schedules have been used in pipelined designs for pre-computing keys, but
there is a key latency cost associated with such an integration. If key changes occur more
frequently for a flow of data the throughput in pipelined de- signs would be affected since clock
cycles are wasted in updating key material. Having lower key change latencies therefore is very
relevant for increasing the av- erage throughput for AES.

Chapter 6 : Software Used : Xilinx


Chapter-7

5.XILINX Software

Xilinx Tools is a suite of software tools used for the design of digital circuits implemented
using Xilinx Field Programmable Gate Array (FPGA) or Complex Programmable Logic
Device (CPLD). The design procedure consists of (a) design entry, (b) synthesis and
implementation of the design, (c) functional simulation and (d) testing and verification. Digital
designs can be entered in various ways using the above CAD tools: using a schematic entry tool,
using a hardware description language (HDL) – Verilog or VHDL or a combination of both. In
this lab we will only use the design flow that involves the use of VerilogHDL.

The CAD tools enable you to design combinational and sequential circuits starting with Verilog
HDL design specifications. The steps of this design procedure are listed below:

1. Create Verilog design input file(s) using template driveneditor.


2. Compile and implement the Verilog designfile(s).
3. Create the test-vectors and simulate the design (functional simulation) without using a
PLD (FPGA orCPLD).
4. Assign input/output pins to implement the design on a targetdevice.
5. Download bitstream to an FPGA or CPLDdevice.
6. Test design on FPGA/CPLDdevice

A Verilog input file in the Xilinx software environment consists of the following segments:

Header: module name, list of input and output ports.


Declarations: input and output ports, registers and wires.

Logic Descriptions: equations, state machines and logic functions.

End: endmodule

All your designs for this lab must be specified in the above Verilog input format. Note that the
state diagram segment does not exist for combinational logic designs.

2. Programmable Logic Device:FPGA

In this lab digital designs will be implemented in the Basys2 board which has a Xilinx Spartan3E
–XC3S250E FPGA with CP132 package. This FPGA part belongs to the Spartan family of
FPGAs. These devices come in a variety of packages. We will be using devices that are
packaged in 132 pin package with the following part number: XC3S250E-CP132. This FPGA is
a device with about 50K gates. Detailed information on this device is available at the Xilinx
website.
3. Creating a NewProject
Xilinx Tools can be started by clicking on the Project Navigator Icon on the Windows desktop.
This should open up the Project Navigator window on your screen. This window shows (see
Figure 1) the last accessed project.

Figure 1: Xilinx Project Navigator window (snapshot from Xilinx ISE software)

3.1 Opening aproject

Select File->New Project to create a new project. This will bring up a new project window
(Figure 2) on the desktop. Fill up the necessary entries as follows:
Figure 2: New Project Initiation window (snapshot from Xilinx ISE software)

ProjectName: Write the name of your newproject

Project Location: The directory where you want to store the new project (Note: DO NOT
specify the project location as a folder on Desktop or a folder in the Xilinx\bin directory.
Your H: drive is the best place to put it. The project location path is NOT to have any spaces
in it eg: C:\Nivash\TA\new lab\sample exercises\o_gate is NOT to be used)

Leave the top level module type as HDL.

Example: If the project name were “o_gate”, enter “o_gate” as the project name and then click
“Next”.
Clicking on NEXT should bring up the following window:

Figure 3: Device and Design Flow of Project (snapshot from Xilinx ISE software)

For each of the properties given below, click on the ‘value’ area and select from the list of
values that appear.
o Device Family: Family of the FPGA/CPLD used. In this laboratory we will be
using the Spartan3EFPGA’s.
o Device: The number of the actual device. For this lab you may enterXC3S250E
(this can be found on the attached prototyping board)
o Package:Thetypeofpackagewiththenumberofpins.TheSpartanFPGAusedin this lab
is packaged in CP132package.
o Speed Grade: The Speed grade is“-4”.
o Synthesis Tool: XST[VHDL/Verilog]
o Simulator: The tool used to simulate and verify the functionality of the design.
Modelsim simulator is integrated in the Xilinx ISE. Hence choose “Modelsim-XE
Verilog” as the simulator or even Xilinx ISE Simulator can beused.
o Then click on NEXT to save theentries.

All project files such as schematics, netlists, Verilog files, VHDL files, etc., will be stored in a
subdirectory with the project name. A project can only have one top level HDL source file (or
schematic). Modules can be added to the project to create a modular, hierarchical design (see
Section 9).

In order to open an existing project in Xilinx Tools, select File->Open Project to show the list
of projects on the machine. Choose the project you want and click OK.

Clicking on NEXT on the above window brings up the following window:

Figure 4: Create New source window (snapshot from Xilinx ISE software)

If creating a new source file, Click on the NEW SOURCE.

3.2 Creating a Verilog HDL input file for a combinational logicdesign


In this lab we will enter a design using a structural or RTL description using the Verilog HDL.
You can create a Verilog HDL input file (.v file) using the HDL Editor available in the Xilinx
ISE Tools (or any text editor).

In the previous window, click on the NEW SOURCE

A window pops up as shown in Figure 4. (Note: “Add to project” option is selected by default. If
you do not select it then you will have to add the new source file to the project manually.)

Figure 5: Creating Verilog-HDL source file (snapshot from Xilinx ISE software)

Select Verilog Module and in the “File Name:” area, enter the name of the Verilog source file
you are going to create. Also make sure that the option Add to project is selected so that the
source need not be added to the project again. Then click on Next to accept the entries. This pops
up the following window (Figure 5).

Figure 6: Define Verilog Source window (snapshot from Xilinx ISE software)

In the Port Name column, enter the names of all input and output pins and specify the Direction
accordingly. A Vector/Bus can be defined by entering appropriate bit numbers in the MSB/LSB
columns. Then click on Next> to get a window showing all the new source information (Figure 6). If
any changes are to be made, just click on <Back to go back and make changes. If everything is
acceptable, click on Finish > Next > Next > Finish tocontinue.
Figure 7: New Project Information window(snapshot from Xilinx ISE software)

Once you click on Finish, the source file will be displayed in the sources window in the
Project Navigator (Figure 1).

If a source has to be removed, just right click on the source file in the Sources in Project
window in the Project Navigator and select Removein that. Then select Project -> Delete
Implementation Data from the Project Navigator menu bar to remove any relatedfiles.

3.3 Editing the Verilog sourcefile

The source file will now be displayed in the Project Navigator window (Figure 8). The source
filewindowcanbeusedasatexteditortomakeanynecessarychangestothesourcefile.All
The input/output pins will be displayed. Save your Verilog program periodically by selecting the
File->Save from the menu. You can also edit Verilog programs in any text editor and add them to the
project directory using “Add Copy Source”.

Figure 8: Verilog Source code editor window in the Project Navigator (from Xilinx ISE
software)

Adding Logic in the generated Verilog Source codetemplate:

A brief Verilog Tutorial is available in Appendix-A. Hence, the language syntax and
construction of logic equations can be referred to Appendix-A.

The Verilog source code template generated shows the module name, the list of ports and
also the declarations (input/output) for each port. Combinational logic code can be added
to the verilog code after the declarations and before the endmodule line.

For example, an output z in an OR gate with inputs a and b can be described as,
assign z = a | b;
Remember that the names are case sensitive.
Other constructs for modeling the logicfunction:

A given logic function can be modeled in many ways in verilog. Here is another example
in which the logic function, is implemented as a truth table using a case statement:

moduleor_gate(a,
b,z); input a;

input
b;
output
z;

reg z;

always @(a
or b) begin

case
({a,b}) 00:
z =1'b0;

01: z =1'b1;

10: z =1'b1;

11: z =1'b1;

endcase

end
en
dmodule

Suppose we want to describe an OR gate. It can be done using the logic equation as shown in
Figure 9a or using the case statement (describing the truth table) as shown in Figure 9b. These
are just two example constructs to design a logic function. Verilog offers numerous such
constructs to efficiently model designs. A brief tutorial of Verilog is available in Appendix-A.

Figure 9: OR gate description using assign statement (snapshot from Xilinx ISE
software)
Figure 10: OR gate description using case statement (from Xilinx ISE software)

4. Synthesis and Implementation of theDesign

The design has to be synthesized and implemented before it can be checked for correctness,
by running functional simulation or downloaded onto the prototyping board. With the top-
level Verilog file opened (can be done by double-clicking that file) in the HDL editor
window in the right half of the Project Navigator, and the view of the project being in the
Module view , the implement design option can be seen in the process view. Design entry
utilities and Generate Programming File options can also be seen in the process view. The
former can be used to include user constraints, if any and the latter will be discussed later.
To synthesize the design, double click on the Synthesize Design option in the Processes
window.

To implement the design, double click the Implement design option in the Processes
window. It will go through steps like Translate, Map and Place & Route. If any of these
steps could not be done or done with errors, it will place a X mark in front of that, otherwise
a tick mark will be placed after each of them to indicate the successful completion. If
everything is done successfully, a tick mark will be placed before the Implement Design
option. If thereare

warnings, one can see mark in front of the option indicating that there are some warnings. One
can look at the warnings or errors in the Console window present at the bottom of the Navigator
window. Every time the design file is saved; all these marks disappear asking for a
freshcompilation.
Figure 11: Implementing the Design (snapshot from Xilinx ISE software)

The schematic diagram of the synthesized verilog code can be viewed by double clicking
View RTL Schematic under Synthesize-XST menu in the Process Window. This would be a
handy way to debug the code if the output is not meeting our specifications in the proto type
board.

By double clicking it opens the top level module showing only input(s) and output(s) as
shown below.
Figure 12: Top Level Hierarchy of the design

By double clicking the rectangle, it opens the realized internal logic as


shown below.
Figure 13: Realized logic by the XilinxISE for the verilog code

5. Functional Simulation of CombinationalDesigns


5.1 Adding the testvectors

To check the functionality of a design, we have to apply test vectors and simulate the
circuit. In order to apply test vectors, a test bench file is written. Essentially it will supply
all the inputs to the module designed and will check the outputs of the module. Example:
For the 2 input OR Gate, the steps to generate the test bench is as follows:
In the Sources window (top left corner) right click on the file that you want to generate
the test bench for and select ‘New Source’

Provide a name for the test bench in the file name text box and select ‘Verilog test
fixture’ among the file types in the list on the right side as shown in figure 11.

Figure 14: Adding test vectors to the design (snapshot from Xilinx ISE software)
Click on ‘Next’ to proceed. In the next window select the source file with which you
want to associate the test bench.

Figure 15: Associating a module to a testbench (snapshot from Xilinx ISE software)

Click on Next to proceed. In the next window click on Finish. You will now be provided
with a template for your test bench. If it does not open automatically click the radio
button next to Simulation .
You should now be able to view your test bench template. The code generated would be
something like this:
moduleo_gate_tb_v;

//
Inp
uts
reg
a;

reg b;

//
Out
puts
wire
z;
// Instantiate the Unit Under
Test (UUT) o_gateuut (

.a(a),

.b(b),

.z(z)

);

initialbegin

// Initialize
Inputs a =
0;

b =0;

// Wait 100 ns for global


reset tofinish #100;

// Add stimulus here

end

endmodule
The Xilinx tool detects the inputs and outputs of the module that you are going to test an assigns
them initial values. In order to test the gate completely we shall provide all the different input
combinations. ‘#100’ is the time delay for which the input has to maintain the current value.
After 100 units of time have elapsed the next set of values can be assign to the inputs.
Complete the test bench as shown below:

moduleo_gate_tb_v;

//
In
put
s
reg
a;
reg
b;

//
Out
puts
wire
z;

// Instantiate the Unit Under


Test (UUT) o_gateuut (

.a(a),

.b(b),
.z(z)

);

initialbegin

// Initialize
Inputs a =
0;

b =0;

// Wait 100 ns for global reset to finish #100;

a = 0;

b =1;

// Wait 100 ns for global


reset tofinish #100;

a = 1;

b =0;
// Wait 100 ns for global
reset tofinish #100;

a = 1;

b =1;

// Wait 100 ns for global


reset tofinish #100;

end

endmodule

Save your test bench file using the File menu.

5.2 Simulating and Viewing the OutputWaveforms

Now under the Processes window (making sure that the testbench file in the Sources
window is selected) expand the ModelSim simulator Tab by clicking on the add sign next
to it. Double Click on Simulate Behavioral Model. You will probably receive a complier
error. This is nothing to worry about – answer “No” when asked if you wish to abort simulation.
This should cause ModelSim to open. Wait for it to complete execution. If you wish to not receive
the compiler error, right click on Simulate Behavioral Model and select process properties. Mark
the

checkbox next to “Ignore Pre-Complied Library Warning Check”.


Figure 16: Simulating the design (snapshot from Xilinx ISE software)

5.3 Saving the simulationresults

To save the simulation results, Go to the waveform window of the Modelsim simulator,
Click on File -> Print to Postscript -> give desired filename and location.

Notethatbydefault,thewaveformis“zoomedin”tothenanosecondlevel. Use the


zoom controls to display the entirewaveform.

Else a normal print screen option can be used on the waveform window and subsequently
stored in Paint.
Figure 17: Behavioral Simulation output Waveform (Snapshot from
ModelSim)

For taking printouts for the lab reports, convert the black background to white in Tools ->
Edit Preferences. Then click Wave Windows -> Wave Background attribute.
Figure 18: Changing Waveform Background in ModelSim

Conclusion

Refrences

You might also like