Chuxiong 2019

The Journal of Engineering
IET International Radar Conference (IRC 2018)
Design and implementation of parallel CRC eISSN 2051-3305

Received on 25th February 2019
algorithm for fibre channel on FPGA Accepted on 2nd May 2019

E-First on 10th October 2019
doi: 10.1049/joe.2019.0727
www.ietdl.org
Wu Chuxiong1 , Shi Haifeng1

1
Nanjing Research Institute of Electronics Technology, Nanjing, People's Republic of China
E-mail: wuchuxiong@outlook.com
Abstract: Fibre channel (FC) provides the high-speed and low-latency communication between the end systems, widely used in
data storage, aerospace applications and large electronic equipment including radar systems. Excellent in error detection and
easy to be implemented in hardware, cyclic redundancy check (CRC) is an important error detection method widely used in
network data transmission. This study introduces a design and development of parallel CRC algorithm for the hardware
implementation on FPGA to meet the specifications for FC. The algorithm can process 128-bit parallel data in a block by broken
it into four 32-bit data and calculate their CRC, respectively, based on the linear feedback shift register, simplifying the
calculation process and reducing resource consumption.
1 Introduction high-speed CRC algorithm for the Ethernet using a reduced lookup
With the rapid development of computer science and data table algorithm achieving a throughput of 40 Gbps.
communication technology, data volume is explosively growing in Although the CRC algorithm has a certain degree of versatility,
the information age, resulting in urgent demands for higher transfer it is necessary to focus on simplified design and implementation
rates. How to store and transmit data securely and rapidly over for FC due to the specifications of FC protocols. On the one hand,
long distances has become a major requirement of modern the increasing number of targets to be detected, the more flexible
communication technology. Fibre channel (FC) protocol, short as mobility of the targets and the dramatic increase in radar
FC, came into being out of this demand. technology and complexity in the future make the radars’ demand
Developed by the American National Standards Institute for data volume higher and higher. On the other hand, the resources
(ANSI), FC is mainly used to achieve high-speed data transmission of FPGAs to process the massive data are limited. Since each FC
between workstations, mainframes and storage devices. High- frame has a CRC field consuming the FPGA resources, how to
bandwidth, low-latency, long-distance transmission, flexible- design a resource-intensive CRC module that meets the transfer
topology and support for multiple upper-layer protocols features of rate requirement of FC has practical significance.
FC make it advantage for high-speed communication. Based on the The paper proposed a kind of design and implementation of a
special requirement of the avionics environment, the FC committee parallel CRC algorithm for FC by Verilog language from both
provided a subset protocol FC-AE in the Avionics Environment. coarse and fine dimensions. The algorithm consumes less resource
After several years of research and development, FC has become of FPGA logic resources and can be easily realised. In this paper,
the preferred choice for new generation of unified avionics network we are going to discuss the principle of CRC and the method we
in the fourth-generation fighter, and it has been applied to new use to generate CRC, respectively. Then, we will introduce the
fighter aircrafts such as F-35 and B-1B. In addition to the airborne specifications of FC in calculating CRC. Finally, according to the
field, large shipborne radar systems also adopt FC to guarantee the rules, we will focus on the implementation of the method and
high speed and reliability of data transmission. analysing the results.
FC generally uses light instead of electricity to transmit data in
the medium so that it can better avoid electromagnetic interference. 2 Principle of CRC algorithm
However, due to various complex influencing factors on the
channel, the transmitted signal will still receive different degrees of CRC is several bits of binary digits that shall immediately follow
interference, and in severe cases, it will cause bit errors and even the data to be check and shall be used to verify the integrity of the
block the communication. That's why we should check whether the data. As Fig. 1 of the FC frame format shows, the CRC field
data received is correct at the receiver. CRC is the most widely appended behind the frame content checks the integrity of the
used and powerful verification method in the field of frame content including the Frame_Headers, if any, the
communication and storage. Frame_Header, and the Data_field.
The CRC mathematics was proposed in 1961 by Perterson and For ease of calculation and explanation, the original data to be
Brown [1] as an error detection code. Since the theory was put checked in the frame content are treated as the coefficients of a
forward, CRC has aroused widespread concern in academia. In [2, polynomial U(x) and CRC can be generated as the following steps:
3], it is shown how the pipelined CRC algorithm can be
implemented based on lookup tables. In [4, 5], a kind of
1.
Adding m zeros after the original data to get xmU(x), where m
generalised parallel CRC algorithm was proposed to incorporate is determined by the degree of a certain polynomial called
negative degree terms. Bajarangbali and Anand [6] have designed a generate polynomial denoted as g(x).
2.
Module 2 dividing the data with m zeros appended by the
generate polynomial g(x) to get the remainder r(x).
3.
Adding zeros before the data corresponding to r(x) to extend it
to m bits.
Append the data got in step 3 after the original data U(x) and we
Fig. 1 FC frame format
get the data to be transmitted, expressed as xmU(x) + r(x).
J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830 1

This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
128 32
x (r(x) + g(x)w(x)) + x U′(x) = r′(x) + g(x)w′(x) (7)
32 96
x (x r(x) + U′(x)) = r′(x) + g(x)w′(x) (8)
The last equation is established from that x128g(x)w(x) can be

divided by g(x) with no remainder. The equation above indicates
that the CRC of the new 256-bits data equals the CRC of the
addition of the second 128-bits data and the first CRC multiplied
Fig. 2 Pipelined architecture of parallel CRC algorithm by x96. It could be written as the following equation:
96
CRC N x = CRC x r x + U′ x (9)
This equation demonstrates that, to get the CRC of the new 256
bits of data, we can add the first CRC multiplied by x96 and the
second 128 bits of data, and then calculate CRC of the addition.
CRC has 32 bits and the product of CRC and x96 has 128 bits, so
the addition of x96r(x) and U′(x) has 128 bits, the same as the
Fig. 3 General LFSR architecture length of first data. Since the data has the same length, we can
calculate its CRC in an iterative way.
As shown in Fig. 2, we can design the CRC block based on
pipelined architecture as the following steps:
1. Depart the 128 bits of data into four 32-bits data.

2. XOR the CRC generated by the previous with the first 32-bits
data of the 128-bits data.
3. Calculate the CRC of each 32-bits data, respectively.
Fig. 4 LFSR architecture for CRC-5
4. XOR the CRC calculated in step 3 so that we get the CRC of
To check whether the data transmission is correct, the receiver the data.
module 2 divides the data received by the same generate
polynomial. If the data received is correct, which means the data 4 Fine-gained algorithm for CRC-32
received is xmU(x) + r(x), the remainder will be zero. In the other Now we have got the coarse-gained method to calculate the 128-
case, if the data received is indivisible, it indicates there must be bits data. We depart the 128-bits data into four sections and
some errors in the transmission. calculate their CRC, respectively. However, how to calculate the
CRC is a linear block code with strong error correction and CRC of 32 bits of data still needs to be discussed. We are going to
some special algebraic properties which will be further discussed in achieve it in a fine-gained method derived from the linear feedback
the following sections. According to different generate shift register (LFSR).
polynomials, CRC is flexible in degrees such as CRC-8, CRC-12, The LFSR architecture is shown in Fig. 3, which is constructed
CRC-16, CRC-CCITT and CRC-32. The FC specifications adopt from the coefficients of the generator polynomial
CRC-32 to improve the accuracy of data transmission and we
continue our introduction based on CRC-32 in the following 32 26 22 16 12 11 10
g(x) = x + x + x + x + x + x + x
sections. (10)
8 7 5 4 2
+x +x +x +x +x +x+1
3 Coarse-gained algorithm for CRC-32
The figure of LFSR architecture is consistent with the principle of
Let U(x) be the first 128 bits of the message and r(x) be the CRC of the CRC algorithm. The LFSR can be seen as a module 2 divider
U(x). Let g(x) be the binary polynomial of the generator; then for used to calculate the results of U(x) divided by g(x). Append 32
some polynomial w(x), the definitions can be written as zeros behind the original k-bit message U(x), we could get its CRC
32 after k + 32 clock cycles. The serial CRC-LFSR architecture takes
x U(x) = r(x) + g(x)w(x) (1) a long time to generate the CRC code, inconsistent with the data
processing requirement of FC.
32 26 22 16 12 11 10
g(x) = x + x + x + x + x + x + x Based on LFSR, we can directly establish the relationship
(2) between the data and its CRC. Without loss of generality, we take
8 7 5 4 2
+x +x +x +x +x +x+1 CRC-5 with 4-bit data U(x) as an example. The generate
polynomial of CRC-5 is x5 + x2 + 1, so the architecture of CRC-5
U(x) is the first 128 bits of the message, and it follows with other can be described in Fig. 4.
128-bit data to be processed. Let U′(x) be the next 128-bit of the From Fig. 4, we can conclude the states in these equations
message, which is closely after the first one. The two 128-bits data
forms 256 bits of the message, denoted as N(x). So, N(x) can be S4[i + 1] = S3[i] (11)
written as
128
S3[i + 1] = S2[i] (12)
N(x) = x U(x) + U′(x) (3)
∧
S2[i + 1] = 4S [i] 1S [i] =4 S [i] ⊕1 S [i] (13)
Let r′(x) be the CRC of N(x), then for some polynomial w′(x), the S [i + 1] = S [i] (14)
1 0
definitions can be written as
32 128 32 32
x N(x) = r′(x) + g(x)w′(x) (4) x x U(x) + x U′(x) = r′(x) + g(x)w′(x) (6)
32 128
x (x U(x) + U′(x)) = r′(x) + g(x)w′(x) (5)
2 J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830

∧
S0[i + 1] = S4[i] D[8 − i] = S4[i] ⊕ D[8 − i]
(15)
The ⊕ operation is the same with XOR operation,

demonstrating module 2 addition operation.
We can write the equations in matrices as showing below:
J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830 3

S4[i + S3[i] CRC[1] = U[1] (25)
1] S2[i] ∧
S3[i + = S4[i] ⊕ CRC[0] = U[3] U[0] (26)
1] S2[i S1[i] S0[i] U[i] means the ith number of the original data from the least
+ 1] S4[ i] ⊕ D[8 − significant bit.
S1[i + (16) Follow the same principle, we can drive the architecture of
i]
1] CRC-32 based on its generate polynomial and calculate the
relationship between the original data and its CRC.
S0[i +
1] 5 FC specifications
0 1 0 0 0 S4[i] 0 ANSI INCITS [7] is the standards for FC developed by ANSI and
0 0 1 0 0 S3[i] 0 it emphasises the characteristics of CRC for FC. The CRC in an
= 1 0 0 1 0 S2[i] ⊕ 0 FC-2 frame shall be aligned on a word boundary and be calculated
before encoding and scrambling.
S4[i] 0 1 0 0 0 An informative diagram of this mapping is given in Fig. 5.
S3[i] 0 0 1 0 0 Different from other protocols, the CRC scope shall be mapped to a
data polynomial before generating its CRC by the following steps:
S[i] S2[i] , A= 1 0 0 1 0, D[8 − i]
= =
S1[i 0 0 0 0 1 1. Reversing the order of the bits in the first byte of the CRC scope.
] 1 0 0 0 0 2. Using the most significant bit of the revised first byte in the
(17) CRC scope as the most significant coefficient of the data
S0[i polynomial.
] 3. Using successively less significant bits of the revised first byte
0 in the CRC scope as less successively less significant
0
0
0
D[8 − coefficients of the data polynomial.
4. Following steps 1–3 for each successive byte of the CRC scope
i] to generate successively less significant groups of eight
then
coefficients of the data polynomial.
S[i + 1] = AS[i] ⊕ D[8 − i] (18)
As shown in Fig. 6, the CRC field value shall be mapped from the
and FCS polynomial derived from the FCS calculation the same way as
mapping CRC scope to FCS input illustrated in Fig. 5. We should
S[i + 2] = AS[i + 1] ⊕ D[7 − i] extend the FCS polynomial to 32 coefficients of the FCS
polynomial by adding zero value coefficients at the most
= A(AS[i] ⊕ D[8 − i]) ⊕ D[7 − i] (19) significant end if necessary.
2
= A S[i] ⊕ AD[8 − i] ⊕ D[7 − i]
6 Results
Thus
The FC protocol requires the data and the CRC reverse in a certain
rule, so we should reverse the data and CRC polynomial
S[1] = AS[0] ⊕ D[8] coefficients before and after the calculation, respectively.
2
S[2] = AS[1] ⊕ D[7] = A S[0] ⊕ AD[8] ⊕ The SOF and EOF delimiters shall not be included in the CRC
scope, so we should detect and exclude EOF and SOF delimiters
D[7]
(20) from the CRC scope. Referring to [8], the algorithm flowchart can
⋮ be drawn in Fig. 7.
9 8 7
S[9] = A S[0] ⊕ A D[8] ⊕ A D[7]
6 5 4
⊕ A D[6] ⊕ A D[5] ⊕ A D[4]
3 2
⊕ A D[3] ⊕ A D[2] ⊕ AD[1] ⊕ D[0]
D[4], D[3], D[2], D[1] and D[0] are all zeros appended to the
original data to generate CRC according to the CRC method, so we
can erase the last five polynomial terms to simplify the calculation.
And let the initial state of S[0] be [0 0 0 0 0]T, we could calculate
that
S4[9] D[7]
S3[9] D8 ⊕
S[9] S2[9] D[6] (21)
= = D[8] ⊕ D[7] ⊕
S1[9] D[5] D[6]
S0[9] D[8] ⊕ D[5]
that means
∧
CRC[4] = U[2] (22) CRC[3] = U[3] U[1] (23)
4 J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830
∧ ∧
CRC[2] = U[3] U[2] U[0] (24)
Fig. 5 Mapping CRC scope to FCS input
J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830 5

Fig. 8 Results of the simulation
Table 1 Comparison of utilisation

Implementation Slice LUTs Slice registers Slice LUT flip-flop
pairs
lookup tables 393 288 148 500
parallel algorithm 390 128 111 393
CRC algorithm reduces the latency from receiving data to

generating its CRC, better meeting the transfer rate requirement of
FC. The algorithm simplifies the process of the calculation to
construct the direct relationship between the input data and its
Fig. 6 Mapping FCS coefficients to CRC field CRC. From Section 1, compared with a common CRC algorithm
based on lookup tables, the resources consumption of our
algorithm reduces about 20% in LUT flip-flop pairs of the same
FPGA chip, denoting the algorithm is efficient.
7 Conclusion
CRC, as a kind of widely-used algorithm in data communication,
can verify data integrity effectively. Aimed at the design and
implementation of CRC for high-speed FC protocol, the paper
gives a method to calculate CRC of 128-bit parallel data. The
method is parallel in both coarse dimension and fine dimension.
The coarse dimension demonstrates how to depart the 128-bit data
into four 32-bit parts and the relationship of CRC of each 128-bit
data. The fine dimension gives a method that uses matrices to
previously generate the direct connection of original data and it's
CRC based on LFSR. According to the method and following the
specifications of FC, the paper designed a resource-intensive
method to generate CRC for FC.
8 References
[1] Perterson, W.W., Brown, D.T.: ‘Cyclic codes for error detection’, Proc. IRE,
1961, 49, (1), pp. 228–235
Fig. 7 CRC calculation procedure [2] Sun, Y., Kim, M.S.: ‘A table-based algorithm for pipelined CRC calculation’.
2010 IEEE Int. Conf. on Communications (ICC), Cape Town, South Africa,
2010, pp. 1–5
According to the flowchart and the theory of CRC, we program [3] Sun, Y., Kim, M.S.: ‘A pipelined CRC calculation using lookup tables’. 7th
the CRC encoding module for FC in Verilog, designed and IEEE Consumer Communications and Networking Conf., Las Vegas, NV,
simulated with Vivado design suite. USA, 2010, pp. 1–2
Fig. 8 shows the simulation results. As the paper mainly [4] Kennedy, C.E., Mozaffari-Kermani, M.: ‘Generalized parallel CRC
computation on FPGA’. 2015 IEEE 28th Canadian Conf. on Electrical and
discusses the design and implementation of CRC for frame content, Computer Engineering (CCECE), Halifax, Canada, 2015, pp. 107–113
the figure shows the results of CRC of the data and skips the steps [5] Kennedy, C., Reyhani-Masoleh, A.: ‘High-speed parallel CRC circuits’. Proc.
to detect SOF and EOF delimiters. From the figure, we can see that of the 42nd Annual Asilomar Conf. on Signals, Systems and Computers
for a 128-bit data ‘0 × 313233343536373839383736353433’, (ACSSC2008), Pacific Grove, CA, USA, 2008, pp. 1823–1829
[6] Bajarangbali, D., Anand, P.A.: ‘Design of high speed CRC algorithm for
reverse it and we will get its reversed CRC ‘0 × 9bd00176’. In Ethernet on FPGA using reduced lookup table algorithm’. IEEE Annual
addition, for a data stream of 512 bits ‘0 × Indian Conf. (INDICON), Bangalore, India, 2016, pp. 1–7
313233343536373839383736353433_31323334353637383938373 [7] ‘ANSI INCITS, fibre channel framing and signaling-4(FC-FS-4)’, 2013
6353434_313233343536373839383736353435_313233343536373 [8] Narapureddy, P., Ananda, C.M., Pradeep Kumar, B., et al.: ‘Design and
839383736353436’, its revered CRC is ‘0 × 4e6dd731’. It can be implementation of fiber channel based high speed serial transmitter for data
protocol on FPGA’. IEEE Int. Conf. on Recent Trends in Electronics
seen that the results generated by our algorithm are consistent with Information Communication Technology, Bangalore, India, 2015, pp. 926–
[9], showing the algorithm is feasible. Append the reversed CRC 931
after the reserved data, we will get the frame content to transfer [9] ‘Online CRC calculation’, Available at http://www.ip33.com/crc.html,
(Table 1). accessed 2 April 2018
The CRC can be generated immediately when the input data is
received by the posedge of the clock. The implementation of the
6 J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830


Chuxiong 2019

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chuxiong 2019

Uploaded by

Copyright:

Available Formats

The Journal of Engineering

IET International Radar Conference (IRC 2018)

Design and implementation of parallel CRC eISSN 2051-3305

algorithm for fibre channel on FPGA Accepted on 2nd May 2019

Wu Chuxiong1 , Shi Haifeng1

J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830 1

The last equation is established from that x128g(x)w(x) can be

1. Depart the 128 bits of data into four 32-bits data.

2 J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830

The ⊕ operation is the same with XOR operation,

J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830 3

Fig. 5 Mapping CRC scope to FCS input

J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830 5

Table 1 Comparison of utilisation

CRC algorithm reduces the latency from receiving data to

6 J. Eng., 2019, Vol. 2019 Iss. 21, pp. 7827-7830

You might also like