Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Elec 466

Diffie-Hellman key exchange with handshaking and hardware


multiplication

David Hamdi
V00208144

August 1, 2017
Contents
1.0 Problem description and solution outline .............................................................................................. 1
2.0 Handshaking protocol ............................................................................................................................. 2
2.1 Timing diagram ................................................................................................................................... 2
2.2 Hardware state diagram ..................................................................................................................... 2
2.3 Software state diagram ....................................................................................................................... 3
3.0 Hardware multiplier ................................................................................................................................ 4
3.1 Hardware block diagram ..................................................................................................................... 5
4.0 SystemC code snippets ........................................................................................................................... 6
4.1 Hardware multiplier code snippets..................................................................................................... 6
4.2 Software code snippets....................................................................................................................... 9
5.0 Recommendations .................................................................................................................................. 9

Figure 1: Flow chart for the start of multiplication....................................................................................... 1


Figure 2: Handshaking timing diagram ......................................................................................................... 2
Figure 3: Hardware handshaking finite state machine ................................................................................. 3
Figure 4: Software state diagram .................................................................................................................. 4
Figure 5: Hardware datapath ........................................................................................................................ 5
Figure 6: Hardware modules part 1 .............................................................................................................. 6
Figure 7: Hardware modules part 2 .............................................................................................................. 7
Figure 8: Hardware modules part 3 .............................................................................................................. 8
Figure 9: Software module............................................................................................................................ 9
1.0 Problem description and solution outline
The project is based around the Diffie-Hellman key exchange protocol; which is used to exchange private
keys publicly and thus enable encrypted communications. The process requires a lot of multiplications to
be carried out which are computationally expensive. It is therefore in the systems best interest to
implement the multiplication in hardware to take the load off of the software. In the specific case of this
project the key exchange program is implemented such that the software and hardware modules start of
interacting purely through timed waits. As such our first task was to implement a handshaking protocol
for the software and hardware to interact with each other using finite state machines. Secondly we had to
design the hardware multiplier by analysing the software code and integrate it into the flow of the finite
state machine.

The multiplication process that was to be analyzed essentially took two 32 bit numbers and split them
both into two 16 bit segments. The 4 left over numbers of 16 bits each were then cross-multiplied and
checked for overflows before being combined into two 32 bit outputs (representing one 64 bit number).

Figure 1: Flow chart for the start of multiplication

Figure 1 graphically depicts the start of the multiplication process. While the initial stages of the
multiplication process displayed above are important and implemented in the hardware design the trickier
parts come later

1
2.0 Handshaking protocol
The code given as is relies on timed waits to interface between the “hardware” (at this stage just the
multiplication software stuck in a different file) and software. Timed waits are not a very robust method
for interfacing and thus need to be replaced by a handshaking protocol.
2.1 Timing diagram
The handshaking protocol is implemented using a simple enable/done two signal system. Once the
software has received two inputs (32 bits), the enable signal is asserted. The software then enters a wait
statement while the hardware conducts the multiplication. Once the multiplication process is complete the
hardware returns the new values and asserts the done signal. The software then receives the inputs from
the hardware and de-asserts the enable signal before once again entering a wait statement. The hardware
then de-asserts the done signal, ending the multiplication process.

Figure 2: Handshaking timing diagram

Figure 2 shows the timing diagram for the process. The traces are split up according to where they take
place, software or hardware. One of the main benefits of the handshaking protocol is that the software and
hardware modules can run at different clock cycles; however the timing diagram is seen from the
perspective of the hardware module, hence why the inputs are seen on the same clock edge. It should be
noted that the number of clock cycles in the diagram will vary depending on the circuit used to implement
the multiplication algorithm.

2.2 Hardware state diagram

2
Figure 3: Hardware handshaking finite state machine

Figure 3 shows the finite state machine for the hardware module with the handshaking protocol
implemented. The first state “Waiting” is simply checking for the enable signal from the software to be
asserted. A wait() statement is not needed here as the state simply doesn’t trigger the next stage enable
hasn’t been asserted. Once enable has been asserted the second state “Executing” is entered. The second
state is where the majority of the multiplication takes place; before the hardware is actually implemented
the original code for multiplication from the software goes here. The “Outputting” state is where the
result is given back to the software and the done signal is asserted. The last state “Finishing” essentially
checks the enable signal from the software and keeps the program waiting until it is deasserted. The
process is actually carried out in the “Waiting” state but the finishing passes directly to that state.

2.3 Software state diagram

3
Figure 4: Software state diagram

Figure 4 shows the state diagram of the software implementation. The state machine for the software is
based primarily around two wait statements. The wait statements are used to check the signal from the
hardware in order to determine what to do next.

3.0 Hardware multiplier


To design the hardware multiplier I first analysed the code provided in the software. The process was
broken down into the following 12 steps.
1. Split the inputs into 4 low and high half words (16 bits each).
2. Multiply bLow with cLow to make the low 32 bit word a0.
3. Multiply bLow with cHigh to make t (also 32 bits).
4. Multiply bHigh with cLow to make u (also 32 bits).
5. Multiply bHigh with cHigh to make a1 (also 32 bits).
6. t and u are then summed.
7. The sum of t and u is compared with the original value of u (the first if statement).
8. If the sum is lower an overflow occurred, so add 1 to the high half of a1 (1 is shifted 16 bits to the
left before the addition).
9. The sum of t and u is shifted into a high half word and assigned to u.
10. a0 is added with the new value of u and compared to the value of the sum of t and u shifted to the
left 16 bits.

4
11. Once again if the new sum of a0 and u is smaller than the old one an overflow has happened and
1 must be added to a1; however this time 1 is added to the lower half so no shifting is required.
12. The final step is to shift the sum of the original t and u 16 bits to the right and to sum it with a1.

The sequential steps listed above were then integrated into a hardware circuit. The circuit was designed to
run for the most part in parallel, which reduces the necessary clock cycles but increases the size of the
circuit itself.

3.1 Hardware block diagram

Figure 5: Hardware datapath

Figure 5 shows the hardware datapath used to implement this multiplication process. The parts lists
includes:
 2 register splitters
 4 multipliers
 5 adders
 2 comparators
 2 multiplexors
 2 shifters
The circuit was designed to run in one clock cycle, with the only pieces triggered by the clock being the
initial split registers. The end result is two, 32 bit words equalling one, 64 bit number. The multiplexors
and comparators represent the two if statements with the results of the if statements corresponding to the
inputs of the multiplexors. In a future iteration it may be possible to combine the two multiplexors into
one with four inputs and two control signals coming from the comparators.

5
4.0 SystemC code snippets

4.1 Hardware multiplier code snippets

Figure 6: Hardware modules part 1

6
Figure 7: Hardware modules part 2

7
Figure 8: Hardware modules part 3

Figures 6, 7 and 8 show the constructors utilized to create each part used in the circuit. Different
variations of say multipliers would then be instantiated in the hardware header file.

8
4.2 Software code snippets

Figure 9: Software module

Figure 9 shows the process code for the software interacting with the hardware. The process starts by
outputting the data to hardware module with the lines “out_data_1.write(b);” and “out_data_2.write(c);”.
Immediately following the value output the enable signal is asserted, telling the hardware module that it
can now run. An infinite while loop is then entered which continuously checks the done signal controlled
by the hardware module. Once the hardware is done carrying out the multiplication process it asserts done
which subsequently lets the software break out of the infinite while loop and receive the data from the
hardware. Once the data is received the enable signal de-asserted before another infinite while loop is
entered. The second infinite loop enables the software to wait for the hardware to de-assert the done
signal (essentially resetting the system) before ending the process.

5.0 Recommendations
As previously mentioned one recommendation for a future iteration of this design would be to cut the
number of multiplexors down to one. Otherwise it would be a good idea to explore serial implementations
of the design that could reduce the size of the circuit being used. This project has been a theoretical based
exercise and in the real world design constraints such as size, speed and available power would be major
concerns that weren’t necessarily accounted for in the design presented in this report. For instance adding
registers between each stage could help to spread out the power consumption but would take more clock
cycles to complete.
Another potential issue with the design is the clock speed which will need to be optimized for its
maximum value. Cramming everything into one clock cycle might result in a clock cycle that is too long
for the requirements (Once again a real world consideration).

You might also like