Professional Documents
Culture Documents
SNN Accelerator: Central DRAM
SNN Accelerator: Central DRAM
Central DRAM
(Spike, Membrane Potential, SNN Parameters)
DMA
Controller
SPIKE INPUT
RAM
One-Seen
Detector
One-Seen
Detector
address
address
LOCAL DRAM
(Synaptic Weight)
LOCAL DRAM
(Synaptic Weight)
LANE1
B
U
F
F
E
R
B
U
F
F
E
R
B
U
F
F
E
R
B
U
F
F
E
R
B
U
F
F
E
R
B
U
F
F
E
R
B
U
F
F
E
R
Wallace Tree
ADD
One TILE
( Layer)
B
U
F
F
E
R
B
U
F
F
E
R
Stochastic
Controller
SNN Parameter
RAM
LANE2
CMP
O/P
Spike
Components:
Local DRAM:
Synaptic Weights of each layer.
Multiple bank to increase Parallelism
Put un-used bank in Sleep Mode for low power.
Spike Input S-RAM:
Loads all Input Spikes of a layer from Central DRAM on every tick.
Two read port for parallel access.
Sleep Mode control logic to put un-used bank in low power mode/power down.
SNN parameter S-RAM:
Loads SNN membrane, leak and other stochastic parameter from central DRAM on every
tick.
Stores Updated membrane potential to Central DRAM.
O/P Spike Buffer:
Stores the final spike value of each output.
Update Central DRAM in a burst along with membrane potential.
One-Seen Detector:
Detects valid spike in input Neurons.
Generate address for Local DRAM to fetch S.weights.
Detects one valid spike in every cycle.
64-bit Spike-Search Buffer.
Lanes:
Each lane stores 4 x 16 valid synaptic weights for partial summation of 4 neuron outputs.
4 buffers in each lane helps input spike re-use for four o/p neuron.
Two lanes work in ping-pong fashion to maximize Wallace tree though-put.
Can work on 2x clock to feel 16 weights in 8 Cycle.
Wallace Tree Adder:
Works in pipe-line
Need ~ 8cycle to complete addition of one lane.
Swap lane every 8 cycle
Partial SUM is fed to respective Lane-Buffer for further addition.
Stochastic Controller:
Generate the threshold potential based on stochastic parameters
Generate leak voltage based on stochastic parameter.
Generate final membrane potential based on stochastic parameters.
Comparator:
Generates spike based on threshold reference for each output neuron.
Working Principle:
1.
2.
3.
4.
5.
Novelty:
It has been observed that the spike inputs are very much sparsed. The one-seen detector
helps to utilize the input sparsity by fetching weights of valid spike and feeding Lane
buffer with valid inputs for addition. It reduces the memory access and number of
operations.
The Wallace tree adder improves the overall throughput by speeding the addition.
Multiple lanes help to improve the throughput.
The local Memories can be implemented with multiple banks to improve parallelism.
Un-used bank (Input Buffer) and Local DRAM can be placed in low power mode.
The device is scalable and performance can be improved by introducing more number of
Adders/Comparators if required.