Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 20

FPGA Co-processor for the

ALICE High Level Trigger

Gaute Grastveit
University of Bergen
Norway

H.Helstrup1, J.Lien1, V.Lindenstruth2, C.Loizides5, D.Roehrich3, B.Skaali4,


T.Steinbeck2, K.Ullaland3, A.Vestbo3, T. Vik4, A. Wiebalck2
for the ALICE Collaboration

1
Bergen College, Norway
2
Kirchhoff Institute for Physics, University of Heidelberg, Germany
3
Departement of Physics, University of Bergen, Norway
4
Departement of Physics, University of Oslo, Norway
5
Institute of Nuclear Physics, University of Frankfurt, Germany
ALICE – A Large Ion Collider Experiment

TPC
- Time Projection Chamber
Very High Data Rate

Pb-Pb central collisions


Event rate: 200Hz
Event size: ~75Mb
=> 15 Gbyte/s
Max data-rate to tape is
1.25 Gbyte/s

Compression/selection is needed
Conventional, lossless methods: factor 2
HLT functionality
• Compress
• Reduce the amount of data required to encode the event
as far as possible without loosing physics information
• Trigger
• Accept/reject events on the basis of physics application
• Select
• Select regions of interest within an event
• remove pile-up in p-p
• ...

Task: reconstruct the tracks of 20.000 charged


particles (each producing 150 clusters) in the TPC
Timebudget: 5 ms
The HLT setup
Data are received in parallel 216x320 MB/s 216x100 MB/s

RORC reveiver Buffer


> 1000 Events
DDL

RcvBd
PCI
NIC

RCU – Readout Controller Unit


ALTRO
TPC FEE Buffer
DDL – Data Detector Link (8 Events)
RORC reveiver Buffer HLT
> 1000 Events
DDL farm
RORC – ReadOut Reciver Card
RcvBd
RCU PCI
NIC

•PCI kernel in the FPGA


•FPGA will also be utilised
for pattern recognition
•Reduces number of CPU’s
needed
The HLT FPGA co-processor

• FPGA: APEX 20K400


• Next prototype: Altera Stratix FPGA
– Large internal memory
– DSP cores
Two Schemes for Finding Tracks

•Low occupancy (p-p, Pb-Pb outer padrows)

•Conventional approach with (2d)


cluster finder and track follower

•High occupancy (overlapping clusters):

•Hough transform on raw data


•Cluster analysis for deconvolution High
•(Kalman filter) multiplicity
picture
Cluster Finder
time

The numbers represent Charge (ADC values)


A vertical uninterrupted stack of numbers is
called a sequence. The square shows the
geometric centre of the sequence.
Neighbouring sequences belong to the same
Cluster.
 charge  scalevalue
Final mean value: 
 charge
(Weighted mean)

Pad
FPGA implementation of a cluster
finder - the algorithm
• Calculate the mean for every Memory of clusters
sequence
• Adjacent pads with similar
means are merged begin
• Two lists of sequences are Searchrange /
used: one for clusters on the Previous pad
previous pad one for clusters
end
on the current pad
• Clusters are removed from the Inputrange /
searchrange when a match is Current pad
found or we know it is finished
• Clusters are inserted in the insert
inputrange after merging or
when we start a new cluster
Block Diagram, Verification
Testbench

Top structure
RAM (lpm)
T
Decoder FIFO (lpm) Merger cluster
seq seq

File: charges File: VHDL clusters

C++ program
C++ model File: C++ clusters compares
the results
Relative Scales

As before
the mean is
calculated by:

  charge  scalevalue
 charge

smaller

+ Smaller numbers, only multiplies by <11


- Multiplication can’t be done until merging takes place

Alternative, (absolute):

Pre_Calc
Decoder FIFO (lpm) Merger
(2 mult, 1 add)
Deconvolution
Simplified implementation, almost for free – splits
at minima in both directions (time and pad)

off on
Merger Goals
•spend few clock cycles per sequence
Clock cycles spent in the diff erent states

•use few logic elements 6%

•high clockspeed 22 %
30 %

& n e w d a ta &
n e xt p a d
send
n e w ro w o r m any
s k ip p a d 5%

n e w s e a rc h ra n g e 4% 11 %

0%
m e rg e send 11 %
id le &
11 %
s to re a ll
W idle - 30%
merge_mult
& send merge_add
e m p ty one merge_store
in s e rt
m e rg e send_all
seq
add W
+ + o ld is a b o v e send_many
send_one
o ld is b e lo w
calc_dist
m e rg e c a lc
m u lt w ith in m a tc h d is ta n c e d is t insert_seq
* * + - -
Cluster Finder Performance

•Syntesized on Altera APEX


•Uses 1800 Logic Elements (11%)

•Memory usage 16*80 + 64*112= 8448 bits (4%)

•Circuit runs at 33Mhz


Outlook
Implementation of Hough transformation

Back Linked List TPC coordinates Local coordinates Parameter Space


(ALTRO sequences) (Padrow, Pad, Time) (X, Y, Z) (A,B,E) (k,phi,eta-index)

Data Format XYZ ABE


Detector Data Link Data Format XYZ ABE Histogram 1
Detector Data Link Decoder Transformer Transformer Histogram 1
Decoder Transformer Transformer

Histogram 2

ADC count .. Find


10-to-8 Bit Find
10-to-8 Bit .. Maxima
Converter
Converter Maxima
..

Histogram
HistogramN-1
N-1

Histogram
HistogramNN
Conclusion

We have demonstrated the feasibility


of a real time cluster finder implemented
in an FPGA
Firmware implementation of a Hough
transform looks promising
transperacy replacements from
now on
ALICE – A Large Ion Collider Experiment
TPC - Time Projection Chamber

18 sectors on
each side, each
sector is
readout in 6
subsectors

Total is ca.
570.000 pads

You might also like