Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 20

FPGA Co-processor for the ALICE High Level Trigger

Gaute Grastveit University of Bergen Norway


H.Helstrup1, J.Lien1, V.Lindenstruth2, C.Loizides5, D.Roehrich3, B.Skaali4, T.Steinbeck2, K.Ullaland3, A.Vestbo3, T. Vik4, A. Wiebalck2 for the ALICE Collaboration College, Norway Institute for Physics, University of Heidelberg, Germany 3Departement of Physics, University of Bergen, Norway 4Departement of Physics, University of Oslo, Norway of Nuclear Physics, University of Frankfurt, Germany
1Bergen

2Kirchhoff

5Institute

ALICE

A Large Ion Collider Experiment

TPC
- Time Projection Chamber

Very High Data Rate


Pb-Pb central collisions Event rate: 200Hz Event size: ~75Mb

=> 15 Gbyte/s
Max data-rate to tape is 1.25 Gbyte/s Compression/selection is needed Conventional, lossless methods: factor 2

HLT functionality
Compress
Reduce the amount of data required to encode the event as far as possible without loosing physics information

Trigger
Accept/reject events on the basis of physics application

Select
Select regions of interest within an event
remove pile-up in p-p ...

Task: reconstruct the tracks of 20.000 charged particles (each producing 150 clusters) in the TPC Timebudget: 5 ms

The HLT setup


Data are received in parallel
216x320 MB/s 216x100 MB/s RORC DDL reveiver Buffer > 1000 Events
RcvBd NIC

PCI

RCU Readout Controller Unit DDL Data Detector Link RORC ReadOut Reciver Card
RCU ALTRO TPC FEE Buffer (8 Events) DDL RORC reveiver Buffer > 1000 Events

HLT farm
PCI

RcvBd NIC

PCI kernel in the FPGA FPGA will also be utilised for pattern recognition Reduces number of CPUs needed

The HLT FPGA co-processor

FPGA: APEX 20K400 Next prototype: Altera Stratix FPGA Large internal memory DSP cores

Two Schemes for Finding Tracks


Low occupancy
(p-p, Pb-Pb outer padrows)

Conventional approach with (2d) cluster finder and track follower High occupancy

(overlapping clusters):

Hough transform on raw data Cluster analysis for deconvolution (Kalman filter)

High multiplicity picture

Cluster Finder

time

The numbers represent Charge (ADC values) A vertical uninterrupted stack of numbers is called a sequence. The square shows the geometric centre of the sequence.

Neighbouring sequences belong to the same Cluster.


Final mean value: (Weighted mean)

charge scalevalue charge

Pad

FPGA implementation of a cluster finder - the algorithm


Calculate the mean for every sequence Adjacent pads with similar means are merged Two lists of sequences are used: one for clusters on the previous pad one for clusters on the current pad Clusters are removed from the searchrange when a match is found or we know it is finished Clusters are inserted in the inputrange after merging or when we start a new cluster
Memory of clusters

begin Searchrange / Previous pad end Inputrange / Current pad insert

Block Diagram, Verification


Testbench Top structure

RAM (lpm) T

Decoder

seq

FIFO (lpm)

seq

Merger

cluster

File: charges

File: VHDL clusters

C++ model

File: C++ clusters

C++ program compares the results

Relative Scales
As before the mean is calculated by:

charge scalevalue charge


smaller

+ Smaller numbers, only multiplies by <11 - Multiplication cant be done until merging takes place Alternative, (absolute): Decoder FIFO (lpm) Pre_Calc
(2 mult, 1 add)

Merger

Deconvolution
Simplified implementation, almost for free splits at minima in both directions (time and pad)

off

on

Merger Goals
spend few clock cycles per sequence use few logic elements
Clock cycles spent in the different states 6%

high clockspeed
&

30 % 22 %

new data

&

next pad new row or skip pad send many


5%

new search range

4% 0% 11 %

11 %

merge store W

idle

send all

&

11 %

idle - 30% merge_mult


&

empty merge add ++ insert seq W

send one

merge_add merge_store send_all send_many send_one


calc dist --

old is above old is below

merge mult **+

calc_dist insert_seq

within match distance

Cluster Finder Performance


Syntesized on Altera APEX
Uses 1800 Logic Elements Circuit runs at 33Mhz
(11%) (4%)

Memory usage 16*80 + 64*112= 8448 bits

Outlook
Implementation of Hough transformation
Back Linked List (ALTRO sequences) TPC coordinates (Padrow, Pad, Time) Local coordinates (X, Y, Z) (A,B,E) Parameter Space (k,phi,eta-index)

Detector Data Link Detector Data Link

Data Format Data Format Decoder Decoder

XYZ XYZ Transformer Transformer

ABE ABE Transformer Transformer

Histogram 1 Histogram 1

Histogram 2 ADC count

10-to-8 Bit 10-to-8 Bit Converter Converter

.. .. ..
Histogram N-1 Histogram N-1

Find Find Maxima Maxima

Histogram N Histogram N

Conclusion
We have demonstrated the feasibility of a real time cluster finder implemented in an FPGA

Firmware implementation of a Hough transform looks promising

transperacy replacements from now on

ALICE

A Large Ion Collider Experiment

TPC - Time Projection Chamber

18 sectors on each side, each sector is readout in 6 subsectors Total is ca. 570.000 pads

You might also like