Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Threadmill: 

A Post‐Silicon Exerciser for 
Multi‐Threaded Processors

Authors:  Allon Adir, Maxim Golubev, Shimon Landa, 


Amir Nahir, Gil Shurek, Vitali Sokhin, Avi Ziv
Presenters: Abraham Addisie and Dong‐hyeon Park
Outline
Motivation 
Overview of Threadmill
Key Techniques of Threadmill
Conclusion
Questions
Debate!
Motivation – Why Post Silicon?
Pre‐Silicon Verification 
is inadequate

Design Pre‐Silicon  Post‐Silicon 


Production
Verification Validation

• Post‐Silicon validation is becoming the next‐level 
vehicle for functional verification
Problems of Post‐Silicon Validation
Threadmill Loading a test image onto a 
silicon platform can be a 
bottleneck
Test 
Template
Test #1 Silicon
Test  Test #2 Test
Generator  Test #3 Silicon

• Loading tests externally have high communication 
and memory overhead Solution:
• Simple on‐platform 
Existing on‐platform test generators are too complex
• Developing full OS system takes time test generator
Outline
Motivation 
Overview of Threadmill
Key Techniques of Threadmill
Conclusion
Questions
Debate!
Threadmill Image 
Threadmill Architecture
Test Template
Generator 
& Kernel Topology
Test 
Template Architectural  Accelerator
Builder Model
System 
Topology &  Generation Silicon
Configuration
Architectural Model  Execution
Testing Knowledge Checking

OS services

6
Test‐Template Language
Test Program Template Test Program
Variable: addr = 0x100  Resource Initial Values:
Values: Variable:  R6=8, R3=‐25,..., R17=‐16
Bias: register‐dependency  100=7, 110=25,..., 1F0=16
Instructions: 
Instructions: Store R5 ‐> ?  500: Store R5 ‐> FF0 
Repeat (addr < 0x200)  504: Load R4 ‐> 100
Instruction: Load reg <‐ addr 508: Sub R5 ‐> R6‐R4
Select  50C: Load R4 ‐> 110
Instruction: Add ? <‐ reg + ?  510: Add R6 ‐> R4+R3
Bias: sum‐zero  …
Instruction: Sub ? <‐ ? ‐ ?  57C: Load R4 ‐> 1F0 
addr = addr + 0x10 580: Add R9 ‐> R4+R17
Execution Process

Generate  Execute Check 


Test Case Test Case Result
Outline
Motivation 
Overview of Threadmill
Key Techniques of Threadmill
Conclusion
Questions
Debate!
Threadmill Design
Good  Multi‐
Goals: Simple Fast
Coverage Threaded

Static Test  Floating‐Point  Concurrent Test 


Key  Generation Generation Generation
Techniques: Multi‐pass  Debugging
Consistency Checking Tests
Static Test 
Generation

Dynamic vs Static Generation
CPU is probably 
doing….

Dynamic Test  Static Test 
Generator Generator

?
But what do you do when you really need 
to know the state of the CPU?
Static Test 
Generation

Observing Machine State
Instruction Stream
I want to send:
beq $s0, $s1, exit # if (x==y) addi $t1, $t1, 1 # i = i+1
add $s0, $s0, $t1 # x = x+i
but will it branch or not? sub $s1, $s1, $t1 # y = y‐i
PC
What’s $s0? $s1?

Threadmill uArch State


$t1 7
$s0 3
$s1 3
Static Test 
Generation

Observing Machine State
Instructions Executed
addi $t1, $t1, 1 # i = i+1
add $s0, $s0, $t1 # x = x+i

!!
sub $s1, $s1, $t1 # y = y‐i
PC 0xdeadbeef # Bogus

$s0 = 3
$s1 = 3

Threadmill uArch State


$t1 7
Interrupt 
$s0 3
Handler
$s1 3
Static Test 
Generation

Observing Machine State
Instructions Executed
Now I know $s0=$s1, so addi $t1, $t1, 1 # i = i+1
beq $s0, $s1, exit # if (x==y) add $s0, $s0, $t1 # x = x+i
sub $s1, $s1, $t1 # y = y‐I
will branch to exit. PC beq $s0, $s1, exit # if (x==y)

Threadmill uArch State


$t1 7
$s0 3
$s1 3
Floating Point 
Generation

Floating‐Point Instructions
How to choose argument values? instr arg0 arg1 des
fp.instr f.val0 f.val1 result
random inefficient
FPgen:  Generate table of interesting 
test cases “off‐line” Floating Point 
Search Space
Threadmill Floating Point Tests
Random  FPgen Overflow,
Generation Table Underflow, etc
Concurrent 
Generation

Concurrent Test Generation
shared_addr = 0xBASE + random()

Possible Collision Types
Need to be random, but consistent 
Write‐Write True Collision
across all the test threads 
Write‐Read False Collision

Shared 
Synchronization
Random
Techniques
Seed
seed=42

Test  Test  Test  Test 


Thread 0 Thread 1 Thread 2 Thread 3
Multi‐Pass 

Multi‐Pass Consistency Checking Consistency

Main Focus: Detecting Bugs in Multi‐Threaded Consistency
Case 1 Case 2
Thread 1 Thread 2 Thread 3 Thread 4 Thread 1 Thread 2 Thread 3 Thread 4
TIME

TIME
Resulting State Resulting State
Multi‐Pass 

Multi‐Pass Consistency Checking Consistency

Main Focus: Detecting Bugs in Multi‐Threaded Consistency
Execution 0 Execution i+1
Each execution can have 
Threadmill Threadmill different timing or order of 
operation, but should end with 
the same result.

?
Registers,
Memory values, etc
Debugging
Tests
Debugging Tests
• Restart the failed exerciser image a few 
test‐cases before the failure

• Take the test‐template that causes the 
bug and run it on a pre‐silicon test 
generation
Conclusion
Strengths Weakness
Fast and Light‐weight.  Doesn’t check datapath or 
permanent bugs.
Automatic testing and 
checking of multi‐threaded   Little evaluation of 
execution. performance or coverage of 
the platform (paper).
Questions?
CON PRO

Debate
Threadmill’s failure detection mechanism only 
checks for bugs in multi‐threaded interactions, and 
do not check bugs in the datapath. Is this adequate?

Instead of generating tests on‐chip, isn’t it better to 
simply load pre‐generated tests?

You might also like