Professional Documents
Culture Documents
Threadmill
Threadmill
A Post‐Silicon Exerciser for
Multi‐Threaded Processors
• Post‐Silicon validation is becoming the next‐level
vehicle for functional verification
Problems of Post‐Silicon Validation
Threadmill Loading a test image onto a
silicon platform can be a
bottleneck
Test
Template
Test #1 Silicon
Test Test #2 Test
Generator Test #3 Silicon
• Loading tests externally have high communication
and memory overhead Solution:
• Simple on‐platform
Existing on‐platform test generators are too complex
• Developing full OS system takes time test generator
Outline
Motivation
Overview of Threadmill
Key Techniques of Threadmill
Conclusion
Questions
Debate!
Threadmill Image
Threadmill Architecture
Test Template
Generator
& Kernel Topology
Test
Template Architectural Accelerator
Builder Model
System
Topology & Generation Silicon
Configuration
Architectural Model Execution
Testing Knowledge Checking
OS services
6
Test‐Template Language
Test Program Template Test Program
Variable: addr = 0x100 Resource Initial Values:
Values: Variable: R6=8, R3=‐25,..., R17=‐16
Bias: register‐dependency 100=7, 110=25,..., 1F0=16
Instructions:
Instructions: Store R5 ‐> ? 500: Store R5 ‐> FF0
Repeat (addr < 0x200) 504: Load R4 ‐> 100
Instruction: Load reg <‐ addr 508: Sub R5 ‐> R6‐R4
Select 50C: Load R4 ‐> 110
Instruction: Add ? <‐ reg + ? 510: Add R6 ‐> R4+R3
Bias: sum‐zero …
Instruction: Sub ? <‐ ? ‐ ? 57C: Load R4 ‐> 1F0
addr = addr + 0x10 580: Add R9 ‐> R4+R17
Execution Process
Dynamic vs Static Generation
CPU is probably
doing….
Dynamic Test Static Test
Generator Generator
?
But what do you do when you really need
to know the state of the CPU?
Static Test
Generation
Observing Machine State
Instruction Stream
I want to send:
beq $s0, $s1, exit # if (x==y) addi $t1, $t1, 1 # i = i+1
add $s0, $s0, $t1 # x = x+i
but will it branch or not? sub $s1, $s1, $t1 # y = y‐i
PC
What’s $s0? $s1?
Observing Machine State
Instructions Executed
addi $t1, $t1, 1 # i = i+1
add $s0, $s0, $t1 # x = x+i
!!
sub $s1, $s1, $t1 # y = y‐i
PC 0xdeadbeef # Bogus
$s0 = 3
$s1 = 3
Observing Machine State
Instructions Executed
Now I know $s0=$s1, so addi $t1, $t1, 1 # i = i+1
beq $s0, $s1, exit # if (x==y) add $s0, $s0, $t1 # x = x+i
sub $s1, $s1, $t1 # y = y‐I
will branch to exit. PC beq $s0, $s1, exit # if (x==y)
Floating‐Point Instructions
How to choose argument values? instr arg0 arg1 des
fp.instr f.val0 f.val1 result
random inefficient
FPgen: Generate table of interesting
test cases “off‐line” Floating Point
Search Space
Threadmill Floating Point Tests
Random FPgen Overflow,
Generation Table Underflow, etc
Concurrent
Generation
Concurrent Test Generation
shared_addr = 0xBASE + random()
Possible Collision Types
Need to be random, but consistent
Write‐Write True Collision
across all the test threads
Write‐Read False Collision
…
…
Shared
Synchronization
Random
Techniques
Seed
seed=42
Multi‐Pass Consistency Checking Consistency
Main Focus: Detecting Bugs in Multi‐Threaded Consistency
Case 1 Case 2
Thread 1 Thread 2 Thread 3 Thread 4 Thread 1 Thread 2 Thread 3 Thread 4
TIME
TIME
Resulting State Resulting State
Multi‐Pass
Multi‐Pass Consistency Checking Consistency
Main Focus: Detecting Bugs in Multi‐Threaded Consistency
Execution 0 Execution i+1
Each execution can have
Threadmill Threadmill different timing or order of
operation, but should end with
the same result.
?
Registers,
Memory values, etc
Debugging
Tests
Debugging Tests
• Restart the failed exerciser image a few
test‐cases before the failure
• Take the test‐template that causes the
bug and run it on a pre‐silicon test
generation
Conclusion
Strengths Weakness
Fast and Light‐weight. Doesn’t check datapath or
permanent bugs.
Automatic testing and
checking of multi‐threaded Little evaluation of
execution. performance or coverage of
the platform (paper).
Questions?
CON PRO
Debate
Threadmill’s failure detection mechanism only
checks for bugs in multi‐threaded interactions, and
do not check bugs in the datapath. Is this adequate?
Instead of generating tests on‐chip, isn’t it better to
simply load pre‐generated tests?