Threadmill

Threadmill:
A Post‐Silicon Exerciser for
Multi‐Threaded Processors
Authors: Allon Adir, Maxim Golubev, Shimon Landa,

Amir Nahir, Gil Shurek, Vitali Sokhin, Avi Ziv
Presenters: Abraham Addisie and Dong‐hyeon Park
Outline
Motivation
Overview of Threadmill
Key Techniques of Threadmill
Conclusion
Questions
Debate!
Motivation – Why Post Silicon?
Pre‐Silicon Verification
is inadequate
Design Pre‐Silicon Post‐Silicon

Production
Verification Validation
• Post‐Silicon validation is becoming the next‐level
vehicle for functional verification
Problems of Post‐Silicon Validation
Threadmill Loading a test image onto a
silicon platform can be a
bottleneck
Test
Template
Test #1 Silicon
Test Test #2 Test
Generator Test #3 Silicon
• Loading tests externally have high communication
and memory overhead Solution:
• Simple on‐platform
Existing on‐platform test generators are too complex
• Developing full OS system takes time test generator
Outline
Motivation
Conclusion
Questions
Debate!
Threadmill Image
Threadmill Architecture
Test Template
Generator
& Kernel Topology
Test
Template Architectural Accelerator
Builder Model
System
Topology & Generation Silicon
Configuration
Architectural Model Execution
Testing Knowledge Checking
OS services
6
Test‐Template Language
Test Program Template Test Program
Variable: addr = 0x100 Resource Initial Values:
Values: Variable: R6=8, R3=‐25,..., R17=‐16
Bias: register‐dependency 100=7, 110=25,..., 1F0=16
Instructions:
Instructions: Store R5 ‐> ? 500: Store R5 ‐> FF0
Repeat (addr < 0x200) 504: Load R4 ‐> 100
Instruction: Load reg <‐ addr 508: Sub R5 ‐> R6‐R4
Select 50C: Load R4 ‐> 110
Instruction: Add ? <‐ reg + ? 510: Add R6 ‐> R4+R3
Bias: sum‐zero …
Instruction: Sub ? <‐ ? ‐ ? 57C: Load R4 ‐> 1F0
addr = addr + 0x10 580: Add R9 ‐> R4+R17
Execution Process
Generate Execute Check

Test Case Test Case Result
Outline
Motivation
Conclusion
Questions
Debate!
Threadmill Design
Good Multi‐
Goals: Simple Fast
Coverage Threaded
Static Test Floating‐Point Concurrent Test

Key Generation Generation Generation
Techniques: Multi‐pass Debugging
Consistency Checking Tests
Static Test
Generation
Dynamic vs Static Generation
CPU is probably
doing….
Dynamic Test Static Test
Generator Generator
?
But what do you do when you really need
to know the state of the CPU?
Static Test
Generation
Observing Machine State
Instruction Stream
I want to send:
beq $s0, $s1, exit # if (x==y) addi $t1, $t1, 1 # i = i+1
add $s0, $s0, $t1 # x = x+i
but will it branch or not? sub $s1, $s1, $t1 # y = y‐i
PC
What’s $s0? $s1?
Threadmill uArch State

$t1 7
$s0 3
$s1 3
Static Test
Generation
Instructions Executed
addi $t1, $t1, 1 # i = i+1
add $s0, $s0, $t1 # x = x+i
!!
sub $s1, $s1, $t1 # y = y‐i
PC 0xdeadbeef # Bogus
$s0 = 3
$s1 = 3

$t1 7
Interrupt
$s0 3
Handler
$s1 3
Static Test
Generation
Instructions Executed
Now I know $s0=$s1, so addi $t1, $t1, 1 # i = i+1
beq $s0, $s1, exit # if (x==y) add $s0, $s0, $t1 # x = x+i
sub $s1, $s1, $t1 # y = y‐I
will branch to exit. PC beq $s0, $s1, exit # if (x==y)

$t1 7
$s0 3
$s1 3
Floating Point
Generation
Floating‐Point Instructions
How to choose argument values? instr arg0 arg1 des
fp.instr f.val0 f.val1 result
random inefficient
FPgen: Generate table of interesting
test cases “off‐line” Floating Point
Search Space
Threadmill Floating Point Tests
Random FPgen Overflow,
Generation Table Underflow, etc
Concurrent
Generation
Concurrent Test Generation
shared_addr = 0xBASE + random()
Possible Collision Types
Need to be random, but consistent
Write‐Write True Collision
across all the test threads
Write‐Read False Collision
…
…
Shared
Synchronization
Random
Techniques
Seed
seed=42
Test Test Test Test

Thread 0 Thread 1 Thread 2 Thread 3
Multi‐Pass
Multi‐Pass Consistency Checking Consistency
Main Focus: Detecting Bugs in Multi‐Threaded Consistency
Case 1 Case 2
Thread 1 Thread 2 Thread 3 Thread 4 Thread 1 Thread 2 Thread 3 Thread 4
TIME
TIME
Resulting State Resulting State
Multi‐Pass
Multi‐Pass Consistency Checking Consistency
Main Focus: Detecting Bugs in Multi‐Threaded Consistency
Execution 0 Execution i+1
Each execution can have
Threadmill Threadmill different timing or order of
operation, but should end with
the same result.
?
Registers,
Memory values, etc
Debugging
Tests
Debugging Tests
• Restart the failed exerciser image a few
test‐cases before the failure
• Take the test‐template that causes the
bug and run it on a pre‐silicon test
generation
Conclusion
Strengths Weakness
Fast and Light‐weight.  Doesn’t check datapath or
permanent bugs.
Automatic testing and
checking of multi‐threaded  Little evaluation of
execution. performance or coverage of
the platform (paper).
Questions?
CON PRO
Debate
Threadmill’s failure detection mechanism only
checks for bugs in multi‐threaded interactions, and
do not check bugs in the datapath. Is this adequate?
Instead of generating tests on‐chip, isn’t it better to
simply load pre‐generated tests?

Threadmill

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Threadmill

Uploaded by

Copyright:

Available Formats

Threadmill:

Authors: Allon Adir, Maxim Golubev, Shimon Landa,

Design Pre‐Silicon Post‐Silicon

Generate Execute Check

Static Test Floating‐Point Concurrent Test

Threadmill uArch State

Threadmill uArch State

Threadmill uArch State

Test Test Test Test

You might also like