CO: The Chameleon 64-Bit Microprocessor Prototype: B. F. Sgs-Thomson

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

CO: The Chameleon 64-bit microprocessor ASIC prototype

B. Ramanadin, F. Pogodalla
SGS-THOMSON Microelectronics 5 bis, chemin de la Dhuy 38240 Meylan FRANCE

Abstract
In the context of designing a complex chip, it is important to have the capability of prototyping the design. Prototypes are the key to ensuring the architecture is implementable, allowing application development to start before silicon, validating the pelformanee estimates (micro-architectural issues).
This paper presents the prototyping strategy chosen by the Chameleon 64-bit microprocessor developement programme of SGS-THOMSON Microelectronics. We will discuss the choice of making a silicon prototype, its aims (strongly related to the applications development allowance), its realisation, schedules and status.

verification, in-circuit emulation and system verification). Real prototype (field of application:software tools, OS and application software development without final product).

To allow software development to start as early as possible, we decided to go for a silicon prototype, for the reasons that will be detailed now.

2. Real vs virtual prototype


Objective for Chameleon is to provide the customer with a complete hardware (the processor) and software (compilers, OS kernel + application software) package. Traditional approaches tend to sequence development tasks by designing silicon product then design and tune software to run on the product. Chameleon approach is more concurrent-design oriented. The goal is to have the chip and the software to be ready in the same time frame, followed by an, as short as possible, integration period. A system allowing software development 1 year before real hardware availability must be built. At this point, prototyping is the favourite solution to achieve this objective. Some points are still open regarding what the prototype will look like, anyway it must have the following features: I ) execute Chameleon code 2) represent Chameleon architecture faithfully 3) provide execution environment for OS and applications 4 ) 15 prototypes are required for software development teams 5 ) Performance > 1 Mips

1. Chameleon
Chameleon is a family of next-generation microprocessors developed by SGS-THOMSON Microelectronics. It is a modular, core-based 64-bit superscalar architecture.This programme involves 150 persons over 3 sites (Meylan -France, Bristol -UK and Cagliari -Italy). The team is composed of architecture, design, marketing, applications and software groups. The first Chameleon products are targeted at multimedia applications, with special emphasis placed upon software programmability. The intention is to provide not only a silicon but also OS, libraries and applications with the product. This requires a huge amount of software development in parallel with the design activity. In this context, it is obviously intended that the software is written, developed and debugged as much as possible while the design goes on, thus raising the necessity of a prototype. Several choices are then possible, among them: I ) an instruction set simulator in high level language ( C ) 2 ) a virtual silicon (acceleration and/or emulation) 3) an ASIC
In fact all these 3 solutions have been fully developed within the programme, but each of them for a particular purpose. The instruction set simulator allows generation of a reference to check against VHDL (field of application: verification and software tools). Virtual silicon (field of application: functional

2.1. Virtual prototype Software


This was the first expected solution, but as the software and the hardware architecture evolved 2 main limitations occurred: I) the number of instructions executable in a reasonable amount of time 2 ) the necessity to have some external environment. Modelling accurately such an environment requires the knowledge of the final application. This knowledge is owned by OUT cus-

140
0-8186-7603-5/96 $5.00 0 1996 IEEE

tomers and is not easily accessible.


PRO
Fast devclopmcntumc (< 6 months) Cheap (< I mcn-year)
a y to modify Easy to pon (C Idnguage) Easy to dtstnbule

type called CO helps software development, porting and verification.


CONS
Slow exccuuon umc ( 8 EO kips)

NO featbtltty nor tmpkmcntauon SNdlcs Far from futun pmduct Luuons on execuuon cnvironmenf

3. Methodology
This section details the main steps to go from concept to working hardware. Considering the global Chameleon programme and the fact that software develoPent must start 1 year before real silicon availability, a reduced risks design methodology must be chosen and architecture simplification must be performed.

Execution time and distance from reality are too critical to rely on this solution

2.2. Vitual prototype Emulation


Possible enhancements of the software prototype solution are to study feasibility and implementation and try to work on a model closer from reality. Generation of a IHDL model of the hardware fixes these 2 points. Unfortunately, after some representative benchmark, we cannot expect inore than 50 instructiodseconds from any HDL simulator. This performance is not acceptable. This particular problem is addressed by emulation. Emulation is a good technical compromise between execution time, feasibility and implemientation studies. This solution also requires the writing of a synthesisable, cycle and bit accurate HDL model.
PRO
Gatc -Icvel mcdcl Cyclc accumtc medium dcvelopmcnt time (6 months) Mcdium cxecuuon time (s kips) 300

3.1. Silicon design


The first implementation of Chameleon architecture is done in an ASIC. Main characteristics of development are given below.

3.2. Architecture simplification


Implementation of Chameleon architecture in an ASIC is not possible.Some simplification are required. Single CPU No on-chip cache No pipeline (measured average number of cycle per instruction E 4) No co-processors Support reduced instruction set (some arithmetic, floating point and cache instructions are treated as NOOP) No dedicated VO modules Any other mechanism or resource having an architectural existence will be implemented and supported by the prototype.

CONS
Expcnsive (I MS) No distribution

The global technical and financial compromise is not satisfying enough to choose this solution

2.3. Real prototype ASIC


As virtual prototyping does not meet Chameleon needweal silicon prototype will be built. This solution requires nearly the same effort (generation of a netlist mapped on a library) than emulation and acceleration solutions, but as we have cheap access to silicon fab, it is a viable solution.
PRO
Real implemenraoon of Chdmeleon archltccNn Fastcxccution umc (>5 Mtps) Wtde dlstnbuuon and prcducuon Act as demonsvator

3.3. Technology
0.7 p, HCMOS process, tripie metal level, supplied by SGS-THOMSON Microelectronics ISB280()0 family, sea of gates (r150 kgates) CPGA, 256 pins ne design is targeted to rzln at 25 Mtlz tactually NnS at 28Mhz) Full scan DFT approach (97% of fault coverage)

CONS
Expcnsive (4 men-ycar) Medium development omc (lyca )

3.4. Design flow


The design flow is based on VHDL and synthesis. Development of a RTL,synthesisable model occupied 3 engineers during 8 months. Compilation, simulation was performed with CADENCE / LEAPFROG. Global functional verification was performed by checking VHDL simulation results against software simulation results.

As the chosen operating system for use with Chameleon processors has been developed using industry standard personal computers, a PC running with a Chameleon processor prototype will be used as the development system. This proto-

141

Synthesis was performed by using SYNOPSYS tools. This task occupied 2 engineers during 3 months. Comparison between VHDL and gate-level netlist was performed by sampling external input/outputs of the chip at each cycle. the gate simulator used at this point was VERILOG-XL. We used the synopsys verilog-out option to enter this simulator. Scan was introduced once the complete chip had been synthesized. We used edif-out option from SYNOPSYS to enter MENTOR / FASTSCAN. No post-scan timing optimization has been performed.

3.5. Timing constraints approach


Designers usually spend a lot of time when iterating for timing optimization. As correct functionality is more an issue than running frequency, a particular approach regarding timing analysis is adopted. The goal is to reduce the number of iterations necessary to reach a satisfying running frequency. To met this objective, we decided to design a daughterboard capable to run in a wide range of frequency ( 1 Mhz up to 40 Mhz). The solution is then to implement asynchronous handshake communication protocol between the chip and its environment. This approach allows significant time saving (reduced number of iteration for timing analysis after synthesis. No iteration after scan insertion) processor clock to be completely separated from rest of world (processor clock # daughterboard clock # motherboard clock) a wide range of running frequency to be acceptable. test-vectors to be generated and run at low speed, significantly reducing possible problems on the tester. increase yield as reduced sorting vs frequency is performed This approach implies Deep and meticulous timing analysis and implementation study at micro-architecture level (the study made at this point is a sizing factor for the future running frequency reached by the prototype) Design of asynchronous protocol, and then the loss of cycles for synchronisation between different frequency domains.

group and is used as the reference for the VHDL. It does not have any other purpose than being a reference at instructionlevel. This means that is not relevant in terms of timing as compared to the VHDL model (no cycle-accuracy). One key point is that there is no component-level verification: each component of the design is integrated into the full chip in order to be verified. This is a chip-level verification. The design is used in a sort of real context, that is rather than applying vectors to it with a VHDL bench, it executes real Chameleon binary code (so-called a test-case) via a simulated memory. The following figure shows the VHDL architecture that is actually simulated.

r s
0x90000000

Tme)

Ox00008ffff

Halt

! 4
CO
prototype

I Clock 1

ddress-space Memory

ox903f ff f f

Interrupts irritator

Bus requests irritator

0x00800010

3.6. Silicon verification


As for any silicon development, and particularly as this prototype has a high complexity level, important focus is made on its verification. The issue of it is more functionality than the timing. Software development requires a good level of functionality more than an optimised timing performance. The methodology of functional verification is based on VHDL simulation checked against a C instruction-level simulator. This simulator, chsim, is developed by the software

This chip-level methodology avoids the complicated development and maintenance of VHDL test-benches, which are highly sensitive to any modification of interfaces, timing, etc ... By verifying the design as a whole, we not only guarantee that the functionality of each block is correct, but we also make sure that the design acts as a complete chip within some realistic representation of its environment. In addition, debugging is facilitated by working at assembler-level rather than at bit-vector level. The verification of features such as interrupts or external bus arbitration is done via external VHDL devices that can be programmed from the test-case to trigger interrupts, generate bus requests, etc ... This allows to control the external events that CO receives from the test-case itself, thus again not requiring any test-bench level signals manipulations. The VHDL test bench consists in: I ) asserting the reset line, causing the memory to be downloaded and the design to be reset 2 ) de-asserting the reset line to allow the program execution to start

3) wait for a specific memory access which notifies that the program is over, and that the memory can be dumped to perform the result checking

142

As a summary, the folllowing figures shows the verification flow of the model. One of the advantages of this methodology is also that the verification of CO prototype requires the development of a large number of test programs. These programs constitute a important database that is used for achieving the first part of the verification of the final design CO is a prototype of. Here are given some key figures of this Verification process: number of instructions simulated: 2818281 in test-cases + 1928224 in C programs number of cycles simulated: 27935309 simulation time: 1Ms CPU (2277h) As stated, the simulation time is quite big. The VHDL simulation are actually concurrent over several CPUs, dividing the overall simulation time. In addition, an IKOS NSIM hardware accelerator is used to improve the turn-around-tirne. Its performance is around 40 times greater than VHDL simulator one.

functional reference 2) an assembler, which generates object-code from a Chameleon assembler file 3) a linker which allows to build binary programs from one or more object-files 4 ) a C compiler which is mainly used for high-level applications. This means that the tests, which are targeted specifically at the verification of the prototype, are written in assembler, whereas C is used for more general code. As an example, it is used for standard benches like dhrystone, sieve, fibonacci computations,...

3.8. Daughterboard
A prototype is not useful until it is plugged into a working environment which allows to use it for development and experimentation. Thus CO silicon is plugged into a host, resulting in a full system prototype. In order to reduce the cost of host development, and because of the first OS that is to be ported, a standard PC host has been chosen as the development platform. The aim is to use CO prototype as the CPU of a system that does have all the common capabilities like disks, monitor, etc... But as CO does not have a compatible pinout with Intel 80486, the development of a specific board is required, which role is to adapt CO interfaces to the PC. In addition, this board provides some control capabilities as well as some memory. The next figure shows the daughterboard organisation.

Tests orgapedln & d a t a b


/

-SW toclchain
\

/
/

Assembler file

Generahantools \

Memory init ASCII file

--------Acceleration

I
\
\
\
\
\
\

Local memory RAM,ROM

Pl-7TI
Motherboard

proto

/
SimulationpJols

- - __ -

Control

IkosNSlM

1
T4

v
Cache

3.7. Software tools


As stated before, the software tools are of great importance in the process of buildiing this prototype. They provilde the capability of building tests to achieve the verification on one side, but also they are lysed later on for the actual developments that will be done auound the prototype. The set of tools is composed of: I) an instruction-level C simulator. Its performaince is around IOOkips, it is not cycle-accurate but is used as the

The remote host, connected via a link and a on-board T4 (transputer) , is used to control CO operations: download bootcode into local memory, access on-board control regis-

143

ters, etc .... It provides the capability of debugging the software running on CO. The PC is the target machine to which the OS is ported. The development of this board is far from being a negligible task, and as detailed later on is done in parallel with the design of CO.

considered a s a positive contribution to the project.

8
Processor
Specificatioru

Chameleon architecture is ready

4. FAQ
4.1. Monitoring and analyzing target architecure
Analyse and monitor of the run-time behaviour of the target architecture is an issue. The big question is do we met our objectives. Chameleon programme address this particular point by using an architecture simulator. This simulator relies on informations from design team (pipe structure, reordering, dependancies, number of cyle for an instruction to leave the pipe....). This simulator was built in early stage of the design to check we can run our application. It is updated periodically. The prototype does not replace the architecture simulator. For the expected application, one must consider the architecture is frozen.

Processor
Si deveiopmer

CO
Specificationi

CO
Si developmen

Daughterboar development
Basic micr*ken software developr Daughterboar integration

OS and applicat software

4.2. Emulation and acceleration


We use emulation for functionnal verification at chip level. We plan also to perform in-circuit check. The obective is to make our emulator communicate with a test generator and run billions of cycles. We dont use it to check the architecture for the reasons given above. In the same line, we use acceleration to built and run regression procedures.

The results of this programme are: I) C processor and daughterboard are working successO fully (Perf 5 Mips) 2) Software tool chain (Compiler, assembler, linker, simulator, debugger) is fully operational. 3) OS and application software development started end November 95 4 ) 15 systems are now being used in development.

4.3. Limitattions
The only limitation is the performance. The prototype, for users is a 5 MIPS box while the final processor is a 100 MIPS same box.

6. Conclusion
The complexity of some of todays design is such as prototypes are often required to confirm architectural choices, performance estimates but also to allow some back-end tasks to start (e.g. software development). A prototype has to match some specific requirements, not only in terms of functionality and performance but also in terms of ease of use or development costs. Several types of prototypes are possible, depending on the project constraints: software, virtual hardware, real hardware. Each of those have specific advantages (flexibility for software, power for hardware) but also important drawbacks (power for software, flexibility for hardware). To address the particular problem of software-hardware concurrent design within Chameleon programme, the choice

5. Schedules and status


Prototyping overview for Chameleon is illustrated by the chart below. To fully understand the benefits of this choice, we must keep in mind that Chameleon
is a software and hardware product, starts from

scratch and the time to market is a key factor for the success of the programme as a whole. Thus any effort for parallel software and hardware co-design must be

144

to go for a real ASIC prototype was mainly driven for the following reasons: technology is available and cheap Performance (3 MIPS) wide distribution capability (for development) customer demonstration capabilities architecture implementability evaluation Despite the rather higlh (4 medyear) development cost (full design flow, from VHDL to sea of gate via synthesis, verification,...), this solution is a success today: it does provide more performance than any emulated prototype that could have been built (5 MIPS vs 200 KIPS) 15 systems are available for development technology and metbodology allowed FTSS (First Time Silicon Success) Finally, another important advantage to notice on this methodology is that it exercises some parts of the design flow for the final design, which is also some kind of a prototyping usefulness.
Acknowledgements

To all people involved in the CO project (architecture: , software, board design, silicon design, verification, staff, management) and more specificidly to : Henry Guyot, Christian Bicais, Christian Berthet,, Ariel Lasry, Andrew Betts, Jon Frosdick, Carlo Gallino, Nathan Sidwell, Chris Dunford, Jeff Wilson, Mark Debagge, and Genevieve Bartlett.
References [ I ] D.A. Patterson. J.L. Henessy Computer organization & design, The hardware I softuare inteirface Morgan Kaujinunn Publisher?. 2929 Campus drive, Suite 260, San Matteo, CA 94403 [2] J.P. Hayes, Computer architecture and organization McGraw. I221 Avenue of the Americas New York. N.ZIOO20 [3] Chameleon-Progra, CHAMELEON ARCHITECTURE:, CPU Architecture manual CH067-01. Corijidential [4] B.B.. Brey, The Intel microprocessors Maxwell Macmillan Canada, Inc. 1200 Eglinton Avenuf East, Suite Don Mills Ontario M3C 3Nl

145

You might also like