Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

I/O Modeling and Refinement for HW/SW Codesign of Embedded Systems

Youngmin Yi School of Computer Science and Engineering


Seoul National University ymyi@iris.snu.ac.kr

Dohyung Kim Department of Computer Science and Engineering


University of California, San Diego dhkim@ucsd.edu

Soonhoi Ha School of Computer Science and Engineering


Seoul National University sha@iris.snu.ac.kr

Abstract - Different levels of abstraction for I/O modeling are


used depending on the codesign step. However, manually changing the abstraction level of I/O models at each design step is laborious. Moreover, the designer may want to mix the level of abstraction between the I/O models and the simulation. Thus, it is very desirable to make the I/O modeling retargettable and configurable. In this paper we propose an I/O modeling and refinement technique in a codesign methodology where an I/O device and its interface code with various levels of abstraction is automatically integrated and finally simulated in the unified framework. We demonstrate the viability of the proposed methodology with videophone example that consists of H.263 codec accessing camera and LCD, network control task accessing Ethernet device and G.723 codec accessing microphone and speaker.

and the simulation. For example, before the device driver for an I/O device is made, a higher level model of I/O device should be used than the simulation level of other components. In summary, it is very desirable to make the I/O modeling retargettable and configurable, which is the main concern of this paper. In this paper we propose an I/O modeling and refinement technique in a codesign methodology where an I/O device and its interface code with various levels of abstraction is automatically integrated and finally simulated in the unified framework. In the next section, we review related work. In section 3, we describe the general codesign procedure of embedded systems. Proposed specification, mapping, and simulation of I/O device and the interface are described from section 4 to 6. Section 7 describes a case study with videophone application. Finally, future work and conclusions follows in section 8.

Keywords: codesign, system level modeling, I/O modeling, I/O refinement, simulation, PeaCE.

Related Work

Introduction

I/O devices are important system components in an embedded system that interacts with its environment. Different levels of abstraction for I/O modeling are used depending on the design step. At the algorithm development step or at the functional simulation step I/O accesses are usually replaced with file accesses and memory accesses at the source level to avoid the need of I/O modeling at all. For example input device access is replaced with access to a file or a memory region that stores the pre-obtained result of input device access. One may load the input image to the simulator memory if the simulator provides such capability or build together the input data with the executable. Semihosting[1][2] is a well-known I/O modeling technique for functional simulation using a processor simulator. The processor simulator emulates an I/O operation in the target code in the simulation host. For example an ARM processor simulator uses SWI (software interrupt) method to hook an I/O request in the target code and to convert it to a host I/O request When the designer simulate the entire system to estimate or verify the overall performance, it is essential to model I/O device behavior accurately enough to capture its impact on system performance. Since I/O behavior and its interface affects task scheduling and bus bandwidth, improper modeling of I/O device and the interface result in inaccurate performance result. Therefore the current practice is to port the operation system on the simulator and run the application on top of the OS. However, writing a device driver for a new I/O device is usually difficult, time consuming, and error-prone since it is heavily dependent on target architecture and requires full understanding of complex I/O device behavior. In case the final system architecture including OS is not determined yet, writing a device driver is not a feasible approach. So we need another technique of I/O modeling in the design space exploration step. Changing the abstraction level of I/O models at each design step is laborious task if it is done manually. In addition, the designer may want to mix the level of abstraction between the I/O models

Bouchhima et al[3] propose a method to model I/O operations in SystemC for HW/SW cosimulation. Their approach is specific to their simulation environment where the operating system is also modeled in SystemC to make the compiled simulation environment. In this paper we assume that the system simulation environment is made up with an integration of simulation models of components, as most commercial cosimulation environments are. Wang et al.[4] proposed a framework for peripheral device modeling but their work aims at synthesis of interface code itself from a formal specification of peripheral devices. It does not deal with a simulation of peripheral device and its interface for HW/SW cosimulation. UDI [5] is a set of APIs that has been defined to allow device drivers to be portable across different platforms and OSes. The main idea of having OS and architecture independent device interface API is the same with our methodology. But the purpose of UDI is to make device drivers portable horizontally across different architectures, whereas ours is to provide horizontal and vertical retargettability for the I/O models. In summary the previous I/O modeling works mainly focus on the horizontal retargetability at the specific level of abstraction. Our contribution in this paper is to enable the vertically retargettable I/O modeling across different levels of abstraction depending on the design steps.

Methodology and Framework

Figure 1 shows the overall codesign flow of our codesign environment, PeaCE[6], that implements the proposed technique. While the proposed technique is also applicable to other codesign approaches, we use ours to show specific implementation examples. In our codesign flow, architecture (platform) and algorithm are separately specified, which is known as Y-chart codesign flow [7]. Algorithm specification with a block diagram is an executable specification; functional simulation is performed by generating a host code from the specification and running the code in the host machine. SW performance estimation is performed for a specified processor using the instruction set simulator of the processor. Based

on the profiling information, HW/SW partition and mapping of algorithm blocks onto the target architecture is performed. After the partitioning decision, a virtual prototype is automatically built through SW synthesis, HW synthesis, and interface synthesis. Interface synthesis in this step means generating the driver and wrapper code between SW-mapped block and HW-mapped one, not between I/O device and a block. After the final architecture is determined through cosimulation or virtual prototyping, real prototyping is made. In the proposed technique, we use the same I/O interface code from the algorithm specification down to the real prototyping. But different I/O modeling is refined depending on the design step. We will illustrate four different levels of refinement for the following design steps: functional simulation, performance estimation, virtual prototyping, and real prototyping.
algorithm specification performance estimation Functional Simulation HW/SW partition & mapping mapped result SW synthesis C code Interface Synthesis driver/wrapper Cosimulation HW synthesis VHDL code architecture specification

semihosting technique is applied as the definition. The second definition is for time-accurate cosimulation and is defined with our I/O interface APIs that is in turn defined with the interface codes for target I/O device model. We selectively redefine them with the same prototype using #define statement for each level of design flow.
#ifndef FUNC_SIM #define fopen IOmodel_fopen #ifdef PERF_ESTIMATION FILE *IOmodel_fopen(const char *path, const char *mode) { strcpy((char *)IO_BUF, path); // pass 1st argument strcpy((char *)(IO_BUF+256), mode); // pass 2nd argument *(volatile int *)IO_CMD = 8; // inform which interface it is (int)fd = *(volatile int *)IO_RET // get result of semihosting return fd; } #else // TIME_ACCURATE_COSIM FILE *IOmodel_fopen(const char *path, const char *mode) { int fd = IOdev_open(path, mode); } #endif #endif

time cost

Figure 3. Porting standard library using semihosting technique

I/O Device Refinement

Figure 1. Overall HW/SW codesign flow of PeaCE

I/O Interface Specification

To make the I/O interface code independent of the abstraction level of I/O models, we define the generic I/O interface APIs. We propose to specify I/O device interface only using predefined set of generic APIs and detailed I/O device specific interface definition for the APIs are refined according to the I/O device model in different abstraction levels. The generic I/O interface API should not assume any specific implementation. So it can also be implemented in hardware. And it must be independent of the target architecture and operating system. But at the same time it should be in such a level that the essential behavior can be captured for correct validation and accurate performance estimation. Figure 2 shows a subset of generic I/O interface APIs that we have defined in the current implementation.
int IOdev_open(const char *devname, int flags); int IOdev_close(int fd); int IOdev_read(int fd, char *buf, int count) { iodev[fd].read(buf, count); } int IOdev_write(int fd, char *buf, int count) { iodev[fd].write(buf, count); } int IOdev_set_config(int fd, int cmd, void *buf, int size) { iodev[fd].set_config(cmd, buf, size); } int IOdev_get_config(int fd, int cmd, void *buf, int size) { } iodev[fd].get config(cmd, buf, size);

Figure 2. Generic I/O interface APIs Standard C library includes a number of I/O related functions (fopen, fread, fscanf, printf, etc). We provide two different levels of ports for those functions as illustrated in figure 3. The first definition is for performance estimation with an ISS. In this case,

In this section we explain how an I/O device model is refined with different abstraction level as the design steps are performed. We use a simple illustrative example in figure 4 where block A accesses I/O device dev1 and sends data to block B which accesses dev2. At the functional simulation step, the code is run in the host machine so that I/O device need not be a simulation model. Instead it can be an actual I/O device driven in the host OS as shown in figure 4(a). Then an I/O interface definition becomes nothing but the associated device driver of the host OS. At the SW performance estimation step, we generate a target code for a specific processor and run the target code using the instruction set simulator (ISS) of the processor (figure 4(b)). The ISS provides the execution profile information such as execution cycles and memory access counts. Since we are only interested in the performance information of a function block for the specific processor, bus contention or bus bandwidth is not of main concern. Therefore, I/O device can be modeled using semihosting technique and the estimated performance of I/O interface is defined as a function of the amount of data exchanged. As explained in section 3, we partition the system behavior before building a virtual prototype. Let us assume that block A is mapped to a HW component and block B to a SW component. Then, virtual prototype is built with HW simulator and ISS being connected to cosimulation engine. The cosimulation engine schedules the component simulators and manages interaction between them. Figure 4(c) displays the execution flow of the refined I/O interface code. These codes can be an actual device driver or premitive APIs of cosimulation engine depeding on what type of cosimulation framework one use. In either cosimulation framework, there are two factors that affect the accuracy of simulation in this step. First is the overhead of the I/O interface code itself in the processing element and the relevant OS scheduling. Second is the response time of the I/O device access considering the contention on the communication architecture. These two factors are solved straightforwardly in the common cosimulation frameworks such as Seamless CVE[8], because target OS itself is also cross-compiled and run on top of ISS with the device driver directly interacting with the OS scheduler. And the access to the communication architecture is simulated accurately since bus and memory architecutre are modeled in RTL. The situation is not different in the common TLM cosimulation framework such as ConvergenSC[9] except that bus

and memory architecutre are modeled in TLM. While our cosimulation framework[10] also employs ISS, it differs from the approach of these tools in that ours neither simulates OS on top of ISS, nor simulates bus and memory architecutre but models them inside the cosimulation engine for higher cosimulation performance. In our cosimulation framework, the former factor is accounted with OS simulation model in the cosimulation engine with I/O device model only accessed through predefined I/O interfaces. These APIs for this level of cosimulation is defined using primitive communicaton APIs that correctly interoperate with the cosimulation engine. The latter factor can be solved by the bus and memory model in the cosimulation engine since every access to I/O device model as well as other communication between the other system components is monitored by the cosimulation engine. I/O device model can be at any level of abstraction as long as the designer provides corresponding interface definition. When I/O device interface codes are retrieved from the interface library, the abstraction level of I/O device model and target OS are given as configuration parameters for a certain type of I/O device. It enables us a mixed level simulation of an I/O device and the other system components.

host binary use(dve1) // interface defintion void dev1_preinit() ; int dev1_read(); int dev1_write(); } while(1) { IOdev_read (); do_computation(); write_port(); } Block A while(1) { read_port(); do_computation(); IOdev_write(); // interface defintion void dev2_preinit() ; int dev2_read(); int dev2_write(); use(dev2) Block B

host I/O dev1

host OS

host I/O dev2

(a) Host code execution in functional simulation


target binary use(dve1) // interface defintion void dev1_preinit() ; int dev1_read(); int dev1_write(); } while(1) { IOdev_read (); do_computation(); write_port(); } Block A while(1) { read_port(); do_computation(); IOdev_write(); // interface defintion void dev2_preinit() ; int dev2_read(); int dev2_write(); use(dev2) Block B

target ISS

I/O model

host OS

host I/O dev1

host I/O dev2

(b) Performance estimation using ISS


target netlist // interface defintion entity IOdev_read component dev1_read component dev1_write end entity entity A is Block A target binary component IOdev_read end component; component write_port end component; begin main: process(clk) begin end process end entity I/O (dev1) simulator HW simulator cosimulation engine } Block B while(1) { read_port(); do_computation(); IOdev_write(); // interface defintion void dev2_preinit() ; int dev2_read(); int dev2_write(); target ISS I/O (dev2) simulator

host I/O dev1

host OS

host I/O dev2

(c) Cosimulation using ISS and HW simulator


target netlist // interface defintion entity IOdev_read component dev1_read component dev1_write end entity entity A is Block A target binary component IOdev_read end component; component write_port end component; begin main: process(clk) begin end process end entity FPGA shared memory } Block B while(1) { read_port(); do_computation(); IOdev_write(); //I/O model stub eth_read_request(id) // interface defintion void dev2_preinit() ; int dev2_read(); int dev2_write(); CPU Ethernet device I/O server dev1

real prototype board

host OS Ethernet Device I/O dev2

(d) Real prototyping board with I/O modeling server Figure4. Various level of abstraction for I/O device model and corresponding interface definition in different level of simulations

Figure 5. (a) Algorithm specifcation

(b) Architecture specification

void camera_preinit(IODEV *iodev_p) { iodev_p->read = camera_read; iodev_p->write = NULL; iodev_p->set_config = camera_set_config; iodev_p->get_config = camera_get_config; iodev_p->open = camera_open; iodev_p->close = camera_close; } int camera_read(char *buf, int size) { write_port(CAMERA_CMD_REG, cmd, cmd_size); read_port(CAMERA_BUF, buf, size); }

use(camera)

use(LCD)

void LCD_preinit(IODEV *iodev_p) { iodev_p->read =NULL;

fd = IOdev_open(camera); while(1) { IOdev_read(fd, buf, size); do_computation(); write_port(); }

fd = IOdev_open(LCD); while(1) { read_port(); do_computation(); IOdev_write(fd, buf, size); } }

iodev_p->write = LCD_write; iodev_p->set_config = LCD_set_config; iodev_p->get_config = LCD_get_config; iodev_p->open = LCD_open; iodev_p->close = LCD_close;

int LCD_write(char *buf, int size) { write_port(LCD_CMD_REG, cmd, cmd_size); camera_device command_reg data_buffer LCD_device command_reg frame_buffer } write_port(LCD_BUF, buf, size);

Figure 6. Proposed I/O device modeling and interface refinement for cosimulation step in videophone example In the real prototype board, it might be the case that I/O device driver for target board has not been ported yet but designers want to execute the image and verify the correctness of syntheses. In that case, having I/O model server in host machine and inserting I/O model stub in the I/O interface definition in the target board is one solution. This solution assumes that there is at least one ported device driver for communication device accessible in the target board (usually Ethernet device). Through that channel, I/O model stub that is invoked whenever target code tries to access the I/O device transmits an I/O request packet to the I/O model server that resides in host machine. I/O model server processes the request accessing the corresponding I/O device of host machine. It can be regarded as an extension form of semihosting (figure 4(d)). In summary, separation of I/O interface from computation in the specification and the selective refinements in the later steps in the codesign flow allows mixed level of abstraction for I/O and computation. It enables concurrent design of I/O modeling and the other system components. Since I/O interface design requires thorough knowledge about I/O device behavior and target OS APIs, it often becomes a bottleneck of the overall system design. By applying interface definition with high level of abstraction, this bottleneck can be avoided and it is even applicable to the real prototype board changed the camera and LCD display interface codes with proposed generic I/O interface APIs in order to apply the proposed modeling and refinement across different steps in codesign flow. Note that interface codes in this example are defined using predefined communication APIs such as read_port() and write_port() that interoperate with our simulation engine. The overall system performance of a videophone example on the processor configured as arm926ej-s was 161 msec when AC97 device latency was set to 40msec and camera latency to 16msec.

Conclusions

In this paper, we proposed an I/O modeling and refinement technique in a codesign methodology where an I/O device and its interface code with various levels of abstraction is automatically integrated and finally simulated in the unified framework. With explicit specification of I/O interface using predefined generic I/O interface APIs, different levels of I/O modeling depending on the design step is easily refined, which makes the I/O modeling retargettable and configurable. The viability of the methodology is confirmed with a videophone example.

Acknowledgements

Case Study

Figure 5(a) is an algorithm specification of videophone example. It consists of several tasks: H.263 encoder, H.263 decoder, G.723 encoder, G.723 decoder, network control task that connects to and accepts from the given IP addresses, and a task that demuxes AV packets into audio and video data. An architecture specification in figure 5(b) contains information about which type of processing element is used and how many of them are used. It also specifies which type of communication architecture is used for data and control transfer between the processing elements. We added I/O device components such as camera, LCD panel with the LCD controller, microphone, speaker, and Ethernet controller. A simplified version of H.263 encoder and decoder tasks and its access to the camera and LCD controller is depicted in figure 6 as an example. The block representing H.263 encoder explicitly indicates its access to a camera device (use(camera)). Code generation module in the framework retrieves relevant interface codes from interface library and builds corresponding API definition for the camera device. When interface codes are retrieved, important information of how many ports exist and what is the semantic of each port are also retrieved and passed to the modules subject to the remaining steps in the codesign flow. Those are very crucial information for automatic building of a virtual prototype such as assignment of the memory map specified from the architecture specification and also for our cosimulation framework that exploits task synchronization or module interaction for high simulation performance. Initially the interface codes for functional simulation of videophone example were written as Linux device drivers. We

This work was supported by National Research Laboratory Program (number M1-0104-00-0015), BK21 Project, IT Leading R&D Support Project, and ITSoC Project. The ICT at Seoul National University provided the research facilities for this study.

References
[1] ARM, http://www.arm.com [2] SimpleScalar, http://www.simplescalar.com/docs/simple_tutorial_v4.pdf/ [3] A. Bouchhima et al, Fast and Time-Accurate Timed Execution of High Level Embedded Software using HW/SW Interface Simulation Model, In Proc. ASP-DAC, Jan. 2004 [4] S. Wang et al., Modeling and Integration of Peripheral Devices in Embedded Systems, In Proc. DATE, Mar. 2003 [5] UDI, http://www.projectudi.org

[6] PeaCE, http://peace.snu.ac.kr/research/peace


[7] B. Kienhuis et al. An approach for quantitative analysis of application specific dataflow architectures, In Proc. ApplicationSpecific Systems, Architectures and Processors, Jul. 1997 [8] Seamless CVE, http://www.mentor.com/products/fv/hwsw_coverification/seamless/ [9] ConvergenSC, http://www.coware.com/products/convergensc.php [10] D. Kim et al, Trace-Driven HW/SW Cosimulation Using Virtual Synchronization Technique, In Proc. DAC, June. 2005

You might also like