Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

CAR: Using Cache as RAM in LinuxBIOS

Yinghai Lu AMD. yinghai.lu@amd.com Li-Ta Lo, Gregory R. Watson, Ronald G. Minnich Advanced om!uting La"oratory Los Alamos #ational La"oratory Los Alamos, #M $%&'&
{ollie, g(atson, rminnich}@lanl.gov

Abstract

LinuxBIOS is fast becoming a widely accepted alternative to the traditional P BIOS for cluster computing applications and embedded solution! LinuxBIOS is divided into several processing part! "# auto stage$ used to initiali%e memory and other necessary initiali%ation &# hardwaremain$ used to enumerate devices and allocate resource to them! 'he auto stage in LinuxBIOS v" is in assembly language( and in LinuxBIOS v& it is in language( but for x)* +e depends on ,O- ( it will compile the code to assembly code( the assembly code will only use registers as local variables( and don.t support call and stac/! In this paper we will show how to use cache in P0 as ,1- before the ,1- is initiali%ed! So the auto stage code in will be compiled with 2 and it will use cache in P0 as stac/! 'he 1, 3 ache as ,1-# in LinuxBIOS include three parts$ cache4as4ram!inc( cache4as4ram4post!c( copy4and4run!c ! 'he cache4as4ram!inc will initiali%e the cache and use that as stac/ and it will call real4main in cache4as4ram4auto!c at last5 the cache4as4ram4post!c is included into cache4as4ram4auto!c( and it is used to stop the cache as ram after ,1- is initli%ed and use ,1- as stac/5 'he copy4and4run!c is included into cache4as4ram4auto!c too! It is used to copy hardwaremain stage code to ram and 6mp to it! 'here is some specific processing about 1-7 and Intel! 0sing 1, in LinuxBIOS( we can reduce the si%e of auto stage( also can use the more complicated code in auto stage that is limited by ,Oand register! 'he solution can be used all x)* based platform( and may easier to port to other platform such as -IPS!

1 Introduction

LinuxBIOS 8&9 is an open:source replacement for the traditional P BIOS! 'he P BIOS was developed in the ";)<=s for the original IB- P ( and much of the functionality needed to support this legacy hardware still remains in the P BIOS today! In addition( the vintage operating systems that ran on these machines were dependent on the BIOS for carrying out many of the configuration activities needed for the system to function properly! -odern operating systems are now able to initiali%e and configure hardware directly( so there is no longer any reason for the BIOS to be involved! 'he basic principle behind a modern BIOS( li/e LinuxBIOS( is to do the minimum necessary to enable system hardware( then leave as much device configuration to the operating system as it can! 'he result of eliminating this unnecessary initiali%ation is a very fast boot time compared to a traditional BIOS! 1nother legacy feature provided by the P BIOS is a "*:bit callbac/ interface using the x)* software interrupt mechanism! >owever( only a tiny subset of the interface is used by modern operating systems( and in the case of Linux( it is not used at all! LinuxBIOS does not provide this interface( and as a result( is able to substantially reduce its memory footprint! ompared to &?*@B reAuired by the P BIOS( the typical si%e of LinuxBIOS is 6ust *B@B! 0nli/e the P BIOS( only a very small portion of LinuxBIOS is written in assembly code! Cor the x)* architecture( this is 6ust enough code to initiali%e the P0 and switch to D&:bit mode! 'he rest of LinuxBIOS is written in the language! 'his ma/es LinuxBIOS very portable across different architectures( and it has already been ported to support the 1lpha and PowerP processors! 0sing a high:level language also allows LinuxBIOS to employ a much more sophisticated ob6ect oriented device model( similar to the one used in the Linux /ernel! In such a model( each physical device has a corresponding software ob6ect! 'he ob6ect encapsulates information about the physical device and has methods to probe( initiali%e and configure the device! 'he main function of LinuxBIOS is really 6ust to organi%e( Auery and manage these device ob6ects! 'his /ind of device ob6ect model is unheard of in a BIOS implemented in assembly code! LinuxBIOS has been successfully deployed in a number of real world applications! 1t Los 1lamos Eational Laboratory( we have Pin/ and Lightning( two very large production cluster systems that use LinuxBIOS! 'here are also a number of companies shipping commercial LinuxBIOS:based systems! LinuxBIOS can be divided into several processing parts! "# auto stage$ used to initiali%e memory and

other necessary initiali%ation &# hardwaremain$ used to enumerate devices and allocating resource to them! 'he auto stage in LinuxBIOS v" is in assembly language( and in LinuxBIOS v& it is in language( but for x)* +e depends on ,Othat will compile the code to assembly code( the assembly code will only use registers as local variables( and don.t support call and stac/! In following sections of this paper we will show how to use cache in P0 as ,1- before the ,1- is initiali%ed! So the auto stage code in will be compiled with 2 and it will use cache in P0 as stac/! 'he 1, 3 ache as ,1-# support in LinuxBIOS include three parts$ cache4as4ram!inc( cache4as4ram4post!c( copy4and4run!c ! 'he cache4as4ram!inc will initiali%e the cache and use that as stac/ and it finally call real4main in cache4as4ram4auto!c5 the cache4as4ram4post!c is included into cache4as4ram4auto!c( and it is used to stop the cache as ram after ,1- is initiali%ed and use ,1- as stac/5 'he copy4and4run!c is included into cache4as4ram4auto!c too! It is used to copy hardwaremain stage code to ram and 6mp to it! 'here is some specific processing about 1-7 and Intel! 0sing 1, in LinuxBIOS( we can reduce the si%e of auto stage( also can ma/e the more complicated code in auto stage that is limited by ,Oand register! 'he solution can be used all x)* based platform( and maybe more easier to porting to other platform such as -IPS!

2 LinuxBIOS v2 structure and romcc

'he LinuxBIOS execution seAuences is a! System booting( the P0 will boot from reset"*!inc( it will 6ump to entry"*!inc b! entry"*!inc( will switch to D&:bit protected mode! c! auto!c( it will be compiled into auto!inc by ,O- ( and auto!inc will only use register as local variables! auto!c main purpose is initiali%ation memory controller or called EorthBridge! Cor 1-7 @) later( it will also initiali%e the >ypertransport lin/s between P0S and non:coherent device( that include >' routing table in nodes( and scan >' chains! 1nd do the reset to ma/e width and freAuency effective! d! crt<!S( actually it includes entry"*!inc and auto!inc( and will become crt<!s! the crt<!S will copy or un%ip the hardwaremain stage code into ,1-( and 6ump to it! e! 'he hardwaremain start from c4start!s( and call hardwaremain function!

,Ois the critical util that auto!c depends( It will compile auto!c into auto!inc( and it can be used by use --F and SSG registers to support complicated code! 'he switch is @)! 'he ,Ois done by Gric Biederman! It is used by LinuxBIOS v& for F)* from Hune &<<D to Hune &<<?! ,Ocan compile code to assembly code and the result will only use register as local variables! So we can write the ,1- initiali%ation code in instead of assembly directly! 1lso for 1-7 >' code( it would be more useful( because there is a lot of code need to done for >' initiali%ation! On the other side( there are some problems( a! the romcc!c is Auite long about &?(<<< line( and it is not easy maintained! 1nd it all depend on Gric( that is painful for other to help! b! ,Odoesn.t support stac/( so there is no call( and the code will be some big! Gsp for the ) way opteron system! c! the auto!c must be coded carefully to spare every byte( otherwise it will use out of register! It is painful to ma/e the >' initiali%ation wor/s with limited registers! 'he >' routing table and >' chain scan need a lot of time to debug to verify if the auto!c coding problem or romcc.s bug! d! for the complicated system( ) way Opteron System! the auto!c to auto!inc compile time is very long( need & or more minutes! e! the crt<!S in assembly need to be rewrite if we want to port that to other platform f! If we want to reuse >' initiali%ation code to PowerP and -IPS( +e need have new ,Ofor these platform!

ROMCC bene!its or "rob#ems

$ cache as ram ca##ing se%uences

'he LinuxBIOS v& with 1, enabled execution seAuences is a! System booting( the P0 will boot from reset"*!inc( it will 6ump to entry"*!inc b! entry"*!inc( will switch to D&:bit protected mode! c! cache4as4ram!inc( will initiali%e the cache in cpu( and prepare that for stac/ using( and call the real4main in cache4as4ram4auto!c at last! d! cache4as4ram4auto!c( it will be compiled into auto!inc by gcc( its main purpose is initiali%ation

e! f! h!

memory controller or called EorthBridge! Cor 1-7 @) later( it will also initiali%e the >ypertransport lin/s between P0S and non:coherent device( that include >' routing table in nodes( and scan >' chains! 1nd do the reset to ma/e width and freAuency effective! post4cache4as4ram!c is included in cache4as4ram4auto!c and it is actually with inline assembly languages! It will copy the stac/ from cache to ,1-( and set the new stac/ ( stop the cache4as4ram and also clear the first "- for hardwaremain! 1t last it will call copy4and4run! the copy4and4run will copy or un%ip hardwaremain in to ,1- and 6ump to it! 'hat is some /ind of crt<!S( but is in ! 'he hardwaremain start from c4start!s( and call hardwaremain function!

1lso cache4as4ram4auto!c can be compiled into auto!o directly( together with print/4init!c and vtxprintf!c into init!o to the last image to get more power full print/4debug in auto stage! 'hat can be enabled by setting OECI240SG4IEI' in -B onfig!lb

& cache'as'ram(inc

cache4as4ram!inc is some D< lines assembly code to enable cache in P0( propare that for using stac/! a! enable Iariable and Cixed -',,s )* +et the de,ault memory ty!e and ena"le ,i-ed and varia"le MTRRs *) movl .MTRRde,Ty!e/M+R, 0ec-orl 0ed-, 0ed)* 1na"le 2aria"le and 3i-ed MTRRs *) movl .4-44444c44, 0ea(rmsr b! clear -',,s c! use CIFG7 -',, to enable wanted range in 8<xc<<<<( <xd<<<<#( the type should be +B 3+riteBac/# IO! )* ena"le caching ,or 567)$7)'7 using ,i-ed mtrr *) movl .4-869, 0ec- )* ,i-':/c,444*) movl .4-46444444, 0ed- )* W; <= ty!e *) -orl 0ea-, 0ea(rmsr d! use Iariable -',, to enable cache for ,O)* ena"le (rite "ase caching so (e can do e-ecute in !lace * on the ,lash rom. *) movl .4-848, 0ec-orl 0ed-, 0edmovl .>?<@/R=M/;A+1 A MTRR/TY@1/WR;A 7B, 0ea(rmsr movl .4-84C, 0ecmovl .4-4444444,, 0edmovl .>D>?<@/R=M/+<E1 - 5B A 4-$44B, 0ea(rmsr e! enable the cache through ,< )* ena"le cache *) movl 0cr4, 0eaandl .4-9,,,,,,,,0eamovl 0ea-, 0cr4 f! read the wanted range with LO7SL )* Read the range (ith lodsl*) movl .> ache;aseF ache+iGe-'B, 0esi std movl .> ache+iGeHH8B, 0ecre! lodsl g! clear the eanted range with S'OSL )* lear the range *)

movl .> ache;aseF ache+iGe-'B, 0edi movl .> ache+iGeHH8B, 0ec-orl 0ea-, 0eare! stosl h! set the stac/ pointer to GSP movl .> ache;aseF ache+iGeB, 0eamovl 0ea-, 0es! i! call the cache4as4ram4main )* Restore the ;<+T result *) movl 0e"!, 0ea)* We need to set e"! I #o need *) movl 0es!, 0e"! !ushl 0ea- )* "ist *) call cache/as/ram/main )* We (ill not go "ac: *)

) "ost'cache'as'ram(c

Post4cache4as4ram will stop the cache4as4ram ( also it will clear the ,1- if needed! a! set access to ,1- at first with var mtrr b! copy data from cache to ,1memco!y>>void *B>> =#3<G/L;/M1M/T=@7JJ54B-D A K1/RAM/+<E1B, >void *BD A K1/RAM/;A+1, D A K1/RAM/+<E1BL ))inline to set the new stac/ in ,1-! c! +e need to use //asm// volatile > )* set ne( es! *) )* "e,ore /RAM;A+1 *) Msu"l 04, 00e"!NnNtM Msu"l 04, 00es!NnNtM OOMaM> >D A K1/RAM/;A+1 F D A K1/RAM/+<E1B- > =#3<G/L;/M1M/T=@7JJ54B B BL to set the new stac/ in ,1-! d! call disable4cache4as4ram e! call copy4and4run!c will li/e following )*co!y and e-ecute linu-"ios/ram *) co!y/and/run>ne(/c!u/resetBL )* We (ill not return *) P !rint/de"ug>Mshould not "e here -NrNnMBL

*( disab#e'cache'as'ram(c

Post4cache4as4ram will stop the cache4as4ram! a! disable the cache )* We donQt need cache as ram ,or no( on *) )* disa"le cache *) Mmovl 0cr4, 0ea-NnNtM Morl .>4-5JJC4B,0ea-NnNtM Mmovl 0ea-, 0cr4NnNtM b! clear CIFG7 -',, )* clear sth *) Mmovl .4-869, 0ec-NnNtM )* ,i-':/c$444*) M-orl 0ed-, 0ed-NnNtM M-orl 0ea-, 0ea-NnNtM M(rmsrNnNtM c! enable the cache )* ena"le cache *) Mmovl 0cr4, 0ea-NnNtM Mandl .4-9,,,,,,,,0ea-NnNtM

Mmovl 0ea-, 0cr4NnNtM MinvdNnNtM

+ co",'and'run(c

a! find out src and dest for hardwaremain stage code or linuxbios4ram //asm// volatile > Mleal 'F/liseg, 04NnNtM Mleal /iseg, 05NnNtM O MRaM >srcB , MR"M >dstB BL b! un%ip the code is from nrv&v!c c! 6mp to hardaremain //asm// volatile > McliNnNtM Mleal /iseg, 0ediNnNtM MSm! * 0ediNnNtM BL

- cache'as'ram'auto(c

It will include pos4cache4as4ram!c to stop the cache4as4ram and set the new stac/ and call the final copy4and4run!

- AM. s"eci!ic/ s"ecia# MSR about mtrr

Cor 1-7 Opteron( +e need to set specific -S, to modify the -',, ena"le/,i-ed/mtrr/dram/modi,yO movl .+Y+ 3G/M+R, 0ecrdmsr andl .>D>+Y+ 3G/M+R/Mtrr3i-Dram1nA+Y+ 3G/M+R/Mtrr2arDram1nBB, 0eaorl .+Y+ 3G/M+R/Mtrr3i-DramMod1n, 0ea(rmsr

12 0oods !or CAR

'he ar avoid the problems that ,Ocause( and also ma/e crt<!S to be in too! 'he benefits include a! small code si%e in auto stage( because +e have stac/ and real call! b! can code auto stage more easier( don.t need to worry about out of registers! c! compile time get less than &< seconds! d! ma/e us more easier to use our code copy4and4run!c and >' related code to other platform such as PowerP and -IP*B!

In this paper( we have described 1, 3 ache:as:,1-#( +e can use 2 to auto stage code( can ma/e the code smaller( and will not face the out of registers error in ,O- ( 1lso can ma/e use to easy port >' support code to PowerP and -IPS! also can resue copy4and4run!c mostly! 'he 1, has been verified in 'yan S&))?JS&);"JS&);? with 1-7 Opteron dual core or single core P0! +e are loo/ing to verify 1, in other F)* platform!

Conc#usion

Re!erences

8"9 1-7! ;<=+ and 7ernel Develo!erQs Guide ,or AMD Athlon 6' and AMD =!teron @rocessors( -ay &<<?! 8&9 ,on -innich( Hames >endric/s( and 7ale +ebster! 'he Linux BIOS! In @roceedings o, the 3ourth Annual Linu+ho(case and on,erence( 1tlanta( 21( October &<<<! 8D9 Li:'a Lo( 2regory ,! +atson( ,onald 2! -innich! CreeI21$ 1rchitecture Independent Iideo 2raphics Initiali%ation for LinuxBIOS! In 844' T+1#<? Annual Techinical on,erence(

You might also like