Inline Assembly x86

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 119

PageGCCInlineAssemblyHOWTO1of13GCCInlineAssemblyHOWTO

Sandeep.S
v0.1,01March2003.
ThisHOWTOexplainstheuseandusageoftheinlineassemblyfeatureprovidedbyGCC.Thereareonly
twoprerequisitesforreadingthisarticle,andthatsobviouslyabasicknowledgeofx86assembly
languageandC.

1.Introduction.

1.1CopyrightandLicense.

1.2FeedbackandCorrections.

1.3Acknowledgments.

2.Overviewofthewholething.
3.GCCAssemblerSyntax.
4.BasicInline.
5.ExtendedAsm.

5.1AssemblerTemplate.

5.2Operands.

5.3ClobberList.

5.4Volatile...?

6.Moreaboutconstraints.

6.1Commonlyusedconstraints.

6.2ConstraintModifiers.

7.SomeUsefulRecipes.

8.ConcludingRemarks.
9.References.
1.Introduction.
1.1CopyrightandLicense.
Copyright(C)2003SandeepS.

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO2of13Thisdocumentisfreeyoucanredistributeand/ormodify
thisunderthetermsoftheGNUGeneralPublicLicenseaspublishedbytheFreeSoftwareFoundation
eitherversion2oftheLicense,or(atyouroption)anylaterversion.
This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.SeetheGNUGeneralPublicLicenseformoredetails.

1.2FeedbackandCorrections.
KindlyforwardfeedbackandcriticismtoSandeep.S.Iwillbeindebtedtoanybodywhopointsouterrors
andinaccuraciesinthisdocumentIshallrectifythemassoonasIaminformed.

1.3Acknowledgments.
IexpressmysincereappreciationtoGNUpeopleforprovidingsuchagreatfeature.Thanksto
Mr.PramodeCEforallthehelpshedid.ThankstofriendsattheGovtEngineeringCollege,Trichurfor
theirmoralsupportandcooperation,especiallytoNishaKururandSakeebS.Thankstomydearteachers
atGovtEngineeringCollege,Trichurfortheircooperation.
Additionally,thankstoPhillip,BrennanUnderwoodandcolin@nyx.netManythingshereare
shamelesslystolenfromtheirworks.

2.Overviewofthewholething.
WeareheretolearnaboutGCCinlineassembly.Whatthisinlinestandsfor?
Wecaninstructthecompilertoinsertthecodeofafunctionintothecodeofitscallers,tothepointwhere
actuallythecallistobemade.Suchfunctionsareinlinefunctions.SoundssimilartoaMacro?Indeed
therearesimilarities.
Whatisthebenefitofinlinefunctions?
Thismethodofinliningreducesthefunctioncalloverhead.Andifanyoftheactualargumentvaluesare
constant,theirknownvaluesmaypermitsimplificationsatcompiletimesothatnotalloftheinline
functionscodeneedstobeincluded.Theeffectoncodesizeislesspredictable,itdependsonthe
particularcase.Todeclareaninlinefunction,wevetousethekeywordinlineinitsdeclaration.
Nowweareinapositiontoguesswhatisinlineassembly.Itsjustsomeassemblyroutineswrittenas
inlinefunctions.Theyarehandy,speedyandverymuchusefulinsystemprogramming.Ourmainfocusis
tostudythebasicformatandusageof(GCC)inlineassemblyfunctions.Todeclareinlineassembly
functions,weusethekeywordasm.
InlineassemblyisimportantprimarilybecauseofitsabilitytooperateandmakeitsoutputvisibleonC
variables.Becauseofthiscapability,"asm"worksasaninterfacebetweentheassemblyinstructionsand
the"C"programthatcontainsit.

3.GCCAssemblerSyntax.
GCC, the GNU C Compiler for Linux, uses AT&T&solUNIX assembly syntax. Here well be using
AT&T syntax for assemblycoding. Dont worry if you are notfamiliar with AT&T syntax, I willteach

you.ThisisquitedifferentfromIntelsyntax.Ishallgivethemajordifferences.
1.SourceDestinationOrdering.

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO
3of13ThedirectionoftheoperandsinAT&TsyntaxisoppositetothatofIntel.InIntelsyntaxthefirst
operandisthedestination,andthesecondoperandisthesourcewhereasinAT&Tsyntaxthefirst
operandisthesourceandthesecondoperandisthedestination.ie,
"Opcodedstsrc"inIntelsyntaxchangesto
"Opcodesrcdst"inAT&Tsyntax.
2.RegisterNaming.
Registernamesareprefixedby%ie,ifeaxistobeused,write%eax.
3.ImmediateOperand.
AT&Timmediateoperandsareprecededby$.Forstatic"C"variablesalsoprefixa$.InIntelsyntax,
forhexadecimalconstantsanhissuffixed,insteadofthat,hereweprefix0xtotheconstant.So,for
hexadecimals,wefirstseea$,then0xandfinallytheconstants.
4.OperandSize.
InAT&Tsyntaxthesizeofmemoryoperandsisdeterminedfromthelastcharacteroftheopcodename.
Opcodesuffixesofb,w,andlspecifybyte(8bit),word(16bit),andlong(32bit)memory
references.Intelsyntaxaccomplishesthisbyprefixingmemoryoperands(nottheopcodes)withbyte
ptr,wordptr,anddwordptr.
Thus,Intel"moval,byteptrfoo"is"movbfoo,%al"inAT&Tsyntax.
5.MemoryOperands.
InIntelsyntaxthebaseregisterisenclosedin[and]whereasinAT&Ttheychangeto(and).
Additionally,inIntelsyntaxanindirectmemoryreferenceislike
section&colon[base&plusindex&astscale&plusdisp],whichchangesto
section&colondisp(base,index,scale)inAT&T.
Onepointtobearinmindisthat,whenaconstantisusedfordisp&solscale,$shouldntbeprefixed.
Nowwesawsomeof themajordifferencesbetween IntelsyntaxandAT&Tsyntax.Ivewroteonlyafew
of them. For acompleteinformation,refer toGNU Assemblerdocumentations.Nowwell look atsome
examplesforbetterunderstanding.
+++|IntelCode|AT&TCode|
+++|moveax,1|movl$1,%eax||movebx,0ffh|movl$0xff,%ebx
||int80h|int$0x80||movebx,eax|movl%eax,%ebx||moveax,[ecx]|movl(%ecx),%eax||mov
eax,[ebx&plus3]|movl3(%ebx),%eax||moveax,[ebx+20h]|movl0x20(%ebx),%eax||addeax,[ebx+ecx*2h]|addl
(%ebx,%ecx,0x2),%eax||leaeax,[ebx+ecx]|leal(%ebx,%ecx),%eax||subeax,[ebx+ecx*4h20h]|subl
0x20(%ebx,%ecx,0x4),%eax|+++

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO4of134.BasicInline.
Theformatofbasicinlineassemblyisverymuchstraightforward.Itsbasicformis
asm("assemblycode")

Example.
asm("movl%ecx%eax")/*movesthecontentsofecxtoeax*/__asm__("movb%bh(%eax)")/*movesthebyte
frombhtothememorypointedbyeax*/

YoumighthavenoticedthathereIveusedasmand__asm__.Botharevalid.Wecanuse__asm__ifthe
keywordasmconflictswithsomethinginourprogram.Ifwehavemorethanoneinstructions,wewrite
oneperlineindoublequotes,andalsosuffixa\nand\ttotheinstruction.Thisisbecausegccsends
eachinstructionasastringtoas(GAS)andbyusingthenewline&soltabwesendcorrectlyformatted
linestotheassembler.
Example.
__asm__("movl%eax,%ebx\n\t"
"movl$56,%esi\n\t""movl%ecx,$label(%edx,%ebx,$4)\n\t""movb%ah,(%ebx)")

Ifinourcodewetouch(ie,changethecontents)someregistersandreturnfromasmwithoutfixingthose
changes,somethingbadisgoingtohappen.ThisisbecauseGCChavenoideaaboutthechangesinthe
registercontentsandthisleadsustotrouble,especiallywhencompilermakessomeoptimizations.Itwill
supposethatsomeregistercontainsthevalueofsomevariablethatwemighthavechangedwithout
informingGCC,anditcontinueslikenothinghappened.Whatwecandoiseitherusethoseinstructions
havingnosideeffectsorfixthingswhenwequitorwaitforsomethingtocrash.Thisiswherewewant
someextendedfunctionality.Extendedasmprovidesuswiththatfunctionality.

5.ExtendedAsm.
Inbasicinlineassembly,wehadonlyinstructions.Inextendedassembly,wecanalsospecifythe
operands.Itallowsustospecifytheinputregisters,outputregistersandalistofclobberedregisters.Itis
notmandatorytospecifytheregisterstouse,wecanleavethatheadachetoGCCandthatprobablyfit
intoGCCsoptimizationschemebetter.Anywaythebasicformatis:
asm(assemblertemplate
:outputoperands/*optional*/:inputoperands/*optional*/:listofclobberedregisters/*optional*/)

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO5of13Theassemblertemplateconsistsofassemblyinstructions.
EachoperandisdescribedbyanoperandconstraintstringfollowedbytheCexpressioninparentheses.A
colonseparatestheassemblertemplatefromthefirstoutputoperandandanotherseparatesthelastoutput
operandfromthefirstinput,ifany.Commasseparatetheoperandswithineachgroup.Thetotalnumber
ofoperandsislimitedtotenortothemaximumnumberofoperandsinanyinstructionpatterninthe
machinedescription,whicheverisgreater.
Iftherearenooutputoperandsbutthereareinputoperands,youmustplacetwoconsecutivecolons
surroundingtheplacewheretheoutputoperandswouldgo.
Example:
asm("cld\n\t""rep\n\t""stosl":/*nooutputregisters*/:"c"(count),"a"(fill_value),"D"(dest):"%ecx","%edi")

Now, whatdoes thiscode do? The above inlinefills the fill_value count timesto the locationpointedto
bytheregister edi.Italsosaystogccthat,thecontentsofregisterseaxandediarenolongervalid.Letus
seeonemoreexampletomakethingsmoreclearer.
inta=10,basm("movl%1,%%eax
movl%%eax,%0":"=r"(b)/*output*/:"r"(a)/*input*/:"%eax"/*clobberedregister*/)

Herewhatwedidiswemadethevalueofbequaltothatofausingassemblyinstructions.Some
pointsofinterestare:

"b"istheoutputoperand,referredtoby%0and"a"istheinputoperand,referredtoby%1.

"r"isaconstraintontheoperands.Wellseeconstraintsindetaillater.Forthetimebeing,"r"saysto
GCCtouseanyregisterforstoringtheoperands.outputoperandconstraintshouldhaveaconstraint
modifier"=".Andthismodifiersaysthatitistheoutputoperandandiswriteonly.

Therearetwo%sprefixedtotheregistername.ThishelpsGCCtodistinguishbetweentheoperandsand
registers.operandshaveasingle%asprefix.

Theclobberedregister%eaxafterthethirdcolontellsGCCthatthevalueof%eaxistobemodified
inside"asm",soGCCwontusethisregistertostoreanyothervalue.
Whentheexecutionof"asm"iscomplete,"b"willreflecttheupdatedvalue,asitisspecifiedasanoutput
operand.Inotherwords,thechangemadeto"b"inside"asm"issupposedtobereflectedoutsidethe
"asm".
Nowwemaylookeachfieldindetail.

5.1AssemblerTemplate.
http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO6of13Theassemblertemplatecontainsthesetofassembly
instructionsthatgetsinsertedinsidetheCprogram.Theformatislike:eithereachinstructionshouldbe
enclosedwithindoublequotes,ortheentiregroupofinstructionsshouldbewithindoublequotes.Each
instructionshouldalsoendwithadelimiter.Thevaliddelimitersarenewline(\n)andsemicolon(&semi).
\nmaybefollowedbyatab(\t).Weknowthereasonofnewline/tab,right?.Operandscorrespondingto
theCexpressionsarerepresentedby%0,%1...etc.

5.2Operands.
Cexpressionsserveasoperandsfortheassemblyinstructionsinside"asm".Eachoperandiswrittenas
firstanoperandconstraintindoublequotes.Foroutputoperands,therellbeaconstraintmodifieralso
withinthequotesandthenfollowstheCexpressionwhichstandsfortheoperand.ie,
"constraint"(Cexpression)isthegeneralform.Foroutputoperandsanadditionalmodifierwillbethere.
Constraintsareprimarilyusedtodecidetheaddressingmodesforoperands.Theyarealsousedin
specifyingtheregisterstobeused.
Ifweusemorethanoneoperand,theyareseparatedbycomma.
Intheassemblertemplate,eachoperandisreferencedbynumbers.Numberingisdoneasfollows.Ifthere
areatotalofnoperands(bothinputandoutputinclusive),thenthefirstoutputoperandisnumbered0,
continuinginincreasingorder,andthelastinputoperandisnumberedn1.Themaximumnumberof
operandsisaswesawintheprevioussection.
Outputoperandexpressionsmustbelvalues.Theinputoperandsarenotrestrictedlikethis.Theymaybe
expressions.Theextendedasmfeatureismostoftenusedformachineinstructionsthecompileritselfdoes
notknowasexisting).Iftheoutputexpressioncannotbedirectlyaddressed(forexample,itisa
bitfield),ourconstraintmustallowaregister.Inthatcase,GCCwillusetheregisterastheoutputofthe
asm,andthenstorethatregistercontentsintotheoutput.
Asstatedabove,ordinaryoutputoperandsmustbewriteonlyGCCwillassumethatthevaluesinthese
operandsbeforetheinstructionaredeadandneednotbegenerated.Extendedasmalsosupports
inputoutputorreadwriteoperands.
Sonowweconcentrateonsomeexamples.Wewanttomultiplyanumberby5.Forthatweusethe
instructionlea.
asm("leal(%1,%1,4),%0"
:"=r"(five_times_x):"r"(x))

Hereourinputisinx.Wedidntspecifytheregistertobeused.GCCwillchoosesomeregisterfor
input,oneforoutputanddoeswhatwedesired.Ifwewanttheinputandoutputtoresideinthesame
register,wecaninstructGCCtodoso.Hereweusethosetypesofreadwriteoperands.Byspecifying
properconstraints,herewedoit.
asm("leal(%0,%0,4),%0"
:"=r"(five_times_x):"0"(x))

Nowtheinputandoutputoperandsareinthesameregister.Butwedontknowwhichregister.Nowifwe
wantto

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO7of13specifythatalso,thereisaway.
asm("leal(%%ecx,%%ecx,4),%%ecx"
:"=c"(x):"c"(x))

Inallthethreeexamplesabove,wedidntputanyregistertotheclobberlist.why?Inthefirsttwo
examples,GCCdecidestheregistersanditknowswhatchangeshappen.Inthelastone,wedonthaveto
putecxontheclobberlist,gccknowsitgoesintox.Therefore,sinceitcanknowthevalueofecx,itisnt
consideredclobbered.

5.3ClobberList.
Someinstructionsclobbersomehardwareregisters.Wehavetolistthoseregistersintheclobberlist,ie
thefieldafterthethird:intheasmfunction.Thisistoinformgccthatwewilluseandmodifythem
ourselves.Sogccwillnotassumethatthevaluesitloadsintotheseregisterswillbevalid.Weshoudnt
listtheinputandoutputregistersinthislist.Because,gccknowsthat"asm"usesthem(becausetheyare
specifiedexplicitlyasconstraints).Iftheinstructionsuseanyotherregisters,implicitlyorexplicitly(and
theregistersarenotpresenteitherininputorintheoutputconstraintlist),thenthoseregistershavetobe
specifiedintheclobberedlist.
Ifourinstructioncanaltertheconditioncoderegister,wehavetoadd"cc"tothelistofclobbered
registers.
If our instruction modifies memory in an unpredictable fashion, add "memory"to the list of clobbered
registers. This will cause GCC to not keep memory values cached in registers across the assembler
instruction. Wealso haveto add the volatile keywordifthe memoryaffected isnotlistedintheinputsor
outputsoftheasm.
Wecanreadandwritetheclobberedregistersasmanytimesaswelike.Considertheexampleofmultiple
instructionsinatemplateitassumesthesubroutine_fooacceptsargumentsinregisterseaxandecx.
asm("movl%0,%%eaxmovl%1,%%ecxcall_foo":/*nooutputs*/:"g"(from),"g"(to):"eax","ecx")

5.4Volatile...?
Ifyouarefamiliarwithkernelsourcesorsomebeautifulcodelikethat,youmusthaveseenmany
functionsdeclaredasvolatileor__volatile__whichfollowsanasmor__asm__.Imentionedearlierabout
thekeywordsasmand__asm__.Sowhatisthisvolatile?
Ifourassemblystatementmustexecutewhereweputit,(i.e.mustnotbemovedoutofaloopasan
optimization),putthekeywordvolatileafterasmandbeforethe()s.Sotokeepitfrommoving,deleting
andall,wedeclareitas
asmvolatile(...:...:...:...)

Use__volatile__whenwehavetobeverymuchcareful.

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO8of13Ifourassemblyisjustfordoingsomecalculationsand
doesnthaveanysideeffects,itsbetternottousethekeywordvolatile.Avoidingithelpsgccin
optimizingthecodeandmakingitmorebeautiful.
InthesectionSomeUsefulRecipes,Ihaveprovidedmanyexamplesforinlineasmfunctions.Therewe
canseetheclobberlistindetail.

6.Moreaboutconstraints.
Bythistime,youmighthaveunderstoodthatconstraintshavegotalottodowithinlineassembly.But
wevesaidlittleaboutconstraints.Constraintscansaywhetheranoperandmaybeinaregister,and
whichkindsofregisterwhethertheoperandcanbeamemoryreference,andwhichkindsofaddress
whethertheoperandmaybeanimmediateconstant,andwhichpossiblevalues(ierangeofvalues)itmay
have....etc.

6.1Commonlyusedconstraints.
Thereareanumberofconstraintsofwhichonlyafewareusedfrequently.Wellhavealookatthose
constraints.
1.Registeroperandconstraint(r)
Whenoperandsarespecifiedusingthisconstraint,theygetstoredinGeneralPurposeRegisters(GPR).
Takethefollowingexample:
asm("movl%%eax,%0\n":"=r"(myval))

Herethevariablemyvaliskeptinaregister,thevalueinregistereaxiscopiedontothatregister,andthe
valueofmyvalisupdatedintothememoryfromthisregister.Whenthe"r"constraintisspecified,gcc
maykeepthevariableinanyoftheavailableGPRs.Tospecifytheregister,youmustdirectlyspecifythe
registernamesbyusingspecificregisterconstraints.Theyare:
+++ | r | Register(s) | +++ | a|%eax,%ax,%al||b|%ebx,%bx,%bl||c|%ecx,
%cx,%cl||d|%edx,%dx,%dl||S|%esi,%si||D|%edi,%di|+++

2.Memoryoperandconstraint(m)
Whentheoperandsareinthememory,anyoperationsperformedonthemwilloccurdirectlyinthe
memorylocation,asopposedtoregisterconstraints,whichfirststorethevalueinaregistertobemodified
andthenwriteitbacktothememorylocation.Butregisterconstraintsareusuallyusedonlywhentheyare
absolutelynecessaryforaninstructionortheysignificantlyspeeduptheprocess.Memoryconstraintscan
beusedmostefficientlyincaseswhereaCvariableneedstobeupdatedinside"asm"andyoureallydont
wanttousearegistertoholditsvalue.Forexample,thevalueofidtrisstoredinthememorylocationloc:
asm("sidt%0\n"::"m"(loc))

3.Matching(Digit)constraints
Insomecases,asinglevariablemayserveasboththeinputandtheoutputoperand.Suchcasesmaybe
specifiedin"asm"byusingmatchingconstraints.

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html

6/25/2010

PageGCCInlineAssemblyHOWTO
9of13asm("incl%0":"=a"(var):"0"(var))

Wesawsimilarexamplesinoperandssubsectionalso.Inthisexampleformatchingconstraints,the
register%eaxisusedasboththeinputandtheoutputvariable.varinputisreadto%eaxandupdated
%eaxisstoredinvaragainafterincrement."0"herespecifiesthesameconstraintasthe0thoutput
variable.Thatis,itspecifiesthattheoutputinstanceofvarshouldbestoredin%eaxonly.Thisconstraint
canbeused:

Incaseswhereinputisreadfromavariableorthevariableismodifiedandmodificationiswrittenbackto
thesamevariable.

Incaseswhereseparateinstancesofinputandoutputoperandsarenotnecessary.
Themostimportanteffectofusingmatchingrestraintsisthattheyleadtotheefficientuseofavailable
registers.
Someotherconstraintsusedare:
1."m":Amemoryoperandisallowed,withanykindofaddressthatthemachinesupportsingeneral.2.
"o":Amemoryoperandisallowed,butonlyiftheaddressisoffsettable.ie,addingasmalloffsettothe
addressgivesavalidaddress.3."V":Amemoryoperandthatisnotoffsettable.Inotherwords,
anythingthatwouldfitthe`mconstraintbut
notthe`oconstraint.4."i":Animmediateintegeroperand(onewithconstantvalue)isallowed.This
includessymbolicconstants
whosevalueswillbeknownonlyatassemblytime.5."n":Animmediateintegeroperandwitha
knownnumericvalueisallowed.Manysystemscannotsupportassemblytimeconstantsforoperandsless
thanawordwide.Constraintsfortheseoperandsshouldusenratherthani.6."g":Anyregister,
memoryorimmediateintegeroperandisallowed,exceptforregistersthatarenotgeneral
registers.
Followingconstraintsarex86specific.
1."r":Registeroperandconstraint,looktablegivenabove.2."q":Registersa,b,cord.3."I":Constant
inrange0to31(for32bitshifts).4."J":Constantinrange0to63(for64bitshifts).5."K":0xff.6."L"
:0xffff.7."M":0,1,2,or3(shiftsforleainstruction).8."N":Constantinrange0to255(forout
instruction).9."f":Floatingpointregister10."t":First(topofstack)floatingpointregister11."u":
Secondfloatingpointregister12."A":Specifiesthe`aor`dregisters.Thisisprimarilyusefulfor64bit
integervaluesintendedtobereturnedwiththe`dregisterholdingthemostsignificantbitsandthe`a
registerholdingtheleastsignificantbits.

6.2ConstraintModifiers.
Whileusingconstraints,formoreprecisecontrolovertheeffectsofconstraints,GCCprovidesuswith

constraintmodifiers.Mostlyusedconstraintmodifiersare
1."=":Meansthatthisoperandiswriteonlyforthisinstructionthepreviousvalueisdiscardedand
replacedby
outputdata.2."&":Meansthatthisoperandisanearlyclobberoperand,whichismodifiedbeforethe
instructionisfinished
usingtheinputoperands. Therefore,thisoperandmaynotlieinaregisterthatisusedasaninputoperand
or as part of anymemory address.Aninputoperandcanbetiedtoanearlyclobberoperandifitsonlyuse
asaninputoccursbeforetheearlyresultiswritten.

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO
10of13Thelistandexplanationofconstraintsisbynomeanscomplete.Examplescangiveabetter
understandingoftheuseandusageofinlineasm.Inthenextsectionwellseesomeexamples,therewell
findmoreaboutclobberlistsandconstraints.

7.SomeUsefulRecipes.
NowwehavecoveredthebasictheoryaboutGCCinlineassembly,nowweshallconcentrateonsome
simpleexamples.ItisalwayshandytowriteinlineasmfunctionsasMACROs.Wecanseemanyasm
functionsinthekernelcode.(/usr/src/linux/include/asm/*.h).
1.Firstwestartwithasimpleexample.Wellwriteaprogramtoaddtwonumbers.
intmain(void){
intfoo=10,bar=15__asm____volatile__("addl%%ebx,%%eax"
:"=a"(foo):"a"(foo),"b"(bar))printf("foo+bar=%d\n",foo)return0}

HereweinsistGCCtostorefooin%eax,barin%ebxandwealsowanttheresultin%eax.The=sign
showsthatitisanoutputregister.Nowwecanaddanintegertoavariableinsomeotherway.
__asm____volatile__(
"lock\n""addl%1,%0\n":"=m"(my_var):"ir"(my_int),"m"(my_var):/*noclobberlist*/)

Thisisanatomicaddition.Wecanremovetheinstructionlocktoremovetheatomicity.Intheoutput
field,"=m"saysthatmy_varisanoutputanditisinmemory.Similarly,"ir"saysthat,my_intisan
integerandshouldresideinsomeregister(recallthetablewesawabove).Noregistersareintheclobber
list.
2.Nowwellperformsomeactiononsomeregisters/variablesandcomparethevalue.
__asm____volatile__("decl%0sete%1"
:"=m"(my_var),"=q"(cond):"m"(my_var):"memory")

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO
11of13Here,thevalueofmy_varisdecrementedbyoneandiftheresultingvalueis0then,thevariable
condisset.Wecanaddatomicitybyaddinganinstruction"lock\n\t"asthefirstinstructioninassembler
template.
Inasimilarwaywecanuse"incl%0"insteadof"decl%0",soastoincrementmy_var.
Pointstonoteherearethat(i)my_varisavariableresidinginmemory.(ii)condisinanyoftheregisters
eax,ebx,ecxandedx.Theconstraint"=q"guaranteesit.(iii)Andwecanseethatmemoryisthereinthe
clobberlist.ie,thecodeischangingthecontentsofmemory.
3.Howtoset&solclearabitinaregister?Asnextrecipe,wearegoingtoseeit.
__asm____volatile__("btsl%1,%0"
:"=m"(ADDR):"Ir"(pos):"cc")

Here,thebitatthepositionposofvariableatADDR(amemoryvariable)issetto1Wecanusebtrl
forbtsltoclearthebit.Theconstraint"Ir"ofpossaysthat,posisinaregister,anditsvalueranges
from031(x86dependantconstraint).ie,wecanset&solclearanybitfrom0thto31stofthevariableat
ADDR.Astheconditioncodeswillbechanged,weareadding"cc"toclobberlist.
4.Nowwelookatsomemorecomplicatedbutusefulfunction.Stringcopy.
staticinlinechar*strcpy(char*dest,constchar*src){intd0,d1,d2__asm____volatile__("1:\tlodsb\n\t"
"stosb\n\t""testb%%al,%%al\n\t""jne1b":"=&S"(d0),"=&D"(d1),"=&a"(d2):"0"(src),"1"(dest):"memory")return
dest}

Thesourceaddressisstoredinesi,destinationinedi,andthenstartsthecopy,whenwereachat0,
copyingiscomplete.Constraints"&S","&D","&a"saythattheregistersesi,ediandeaxareearly
clobberregisters,ie,theircontentswillchangebeforethecompletionofthefunction.Herealsoitsclear
thatwhymemoryisinclobberlist.
Wecanseeasimilarfunctionwhichmovesablockofdoublewords.Noticethatthefunctionisdeclared
asamacro.
#definemov_blk(src,dest,numwords)\__asm____volatile__(\"cld\n\t"\"rep\n\t"\"movsl"\

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO
12of13:\:"S"(src),"D"(dest),"c"(numwords)\:"%ecx","%esi","%edi"\)

Herewehavenooutputs,sothechangesthathappentothecontentsoftheregistersecx,esiandediare
sideeffectsoftheblockmovement.Sowehavetoaddthemtotheclobberlist.
5.InLinux,systemcallsareimplementedusingGCCinlineassembly.Letuslookhowasystemcallis
implemented.Allthesystemcallsarewrittenasmacros(linux/unistd.h).Forexample,asystemcallwith
threeargumentsisdefinedasamacroasshownbelow.
#define_syscall3(type,name,type1,arg1,type2,arg2,type3,arg3)\typename(type1arg1,type2arg2,type3arg3)\{\
long__res\__asm__volatile("int$0x80"\
:"=a"(__res)\:"0"(__NR_##name),"b"((long)(arg1)),"c"((long)(arg2)),\
"d"((long)(arg3)))\__syscall_return(type,__res)\}

Wheneverasystemcallwiththreeargumentsismade,themacroshownaboveisusedtomakethecall.
Thesyscallnumberisplacedineax,theneachparametersinebx,ecx,edx.Andfinally"int0x80"isthe
instructionwhichmakesthesystemcallwork.Thereturnvaluecanbecollectedfromeax.
Everysystemcallsareimplementedinasimilarway.Exitisasingleparametersyscallandletsseehow
itscodewilllooklike.Itisasshownbelow.
{
asm("movl$1,%%eax/*SYS_exitis1*/
xorl%%ebx,%%ebx/*Argumentisinebx,itis0*/int$0x80"/*Enterkernelmode*/)}

Thenumberofexitis"1"andhere,itsparameteris0.Sowearrangeeaxtocontain1andebxtocontain
0andbyint$0x80,theexit(0)isexecuted.Thisishowexitworks.

8.ConcludingRemarks.
ThisdocumenthasgonethroughthebasicsofGCCInlineAssembly.Onceyouhaveunderstoodthebasic
conceptitisnotdifficulttotakestepsbyyourown.Wesawsomeexampleswhicharehelpfulin
understandingthefrequentlyusedfeaturesofGCCInlineAssembly.
GCCInliningisavastsubjectandthisarticleisbynomeanscomplete.Moredetailsaboutthesyntaxs
wediscussed

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

PageGCCInlineAssemblyHOWTO13of13aboutisavailableintheofficialdocumentationforGNU
Assembler.Similarly,foracompletelistoftheconstraintsrefertotheofficialdocumentationofGCC.
Andofcourse,theLinuxkerneluseGCCInlineinalargescale.Sowecanfindmanyexamplesof
variouskindsinthekernelsources.Theycanhelpusalot.
Ifyouhavefoundanyglaringtypos,oroutdatedinfointhisdocument,pleaseletusknow.

9.References.
1.BrennansGuidetoInlineAssembly2.UsingAssemblyLanguageinLinux3.Usingas,TheGNU
Assembler4.UsingandPortingtheGNUCompilerCollection(GCC)5.LinuxKernelSource

http://www.ibiblio.org/gferg/ldp/GCCInlineAssemblyHOWTO.html
6/25/2010

UsingInlineAssemblyWithgcc
ClarkL.Coleman(plagiarist/researcher)

1.0Overview
ThisisacompilationinFrameMakerofthreepublicdomaindocumentswrittenbyothers.There
isnooriginalcontentaddedbymyself.Thethreedocumentsare:
1.Aportionofthegccinfopageforgcc2.8.1,dealingwiththesubjectofinlineassembly
language.
2.AtutorialbyBrennanUnderwood.
3.Atutorialbycolin@nyx.net.

2.0Informationfromthegccinfopages
2.1GeneralandCopyrightInformation
ThisisInfofilegcc.info,producedbyMakeinfo1.55fromtheinputfilegcc.texi.
ThisfiledocumentstheuseandtheinternalsoftheGNUcompiler.
PublishedbytheFreeSoftwareFoundation59TemplePlaceSuite330Boston,MA
021111307USA
Copyright(C)1988,1989,1992,1993,1994,1995FreeSoftwareFoundation,Inc.
Permissionisgrantedtomakeanddistributeverbatimcopiesofthismanualprovidedthe
copyrightnoticeandthispermissionnoticearepreservedonallcopies.
Permissionisgrantedtocopyanddistributemodifiedversionsofthismanualunderthe
conditionsforverbatimcopying,providedalsothatthesectionsentitledGNUGeneralPublic
License,FundingforFreeSoftware,andProtectYourFreedomFightLookAndFeelare
includedexactlyasintheoriginal,andprovidedthattheentireresultingderivedworkis
distributedunderthetermsofapermissionnoticeidenticaltothisone.
File:gcc.info,Node:ExtendedAsm,Next:AsmLabels,Prev:Inline,Up:CExtensions

2.2AssemblerInstructionswithCExpressionOperands
Inanassemblerinstructionusingasm,youcannowspecifytheoperandsoftheinstruction
usingCexpressions.Thismeansnomoreguessingwhichregistersormemorylocationswill
containthedatayouwanttouse.
UsingInlineAssemblyWithgccJanuary11,20001

Youmustspecifyanassemblerinstructiontemplatemuchlikewhatappearsinamachine
description,plusanoperandconstraintstringforeachoperand.
Forexample,hereishowtousethe68881sfsinxinstruction:
asm(fsinx%1,%0:=f(result):f(angle))

HereangleistheCexpressionfortheinputoperandwhileresultisthatoftheoutputoperand.
Eachhasfasitsoperandconstraint,sayingthatafloatingpointregisterisrequired.The=
in=findicatesthattheoperandisanoutputalloutputoperandsconstraintsmustuse=.
Theconstraintsusethesamelanguageusedinthemachinedescription(*noteConstraints::.).
EachoperandisdescribedbyanoperandconstraintstringfollowedbytheCexpressionin
parentheses.Acolonseparatestheassemblertemplatefromthefirstoutputoperand,andanother
separatesthelastoutputoperandfromthefirstinput,ifany.Commasseparateoutputoperands
andseparateinputs.Thetotalnumberofoperandsislimitedtotenortothemaximumnumberof
operandsinanyinstructionpatterninthemachinedescription,whicheverisgreater.
Iftherearenooutputoperands,andthereareinputoperands,thentheremustbetwo
consecutivecolonssurroundingtheplacewheretheoutputoperandswouldgo.
Outputoperandexpressionsmustbelvaluesthecompilercancheckthis.Theinputoperands
neednotbelvalues.Thecompilercannotcheckwhethertheoperandshavedatatypesthatare
reasonablefortheinstructionbeingexecuted.Itdoesnotparsetheassemblerinstruction
templateanddoesnotknowwhatitmeans,orwhetheritisvalidassemblerinput.Theextended
asmfeatureismostoftenusedformachineinstructionsthatthecompileritselfdoesnotknow
exist.Iftheoutputexpressioncannotbedirectlyaddressed(forexample,itisabitfield),your
constraintmustallowaregister.Inthatcase,GNUCCwillusetheregisterastheoutputofthe
asm,andthenstorethatregisterintotheoutput.
TheoutputoperandsmustbewriteonlyGNUCCwillassumethatthevaluesinthese
operandsbeforetheinstructionaredeadandneednotbegenerated.Extendedasmdoesnot
supportinputoutputorreadwriteoperands.Forthisreason,theconstraintcharacter+,which
indicatessuchanoperand,maynotbeused.
Whentheassemblerinstructionhasareadwriteoperand,oranoperandinwhichonlysomeof
thebitsaretobechanged,youmustlogicallysplititsfunctionintotwoseparateoperands,one
inputoperandandonewriteonlyoutputoperand.Theconnectionbetweenthemisexpressedby
constraintswhichsaytheyneedtobeinthesamelocationwhentheinstructionexecutes.You
canusethesameCexpressionforbothoperands,ordifferentexpressions.Forexample,herewe
writethe(fictitious)combineinstructionwithbarasitsreadonlysourceoperandandfooas
itsreadwritedestination:
asm(combine%2,%0
:=r(foo):0(foo),g(bar))

UsingInlineAssemblyWithgccJanuary11,20002

Theconstraint0foroperand1saysthatitmustoccupythesamelocationasoperand0.A
digitinconstraintisallowedonlyinaninputoperand,anditmustrefertoanoutputoperand.
Only a digit in the constraint can guarantee that one operand will be in the same place as
another. The mere fact that foo is the value of both operands is not enough to guarantee that
theywillbeinthesameplaceinthegeneratedassemblercode.Thefollowingwouldnotwork:
asm(combine%2,%0
:=r(foo):r(foo),g(bar))

Variousoptimizationsorreloadingcouldcauseoperands0and1tobeindifferentregisters
GNUCCknowsnoreasonnottodoso.Forexample,thecompilermightfindacopyofthe
valueoffooinoneregisteranduseitforoperand1,butgeneratetheoutputoperand0ina
differentregister(copyingitafterwardtofoosownaddress).Ofcourse,sincetheregisterfor
operand1isnotevenmentionedintheassemblercode,theresultwillnotwork,butGNUCC
canttellthat.
Someinstructionsclobberspecifichardregisters.Todescribethis,writeathirdcolonafterthe
inputoperands,followedbythenamesoftheclobberedhardregisters(givenasstrings).Hereis
arealisticexamplefortheVax:
asmvolatile(movc3%0,%1,%2
:/*nooutputs*/:g(from),g(to),g(count):r0,r1,r2,r3,r4,r5)

Ifyourefertoaparticularhardwareregisterfromtheassemblercode,thenyouwillprobably
havetolisttheregisterafterthethirdcolontotellthecompilerthattheregistersvalueis
modified.Inmanyassemblers,theregisternamesbeginwith%toproduceone%inthe
assemblercode,youmustwrite%%intheinput.
Ifyourassemblerinstructioncanaltertheconditioncoderegister,addcctothelistof
clobberedregisters.GNUCConsomemachinesrepresentstheconditioncodesasaspecific
hardwareregisterccservestonamethisregister.Onothermachines,theconditioncodeis
handleddifferently,andspecifyingcchasnoeffect.Butitisvalidnomatterwhatthemachine.
Ifyourassemblerinstructionmodifiesmemoryinanunpredictablefashion,addmemoryto
thelistofclobberedregisters.ThiswillcauseGNUCCtonotkeepmemoryvaluescachedin
registersacrosstheassemblerinstruction.
Youcanputmultipleassemblerinstructionstogetherinasingleasmtemplate,separated
eitherwithnewlines(writtenas\n)orwithsemicolonsiftheassemblerallowssuch
semicolons.TheGNUassemblerallowssemicolonsandallUnixassemblersseemtodoso.The
inputoperandsareguaranteednottouseanyoftheclobberedregisters,andneitherwillthe
outputoperandsaddresses,soyoucanreadandwritetheclobberedregisters
UsingInlineAssemblyWithgccJanuary11,20003

asmanytimesasyoulike.Hereisanexampleofmultipleinstructionsinatemplateitassumes
thatthesubroutine_fooacceptsargumentsinregisters9and10:
asm(movl%0,r9movl%1,r10call_foo
:/*nooutputs*/:g(from),g(to):r9,r10)

Unlessanoutputoperandhasthe&constraintmodifier,GNUCCmayallocateitinthesame
registerasanunrelatedinputoperand,ontheassumptionthattheinputsareconsumedbefore
theoutputsareproduced.Thisassumptionmaybefalseiftheassemblercodeactuallyconsistsof
morethanoneinstruction.Insuchacase,use&foreachoutputoperandthatmaynotoverlap
aninput.*NoteModifiers::.
Ifyouwanttotesttheconditioncodeproducedbyanassemblerinstruction,youmustincludea
branchandalabelintheasmconstruct,asfollows:
asm(clr%0frob%1beq0fmov#1,%00:
:g(result):g(input))

Thisassumesyourassemblersupportslocallabels,astheGNUassemblerandmostUnix
assemblersdo.
Speaking of labels, jumps from one asm to another are not supported. The compilers
optimizers do not know about thesejumps,andthereforetheycannottakeaccount ofthemwhen
decidinghowtooptimize.
Usuallythemostconvenientwaytousetheseasminstructionsistoencapsulatethemin
macrosthatlooklikefunctions.Forexample,
#definesin(x)\
({double__value,__arg=(x)\
asm(fsinx%1,%0\
:=f(__value)\:f(__arg))\__value})

Herethevariable__argisusedtomakesurethattheinstructionoperatesonaproperdouble
value,andtoacceptonlythoseargumentsxwhichcanconvertautomaticallytoadouble.
Anotherwaytomakesuretheinstructionoperatesonthecorrectdatatypeistouseacastin
theasm.Thisisdifferentfromusingavariable__arginthatitconvertsmoredifferenttypes.
Forexample,ifthedesiredtypewereint,castingtheargumenttointwouldacceptapointer
withnocomplaint,whileassigningtheargumenttoanintvariablenamed__argwouldwarn
aboutusingapointerunlessthecallerexplicitlycastsit.
Ifanasmhasoutputoperands,GNUCCassumesforoptimizationpurposesthatthe
instructionhasnosideeffectsexcepttochangetheoutputoperands.Thisdoesnotmeanthat
instructionswithasideeffectcannotbeused,butyoumustbecareful,becausethecompilermay

eliminatethemiftheoutputoperandsarentused,ormovethemoutofloops,orreplacetwowith
oneiftheyconstituteacommonsubexpression.Also,ifyour
UsingInlineAssemblyWithgccJanuary11,20004

instructiondoeshaveasideeffectonavariablethatotherwiseappearsnottochange,theold
valueofthevariablemaybereusedlaterifithappenstobefoundinaregister.
Youcanpreventanasminstructionfrombeingdeleted,movedsignificantly,orcombined,
bywritingthekeywordvolatileaftertheasm.Forexample:
#defineset_priority(x)\
asmvolatile(set_priority%0\:/*nooutputs*/\:g(x))

Aninstructionwithoutoutputoperandswillnotbedeletedormovedsignificantly,regardless,
unlessitisunreachable.
Notethatevenavolatileasminstructioncanbemovedinwaysthatappearinsignificantto
thecompiler,suchasacrossjumpinstructions.Youcantexpectasequenceofvolatileasm
instructionstoremainperfectlyconsecutive.Ifyouwantconsecutiveoutput,useasingleasm.
Itisanaturalideatolookforawaytogiveaccesstotheconditioncodeleftbytheassembler
instruction.However,whenweattemptedtoimplementthis,wefoundnowaytomakeitwork
reliably.Theproblemisthatoutputoperandsmightneedreloading,whichwouldresultin
additionalfollowingstoreinstructions.Onmostmachines,theseinstructionswouldalterthe
conditioncodebeforetherewastimetotestit.Thisproblemdoesntariseforordinarytestand
compareinstructionsbecausetheydonthaveanyoutputoperands.
IfyouarewritingaheaderfilethatshouldbeincludableinANSICprograms,write__asm__
insteadofasm.*NoteAlternateKeywords::.
UsingInlineAssemblyWithgccJanuary11,20005

3.0BrennansGuidetoInlineAssembly
byBrennanBasUnderwoodDocumentversion1.1.2.2

3.1Introduction
Ok.ThisismeanttobeanintroductiontoinlineassemblyunderDJGPP.DJGPPisbasedon
GCC,soitusestheAT&T/UNIXsyntaxandhasasomewhatuniquemethodofinlineassembly.
IspentmanyhoursfiguringsomeofthisstuffoutandtoldInfothatIhateit,manytimes.
HopefullyifyoualreadyknowIntelsyntax,theexampleswillbehelpfultoyou.Iveputvariable
names,registernamesandotherliteralsinboldtype.

3.2TheSyntax
So,DJGPPusestheAT&Tassemblysyntax.Whatdoesthatmeantoyou?
*Registernaming:Registernamesareprefixedwith%.Toreferenceeax:
AT&T:%eaxIntel:eax
* Source/Destination Ordering: In AT&T syntax (which is the UNIX standard, BTW) the
source is always on the left, and the destination isalwaysontheright.Soletsloadebxwiththe
valueineax:
AT&T:movl%eax,%ebxIntel:movebx,eax
*Constantvalue/immediatevalueformat:Youmustprefixallconstant/immediatevalueswith
$.LetsloadeaxwiththeaddressoftheCvariablebooga,whichisstatic.
AT&T:movl$_booga,%eaxIntel:moveax,_booga
Nowletsloadebxwith0xd00d:
AT&T:movl$0xd00d,%ebxIntel:movebx,d00dh
*Operatorsizespecification:Youmustsuffixtheinstructionwithoneofb,w,orltospecify
thewidthofthedestinationregisterasabyte,wordorlongword.Ifyouomitthis,GAS(GNU
assembler)willattempttoguess.YoudontwantGAStoguess,andguesswrong!Dontforget
it.
AT&T:movw%ax,%bxIntel:movbx,ax
TheequivalentformsforIntelisbyteptr,wordptr,anddwordptr,butthatisforwhenyou
are...
UsingInlineAssemblyWithgccJanuary11,20006

*Referencingmemory:DJGPPuses386protectedmode,soyoucanforgetallthatrealmode
addressingjunk,includingtherestrictionsonwhichregisterhaswhatdefaultsegment,which
registerscanbebaseorindexpointers.Now,wejustget6generalpurposeregisters.(ifyouuse
ebp,butbesuretorestoreityourselforcompilewithfomitframepointer.)Hereisthe
canonicalformatfor32bitaddressing:
AT&T:immed32(basepointer,indexpointer,indexscale)
Intel:[basepointer+indexpointer*indexscale+immed32]
Youcouldthinkoftheformulatocalculatetheaddressas:
immed32+basepointer+indexpointer*indexscale
Youdonthavetouseallthosefields,butyoudohavetohaveatleast1ofimmed32,
basepointerandyouMUSTaddthesizesuffixtotheoperator!Letsseesomesimpleformsof
memoryaddressing:
oAddressingaparticularCvariable:
AT&T:_boogaIntel:[_booga]
Note:theunderscore(_)ishowyougetatstatic(global)Cvariablesfromassembler.
Thisonlyworkswithglobalvariables.Otherwise,youcanuseextendedasmtohavevariables
preloadedintoregistersforyou.Iaddressthatfartherdown.
oAddressingwhataregisterpointsto:
AT&T:(%eax)Intel:[eax]
oAddressingavariableoffsetbyavalueinaregister:
AT&T:_variable(%eax)Intel:[eax+_variable]
oAddressingavalueinanarrayofintegers(scalingupby4):
AT&T:_array(,%eax,4)Intel:[eax*4+array]
oYoucanalsodooffsetswiththeimmediatevalue:
Ccode:*(p+1)wherepisachar*
AT&T:1(%eax)whereeaxhasthevalueofp
Intel:[eax+1]
oYoucandosomesimplemathontheimmediatevalue:
AT&T:_struct_pointer+8
UsingInlineAssemblyWithgccJanuary11,20007

IassumeyoucandothatwithIntelformataswell.
oAddressingaparticularcharinanarrayof8characterrecords:eaxholdsthenumberof
therecorddesired.ebxhasthewantedcharsoffsetwithintherecord.
AT&T:_array(%ebx,%eax,8)Intel:[ebx+eax*8+_array]
Whew.Hopefullythatcoversalltheaddressingyoullneedtodo.Asanote,youcanputesp
intotheaddress,butonlyasthebaseregister.

3.3Basicinlineassembly
Theformatforbasicinlineassemblyisverysimple,andmuchlikeBorlandsmethod.
asm(statements)
Prettysimple,no?So
asm(nop)
willdonothingofcourse,and
asm(cli)
willstopinterrupts,with
asm(sti)
ofcourseenablingthem.Youcanuse__asm__insteadofasmifthekeywordasmconflicts
withsomethinginyourprogram.
Whenitcomestosimplestufflikethis,basicinlineassemblyisfine.Youcanevenpushyour
registersontothestack,usethem,andputthemback.
asm(pushl%eax\n\tmovl$0,%eax\n\tpopl%eax)
(The \ns and \ts are there so the .s file that GCC generates and hands to GAS comes out right
when youve got multiple statements per asm.) Its really meant for issuing instruc tions for
whichthereisnoequivalentinCanddonttouchtheregisters.
Butifyoudotouchtheregisters,anddontfixthingsattheendofyourasmstatement,likeso:
asm(movl%eax,%ebx)asm(xorl%ebx,%edx)asm(movl$0,_booga)
thenyourprogramwillprobablyblowthingstohell.ThisisbecauseGCChasntbeentoldthat
yourasmstatementclobberedebxandedxandbooga,whichitmighthavebeenkeepingina
register,andmightplanonusinglater.Forthat,youneed:
UsingInlineAssemblyWithgccJanuary11,20008

3.4Extendedinlineassembly
Thebasicformatoftheinlineassemblystaysmuchthesame,butnowgetsWatcomlike
extensionstoallowinputargumentsandoutputarguments.
Hereisthebasicformat:
asm(statements:output_registers:input_registers:clobbered_registers)
Letsjustjumpstraighttoaniftyexample,whichIllthenexplain:
asm(cld\n\trep\n\tstosl:/*nooutputregisters*/:c(count),a(fill_value),D
(dest):%ecx,%edi)
Theabovestoresthevalueinfill_valuecounttimestothepointerdest.
Letslookatthisbitbybit.
asm(cld\n\t
Weareclearingthedirectionbitoftheflagsregister.Youneverknowwhatthisisgoingtobe
leftat,anditcostsyouallof1or2cycles.
rep\n\tstosl
NoticethatGASrequirestherepprefixtooccupyalineofitsown.Noticealsothatstoshasthel
suffixtomakeitmovelongwords.
:/*nooutputregisters*/
Well,therearentanyinthisfunction.
:c(count),a(fill_value),D(dest)
Hereweloadecxwithcount,eaxwithfill_value,andediwithdest.WhymakeGCCdoit
insteadofdoingitourselves?BecauseGCC,initsregisterallocating,mightbeabletoarrange
for,say,fill_valuetoalreadybeineax.Ifthisisinaloop,itmightbeabletopreserveeaxthru
theloop,andsaveamovlonceperloop.
:%ecx,%edi)
And heres where we specify to GCC, you can no longer count on the values you loaded into
ecx or edi to be valid. This doesnt mean they will be reloaded for certain. This is the
clobberlist.
Seemfunky?Well,itreallyhelpswhenoptimizing,whenGCCcanknowexactlywhatyoure
doingwiththeregistersbeforeandafter.Itfoldsyourassemblycodeintothecodeitgenerates
(whoserulesforgenerationlookremarkablyliketheabove)andthenoptimizes.Itsevensmart
enoughtoknowthatifyoutellittoput(x+1)inaregister,thenif
UsingInlineAssemblyWithgccJanuary11,20009

youdontclobberit,andlaterCcoderefersto(x+1),anditwasabletokeepthatregisterfree,it
willreusethecomputation.Whew.
Heresthelistofregisterloadingcodesthatyoullbelikelytouse:
aeaxbebxcecxdedxSesiDediIconstantvalue(0to31)q,rdynamicallyallocatedregister
(seebelow)geax,ebx,ecx,edxorvariableinmemoryAeaxandedxcombinedintoa64bit
integer(uselonglongs)
Notethatyoucantdirectlyrefertothebyteregisters(ah,al,etc.)orthewordregisters(ax,bx,
etc.)whenyoureloadingthisway.Onceyouvegotitinthere,though,youcanspecifyaxor
whateverallyoulike.
Thecodeshavetobeinquotes,andtheexpressionstoloadinhavetobeinparentheses.
Whenyoudotheclobberlist,youspecifytheregistersasabovewiththe%.Ifyouwritetoa
variable,youmustincludememoryasoneofTheClobbered.Thisisincaseyouwrotetoa
variablethatGCCthoughtithadinaregister.Thisisthesameasclobberingallregisters.
WhileIveneverrunintoaproblemwithit,youmightalsowanttoaddccasaclobberifyou
changetheconditioncodes(thebitsintheflagsregisterthejnz,je,etc.operatorslookat.)
Now,thatsallfineandgoodforloadingspecificregisters.Butwhatifyouspecify,say,ebx,and
ecx,andGCCcantarrangeforthevaluestobeinthoseregisterswithouthavingtostashthe
previousvalues.ItspossibletoletGCCpicktheregister(s).Youdothis:
asm(leal(%1,%1,4),%0:=r(x):0(x))
Theaboveexamplemultipliesxby5reallyquickly(1cycleonthePentium).Now,wecould
havespecified,sayeax.Butunlesswereallyneedaspecificregister(likewhenusingrepmovsl
orrepstosl,whicharehardcodedtouseecx,edi,andesi),whynotletGCCpickanavailable
one?SowhenGCCgeneratestheoutputcodeforGAS,%0willbereplacedbytheregisterit
picked.
Andwheredidqandrcomefrom?Well,qcausesGCCtoallocatefromeax,ebx,ecx,
andedx.rletsGCCalsoconsideresiandedi.Somakesure,ifyouuserthatitwouldbe
possibletouseesiorediinthatinstruction.Ifnot,useq.
Now,youmightwonder,howtodeterminehowthe%ntokensgetallocatedtothearguments.
Itsastraightforwardfirstcomefirstserved,lefttorightthing,mappingtotheqsandrs.
Butifyouwanttoreusearegisterallocatedwithaqorr,youuse0,1,2...etc.
YoudontneedtoputaGCCallocatedregisterontheclobberlistasGCCknowsthatyoure
messingwithit.
Nowforoutputregisters.
UsingInlineAssemblyWithgccJanuary11,200010

asm(leal(%1,%1,4),%0:=r(x_times_5):r(x))
Notetheuseof=tospecifyanoutputregister.Youjusthavetodoitthatway.Ifyouwant1
variabletostayin1registerforbothinandout,youhavetorespecifytheregisterallocatedtoit
onthewayinwiththe0typecodesasmentionedabove.
asm(leal(%0,%0,4),%0:=r(x):0(x))
Thisalsoworks,bytheway:
asm(leal(%%ebx,%%ebx,4),%%ebx:=b(x):b(x))
2thingshere:
*Notethatwedonthavetoputebxontheclobberlist,GCCknowsitgoesintox.Therefore,
sinceitcanknowthevalueofebx,itisntconsideredclobbered.
*Noticethatinextendedasm,youmustprefixregisterswith%%insteadofjust%.Why,you
ask?BecauseasGCCparsesalongfor%0sand%1sandsoon,itwouldinterpret%edxasa
%eparameter,seethatthatsnonexistent,andignoreit.Thenitwouldbitchaboutfindinga
symbolnameddx,whichisntvalidbecauseitsnotprefixedwith%anditsnottheoneyou
meantanyway.
Importantnote:Ifyourassemblystatementmustexecutewhereyouputit,(i.e.mustnotbe
movedoutofaloopasanoptimization),putthekeywordvolatileafterasmandbeforethe()s.
Tobeultracareful,use
__asm____volatile__(...whatever...)
However,Iwouldliketopointoutthatifyourassemblysonlypurposeistocalculatetheoutput
registers,withnoothersideeffects,youshouldleaveoffthevolatilekeywordsoyourstatement
willbeprocessedintoGCCscommonsubexpressioneliminationoptimization.

3.5Someusefulexamples
#definedisable()__asm____volatile__(cli)#defineenable()__asm____volatile__(sti)
UsingInlineAssemblyWithgccJanuary11,200011

Ofcourse,libchasthesedefinedtoo.
#definetimes3(arg1,arg2)\
__asm__(\
leal(%0,%0,2),%0\:=r(arg2)\:0(arg1))#definetimes5(arg1,arg2)\
__asm__(\
leal(%0,%0,4),%0\:=r(arg2)\:0(arg1))#definetimes9(arg1,arg2)\
__asm__(\
leal(%0,%0,8),%0\:=r(arg2)\:0(arg1))

Thesemultiplyarg1by3,5,or9andputtheminarg2.Youshouldbeoktodo:
times5(x,x)
aswell.
#definerep_movsl(src,dest,numwords)\
__asm____volatile__(\
cld\n\t\rep\n\t\movsl\::S(src),D(dest),c(numwords)\:%ecx,%esi,%edi)

HelpfulHint:Ifyousaymemcpy()withaconstantlengthparameter,GCCwillinlineittoarep
movsllikeabove.Butifyouneedavariablelengthversionthatinlinesandyourealways
movingdwords,thereyago.
#definerep_stosl(value,dest,numwords)\
__asm____volatile__(\
cld\n\t\rep\n\t\stosl\::a(value),D(dest),c(numwords)\:%ecx,%edi)

Sameasabovebutformemset(),whichdoesntgetinlinednomatterwhat(fornow.)
#defineRDTSC(llptr)({\__asm____volatile__(\
.byte0x0f.byte0x31\:=A(llptr)\::eax,edx)})

ReadstheTimeStampCounteronthePentiumandputsthe64bitresultintollptr.

3.6TheEnd
TheEnd?!Yah,Iguessso.
Ifyourewondering,IpersonallyamabigfanofAT&T/UNIXsyntaxnow.(Itmighthave
helpedthatIcutmyteethonSPARCassembly.Ofcourse,thatmachineactuallyhada
UsingInlineAssemblyWithgccJanuary11,200012

decentnumberofgeneralregisters.)Itmightseemweirdtoyouatfirst,butitsreallymore
logicalthanIntelformat,andhasnoambiguities.
IfIstillhaventansweredaquestionofyours,lookintheInfopagesformoreinformation,
particularlyontheinput/outputregisters.YoucandosomefunkystufflikeuseAtoallocate
tworegistersatoncefor64bitmathormforstaticmemorylocations,andabunchmorethat
arentreallyusedasmuchasqandr.
Alternately,mailme,andIllseewhatIcando.(Ifyoufindanyerrorsintheabove,please,
emailmeandtellmeaboutit!Itsfrustratingenoughtolearnwithoutbuggydocs!)Orheck,
mailmetosayboogabooga.
Itstheleastyoucando.

RelatedUsenetposts:
*locallabels*fixedpointmultiplies
ThankstoEricJ.Korpela
<korpela@ssl.Berkeley.EDU>forsomecorrections.
HaveyouseentheDJGPP2+GamesPage?Probably.Pagewritten
andprovidedbyBrennanUnderwood.Copyright1996BrennanUnderwood.Shareandenjoy!
Pagecreatedwithvi,Godsowneditor.
UsingInlineAssemblyWithgccJanuary11,200013

4.0ABriefTutorialonGCCinlineasm(x86biased)
colin@nyx.net,20April1998
I am a great fan of GCCs inline asm feature, because there is no need to secondguess or
outsmart the compiler. You can tell the compiler what you are doing and what you expectofit,
anditcanworkwithitandoptimizeyourcode.
However,onaconvolutedprocessorlikethex86,describingjustwhatisgoingoncanbequitea
complexjob.Intheinterestofafasterkernelthroughappropriateusageofthispowerfultool,
hereisanintroductiontoitsuse.

4.1Extendedasm,anintroduction.
InanicecleanregisterregisterRISCarchitecture,accessinganoccasionalfooinstructionis
quitesimple.Youjustwrite:
asm(foo%1,%2,%0
:=r(output):r(input1),r(input2))

Thepartbeforethefirstcolonisverymuchlinethesemistandardasm()featurethathasbeenin
manyCcompilerssincetheK&Rdays.Thestringispastedintothecompilersassemblyoutput
atthecurrentlocation.
However,GCCisrathercleverer.WhatwillactuallyappearintheoutputofgccOSfoo.c(a
filenamedfoo.s)is:
#APPfoor17,r5,r9#NO_APP

The #APP and #NO_APP parts are instructions to the assembler that briefly put it into
normal operating mode, as opposed to the special highspeed compileroutputmodethatturns
off every feature that the compiler doesnt use as well as a lot of errorchecking. For our
purposes,itsconvenientbecuaseithighlightsthepartofthecodewereinterestedin.
Between,youwillseethatthe%1andsoforthhaveturnedintoregisters.ThisisbecauseGCC
replaced%0,%1and%2withregistersholdingthefirstthreeargumentsafterthecolon.
Thatis,r17holdsinput1,r5holdsinput2,andr9holdsoutput.
Itsperfectlylegaltousemorecomplexexpressionslike:
asm(foo%1,%2,%0
:=r(ptr>vtable[3](a,b,c)>foo.bar[baz])::r(gcc(is)+really(damn>cool)),r(42))
UsingInlineAssemblyWithgccJanuary11,200014

GCCwilltreatthisjustlike:
registerintt0,t1,t2t1=gcc(is)+really(damn>cool)t2=42asm(foo%1,%2,%0
:=r(t0):r(t1),r(t2))ptr>vtable[3](a,b,c)>foo.bar[baz]=t0

Thegeneralformofanasm()is
asm(code:outputs:inputs:clobbers)

Withinthecode,%0referstothefirstargument(usuallyanoutput,unlessthereareno
outputs),%1tothesecond,andsoforth.Itonlygoesupto%9.NotethatGCCprependsatab
andappendsanewlinetothecode,soifyouwanttoincludemultilineasm(whichislegal)and
youwantittolookniceintheasmoutput,youshouldseparatelineswith\n\t.(Youllseelots
ofexamplesofthisintheLinuxsource.)Itsalsolegaltouseasaseparatortoputmorethan
oneasmstatementonaline.
Thereareoptionlettersthatyoucanputbetweenthe%andthedigittoprinttheoperand
speciallymoreonthislater.
Eachoutputorinputinthecommaseparatedlisthastwoparts,constraintsand(value).The
(value)partisprettystraightforward.Itsanexpression.Foroutputs,itmustbeanlvalue,i.e.
somethingthatislegaltohaveontheleftsideofanassignment.
Theconstraintsaremoreinteresting.Alloutputsmustbemarkedwith=,whichsaysthatthis
operandisassignedto.Imnotsurewhythisisnecessary,sinceyoualsohavetodivideup
outputsandinputswiththecolon,butImnotinclinedtomakeafussaboutit,sinceitseasyto
doonceyouknow.
Thelettersthatcomeafterthatgivepermittedoperands.Therearemorechoicesthanyoumight
think.Somedependontheprocessor,butthereareafewthataregeneric.
r,asexamplermmeansaregisterormemory.rimeansaregisteroranimmediatevalue.
gisgeneralitcanbeanythingatall.Itsusuallyequivalenttorim,butyourprocessor
mayhaveevenmoreoptionsthatareincluded.oislikem,butoffsettable,meaningthat
youcanaddasmalloffsettoit.Onthex86,allmemoryoperandsareoffsettable,butsome
machinesdontsupportindexinganddisplacementatthesametime,orhavesomethinglikethe
680x0sautoincrementaddressingmodethatdoesntsupportadisplacement.
Capital letters starting with I are usually assigned to immediate values in a certain range. For
example, a lot of RISC machines allow either a register or a short immediate value. If our
machineisliketheDECAlpha,andallowsaregisterora16bitimmediate,youcouldwrite
asm(foo%1,%2,%0
:=r(output):r(input1),rI(input2))
UsingInlineAssemblyWithgccJanuary11,200015

andifinput2were,say,42,thecompilerwoulduseanimmediateconstantintheinstruction.
Thex86specificconstraintsaredefinedlater.

4.2Afewnotesaboutinputs
Aninputmaybeatemporarycopy,butitmaynotbe.UnlessyoutellGCCthatyouaregoingto
modifythatlocation(describedlaterinequivalenceconstraints),youmustnotalteranyinputs.
GCCmay,however,electtoplaceanoutputinthesameregisterasaninputifitdoesntneedthe
inputvalueanymore.Youmustnotmakeassumptionseitherway.Ifyouneedtohaveitone
wayortheother,thereareways(describedlater)totellGCCwhatyouneed.
TheruleinGCCsinlineasmis,saywhatyouneedandthengetoutoftheoptimizersway.

4.3x86assemblycode
TheGNUtoolsusedinLinuxuseanAT&Tdevelopedassemblysyntaxthatisdifferentfromthe
Inteldevelopedonethatyouseeinalotofexamplecode.Itsalotsimpler,actually.Itdoesnt
haveanyoftheDWORDPTRstuffthattheIntelsyntaxrequires.
Themostsignificantdifference,however,isamajoroneandeasytogetconfusedby.WhileIntel
usesopdest,src,AT&Tsyntaxusesopsrc,dest.DONTFORGETTHIS.Ifyoureusedto
Intelsyntax,thiscantakequiteawhiletogetusedto.
Theeasywaytoknowwhichflavourofasmsyntaxyourereadingistolookforallthe%
synbols.AT&Tnamestheregisters%eax,%ebx,etc.Thisavoidstheneedforakludgelike
putting_infrontofallthefunctionandvariablenamestoavoidusingperfectlygoodCnames
likeesp.Itseasyenoughtoread,butdontforgetitwhenwriting.
Theothermajordifferenceisthattheoperandsizeisclearfromtheinstruction.Youdonthave
justinc,youhaveincb,incwandincltoincrement8,16or32bits.Ifthesizeisclear
fromtheoperands,youcanjustwriteinc,(e.g.inc%eax),butifitsamemoryoperand,
ratherthanwritingincDWORDPTRfooyoujustwroteinclfoo.incfooisanerrorthe
assemblerdoesnttrytokeeptrackofthetypeofanything.Writingincl%alisanerrorwhich
theassemblercatches.
Immediatevaluesarewrittenwithaleading$.Thus,movlfoo,%eaxcopiesthecontentsof
memorylocationfoointo%eax.movl$foo,%eaxcopiestheaddressoffoo.movl42,%eaxis
afetchfromanabsoluteaddress.movel$42,%eaxisanimmediateload.
UsingInlineAssemblyWithgccJanuary11,200016

Addressingmodesarewrittenoffset(base,index,scale).Youmayleaveoutanythingirrelevant.
So(%ebx)islegal,asis44(%ebx,%eax),whichisequivalentto44(%ebx,%eax,1).Legal
scalesare1,24and8.

4.4Equivalenceconstraints
Sometimes,especiallyontwoaddressmachineslikethex86,youneedtousethesameregister
foroutputandforinput.AlthoughifyoulookintotheGCCdocumentation,youllseea
usefullooking+constraintcharacter,thisisntavailabletoinlineasm.Whatyouhavetodo
insteadistouseaspecialconstraintlike0:
asm(foo%1,%0
:=r(output):r(input1),0(input2))

Thissaysthatinput2hastogointhesameplaceastheoutput,so%2and%0arethesamething.
(Whichiswhy%2isntactuallymentionedanywhere.)Notethatitisperfectlylegaltohave
differentvariablesforinputandoutputeventhoughtheybothusethesameregister.GCCwilldo
anynecessarycopyingtotemporaryregistersforyou.

4.5Constraintsonthex86
The i386 has *lots* ofregisterclasses, designedforanythingremotelyuseful.Commononesare
definedintheconstraintssectionoftheGCCmanual.Herearethemostuseful:
ggeneraleffectiveaddressmmemoryeffectiveaddressrregisteriimmediatevalue,
0..0xffffffffnimmediatevalueknownatcompiletime.(iwouldallowanaddressknownonly
atlinktime)
Buttherearesomei386specificonesdescribedintheprocessorspecificpartofthemanual
andinmoredetailinGCCsi386.h:
qbyteaddressibleregister(eax,ebx,ecx,edx)Aeaxoredxa,b,c,d,S,Deax,ebx,...,esi,
edionly
Iimmediate0..31Jimmediate0..63Kimmediate255Limmediate65535Mimmediate
0..3(shiftsthatcanbedonewithlea)Nimmediate0..255(onebyteimmediatevalue)O
immedaite0..32
There are some more for floatingpoint registers, but I wont go into those. The very spe cial
cases like K are mostly used inside GCC in alternative code sequences, providing a
specialcasewaytodosomethinglikeANDingwith255.
ButsomethinglikeIisuseful,forexamplethex86rotateleft:
asm(roll%1,%0
:=g(result):cI(rotate),0(input))
UsingInlineAssemblyWithgccJanuary11,200017

(Seethesectiononx86assemblysyntaxifyouwonderwhytheextralisonrol.)

4.6Advancedconstraints
IntheGCCmanual,constraintsandsoonaredescribedinmostdetailinthesectiononwriting
machinedescriptionsforports.GCC,notsurprisingly,usesthesameconstaintsmechanism
internallytocompileCcode.Heresasummary.
=hasalreadybeendiscussed,tomarkanoutput.No,Idontknowwhyitsneededininlineasm,
butitsnotworthfixing.
+isdescribedinthegccmanual,butisnotlegalininlineasm.Sorry.
%saysthatthisoperandandthenextonemaybeswitchedatthecompilersconveniencethe
argumentsarecommutative.Manyoperations(+,*,&,|,^)havethisproperty,buttheoptions
permittedintheinstructionsetmaynotbeasgeneral.Forexample,onaRISCmachinewhich
letsthesecondoperandbeanimmediatevalue(intheIrange),youcouldspecifyanadd
instructionlike:
asm(add%1,%2,%0
:=r(output):%r(input1),rI(input2))

,separatesalistofalternativeconstraints.Eachinputandoutputmusthavethesamelengthlist
ofalternatives,andoneelementofthelistischosen.Forexample,thex86permits
registermemoryandmemoryregisteroperations,butnotmemorymemory.Soanaddcouldbe
writtenas:
asm(add%1,%0
:=r,rm(output):%g,ri(input1),0,0(input2))

Thissaysthatiftheoutputisaregister,input1maybeanything,butiftheoutputismemory,
theinputmayonlybearegisteroranimmediatevalue.Andinput2mustbeinthesameplaceas
theoutput,althoughyoucanswapthingsandplaceinput1thereinstead.
Iftherearemultipleoptionslistedandthecompilerhasnopreference,itwillchoosethefirst
one.Thus,iftheresaminordifferenceintimingorsomesuch,listthefasteronefirst.
?inonealternativesaysthatanalternativeisdiscouraged.Thisisimportantforcom
pilerwriterswhowanttoencouragethefastestcode,butisgettingprettyesotericforinlineasm.
&saysthatanoutputoperandiswrittentobeforetheinputsareread,sothisoutputmustnotbe
thesameregisterasanyinput.Withoutthis,gccmayplaceanoutputandaninputinthesame
registerevenifnotrequiredbya0constraint.Thisisveryuseful,butismentionedhere
becauseitsspecifictoanalternative.Unlike=and%,butlike?,youhavetoincludeitwith
eachalternativetowhichitapplies.
UsingInlineAssemblyWithgccJanuary11,200018

Notethatthereisnowaytoencodemorecomplexinformation,likethisoutputmaynotbein
thesameplaceas*that*input,butmaysharearagiaterwiththat*other*input.Eachoutput
eithermaysharearegisterwithanyinput,orwithnone.
Ininlineasm,youusuallyspecifythiswitheveryalternative,sinceyoucantchnagetheorder
ofoperationsdependingontheoptionselected.InGCCsinternalcodegeneration,thereare
provisionsforproducingdifferentcodedependingontheregisteralternativechosen,butyou
cantdothatwithinlineasm.
Oneplaceyoumightuseitiswhenyouhavethepossibilityoftheoutputoverlappingwith
inputtwo,butnotinputone.E.g.
asm(foo%1,%0bar%2,%0
:=r,&r(out):r,r(in1),0,r(in2))

Thissaysthateitherin2isinthesameregisterasout,ornothingis.However,withmore
operands,thenumberofpossibilitiesquicklymushroomsandGCCdoesntcopegracefully
withlargenumbersofalternatives.

4.7Clobbers
Sometimesaninstructionknocksoutcertainspecificregisters.Themostcommonexampleof
thisisafunctioncall,wherethecalledfunctionisallowedtodowhateveritlikeswithsome
registers.
If this is the case, you can list specific registers that get clobbered by an operation after the
inputs. Thesyntaxisnotlikeconstraints,youjustprovideacommaseparatedlistofregistersin
stringform.Onthe80x86,theyreax,bx,sidi,etc.
Therearetwospecialcasesforclobberedvalues.Oneismemory,meaningthatthisinstruction
writestosomememory(otherthanalistedoutput)andGCCshouldntcachememoryvaluesin
registersacrossthisasm.Anasmmemcpy()implementationwouldneedthis.Youdo*not*need
tolistmemoryjustbecauseoutputsareinmemorygccunderstandsthat.
Thesecondiscc.Itsnotnecessaryonallmachines,andIhavemtfigureditoutforthex86(I
dontthinkitis),butitsalwayslegaltospecify,andmeansthattheinstructionsmessupthe
conditioncodes.
NotethatGCCwillnotuseaclobberedregisterforinputsoroutputs.GCC2.7wouldletyoudo
itanyway,specifyinganinputinclassaandsayingthataxisclobbered.GCC2.8andegcs
aregettingpickeri,andcomplainingthattherearenofreeregistersinclassaavailable.Thisis
notthewaytodoit.Ifyoucorrputaninputregister,includeadummyoutputinthesame
register,thevalueofwhichisneverused.E.g.
intdummyasm(munge%0
:=r(dummy):0(input))

UsingInlineAssemblyWithgccJanuary11,200019

4.8Temporaryregisters
Peoplealsosometimeserroneouslyuseclobbersfortemporaryregisters.Therightwayisto
makeupadummyoutput,anduse=ror=&rdependingonthepermittedoverlapwiththe
inputs.GCCallocatesaregisterforthedummyvalue.ThedifferenceisthatGCCcanpicka
convenientregister,soithasmoreflexibility.

4.9constandvolatile
Therearetwooptimizationhintsthatyoucangivetoanasmstatement.
asmvolatile(...)statementsmaynotbedeletedorsignificantlyreorderedthevolatilekeyword
saysthattheydosomethingmagicthatthecompilershouldntplaywithtoomuch.
GCCwilldeleteordinaryasm()blocksiftheoutputsarenotused,andwillreorderthemslightly
tobeconvenienttowheretheoutputsare.(asmblockswithnooutputsareassumedtobevolatile
bydefault.)
asmconst()statementsareassumedtoproduceoutputsthatdependonlyontheinputs,andthus
canbesubjecttocommonsubexpressionoptimizationandcanbehoistedoutofloops.Themost
commonexampleofanoutputthatdoes*not*dependonlyonaninputisapointerthatis
fetched.*pmaychangefromtimetotimeevenifpdoesnotchange.Thus,anasmblockthat
fetchesfromapointershouldnotincludeaconst.
Anexampleofsomethingthatisgoodisacoprocessorinstructiontocomputesin(x).IfGCC
knowsthattwocallshavethesamevalueofx,itcancomputesin(x)onlyonce.
Forexample,compare:
intfoo(intx){inti,y,totaltotal=0for(i=0i<100i++){asmvolatile(foo%1,%0
:=r(y):g(x))total+=y}returntotal}
UsingInlineAssemblyWithgccJanuary11,200020

thentrychangingthattoconstaftertheasm.Thecode(onanx86)lookslike:
func1:
xorl%ecx,%ecxpushl%ebxmovl%ecx,%edxmovl8(%esp),%ebx.align4.L7:#APP
foo%ebx,%eax#NO_APP
addl%eax,%ecxincl%edxcmpl$99,%edxjle.L7movl%ecx,%eaxpopl%ebxret

whichthenchangesto(intheconstcase):
func2:
xorl%edx,%edx#APP
foo4(%esp),%ecx#NO_APP
movl%edx,%eax.align4.L13:
addl%ecx,%edxincl%eaxcmpl$99,%eaxjle.L13movl%edx,%eaxret

Imstillnotcompletelythrilledwiththecode(whyputtheloopcounterin%eaxinsteadoftotal,
whichgetsreturned),butyoucanseehowitimproves.

4.10Alternatekeywords
__asm__()isalegalaliasforasm(),anditislegal(andproducesnowarnings)evenwhenin
strictANSImodeorwhenwarningaboutnonportableconstructs.Otherwise,itisequivalent.

4.11Outputsubstitutions
Sometimesyouwanttoincludeavalueinanasmstatementinanunusualway.Forexample,
youcouldusetheleainstructiontodosomethinghairylike
asm(lea%1(%2,%3,1<<%4),%0
:=r(out):%i(in1),r(in2),r(in3),M(logscale))

thislookslikeawaytogeneratealegalleainstructionwithallthepossiblebellsandwhistles.
Theresonlyoneproblem.WhenGCCsubstitutestheimmedaitesin1andlogscale,its
goingtoproducesomethinglike:
lea$44(%ebx,%eax,1<<$2),%ecx
UsingInlineAssemblyWithgccJanuary11,200021

whichisasyntaxerror.The$ontheconstantsarenotusefulinthiscontext.Sothereare
modifiercharacters.Theoneapplicableinthiscontextisc,whichmeanstoomittheusual
immediatevalueinformation.Thecorrectasmis
asm(lea%c1(%2,%3,1<<%c4),%0
:=r(out):%i(in1),r(in2),r(in3),M(logscale))

whichwillproduce
lea44(%ebx,%eax,1<<2),%ecx

asdesired.ThereareafewothersmentionedintheGCCmanualasgeneric:
%c0substitutestheimmediatevalue%0,butwithouttheimmediatesyntax.%n0substitutes
like%c0,butthenegatedvalue.%l0substituteslile%c0,butwiththesyntaxexpectedofajump
target.(Thisisusuallythesameas%c0.)
Andthentherearethex86specificones.Theseare,unfortunately,onlylistedinthei386.h
headerfileintheGCCsource(config/i386/i386.h),soyouhavrtodigabitforthem.
%k0printsthe32bitformofanoperand.%eax,etc.%w0printsthe16bitformofanoperand.
%ax,etc.%b0printsthe8bitformofanoperand.%al,etc.%h0printsthehigh8bitformofa
register.%ah,etc.%z0printopcodesuffixcorespondingtotheoperandtype,b,worl.
Bydefault,when%0printsaregisterintheformcorrespondingtotheargumentsize.E.g.
asm(inc%0:=r(out):0(in))willprintasinc%al,inc%axorinc%eaxdepending
onthetypeofout.
Forexample,byteswappingonanon486:
asm(xchg%b0,%h0roll$16,%0xchg%b0,%h0
:=q(x):=(x))

Thissaysthatxmustbeinabyteaddressibleregisterandproceedstoswapthebytesto
bigendianform.
Itslegaltousethe%wand%bformsonobjectsthatarentregisters,itjustmakesnodif
ference.Using%band%honnonbyteaddressibleregisterstendstomakethecompilerabort,so
dontdothat.
UsingInlineAssemblyWithgccJanuary11,200022

%zisrathercool.Forexample,considerthefollowingcode:
#definexchg(m,in,out)\
asm(xchg%z0%2,%0\
:=g(*(m)),=r(out)\:1(in))intbar(void*m,intx){
xchg((char*)m,(char)x,x)xchg((short*)m,(short)x,x)xchg((int*)m,(int)x,x)returnx}

Thisproduces,asassemblyoutput,
.globlbar
.typebar,@functionbar:
movl4(%esp),%eaxmovb8(%esp),%dl#APP
xchgb%dl,(%eax)xchgw%dx,(%eax)xchgl%edx,(%eax)#NO_APP
movl%edx,%eaxret

(Reusingxisawaytomakesurethatnothinggotoptimizedaway.)
Itsnotreallyneededherebecausethesizeofthe%2registerletsyougetawaywithjustxchg,
buttherearesituationswhereitsnicetohaveanoperandsize.

4.12Extra%patterns
Some%substitutionsdontspecifyanargument.Themostcommononeis%%,whichcomes
outasasingle%.
Thesecondis%=,whichgeneratesauniquenumberforeachasm()block.(Eachtimeitisused
ifinlinedorusedinamacro.)Thiscanbeusedfortemporarylabelsandsoon.

4.13Examples
Somecodethatwasininclude/asmi386/system.h:
#define_set_tssldt_desc(n,addr,limit,type)\__asm____volatile__(movw%3,0(%2)\n\t\
movw%%ax,2(%2)\n\t\rorl$16,%%eax\n\t\movb%%al,4(%2)\n\t\movb%4,5(%2)\n\t\
movb$0,6(%2)\n\t\movb%%ah,7(%2)\n\t\rorl$16,%%eax\:=m(*(n))\:a(addr),
r(n),ri(limit),i(type))
UsingInlineAssemblyWithgccJanuary11,200023

Itsobviousthatthewriterdidntknowhowtotakeoptimaladvantageofthis(admittedly
complex,butx86addressing*is*complex)facility.Thiscouldberewrittentouseanyregister
insteadof%eax:
#define_set_tssldt_desc(n,addr,limit,type)\__asm____volatile__(movw%w3,0(%2)\n\t\
movw%w1,2(%2)\n\t\rorl$16,%1\n\t\movb%b1,4(%2)\n\t\movb%4,5(%2)\n\t\movb
$0,6(%2)\n\t\movb%h1,7(%2)\n\t\rorl$16,%1\:=m(*(n)):q(addr),r(n),ri(limit),
ri(type))

Younoticeherethat*nislistedasanoutput,soGCCknowsthatitsmodified,butactually
addressingitisdonerelativetonasaninputregistereverywherebecauseoftheneedtocompute
anoffset.
Theproblemisthatthereisnosyntacticwaytoencodeanoffsetfromagivenaddress.Ifthe
addressis40(%eax)thenanoffsetof2canbemadebyprepending2+toit.Butifthe
addressis(%eax)then2+(%eax)isnotvalid.Trickslike2+0fallflatbecause040is
takenasoctalandgetstranslatedinto32.
BUTTHERESNEWS(19April1998):gaswillactuallyDoTheRightThingwith2+(%eax),
justemitawarning.Havingseenthis,agasmaintainer(AlanModra)decidedtomakethe
warninggoawayinthiscase,soinsomenearfutureversionyouwillbeabletodoit.
Withthisfix(orputtingupwiththewarning),youcouldwritetheaboveas:
#define_set_tssldt_desc(n,addr,limit,type)\__asm____volatile__(movw%w2,%0\n\t\
movw%w1,2+%0\n\t\rorl$16,%1\n\t\movb%b1,4+%0\n\t\movb%3,5+%0\n\t\movb
$0,6+%0\n\t\movb%h1,7+%0\n\t\rorl$16,%1\:=o(*(n)):q(addr),ri(limit),i(type))

Theoconstraintisjustlikem,exceptthatitsoffstableaddingasmallvaluetoitleavesa
validaddress.Onthex86,thereisnodistinction,soitsnotreallynecessary,butonthe68000,
forexample,youcantaddanoffsettoapostincrementaddressingmode.
UsingInlineAssemblyWithgccJanuary11,200024

Ifneitherthewarningnorwaitingisacceptable,afixistolisteachpossibleoffsetasadifferent
output(herewereusingthefactthatnisachar*):
__asm____volatile__(movw%w7,%0\n\t\
movw%w6,%1\n\t\rorl$16,%6\n\t\movb%b6,%2\n\t\movb%b8,%3\n\t\movb
$0,%4\n\t\movb%h6,%5\n\t\rorl$16,%6\:=m(*(n)),\
=m((n)[2]),\=m((n)[4]),\=m((n)[5]),\=m((n)[6]),\=m((n)[7])\:q(addr),g(limit),
iqm(type))

Although,asyoucansee,thisgetsabituglywhenyouhavelotsofoffsets,butitworksjustthe
same.

4.14Conclusion
Ihopethishasbeenofusetosomefolks.GCCsinlineasmfeaturesarereallycoolbecuase
youcanjustdothelittlebitthatyouwantandletthecompileroptimizetherest.
Thishastheunfortunatesideeffectthatyouhavetolearnhowtoexplaintothecompilerwhats
goingon.Butitsworthit,really!
UsingInlineAssemblyWithgccJanuary11,200025

UsingAssemblyLanguageinLinux.
byPhillip
phillip@ussrback.com
Lastupdated:Monday8thJanuary2001

Contents:
IntroductionIntelandAT&TSyntax
PrefixesDirectionofOperandsMemoryOperandsSuffixesSyscalls
Syscallswith<6argsSyscallswith>5argsSocketsyscallsCommandLineArgumentsGCC
InlineASMCompilingFurtherreference/LinksExampleCode.

Introduction.
ThisarticlewilldescribeassemblylanguageprogrammingunderLinux.Containedwithinthe
boundsofthearticleisacomparisonbetweenIntelandAT&Tsyntaxasm,aguidetousing
syscallsandaintroductoryguidetousinginlineasmingcc.
Thisarticlewaswrittenduetothelackof(good)infoonthisfieldofprogramming(inlineasm
sectioninparticular),inwhichcaseishouldremindtheethatthisisnotashellcodewriting
tutorialbecausethereisnolackofinfointhisfield.
Various parts of this textIhavelearntaboutthroughexperimentationandhencemaybe proneto
error. Should you find any of these errors on my part, do nothesitatetonotifymeviaemailand
enlightenmeonthegivenissue.
Thereisonlyoneprerequisiteforreadingthisarticle,andthatsobviouslyabasicknowledgeof
x86assemblylanguageandC.

IntelandAT&TSyntax.
IntelandAT&TsyntaxAssemblylanguageareverydifferentfromeachotherinappearance,and
thiswillleadtoconfusionwhenonefirstcomesacrossAT&TsyntaxafterhavinglearntIntel
syntaxfirst,orviceversa.Soletsstartwiththebasics.

Prefixes.
InIntelsyntaxtherearenoregisterprefixesorimmedprefixes.InAT&Thoweverregistersare
prefixedwitha%andimmedsareprefixedwitha$.Intelsyntaxhexadecimalorbinary
immeddataaresuffixedwithhandbrespectively.Alsoifthefirsthexadecimaldigitisa
letterthenthevalueisprefixedbya0.
Example:
IntexSyntax
moveax,1movebx,0ffhint80h

AT&TSyntax
movl$1,%eaxmovl$0xff,%ebxint$0x80

DirectionofOperands.
ThedirectionoftheoperandsinIntelsyntaxisoppositefromthatofAT&Tsyntax.InIntel
syntaxthefirstoperandisthedestination,andthesecondoperandisthesourcewhereasin
AT&Tsyntaxthefirstoperandisthesourceandthesecondoperandisthedestination.The
advantageofAT&Tsyntaxinthissituationisobvious.Wereadfromlefttoright,wewritefrom
lefttoright,sothiswayisonlynatural.
Example:
IntexSyntax
instrdest,sourcemoveax,[ecx]

AT&TSyntax
instrsource,destmovl(%ecx),%eax

MemoryOperands.
Memoryoperandsasseenabovearedifferentalso.InIntelsyntaxthebaseregisterisenclosedin
[and]whereasinAT&Tsyntaxitisenclosedin(and).
Example:
IntexSyntax
moveax,[ebx]moveax,[ebx+3]

AT&TSyntax

movl(%ebx),%eaxmovl3(%ebx),%eax

TheAT&TformforinstructionsinvolvingcomplexoperationsisveryobscurecomparedtoIntel
syntax.TheIntelsyntaxformoftheseissegreg:[base+index*scale+disp].TheAT&Tsyntax
formis%segreg:disp(base,index,scale).
Index/scale/disp/segregarealloptionalandcansimplybeleftout.Scale,ifnotspecifiedand
indexisspecified,defaultsto1.Segregdependsontheinstructionandwhethertheappisbeing
runinrealmodeorpmode.Inrealmodeitdependsontheinstructionwhereasinpmodeits
unnecessary.Immediatedatausedshouldnot$prefixedinAT&Twhenusedforscale/disp.
Example:
IntelSyntax
instrfoo,segreg:[base+index*scale+disp]moveax,[ebx+20h]addeax,[ebx+ecx*2hleaeax,[ebx+ecx]sub
eax,[ebx+ecx*4h20h]

AT&TSyntax
instr%segreg:disp(base,index,scale),foomovl0x20(%ebx),%eaxaddl(%ebx,%ecx,0x2),%eaxleal
(%ebx,%ecx),%eaxsubl0x20(%ebx,%ecx,0x4),%eax

Asyoucansee,AT&Tisveryobscure.[base+index*scale+disp]makesmoresenseataglance
thandisp(base,index,scale).

Suffixes.
Asyoumayhavenoticed,theAT&Tsyntaxmnemonicshaveasuffix.Thesignificanceofthis
suffixisthatofoperandsize.lisforlong,wisforword,andbisforbyte.Intelsyntaxhas
similardirectivesforusewithmemoryoperands,i.e.byteptr,wordptr,dwordptr."dword"of
coursecorrespondingto"long".ThisissimilartotypecastinginCbutitdoesntseemtobe
necessarysincethesizeofregistersusedistheassumeddatatype.
Example:
IntelSyntax
moval,blmovax,bxmoveax,ebxmoveax,dwordptr[ebx]

AT&TSyntax
movb%bl,%almovw%bx,%axmovl%ebx,%eaxmovl(%ebx),%eax

**NOTE:ALLEXAMPLESFROMHEREWILLBEINAT&TSYNTAX**

Syscalls.
Thissectionwilloutlinetheuseoflinuxsyscallsinassemblylanguage.Syscallsconsistofallthe
functionsinthesecondsectionofthemanualpageslocatedin/usr/man/man2.Theyarealso
listedin:/usr/include/sys/syscall.h.Agreatlistisathttp://www.linuxassembly.org/syscall.html.
Thesefunctionscanbeexecutedviathelinuxinterruptservice:int$0x80.

Syscallswith<6args.
Forallsyscalls,thesyscallnumbergoesin%eax.Forsyscallsthathavelessthansixargs,the
argsgoin%ebx,%ecx,%edx,%esi,%ediinorder.Thereturnvalueofthesyscallisstoredin
%eax.
Thesyscallnumbercanbefoundin/usr/include/sys/syscall.h.Themacrosaredefinedas
SYS_<syscallname>i.e.SYS_exit,SYS_close,etc.
Example:(Helloworldprogramithadtobedone)
Accordingtothewrite(2)manpage,writeisdeclaredas:ssize_twrite(intfd,constvoid*buf,
size_tcount)
Hencefdgoesin%ebx,bufgoesin%ecx,countgoesin%edxandSYS_writegoesin%eax.
Thisisfollowedbyanint$0x80whichexecutesthesyscall.Thereturnvalueofthesyscallis
storedin%eax.
$catwrite.s.include"defines.h".datahello:
.string"helloworld\n"
.globlmainmain:
movl$SYS_write,%eaxmovl$STDOUT,%ebxmovl$hello,%ecxmovl$12,%edxint$0x80
ret$

Thesameprocessappliestosyscallswhichhavelessthanfiveargs.Justleavetheunused
registersunchanged.Syscallssuchasopenorfcntlwhichhaveanoptionalextraargwillknow
whattouse.

Syscallswith>5args.
Syscallswhosnumberofargsisgreaterthanfivestillexpectthesyscallnumbertobein%eax,
buttheargsarearrangedinmemoryandthepointertothefirstargisstoredin%ebx.
Ifyouareusingthestack,argsmustbepushed ontoitbackwards,i.e.fromthelastargtothefirst
arg. Then the stack pointer should be copied to %ebx. Otherwise copy args to an allocated area
ofmemoryandstoretheaddressofthefirstargin%ebx.
Example:(mmapbeingtheexamplesyscall).Usingmmap()inC:
#include<sys/types.h>#include<sys/stat.h>#include<sys/mman.h>

#include<fcntl.h>#include<unistd.h>
#defineSTDOUT1
voidmain(void){
charfile[]="mmap.s"char*mappedptrintfd,filelen
fd=fopen(file,O_RDONLY)filelen=lseek(fd,0,SEEK_END)
mappedptr=mmap(NULL,filelen,PROT_READ,MAP_SHARED,fd,0)write(STDOUT,mappedptr,filelen)
munmap(mappedptr,filelen)close(fd)}

Arrangementofmmap()argsinmemory:
%esp%esp+4%esp+8%esp+12%esp+16%esp+2000000000filelen0000000100000001fd
00000000ASMEquivalent:
$catmmap.s.include"defines.h"
.datafile:
.string"mmap.s"fd:
.long0filelen:
.long0mappedptr:
.long0
.globlmainmain:
push%ebpmovl%esp,%ebpsubl$24,%esp
//open($file,$O_RDONLY)
movl$fd,%ebx//savefdmovl%eax,(%ebx)
//lseek($fd,0,$SEEK_END)
movl$filelen,%ebx//savefilelengthmovl%eax,(%ebx)
xorl%edx,%edx
//mmap(NULL,$filelen,PROT_READ,MAP_SHARED,$fd,0)
movl%edx,(%esp)movl%eax,4(%esp)//filelengthstillin%eaxmovl$PROT_READ,8(%esp)movl
$MAP_SHARED,12(%esp)

movl$fd,%ebx//loadfiledescriptormovl(%ebx),%eaxmovl%eax,16(%esp)movl%edx,20(%esp)movl
$SYS_mmap,%eaxmovl%esp,%ebxint$0x80
movl$mappedptr,%ebx//saveptrmovl%eax,(%ebx)
//write($stdout,$mappedptr,$filelen)//munmap($mappedptr,$filelen)//close($fd)
movl%ebp,%esppopl%ebp
ret$

**NOTE:Theabovesourcelistingdiffersfromtheexamplesourcecodefoundattheendofthe
article.Thecodelistedabovedoesnotshowtheothersyscalls,astheyarenotthefocusofthis
section.Thesourceabovealsoonlyopensmmap.s,whereastheexamplesourcereadsthe
commandlinearguments.Themmapexamplealsouseslseektogetthefilesize.**

SocketSyscalls.
Socketsyscallsmakeuseofonlyonesyscallnumber:SYS_socketcallwhichgoesin%eax.The
socketfunctionsareidentifiedviaasubfunctionnumberslocatedin/usr/include/linux/net.hand
arestoredin%ebx.Apointertothesyscallargsisstoredin%ecx.Socketsyscallsarealso
executedwithint$0x80.
$catsocket.s.include"defines.h"
.globl_start_start:
pushl%ebpmovl%esp,%ebpsub$12,%esp
//socket(AF_INET,SOCK_STREAM,IPPROTO_TCP)
movl$AF_INET,(%esp)movl$SOCK_STREAM,4(%esp)movl$IPPROTO_TCP,8(%esp)
movl$SYS_socketcall,%eaxmovl$SYS_socketcall_socket,%ebxmovl%esp,%ecxint$0x80
movl$SYS_exit,%eaxxorl%ebx,%ebxint$0x80
movl%ebp,%esppopl%ebpret

CommandLineArguments.
Commandlineargumentsinlinuxexecutablesarearrangedonthestack.argccomesfirst,
followedbyanarrayofpointers(**argv)tothestringsonthecommandlinefollowedbya
NULLpointer.Nextcomesanarrayofpointerstotheenvironment(**envp).Thesearevery
simplyobtainedinasm,andthisisdemonstratedintheexamplecode(args.s).

GCCInlineASM.
ThissectiononGCCinlineasmwillonlycoverthex86applications.Operandconstraintswill
differonotherprocessors.Thelocationofthelistingwillbeattheendofthisarticle.
Basicinlineassemblyingccisverystraightforward.Initsbasicformitlookslikethis:
__asm__("movl%esp,%eax")//lookfamiliar?

or
__asm__("
movl$1,%eax//SYS_exitxor%ebx,%ebxint$0x80")

Itispossibletouseitmoreeffectivelybyspecifyingthedatathatwillbeusedasinput,output
fortheasmaswellaswhichregisterswillbemodified.Noparticularinput/output/modifyfieldis
compulsory.Itisoftheformat:
__asm__("<asmroutine>":output:input:modify)

TheoutputandinputfieldsmustconsistofanoperandconstraintstringfollowedbyaC
expressionenclosedinparentheses.Theoutputoperandconstraintsmustbeprecededbyan=
whichindicatesthatitisanoutput.Theremaybemultipleoutputs,inputs,andmodified
registers.Each"entry"shouldbeseparatedbycommas(,)andthereshouldbenomorethan10
entriestotal.Theoperandconstraintstringmayeithercontainthefullregistername,oran
abbreviation.

AbbrevTableAbbrevRegistera%eax/%ax/%alb%ebx/%bx/%blc%ecx/%cx/%cld
%edx/%dx/%dlS%esi/%siD%edi/%dimmemoryExample:
__asm__("test%%eax,%%eax",:/*nooutput*/:"a"(foo))

OR
__asm__("test%%eax,%%eax",:/*nooutput*/:"eax"(foo))

Youcanalsousethekeyword__volatile__after__asm__:"Youcanpreventanasm
instructionfrombeingdeleted,movedsignificantly,orcombined,bywritingthekeyword
volatileaftertheasm."
(Quotedfromthe"AssemblerInstructionswithCExpressionOperands"sectioninthegccinfo
files.)
$catinline1.c#include<stdio.h>
intmain(void){
intfoo=10,bar=15
__asm____volatile__("addl%%ebxx,%%eax"
:"=eax"(foo)//ouput:"eax"(foo),"ebx"(bar)//input:"eax"//modify)printf("foo+bar=%d\n",foo)return0}
$

Youmayhavenoticedthatregistersarenowprefixedwith"%%"ratherthan%.Thisis
necessarywhenusingtheoutput/input/modifyfieldsbecauseregisteraliasesbasedontheextra
fieldscanalsobeused.Iwilldiscusstheseshortly.
Insteadofwriting"eax"andforcingtheuseofaparticularregistersuchas"eax"or"ax"or"al",
youcansimplyspecify"a".Thesamegoesfortheothergeneralpurposeregisters(asshownin
theAbbrevtable).Thisseemsuselesswhenwithintheactualcodeyouareusingspecific
registersandhencegccprovidesyouwithregisteraliases.Thereisamaxof10(%0%9)which
isalsothereasonwhyonly10inputs/outputsareallowed.
$catinline2.cintmain(void){

longeaxshortbxcharcl
__asm__("nopnopnop")//toseparateinlineasmfromtherestof
//thecode__volatile____asm__("
test%0,%0test%1,%1test%2,%2":/*nooutputs*/:"a"((long)eax),"b"((short)bx),"c"((char)cl))
__asm__("nopnopnop")return0}$gccoinline2inline2.c$gdb./inline2GNUgdb4.18Copyright1998
FreeSoftwareFoundation,Inc.GDBisfreesoftware,coveredbytheGNUGeneralPublicLicense,andyou
arewelcometochangeitand/ordistributecopiesofitundercertainconditions.Type"showcopying"tosee
theconditions.ThereisabsolutelynowarrantyforGDB.Type"showwarranty"fordetails.ThisGDBwas
configuredas"i686pclinuxgnulibc1"...(nodebuggingsymbolsfound)...(gdb)disassemblemainDumpof
assemblercodeforfunctionmain:...start:inlineasm...0x8048427:nop0x8048428:nop0x8048429:nop
0x804842a:mov0xfffffffc(%ebp),%eax0x804842d:mov0xfffffffa(%ebp),%bx0x8048431:mov
0xfffffff9(%ebp),%cl0x8048434:test%eax,%eax0x8048436:test%bx,%bx0x8048439:test%cl,%cl
0x804843b:nop0x804843c:nop0x804843d:nop...end:inlineasm...Endofassemblerdump.$

Asyoucansee,thecodethatwasgeneratedfromtheinlineasmloadsthevaluesofthevariables
intotheregisterstheywereassignedtointheinputfieldandthenproceedstocarryouttheactual
code.Thecompilerautodetectsoperandsizefromthesizeofthevariablesandsothe
correspondingregistersarerepresentedbythealiases%0,%1and%2.(Specifyingtheoperand
sizeinthemnemonicwhenusingtheregisteraliasesmaycauseerrorswhilecompiling).
Thealiasesmayalsobeusedintheoperandconstraints.Thisdoesnotallowyoutospecifymore
than10entriesintheinput/outputfields.Theonlyuseforthisicanthinkofiswhenyouspecify
theoperandconstraintas"q"whichallowsthecompilertochoosebetweena,b,c,dregisters.
Whenthisregisterismodifiedwewillnotknowwhichregisterhasbeenchosenand
consequentlycannotspecifyitinthemodifyfield.Inwhichcaseyoucansimplyspecify
"<number>".
Example:

$catinline3.c#include<stdio.h>
intmain(void){
longeax=1,ebx=2
__asm____volatile__("add%0,%2":"=b"((long)ebx):"a"((long)eax),"q"(ebx):"2")printf("ebx=%x\n",
ebx)return0}$

Compiling
CompilingassemblylanguageprogramsismuchlikecompilingnormalCprograms.Ifyour
programlookslikeListing1,thenyouwouldcompileitlikeyouwouldaCapp.Ifyouuse_start
insteadofmain,likeinListing2youwouldcompiletheappslightlydifferently:
Listing1
$catwrite.s.datahw:
.string"helloworld\n".text.globlmainmain:
movl$SYS_write,%eaxmovl$1,%ebxmovl$hw,%ecxmovl$12,%edxint$0x80movl$SYS_exit,%eaxxorl
%ebx,%ebxint$0x80ret$gccowritewrite.s$wcc./write4790./write$strip./write$wcc./write2556
./write

Listing2
$catwrite.s.datahw:
.string"helloworld\n".text.globl_start_start:
movl$SYS_write,%eaxmovl$1,%ebxmovl$hw,%ecxmovl$12,%edxint$0x80movl$SYS_exit,%eaxxorl
%ebx,%ebxint$0x80
$gcccwrite.s$ldsowritewrite.o$wcc./write408./write

Thesswitchisoptional,itjustcreatesastrippedELFexecutablewhichissmallerthana
nonstrippedone.Thismethod(Listing2)alsocreatessmallerexecutables,sincethecompiler
isntaddingextraentryandexitroutinesaswouldnormallybethecase.

Links.
Furtherreference.
http://www.linuxassembly.orgGNUAssemblerManualGNUCCompilerManualGNU
DebuggerManualOperandConstraintReferenceAT&TSyntaxReference

ExampleCode
args.sReadscommandlineargumentspassedtotheprogdaemon.sBindsashelltoaport
(backdoorstyle)mmap.sMapsafiletomemory,anddumpsitscontentssocket.sCreatesa
socketwrite.sHelloworld!linasmsrc.tgzMakefiledefines.hargs.sdaemon.ssocket.swrite.s

========================================================================LINUX
ASSEMBLERTUTORIAL
by
RobinMiyagi
@
http://www.geocities.com/SiliconValley/Ridge/2544/
========================================================================
start@:ThuFeb0302:14:37UTC2000
update:FriJul3023:52:23UTC2000
update:FriSep1522:39:17UTC2000:
ThistutorialnowexplainsLinuxassemblerintermsoftheGNU
assembleras.
InformationaboutBinutilsprogramssuchasObjdump,andld.DiscussiononDebuggingandgdbis
added.
update:ThuJan1120:13:06UTC2001:
========================================================================
*Introduction
When programming in assembler for Linux (or any other Unix variant for that matter), it is important to
remember thatLinuxis aprotectedmode operating system(on i386 machines, Linux operates the CPU in
protected mode).This means thatordinaryuser modeprocessesarenotallowedtodocertainthings, such
as access DMA, or access IO ports. Writing Linux kernel modules on the other hand (which operate in
kernel mode), are allowed to access hardware directly (Read the AssemblerHOWTO on my assembler
page for more information on this issue). User mode processes mayaccess hardware using device files.
Device files actually access kernel modules which access hardware directly. This file willbe restrictedto
usermodeoperation.Seemypagesonkernelmoduleprogramming.
Pleaseemailmecommentsandsuggestionsregardingthistutorialatpenguin@dccnet.com.
*SystemCalls
Inprogramming in assembler forDOSyouprobablymadeuseofsoftwareinterrupts,especiallytheint0x21
functions which were the DOS system calls. In Linux, system calls aremade via int0x80.The sytemcall
number is passed via register EAX, and the parameters to the system call arepassedvia the remaining
registers. This discussiononly appliesiftherearenomorethanfiveparameterspassedtothesystemcall.If
thereare more than 5 parameters.Theparametersmustbelocatedinmemory(e.g.onthestack), andEBX
mustcontaintheaddressofthebeginningoftheparameters.
Ifyouwouldlikealistofthesystemcallnumbers,lookatthe

contents of/usr/include/asm/unistd.h.Ifyou would likeinformationaboutaspecificsystemcall(e.g.write ()),


typeman2writeattheprompt.Section2ofthelinuxmanpagescoverssytemcalls.
Ifyoulookatthecontentsof/usr/include/asm/unistd.h,youwillseethefollowinglinenearthetopofthefile
#define__NR_write4
This indicatesthat registerEAXmust be setto4inordertocallthewrite()systemcall.Now,ifyouexecute
thefollowingcommand
$man2write
yougetthefollowingfunctiondescription(undertheSYNOPSISheading).
ssize_twrite(intfd,constvoid*buf,size_tcount)
This indicates that ebx isequalto thefile descriptor of the fileyouwant to write to, ecx isapointer of the
string you want to write, andedxcontains the lengthof the string.If therewere 2 moreparametersto this
systemcall,theywouldbeplacedinesi,andedirespectively.
How do I know the file discriptor for stdout is 1. If you look at your /dev directory, you will notice that
/dev/stdoutisasymboliclinkthatpointsto/proc/self/fd/1.Thereforestdoutisfiledescriptor1.
Ileavelookingupthe_exitsystemcallasanexercise.
Inlinux,systemcallsareprocessedbythekernel.
*GNUAssembler
On most Linux systems,youwillusually find the GNU Ccompiler(gcc). Thiscompileruses anassembler
called as as a backend. This means thattheC compiler translatestheC code into assembler,which in
turnisassembledbyastoanobjectfile(*.o).
Asuses the AT&Tsyntax.Experienced intel syntax assembler programmersfindAT&T reallyweird. Itis
really no more orno less difficultthan intelsyntax. Iswitched over to asbecause thereis less ambiguity,
works better with the standard GNU/Linux programs such as gdb (supports the gstabs format),objdump
(objdump dissassembles codeinas syntax).Inshort, itis a standardcomponent of a GNU Linux system
withprogrammingtoolsinstalled.Iwillexplaindebuggingandobjdumplaterinthistutorial.
If you would likemore information about aslook in theinfodocumentationunder as(e.g.type info asat
the shellprompt). Also look in the info documentationon the Binutils package (thispackage contains such
programmingtoolsasobjdump,ld,etc.).
**GNUassemblerv.s.IntelSyntax

Since most assembler documentation for the i386platformis written using intel syntax, some comparison
betweenthe2formatsisinorder.Hereisasummarizedlistofthedifferences
Inasthesourcecomesbeforethethedestination,oppositeto
theintelsyntax.
Theopcodesaresuffixedwithaletterindicatingthesizeof
theopperands(e.g.lfordword,wforword,bforbyte).
Immediatevaluesmustbeprefixedwitha$,andregistersmust
beprefixedwitha%.
EffectiveaddressesusetheGeneralsyntax
DISP(BASE,INDEX,SCALE).Aconcreteexamplewouldbe
movlmem_location(%ebx,%ecx,4),%eax
Whichisequivelenttothefollowinginintelsyntax
moveax,[eax+ecx*4+mem_location]
Nowforanexampleillustratingthedifference(intelversionincomments)
movl%eax,%ebx#mov%ebx,%eaxmovw$0x3c4a,%ax
Nowforourlittleprogram
##helloworld.s
##byRobinMiyagi##http://www.geocities.com/SiliconValley/Ridge/2544/
##CompileInstructions:####asohelloworldhelloworld.o
##ldoO0helloworld.ohelloworld.s
##ThisfileisabasicdemonstrationoftheGNUassembler,##as.
##Thisprogramdisplaysafriendlystringonthescreenusing##thewrite()systemcall
########################################################################
.section.datahello:
.ascii"Hello,world!\n"hello_len:
.long.hello
########################################################################
.section.text.globl_start
_start:
##displaystringusingwrite()systemcallxorl%ebx,%ebx#%ebx=0movl$4,%eax#write()systemcall
xorl%ebx,%ebx#%ebx=0incl%ebx#%ebx=1,fd=stdout

lealhello,%ecx#%ecx>hellomovlhello_len,%edx#%edx=countint$0x80#executewrite()system
call
##terminateprogramvia_exit()systemcallxorl%eax,%eax#%eax=0incl%eax#%eax=1systemcall
_exit()xorl%ebx,%ebx#%ebx=0normalprogramreturncodeint$0x80#executesystemcall_exit()

Intheaboveprogram,noticetheuseof#tostartcomments.As alsosupportsthe/*Ccomment*syntax.
If you use the C comment syntax, it works exactly the same as for C (multiple lines, as well as inline
commenting).I always use the #comment syntax,as thisworksbetterwithemacsasmmode.Thedouble
##isallowedbutnotneccessary(thisisonlybecauseofaquirkofemacsasmmode).
Noticethenamesofthesections .text,and.data.theseare usedinELFfilestotellthelinkerwherethecode
anddatasegmentsare.Thereisalsothe.bsssectiontostoreuninitializeddata.Itisonlythesesections that
occupymemorydurringprogramexecution.
*AccessingCommandLineArgumentsandEnvironmentVariables
When an ELF executable starts running, the command line arguments and environment variables are
available on the stack. In assembler thismeansthat you may access these viathe pointerstoredin ESP
when the program starts execution. Seethedocumentationon myassemblerprogrammingpagerelatingto
theELFbinaryformat.
So how is this data arranged on the stack? Quitesimple really. The number of command linearguments
(includingthename of the program)arestoredas anintegerat [esp]. Then,at[esp+4]apointertothefirst
command line argument (whichisthenameoftheprogram)isstored. Iftherewere any additionalcommand
line parameters, their pointers would be stored in [esp+8], [esp+12], etc. After all the command line
argument pointers, comes a NULL pointer. After the NULLpointer are all the pointers to the environment
variables, and then finally a NULL pointer to indicate the end of the environment variables have been
reached.
AsummaryoftheinitialELFstackisshownbelow
(%esp)argc,countofarguments(integer)4(%esp)char*argv(pointertofirstcommandlineargument)
...pointerstotherestofthecommandlinearguments?(%esp)NULLpointer
...pointerstoenvironmentvariables??(%esp)NULLpointer
Nowforourlittleprogram##stackparam.s
###############################################
##RobinMiyagi##################################################
http://www.geocities.com/SiliconValley/Ridge/2544/##########

## This fileshows how one canaccess command lineparameters ##via thestackatprocessstartup.This


behaviorisdefined##intheELFspecification.
##CompileInstructions:####asostackparam.o
stackparam.s##ldO0ostackparamstackparam.o
########################################################################
.section.data
new_line_char:
.byte0x0a
########################################################################
.section.text
.globl_start
.align4_start:
movl%esp,%ebp#store%espin%ebpagain:
addl$4,%esp#%esp>nextparameteronstackmovl(%esp),%eax#movenextparameterinto%eax
testl%eax,%eax#%eax(parameter)==NULLpointer?jzend_again#getoutofloopifyescallputstring#
outputparametertostdout.jmpagain#repeatloopend_again:
xorl%eax,%eax#%eax=0incl%eax#%eax=1,systemcall_exit()xorl%ebx,%ebx#%ebx=0,
normalprogramexit.int$0x80#execute_exit()systemcall
##printsstringtostdoutputstring:.type@function
pushl%ebpmovl%esp,%ebpmovl8(%ebp),%ecxxorl%edx,%edxcount_chars:
movb(%ecx,%edx,$1),%altestb%al,%aljzdone_count_charsincl%edxjmpcount_chars
done_count_chars:
movl$4,%eaxxorl%ebx,%ebxincl%ebxint$0x80movl$4,%eaxlealnew_line_char,%ecxxorl%edx,
%edxincl%edxint$0x80movl%ebp,%esppopl%ebpret

*TheBinutilsPackage
Binutilsstandsforbinaryutilities,andincludesalotoftoolsusefultoprogrammers,especiallydurring
debugging.
Iwillnowaddresssomeoftheseutilities.
**Objdump
Objdump diplays information about 1 or more object files. For example, to see information about
paramstack,typethefollowingcommandatshellprompt(besureworkingdirectorycontainsparamstack)
objdumpxparamstack|less
Since the informationis likely to span morethanonescreen,the outputof objdumpispiped tothestandard
input of the paging command less. the option x tells objdump to display the numeric information in
hexadecimal.Hereistheoutputoftheabovecommand
stackparam:fileformatelf32i386stackparam
architecture:i386,flags0x00000112:EXEC_P,HAS_SYMS,D_PAGEDstartaddress0x08048074
ProgramHeader:
LOADoff0x00000000vaddr0x08048000paddr0x08048000align2**12
filesz0x000000bememsz0x000000beflagsrxLOADoff0x000000c0vaddr0x080490c0paddr
0x080490c0align2**12
filesz0x00000001memsz0x00000004flagsrw
Sections:IdxNameSizeVMALMAFileoffAlgn0.text0000004a0804807408048074000000742**2
CONTENTS,ALLOC,LOAD,READONLY,CODE1.data00000001080490c0080490c0000000c02**2
CONTENTS,ALLOC,LOAD,DATA2.bss00000000080490c4080490c4000000c42**2
ALLOCSYMBOLTABLE:08048074ld.text00000000080490c0ld.data
00000000080490c4ld.bss0000000000000000ld*ABS*0000000000000000ld*ABS*00000000
00000000ld*ABS*00000000080490c0l.data00000000new_line_char08048076l.text00000000again
08048087l.text00000000end_again0804808el.text00000000putstring08048096l.text00000000
count_chars080480a0l.text00000000done_count_chars00000000F*UND*00000000080480begO
*ABS*00000000_etext08048074g.text00000000_start

080490c1gO*ABS*00000000__bss_start080490c1gO*ABS*00000000_edata080490c4gO*ABS*
00000000_end

Notice the Information provided from the program header (ELF files have header information at the
beginningofthefilegivinginformationtothekernelonhowtoloadthefileintomemoryetc.).
ELF files also contain information about the sections (contained in section tables). Notice that the .text
section contains 0x4a bytes of information, is located 0x74 bytes into the file, and is aligned at a 4byte
boundary (4 == 2 ** 2), has memory allocated to it(ALLOC),is readoly, andcontains code (thesegment
selectorcsforthisprocesspointstothissection(handledbytheoperatingsystem)).
Information about the symbols is also provided. All this information is used by debuggers and other
programmingtoolstoexaminebinaryfiles.
Objdump can also be used to dissasemble binary executables. Typeing the following command will
dissassemblethefile to standardoutput(this doesnothingtotheactualfile,asobjdumponlyreadsfromthe
file)
objdumpdstackparam|less
Hereistheoutputoftheabovecommand
stackparam:fileformatelf32i386
Disassemblyofsection.text:
08048074<_start>:
8048074:89e5movl%esp,%ebp
08048076<again>:
8048076:83c404addl$0x4,%esp8048079:8b0424movl(%esp,1),%eax804807c:85c0testl
%eax,%eax804807e:7407je8048087<end_again>8048080:e809000000call804808e<putstring>
8048085:ebefjmp8048076<again>
08048087<end_again>:
8048087:31c0xorl%eax,%eax8048089:40incl%eax804808a:31dbxorl%ebx,%ebx804808c:cd80int
$0x80
0804808e<putstring>:
804808e:55pushl%ebp804808f:89e5movl%esp,%ebp8048091:8b4d08movl0x8(%ebp),%ecx
8048094:31d2xorl%edx,%edx
08048096<count_chars>:
8048096:8a0411movb(%ecx,%edx,1),%al

8048099:84c0testb%al,%al804809b:7403je80480a0<done_count_chars>804809d:42incl%edx
804809e:ebf6jmp8048096<count_chars>
080480a0<done_count_chars>:
80480a0:b804000000movl$0x4,%eax80480a5:31dbxorl%ebx,%ebx80480a7:43incl%ebx
80480a8:cd80int$0x8080480aa:b804000000movl$0x4,%eax80480af:8d0dc0900408leal
0x80490c0,%ecx80480b5:31d2xorl%edx,%edx80480b7:42incl%edx80480b8:cd80int$0x80
80480ba:89ecmovl%ebp,%esp80480bc:5dpopl%ebp80480bd:c3ret

The dtellsobjdumpto disassemblesections that areexpectedto contain code(usuallythe .textsection).


Using theDoptionwilldisassemble all sections.Objdump wasabletogivethenamesoflabelsinthecode
becauseoftheinformationcontainedinthesymbolstable.
The first column displays the virtualmemoryaddressforeach lineof code.Thesecondcolumndisplays the
machine code corresponding to its respectiveassemblerline ofcode,and finally thecodeinassembleris
containedinthe3rdcolumn.
Formoreinformationlookintheinfodocumentationsystem.
**Gettingtheamountofmemoryusedwithsize
Ifyoudoanlslstackparamyougetthefollowing
rwxrwxrx1robinrobin932Sep1518:21stackparam
This tells you that the file is 932 bytes long. Howeverthisfile also containsheader tables, sectiontables,
symbol tablesetc.The amount of memory that this programwillusedurring runtime will be lessthanthis.
Tofindoutactualmemoryuse,typethefollowing
sizestackparam
Theabovewillresultinthefollowingoutput
textdatabssdechexfilename
7410754bstackparam
Thistellsyouthat.textoccupies74bytes,and.dataoccupiesonebyte,foratotalof75bytesmemoryuse.
**Gettingridofsymbolinformationwithstrip
The strip command can be usedto get rid ofthesymbol information. With no options, this command only
stripssymbolsthatarenotusedfordebugging.Withthestipalloptionprovided,itwillstrip

all symbol information, including those used for debugging.I recommend not doing this,asthismakes the
files harder to analyse with the standard programming tools. This command is used only if filesize is of
paramountimportance.
*debuggingandgdb
Perhaps the most difficult aspect of programming is debugging. Quite often the error that caused the
program to terminate abnormally is not at the linewhere the program terminated(the example lateronwill
showthis).
ProgramthatexitswithSIG_SEGV##
stackparamerror.s#########################################
##RobinMiyagi##################################################
http://www.geocities.com/SiliconValley/Ridge/2544/##########
## This fileshows how one canaccess command lineparameters ##via thestackatprocessstartup.This
behaviorisdefined##intheELFspecification.
##CompileInstructions:####asgstabso
stackparamerror.ostackparamerror.s##ldO0ostackparamerrorstackparamerror.o
########################################################################
.section.data
new_line_char:
.byte0x0a
########################################################################
.section.text
.globl_start
.align4_start:
movl%esp,%ebp#store%espin%ebpagain:
addl$4,%esp#%esp>nextparameteronstackleal(%esp),%eax#movenextparameterinto%eax
testl%eax,%eax#%eax(parameter)==NULLpointer?jzend_again#getoutofloopifyescallputstring#
outputparametertostdout.jmpagain#repeatloopend_again:
xorl%eax,%eax#%eax=0incl%eax#%eax=1,systemcall_exit()xorl%ebx,%ebx#%ebx=0,
normalprogramexit.int$0x80#execute_exit()systemcall
##printsstringtostdoutputstring:.type@function
pushl%ebpmovl%esp,%ebpmovl8(%ebp),%ecxxorl%edx,%edxcount_chars:

movb(%ecx,%edx,$1),%altestb%al,%aljzdone_count_charsincl%edxjmpcount_chars
done_count_chars:
movl$4,%eaxxorl%ebx,%ebxincl%ebxint$0x80movl$4,%eaxlealnew_line_char,%ecxxorl%edx,
%edxincl%edxint$0x80movl%ebp,%esppopl%ebpret

Noticethat the above program isassembledwiththegstabsoption ofas.This make as put debugging


information in output file, such as the original source file, debugging symbols etc. Using objdump x
stackparamerror|lesswillshowyoutheinclusionofdebuggingsymbols.
Nowtofindoutwhereourerroroccurredtypethefollowingcommand
gdbstackparamerror
thiswillgetyoutothegdbprompt(gdb)
(gdb)runeatmyshorts/home/robin/programming/asmtut/stackparamerroreatmyshortsProgram
recievedSIGSEGV,segmentationfaultcount_chars()atstackparamerror.s:47
47movb(%ecx,%edx,$1),%alCurrentlanguage:autocurrentlyasm(gdb)q[~]$_
(gdbwilloutputmorethanthis,Ijustwantedtohighlightwhatisimportant).
This tellsus thatthesegmentationfaultoccuredatline47 ofparamstackerror.s.Howevertheproblemwas
caused in line 29. If you look at line 29 of stackparam.s, you will see that this line readsmovl(%esp),
%eax. This isduetothe wayintel i386opcodeleahandlesNULLpointers.EAXwasneverloadedwith 0on
a null pointer (justsome invalid pointer), whichcausedline 47toaccessanareaof memory not availableto
this process (hence the segmentationfault). The loop in _start() neverstopped normally,asthecondition
forbreakingoutoftheloopiseaxbeing0,whichneverhappened.
Debuggingisanartthatcomeswithpractice.Formoreinformationaboutgdb,lookintheinfopages(e.g.
infogdb).Youcanalso

typehelpatthe(gdb)prompt.
The only reasongdbwasable to tellyouwhat line number inthesourcecodetheerroroccuredisthatthe
debugging symbols and source code was included in the output file (recall that we used the gstabs
option).
Commentsandsuggestions<penguin@dccnet.com>
========================================================================
Youarefreetomakeverbatimcopiesofthisfile,providingthatthisnoticeispreserved.

IntroductiontoGCCInlineAsm
ByRobinMiyagi
http://www.geocities.com/SiliconValley/Ridge/2544/
WedSep1319:18:50UTC
*asandAT&TSyntax
The GNU CCompileruses the assembler asas abackend. Thisassembleruses AT&Tsyntax.Hereisa
briefoverviewofthesyntax.Formoreinformationaboutas,lookinthesysteminfodocumentation.
asusestheform
nemonicsource,destination(oppositetointelsyntax)
asprefixesregisterswith%,andprefixesnumericconstants
with$.
Effectiveaddressesusethefollowinggeneralsyntax
SECTION:DISP(BASE,INDEX,SCALE)
As in otherassemblers,anyoneor moreofthese components maybe ommited,withinconstraintsofvalid
intel instruction syntax. The above syntax was shamelessly copied from the info pages under the i386
dependantfeaturesofas.
As suffixes theassembler nemonics with a letter indicating the operand sizes(b for byte, wforword,l
forlongword).Readtheinfopagesformoreinformationsuchassuffixesforfloatingpointregistersetc.
Examplecode(rawasm,notgccinline)movl%eax,
%ebx/*intel:movebx,eax*/movl$56,%esi/*intel:movesi,56*/movl%ecx,$label(%edx,%ebx,$4)/*
intel:mov[edx+ebx*4+4],ecx*/movb%ah,(%ebx)/*intel:mov[ebx],ah*/

NoticethatasusesCcommentsyntax.Ascanalsouse#thatworksthesamewayasinmostotherintel
assemblers.
Abovecodeininlineasm
__asm__("movl%eax,%ebx\n\t"
"movl$56,%esi\n\t""movl%ecx,$label(%edx,%ebx,$4)\n\t""movb%ah,(%ebx)")

Notice that in the above example, the __ prefixing and suffixingasm are not neccesary, but mayprevent
nameconflictsinyourprogram.Youcanreadmoreaboutthisin[Cenxtensions|extendedasm]under

theinfodocumentationforgcc.
Also noticethe \n\tattheend of each lineexcept the last,andthat each lineisinclosedinquotes.Thisis
because gcc sends eachasinstructiontoasas astring.The newline/tabcombinationisrequired sothatthe
lines are fed to as according to the correct format (recall that each line inasssembler is indented one tab
stop,generally8characters).
You can also use labels from your C code (variable names and such).In Linux, underscoresprefixingC
variablesarenotNecessaryinyourcodee.g.
intmain(void){
intCvariable__asm__("movlCvariable,%eax")#Cvariablecontents>eax__asm__("movl$Cvariable,
%ebx")#ebx>Cvariable}
Noticethat inthedocumentation for DJGPP, itwillsaythat theunderscoreis necessary.The difference is
do to the differencesbetween djgppRDOFFformat andLinuxsELF format.I amnotcertain,butIthink that
theoldLinuxa.outobjectfilesalsouseunderscores(pleasecontactmeifyouhavecommentsonthis).
*ExtendedAsm
The code intheabove example will most probably cause conflicts with the restofyour C code, especially
with compiler optimizations (recall that gcc isan optimizingcompiler).Any registers usedinyourcodemay
be usedto hold C variabledata from the rest ofyour program. You would not wanttoinadvertentlymodify
the register without telling gcctotake this intoaccountwhencompiling. Thisiswhereextendedasmcomes
intoplay.
Extended asm allows you to specify input registers, outputregisters, andclobberedregistersas interface
information to your block of asm code. You can even allow gcc to chooseactual physical CPUregisters
automatically, that probably fit intogccsoptimizationscheme better.Anexamplewill demonstrateextended
asmbetter.
Examplecode#include<stdlib.h>
intmain(void){
intoperand1,operand2,sum,accumulator
operand1=rand()operand2=rand()
__asm__("movl%1,%0\n\t"
"addl%2,%0":"=r"(sum)/*outputoperands*/:"r"(operand1),"r"(operand2)/*inputoperands*/:"0")/*
clobberedoperands*/
accumulator=sum
__asm__("addl%1,%0\n\t"

"addl%2,%0":"=r"(accumulator):"0"(accumulator),"g"(operand1),"r"(operand2):"0")return
accumulator}
The first the linethat beginswith: specifiestheoutputoperands,thesecondindicatestheinputoperands,
and the last indicates the clobbered operands. the "r", "g", and "0" are examples of constraints. Output
constraintsmust be prefixedwithan=,asin"=r"(=is aconstraintmodifier,indicatingwriteonly).Inputand
output constraints must have its correspoding Cargumentincludedwithitenclosedinparenthisis(thismust
not be done with theclobberedline, Ifiguredthisoutafteranhourof fustration)."r" means assignageneral
registerregisterfortheargument,"g"meanstoassignanyregister,memoryorimmediateintegerforthis.
Noticetheuseof"0","1","2"etc.Theseareusedtoensurethatwhenthesame variableis indicated inmore
than oneplace in theextendedasm,that isvariable isonlymapped tooneregister.Ifyouhadmerelyused
another "r" for example, the compiler may ormay not assign this variabletothesame registerasbefore.
Youcansurmise from thisthat "0"referstothefirstregisterassignedtoa variable,"1"thesecondetc.When
theseregistersareusedintheasmcode,theyarereferedtoas"%0","%1"etc.
Summaryofconstraints.(copiedfromthesysteminfodocumentationforgcc)
m
Amemoryoperandisallowed,withanykindofaddressthatthemachinesupportsingeneral.
o
Amemoryoperandisallowed,butonlyiftheaddressis "offsettable".Thismeansthataddingasmallinteger
(actually, the width in bytes of the operand, as determined by its machine mode) may be added to the
addressandtheresultisalsoavalidmemoryaddress.
For example, anaddresswhichis constantisoffsettablesoisanaddress thatisthesumofaregisteranda
constant (as long as aslightly largerconstantis also within the rangeof addressoffsets supported bythe
machine) but an autoincrement or autodecrement address is not offsettable. More complicated
indirect/indexedaddressesmay ormaynotbeoffsettabledependingontheotheraddressingmodesthatthe
machinesupports.
Note that in anoutputoperand whichcanbe matched byanotheroperand, the constraintletter o isvalid
onlywhenaccompaniedbyboth<(ifthe targetmachinehaspredecrementaddressing)and>(ifthetarget
machinehaspreincrementaddressing).
V

Amemoryoperandthatisnotoffsettable.Inother words,anythingthatwouldfitthemconstraintbutnotthe
oconstraint.
<
Amemoryoperandwithautodecrementaddressing(eitherpredecrementorpostdecrement)isallowed.
>
Amemoryoperandwithautoincrementaddressing(eitherpreincrementorpostincrement)isallowed.
r
Aregisteroperandisallowedprovidedthatitisinageneralregister.
d,a,f,...
Other letterscanbe definedinmachinedependentfashiontostand forparticularclasses ofregisters.d,a
andfaredefinedonthe68000/68020tostandfordata,addressandfloatingpointregisters.
i
Animmediateintegeroperand(onewithconstant value)isallowed.Thisincludessymbolicconstants whose
valueswillbeknownonlyatassemblytime.
n
An immediate integer operand with a known numeric value is allowed. Many systems cannot support
assemblytimeconstants for operandslessthan a wordwide.Constraintsfortheseoperandsshouldusen
ratherthani.
I,J,K,...P
Other lettersintherangeIthroughPmaybedefinedinamachinedependentfashiontopermitimmediate
integeroperandswith explicitinteger values inspecified ranges.Forexample,onthe68000,Iisdefinedto
standfortherangeofvalues1to8.Thisistherangepermittedasashiftcountintheshiftinstructions.
E
An immediate floating operand (expression code const_double) is allowed, but only if thetarget floating
pointformatisthesameasthatofthehostmachine(onwhichthecompilerisrunning).
F
Animmediatefloatingoperand(expressioncodeconst_double)isallowed.
G,H

G and H may be defined in a machinedependent fashion to permit immediate floating operands in


particularrangesofvalues.
s
Animmediateintegeroperandwhosevalueisnotanexplicitintegerisallowed.
This might appearstrangeif aninsn allowsaconstant operandwitha value not knownatcompile time,it
certainly must allow any knownvalue. So why use sinstead of i? Sometimes itallowsbetter code to be
generated.
For example, on the 68000 in a fullword instruction it is possibletousean immediate operand but if the
immediate value is between 128 and 127, better code results from loading the value into a registerand
using the register. This is because the load into the register can be done withamoveqinstruction. We
arrangeforthis to happen by definingtheletterKtomean"anyintegeroutsidetherange128to127",and
thenspecifyingKsintheoperandconstraints.
g
Anyregister,memoryorimmediateintegeroperandisallowed,exceptforregistersthatarenotgeneral
registers.
X
Any operand whatsoever isallowed,evenif itdoes not satisfygeneral_operand.This is normallyusedin
theconstraintofamatch_scratchwhencertainalternativeswillnotactuallyrequireascratchregister.
0,1,2,...9
An operand that matches the specified operand number is allowed. Ifa digitis used together with letters
withinthesamealternative,thedigitshouldcomelast.
This is called a "matching constraint" and what it really means is that the assembler has only a single
operand that fills two roles considered separate in theRTL insn.For example, anadd insn has two input
operands andone outputoperandin theRTL,butonmost CISC machinesanaddinstruction reallyhasonly
twooperands,oneofthemaninputoutputoperand:
addl#35,r12
Matching constraints are used in these circumstances. Moreprecisely,thetwo operands thatmatch must
include oneinputonly operandandone outputonlyoperand.Moreover, the digitmust beasmallernumber
thanthenumberoftheoperandthatusesitintheconstraint.
For operandsto matchin a particularcase usuallymeansthat theyareidenticallookingRTL expressions.
But inafew special cases specifickinds of dissimilarity are allowed.For example,*x asaninputoperand
willmatch*x++asanoutput

operand. For proper results in such cases, the output template should always use the outputoperands
numberwhenprintingtheoperand.
p
Anoperandthatisavalidmemoryaddressisallowed.Thisisfor"loadaddress"and"pushaddress"
instructions.
p in the constraint must be accompanied by address_operand asthe predicate in thematch_operand.
This predicate interprets the modespecified in the match_operand as the modeof thememoryreference
forwhichtheaddresswouldbevalid.
Q,R,S,...U
Letters in the range Q through U maybe definedin a machinedependent fashion to standforarbitrary
operand types. The machine description macroEXTRA_CONSTRAINTis passed the operandas its first
argumentandtheconstraintletterasitssecondoperand.
Atypicaluseforthiswouldbetodistinguishcertaintypesofmemoryreferencesthataffectotherinsn
operands.
Donotdefine theseconstraint lettersto acceptregister references (reg) the reloadpass does notexpect
thisandwouldnothandleitproperly.
Inorder to have validassemblercode,eachoperandmustsatisfyitsconstraint.Butafailureto dosodoes
not prevent the patternfromapplying to aninsn. Instead, itdirects the compiler to modify the codeso that
theconstraintwillbesatisfied.Usuallythisisdonebycopyinganoperandintoaregister.
Contrast,therefore,thetwoinstructionpatternsthatfollow:
(define_insn""
[(set(match_operand:SI0"general_operand""=r")
(plus:SI(match_dup0)
(match_operand:SI1"general_operand""r")))]"""...")
whichhastwooperands,oneofwhichmustappearintwoplaces,and
(define_insn""
[(set(match_operand:SI0"general_operand""=r")
(plus:SI(match_operand:SI1"general_operand""0")
(match_operand:SI2"general_operand""r")))]"""...")
which has threeoperands,twoofwhicharerequiredbyaconstraintto beidentical.If weareconsideringan
insnoftheform
(insnNPREVNEXT(set(reg:SI3)

(plus:SI(reg:SI6)(reg:SI109)))...)
the firstpatternwouldnot apply atall,becausethisinsndoesnot containtwoidenticalsubexpressionsinthe
right place. The pattern would say, "That does not look like an add instruction try other patterns." The
second pattern would say, "Yes, thats an add instruction, but there is something wrong with it." It would
direct the reloadpass of the compiler to generateadditional insnsto make the constrainttrue.The results
mightlooklikethis:
(insnN2PREVN
(set(reg:SI3)(reg:SI6))...)
(insnNN2NEXT
(set(reg:SI3)
(plus:SI(reg:SI3)(reg:SI109)))...)
It is up to you to make surethat eachoperand,in each pattern, has constraints thatcanhandleanyRTL
expressionthat couldbepresentforthatoperand.(Whenmultiplealternativesareinuse,eachpatternmust,
for each possible combination ofoperandexpressions,haveatleast onealternativewhich can handlethat
combination of operands.) The constraints dont need to *allow* any possible operandwhen this is the
case, theydo not constrainbuttheymustatleastpointthewaytoreloadinganypossibleoperandso thatit
willfit.
* If the constraintaccepts whatever operands the predicate permits, thereisnoproblem:reloadingisnever
necessaryforthisoperand.
For example, anoperand whoseconstraintspermiteverythingexceptregistersissafeprovideditspredicate
rejectsregisters.
Anoperand whose predicateaccepts only constant values is safeprovideditsconstraintsincludetheletter
i. If any possible constant value is accepted, then nothing less than i will do if the predicate is more
selective,thentheconstraintsmayalsobemoreselective.
* Any operand expression canbe reloaded bycopying itintoaregister.Soifanoperandsconstraintsallow
somekindof register, itis certain to besafe.It neednot permit allclasses ofregistersthecompilerknows
howtocopyaregisterintoanotherregisteroftheproperclassinordertomakeaninstructionvalid.
* A nonoffsettable memory reference can be reloaded by copying the address into a register. So if the
constraintusesthelettero,allmemoryreferencesaretakencareof.
* Aconstantoperand can be reloaded by allocating spacein memory tohold itaspreinitializeddata.Then
the memory referencecanbe used inplaceof theconstant. So iftheconstraint usestheletters o orm,
constantoperandsarenotaproblem.

* If the constraint permits a constant and a pseudo register used in an insn was not allocated to a hard
register and is equivalentto aconstant, the registerwillbereplacedwiththe constant.Ifthe predicatedoes
not permit a constant and the insn is rerecognized for some reason, the compiler will crash. Thus the
predicatemustalwaysrecognizeanyobjectsallowedbytheconstraint.
If the operands predicatecanrecognize registers,buttheconstraint does notpermit them,it can makethe
compiler crash. When this operand happens to be a register, the reload pass will bestymied, because it
doesnotknowhowtocopyaregistertemporarilyintomemory.
Ifthepredicateacceptsaunaryoperator,theconstraintappliestotheoperand.Forexample,theMIPS
processoratISAlevel3supportsaninstructionwhichaddstworegistersinSImodetoproduceaDImode
result,butonlyiftheregistersarecorrectlysignextended.Thispredicatefortheinputoperandsacceptsa
sign_extendofanSImoderegister.Writetheconstrainttoindicatethetypeofregisterthatisrequiredfor
theoperandofthesign_extend.
The = inthe"=r" is aconstraint modifier,youcanfindmore informationabout constraintmodifiers,in the
gccinfounderMachineDescriptions:Constraints:Modifiers.
Istrongly recommendreading moreinthesystem info documentation.If you haventhadmuch experience
withtheinforeader(alsoaccesablethroughemacs),learnit,itisanexcellentsourceofinformation.
The gcc info documentation also explains how to use a specific CPU register foraconstraint for various
hardware including the i386. You can find this information under [gcc : Machine Desc : Constraints :
MachineConstraints]intheinfodocumentation.
Youcanspecifyspecificregistersinyourconstraints,e.g."%eax".
*__asm____volatile__
Because of the compilers optimization mechanism, your code may not appear at exactly in the location
specified by the programmer. I mayevenbeinterspersedwiththerestof thecode.Topreventthis,youcan
use __asm__ __volotile__ instead. Like the __ for asm, these are also not needed for volatile, but can
preventnameconflicts.
========================================================================comments
andsuggestions<deltak@telus.net>

LinuxAssemblyTutorial,CS200
byBjornChambless
Introduction
ThefollowingisdesignedtobeaLinuxequivalentto"DevelopingAssemblyLanguage
ProgramsonaPC"byDouglasV.Hall.
Thistutorialrequiresthefollowing:
ani386familyPCrunningLinuxas,theGNUassembler(includedwithanygccinstallation)ld,
theGNUlinker(alsoincludedwithgcc)gdb,theGNUdebugger
The tutorial was developed on a 5.1 Redhat Linux installation running a 2.0.34 version kernel
and the version 5 and 6 C language libraries with ELF file format. But I have tried to makethe
tutorialasgeneralpossiblewithrespecttoLinuxsystems.
Ihighlyrecommendworkingthroughthistutorialwith"as"and"gdb"documentationcloseat
hand.
Overview
Theprocessofdevelopinganassemblyprogramunderlinuxissomewhatdifferentfrom
developmentunderNT.Inordertoaccommodateobjectorientedlanguageswhichrequirethe
compilertocreateconstructoranddestructormethodswhichexecutebeforeandafterthe
executionof"main",theGNUdevelopmentmodelembedsusercodewithinawrapperofsystem
code.Inotherwords,theusers"main"istreatedasafunctioncall.Anadvantageofthisisthat
userisnotrequiredtoinitializesegmentregisters,thoughusercodemustobeysomefunction
requirements.
TheCode
ThefollowingistheLinuxversionoftheaveragetemperatureprogram.Itwillbereferredtoas
"average.s".Note:Assemblylanguageprogramsshouldusethe".s"suffix.
/*linuxversionofAVTEMP.ASMCS200,fall1998*/.data/*beginningofdatasegment*/
/*hi_tempdataitem*/
.typehi_temp,@object/*declareasdataobject*/.sizehi_temp,1/*declaresizeinbytes*/hi_temp:
.byte0x92/*setvalue*/
/*lo_tempdataitem*/
.typelo_temp,@object.sizelo_temp,1lo_temp:
.byte0x52

/*av_tempdataitem*/
.typeav_temp,@object.sizeav_temp,1av_temp:
.byte0
/*segmentregisterssetupbylinkedcode*//*beginningoftext(code)segment*/.text
.align4/*set4doublewordalignment*/.globlmain/*makemainglobalforlinker*/
.typemain,@function/*declaremainasafunction*/main:
pushl%ebp/*functionrequirement*/movl%esp,%ebp/*functionrequirement*/movbhi_temp,%aladdb
lo_temp,%almovb$0,%ahadcb$0,%ahmovb$2,%blidivb%blmovb%al,av_templeave/*function
requirement*/ret/*functionrequirement*/

assemblyinstructionsThiscodemaybeassembledwiththefollowingcommand:
asagstabsoaverage.oaverage.s
The"a"optionprintsamemorylistingduringassembly.Thisoutputgivesthelocationvariables
andcodewithrespecttothebeginningsofthedataandcodesegments."gstabs"places
debugginginformationintheexecutable(usedbygdb)."o"specifiesaverage.oastheoutputfile
name(thedefaultisa.out,whichisconfusingsincethefileisnotexecutable.)
Theobjectfile(average.o)canthenbelinkedtotheLinuxwrappercodeinordertocreatean
executable.Thesefilesarecrt1.o,crti.oandcrtn.o.crt1.oandcrti.oprovideinitializationcode
andcrtn.odoescleanup.Theseshouldallbelocatedin"/usr/lib"bemaybeelsewereonsome
systems.They,andtheirsource,mightbelocatedbyexecutingthefollowingfindcommand:
find/name"crt*"print
Thelinkcommandisthefollowing:
ldmelf_i386static/usr/lib/crt1.o/usr/lib/crti.olcaverage.o/usr/lib/crtn.o
"melf_i386"instructsthelinkertousetheELFfileformat."static"causestaticratherthan
dynamiclinkingtooccur.And"lc"linksinthestandardclibraries(libc.a).Itmightbe
necessarytoinclude"I/libdirectory"intheinvocationforldtofindtheclibrary.
Itwillbenecessarytochangethemodeoftheresultingobjectfilewith"chmod+x./a.out".

Itshouldnowbepossibletoexecutethefile.But,ofcourse,therewillbenooutput.
Irecommendplacingtheabovecommandsinamakefile.
debugging
The"gstabs"optiongiventotheassemblerallowstheassemblyprogramtobedebuggedunder
gdb.
Thefirststepistoinvokegdb:
gdb./a.out
gdbshouldstartwiththefollowingmessage:
[bjorn@pomadesrc]$gdb./a.outGNUgdb4.17Copyright1998FreeSoftwareFoundation,Inc.GDBisfree
software,coveredbytheGNUGeneralPublicLicense,andyouarewelcometochangeitand/ordistribute
copiesofitundercertainconditions.Type"showcopying"toseetheconditions.Thereisabsolutelyno
warrantyforGDB.Type"showwarranty"fordetails.ThisGDBwasconfiguredas"i386redhatlinux"...(gdb)
The"l"commandwilllisttheprogramsourcecode.
(gdb)l1/*linuxversionofAVTEMP.ASMCS200,fall1998*/2.data/*beginningofdatasegment*/34/*
hi_tempdataitem*/5.typehi_temp,@object/*declareasdataobject*/6.sizehi_temp,1/*declaresizein
bytes*/7hi_temp:8.byte0x92/*setvalue*/910/*lo_tempdataitem*/(gdb)
Thefirstthingtodoissetabreakpointsoitwillbepossibletostepthroughthecode.
(gdb)breakmainBreakpoint1at0x80480f7(gdb)
Thissetsabreakpointatthebeginningofmain.Nowruntheprogram.
(gdb)runStartingprogram:/home/bjorn/src/./a.out
Breakpoint1,main()ataverage.s:3131movbhi_temp,%alCurrentlanguage:autocurrentlyasm(gdb)

valuesinregisterscanbecheckedwitheither"inforegisters"
(gdb)inforegisterseax0x8059200134582784ecx0xbffffd941073742444edx0x00ebx0x8097bf0
134839280esp0xbffffdd80xbffffdd8ebp0xbffffdd80xbffffdd8esi0x11edi0x8097088134836360eip
0x80480f70x80480f7eflags0x246582cs0x2335ss0x2b43ds0x2b43es0x2b43fs0x2b43gs0x2b
43(gdb)
...or"p/x$eax"whichprintsthevalueintheEAXregisterinhex.The"e"infrontoftheregistername
indicatesa32bitregister.TheIntelx86familyhasincluded"extended"32bitregisterssincethe80386.
TheseEregistersaretotheXregistersastheLandHaretotheXregisters.Linuxalsousesa"flat"and
protectedmemorymodelratherthatsegmentation,thustheEIPstorestheentirecurrentaddress.
(gdb)p/x$eax$4=0x8059200(gdb)
The"p"commandprints,"/x"indicatestheoutputshouldbeinhexadecimal.
type"s"or"step"tosteptothenextinstruction.
(gdb)step32addblo_temp,%al(gdb)
noticethat92HhasbeenloadedintotheleastsignificantbitoftheEAXregister(ie.theALregister)bythe
movbinstruction.
(gdb)p/x$eax$6=0x8059292(gdb)
Andwecontinuesteppingthroughtheprogram....
(gdb)s33movb$0,%ah(gdb)s34adcb$0,%ah(gdb)s35movb$2,%bl(gdb)s36idivb%bl(gdb)s37
movb%al,av_temp(gdb)s38leave

andifweexaminetheEAXregisterandthevariableav_tempafterthefinalmovbinstruction,weseethat
theyaresettothecorrectvalue,72H.
(gdb)p/x$eax$9=0x8050072(gdb)p/xav_temp$10=0x72(gdb)
Notethatduringsteppingthelistedinstructionistheoneabouttobeexecuted.

<backtomainpage

DJGPPQuickAsmProgrammingGuide
Okay,sothistutorialhaslongbeenoverdue,Ivebeenputtingitoffforacoupleofmonthsbut
rightnowIminthemoodtowriteitsohereitis.Thisisjustashorttutorialondoingassembly
codeusingDJGPP.Iamnotteachinghowtocodex86asm(getanothertutorialorbook)butIll
trytoshowhowtodobothinlineandexternalasminDJGPP.Iassumeyouarealreadyfamiliar
with"standard"Intelasm,asusedinTASM,MASM,etc.
IhighlysuggestreadingtheFAQlistsfirst,faq102.zipandfaq211b.zip,andtheonline
documentationinsidethetxi*.zippackage.Thereisalsoanewsgroup,comp.os.msdos.djgpp.
ThemainsiteisatDelorieSoftware,wherethemostuptodateinformationisavailableon
DJGPP,andwherethemailarchivesarekept.Ifindmanyhelpfularticlesinthemailarchives.
ClickHeretoGetPaidtoSurftheWeb!

AT&Tx86AsmSyntax
DJGPPusesAT&Tasmsyntax.ThisisalittledifferentfromtheregularIntelformat.Themain
differencesare:
AT&Tsyntaxusestheoppositeorderforsourceanddestinationoperands,sourcefollowedby
destination.Registeroperandsareprecededbythe%character,includingsections.Immediate
operandsareprecededbythe$character.Thesizeofmemoryoperandsarespecifiedusingthe
lastcharacteroftheopcode.Theseareb(8bit),w(16bit),andl(32bit).
Herearesomeexamples.Intelequivalents,ifany,areprovidedinC++stylecomments..
movw%bx,%ax//movax,bxxorl%eax,%eax//xoreax,eaxmovw$1,%ax//movax,1movbX,%ah//
movah,byteptrXmovwX,%ax//movax,wordptrXmovlX,%eax//moveax,X

MostopcodesareidenticalbetweenAT&TandIntelformat,exceptforthese:
movsSD//movsxmovzSD//movz

whereSandDarethesourceanddestinationoperandsizesuffixes,respectively

movswl%ax,%ecx//movsxecx,axcbtw//cbwcwtl//cwdecwtd//cwdcltd//cdqlcall$S,$O//callfarS:O
ljmp$S,$O//jumpfarS:Olret$V//retfarV

Opcodeprefixesshouldnotbewrittenonthesamelineastheinstructiontheyactupon.For
example,repandstosdshouldbetwoseparateinstructions,withthelatterimmediatelyfollowing
theformer.
Memoryreferencesarealittledifferenttoo.TheusualIntelmemoryreferenceoftheform:
section:[base+index*scale+disp]
iswrittenas:
section:disp(base,index,scale)
Herearesomeexamples:
movl4(%ebp),%eax//moveax,[ebp+4])addl(%eax,%eax,4),%ecx//addecx,[eax+eax*4])movb$4,
%fs:(%eax)//movfs:eax,4)movl_array(,%eax,4),%eax//moveax,[4*eax+array])movw
_array(%ebx,%eax,4),%cx//movcx,[ebx+4*eax+array])

Jump instructions always use the smallest displacements. However, the following instructions
always work in byte displacements only: jcxz, jecxz, loop, loopz, loope, loopnz and loopne. As
suggestedintheonlinedocumentation,ajcxzfoocouldbeexpandedtowork:
jcxzcx_zerojmpcx_nonzerocx_zero:
jmpfoocx_nonzero:

The documentation also caution on mul and imul instructions. The expanding multiply
instructions are done using one operand. For example, imul $ebx, $ebx will notputtheresultin
edx:eax.Usethesingleoperandformimul%ebxtogettheexpandedresult.

InlineAsm
Illstartwithinlineasmfirst,becauseitseemstobethemorefrequentlyaskedquestion.Thisis
thebasicsyntax,asdescribedintheonlinehelp:
__asm__(asmstatements:outputs:inputs:registersmodified)
Thefourfieldsare:

asmstatementsAT&Tform,separatedbynewlineoutputsconstraintfollowedbynamein
parentheses,separatedbycommainputsconstraintfollowedbynameinparentheses,separated
bycommaregistersmodifiednamesseparatedbycomma
Asimpleexample:
__asm__("
pushl%eax\nmovl$1,%eax\npopl%eax")

Youdonotalwayshavetousetheotherthreefields,aslongasyoudonotwanttospecifyany
inputoroutputvariablesandyourenotaccidentallyclobberinganyregisters.
Letsspiceitupwithinputvariables.
inti=0
__asm__("
pushl%%eax\nmovl%0,%%eax\naddl$1,%%eax\nmovl%%eax,%0\npopl%%eax"::"g"(i))//
incrementi

Dontpanicyet!Illtrytoexplainfirst.Ourinputvariableisiandwewanttoincrementitby1.
Wedonthaveanyoutputvariables,norclobberedregisters(wesaveeaxourselves).Therefore
thesecondandlastfieldsareempty.
Sincetheinputfieldisspecified,weneedtoleaveablankcolonfortheoutputfield,butnonefor
thelastfield,sinceitisntused.Leaveanewlineoratleastonespacebetweentwoblankcolons.
Letsmoveontotheinputfield.Constraintsarejustinstructionsyougivetothecompilerto
handlethevariablestheyactupon.Theyareenclosedindoublequotes.Sowhatdoesthe"g"
mean?"g"letsthecompilerdecidewheretoloadthevalueofiinto,aslongastheasm
instructionsacceptit.Ingeneral,mostofyourinputvariablescanbeconstrainedas"g",letting
thecompilerdecidehowtoloadthem(gccmightevenoptimizeit!).Othercommonlyused
constraintsare"r"(loadintoanyavailableregister),"a"(ax/eax),"b"(bx/ebx),"c"(cx/ecx),"d"
(dx/edx),"D"(di/edi),"S"(si/esi),etc.
Theoneinputwehavewillbereferredtoas%0insidetheasmstatements.Ifwehavetwo
inputs,theywillbe%0and%1,intheorderlistedintheinputfields(seenextexample).ForN
inputsandnooutputs,%0through%N1willcorrespondtotheinputs,intheordertheyare
listed.
Ifanyoftheinput,output,orregistersmodifiedfieldsareused,registernamesinsidetheasm
statementsmustbeprecededwithtwopercent(%)characters,insteadofone.Contrastthis
example

withthefirstone,whichdidntuseanyofthelastthreefields.
Letsdotwoinputsandintroduce"volatile":
inti=0,j=1__asm____volatile__("
pushl%%eax\nmovl%0,%%eax\naddl%1,%%eax\nmovl%%eax,%0\npopl%%eax"::"g"(i),"g"
(j))//incrementibyj

Okay,thistimearoundwehavetwoinputs.Noproblem,theonlythingwehavetorememberis
%0correspondstothefirstinput(iinthiscase),and%1toj,whichislistedafteri.
Ohyeah,whatexactlyisthisvolatilething?Itjustpreventsthecompilerfrommodifyingyour
asmstatements(reordering,deleting,combining,etc.),andassemblethemastheyare(yes,gcc
willoptimizeifitfeelslikeit!).Isuggestusingvolatilemostofthetime,andfromhereonwell
beusingit.
Letsdoonewhichusestheoutputfieldthistime.
inti=0__asm____volatile__("
pushl%%eax\nmovl$1,%%eax\nmovl%%eax,%0\npopl%%eax":"=g"(i))//assign1toi

Thislooksalmostexactlylikeoneofourpreviousinputfieldexamplesanditisreallynotvery
different.Alloutputconstraintsareprecededbyanequal(=)character.Theyarealsoreferred
from%0to%N1insidetheasmstatements,intheordertheyarelistedintheoutputfield.You
mightaskwhathappensifoneusesboththeinputandoutputfields?Well,thenextexamplewill
showyouhowtousethemtogether.
inti=0,j=1,k=0__asm____volatile__("
pushl%%eax\nmovl%1,%%eax\naddl%2,%%eax\nmovl%%eax,%0\npopl%%eax":"=g"(k):"g"
(i),"g"(j))//k=i+j

Okay,theonlyunclearpartisjustthenumberingofthevariablesinsidetheasmstatements.Ill
explain.
Whenusingbothinputandoutputfields:

%0...%Karetheoutputs
%K+1...%Naretheinputs
Inourexample,%0referstok,%1toi,and%2toj.Simple,no?
Sofarwehaventusedthelastfield,registersmodified,atall.Ifweneedtouseanyregister
insideourasmstatements,weeitherhavetopushorpopthemexplicitly,orlisttheminthisfield
andletgcctakecareofthat.
Heresthepreviousexample,withouttheexpliciteaxsaveandrestore.
inti=0,j=1,k=0__asm____volatile__("
pushl%%eax\nmovl%1,%%eax\naddl%2,%%eax\nmovl%%eax,%0\npopl%%eax":"=g"(k):"g"
(i),"g"(j):"ax","memory")//k=i+j

Wetellgccthatwereusingregistereaxintheregistersmodifiedfieldanditwilltakecareof
savingorrestoring,ifnecessary.A16bitregisternamecovers32,16or8bitregisters.
Ifwearealsotouchingmemory(writingtovariables,etc.),itsrecommendedtospecify
"memory"intheregistersmodifiedfield.Thismeansallourexampleshereshouldhavehadthis
specified(well,excepttheveryfirstone),butIchosenottobringthisupuntilnow,justfor
simplicity.
Locallabelsinsideyourinlineasmshouldbeterminatedwitheitherborf,forbackwardand
forwardreferences,respectively.
Forexample,
__asm____volatile__("
0:\n
...jmp0b\n...jmp1f\n...1:\n
...)

HeresanexampleonmixingCcodewithinlineasmjumps(thankstoSrikanthB.Rforthistip).
voidMyFunction(intx,inty){
__asm__("Start:")__asm__(...dosomecomparison...)__asm__("jlLabel_1")

CallFunction(&x,&y)__asm__("jmpStart")
Label_1:
return}

ExternalAsm
Blah...Okayfine.Heresaclue:GetsomeofyourC/C++files,andcompilethemasgccS
file.c.Theninspectfile.S.Thebasiclayoutis:
.file"myasm.S"
.datasomedata:.word0...
.text.globl__myasmfunc__myasmfunc:...ret

Macros,macros!Theresaheaderfilelibc/asmdefs.hthatisconvenientforwritingexternalasm.
Justincludeitontopofyourasmsourceandusethemacrosaccordingly.Forexample,heres
myasm.S:
#include<libc/asmdefs.h>
.file"myasm.S"
.data.align2somedata:.word0...
.text.align4FUNC(__MyExternalAsmFunc)ENTER
movlARG1,%eax...jmpmylabel...mylabel:
...LEAVE

Thatshouldbeagoodskeletonforyourexternalasmcode.

OtherResources

Thebestwaytolearnalltheseistolookatotherscode.Theressomeinlineasmcodeinthe
sys/farptr.h.Also,ifyourunLinux,FreeBSD,etc.,somewhereinthekernelsourcetree(i386/or
something),thereareplentyofasmsources.Checkthedjgpp2/directoryatx2ftp.oulu.fi,for
graphicsandgaminglibrariesthathavesources.
IfyouhaveasmcodethatneedstobeconvertedfromInteltoAT&Tsyntax,orjustwanttostick
withregularIntelsyntax,youcan:
GetNASM,afreeassemblerwhichtakesIntelasmformatandproducesCOFFbinaries
compatiblewithDJGPPGetMASMandcompileyoursourcestoCOFFformat(objectfile
formatusedbyDJGPP)Getta2asv08.zip,aTASMtoAT&TasmconverterGeto2cv10.arjto
convert.OBJ/.LIBbetweenTASMandDJGPPSearchthemailarchivesforasedscriptthat
convertsInteltoAT&Tsyntax
<backtomainpage
July28,1999icbmX2onEFNETIRC.Sendemailforcorrectionsorsuggestionsto
avly@castle.net.
Copyright19951999icbmX2.Allrightsreserved.
StandardDisclaimer:Alltrademarksmentionedareownedbytheirrespectivecompanies.There
areabsolutelynoguarantees,expressedorimplied,onanythingthatyoufindinthisdocument.I
cannotbeheldresponsibleforanythingthatresultsfromtheuseormisuseofthisdocument.

PageSection2.4.InlineAssembly
1of7ClickheretoshowtoolbarsoftheWebOnlineHelpSystem:showtoolbars

2.4.InlineAssembly
Anotherformofcodingallowedwiththegcccompileristheabilitytodoinlineassemblycode.Asits
nameimplies,inlineassemblydoesnotrequireacalltoaseparatelycompiledassemblerprogram.By
usingcertainconstructs,wecantellthecompilerthatcodeblocksaretobeassembledratherthan
compiled.Althoughthismakesforanarchitecturespecificfile,thereadabilityandefficiencyofaC
functioncanbegreatlyincreased.
Hereistheinlineassemblerconstruct:
1asm(assemblerinstruction(s)2:outputoperands(optional)
3:inputoperands(optional)4:clobberedregisters(optional)5)

Forexample,initsmostbasicform,
asm("movl%eax,%ebx")
couldalsobewrittenas
asm("movl%eax,%ebx":::)
Wewouldbelyingtothecompilerbecauseweareindeedclobberingebx.Readon.
WhatmakesthisformofinlineassemblysoversatileistheabilitytotakeinCexpressions,modify
them,andreturnthemtotheprogram,allthewhilemakingsurethatthecompilerisawareofour
changes.Let'sfurtherexplorethepassingofparameters.

2.4.1.OuputOperands
Online2,followingthecolon,theoutputoperandsarealistofCexpressionsinparenthesespreceded
bya"constraint."Foroutputoperands,theconstraintusuallyhasthe=modifier,whichindicatesthat
thisiswriteonly.The&modifiershowsthatthisisan"earlyclobber"operand,whichmeansthatthis
operandisclobberedbeforetheinstructionisfinishedusingit.Eachoperandisseparatedbya
comma.

2.4.2.InputOperands
Theinputoperandsonline3followthesamesyntaxastheoutputoperandsexceptforthewriteonly
modifier.

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se...
6/25/2010

PageSection2.4.InlineAssembly2of72.4.3.ClobberedRegisters(orClobber
List)
Inourassemblystatements,wecanmodifyvariousregistersandmemory.Forgcctoknowthatthese
itemshavebeenmodified,welistthemhere.

2.4.4.ParameterNumbering
Eachparameterisgivenapositionalnumberstartingwith0.Forexample,ifwehaveoneoutput
parameterandtwoinputparameters,%0referencestheoutputparameterand%1and%2reference
theinputparameters.

2.4.5.Constraints
Constraintsindicatehowanoperandcanbeused.TheGNUdocumentationhasthecompletelistingof
simpleconstraintsandmachineconstraints.Table2.4liststhemostcommonconstraintsforthex86.

Table2.4.SimpleandMachineConstraintsforx86
ConstraintFunction
a
eaxregister.
b
ebxregister.
c
ecxregister.
d
edxregister.
S
esiregister.
D
ediregister.
IConstantvalue(0...31).
qDynamicallyallocatesaregisterfromeax,ebx,ecx,edx.
rSameasq+esi,edi.
mMemorylocation.
ASameasa+b.eaxandebxareallocatedtogethertoforma64bit
register.

2.4.6.asm
Inpractice(especiallyintheLinuxkernel),thekeywordasmmightcauseerrorsatcompiletime
becauseofotherconstructsofthesamename.Youoftenseethisexpressionwrittenas__asm__,
whichhasthesamemeaning.

2.4.7.__volatile__
Anothercommonlyusedmodifieris__volatile__.Thismodifierisimportanttoassemblycode.Ittells
thecompilernottooptimizetheinlineassemblyroutine.Often,withhardwarelevelsoftware,the

compilerthinkswearebeingredundantandwastefulandattemptstorewriteourcodetobeas
efficientaspossible.Thisisusefulforapplicationlevelprogramming,butatthehardwarelevel,itcan
becounterproductive.
Forexample,saywearewritingtoamemorymappedregisterrepresentedbytheregvariable.Next,
weinitiatesomeactionthatrequiresustopollreg.Thecompilersimplyseesthisasconsecutivereads
tothesamememorylocationandeliminatestheapparentredundancy.Using__volatile__,the
compilernow

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se...
6/25/2010

PageSection2.4.InlineAssembly3of7knowsnottooptimizeaccessesusingthisvariable.Likewise,
whenyouseeasmvolatile(...)inablockofinlineassemblercode,thecompilershouldnotoptimize
thisblock.
Nowthatwehavethebasicsofassemblyandgccinlineassembly,wecanturnourattentiontosome
actualinlineassemblycode.Usingwhatwejustlearned,wefirstexploreasimpleexampleandthena
slightlymorecomplexcodeblock.
Here'sthefirstcodeexampleinwhichwepassvariablestoaninlineblockofcode:
6intfoo(void)7{8intee=0x4000,ce=0x8000,reg9
__asm____volatile__("movl%1,%%eax"10"movl%2,%%ebx"11"callsetbits"12"movl%%eax,%0"13:"=r"
(reg)//reg[param%0]isoutput14:"r"(ce),"r"(ee)//ce[param%1],ee[param%2]areinputs15:"%eax","%ebx"
//%eaxand%ebxgotclobbered16)17printf("reg=%x",reg)18}

Line6
ThislineisthebeginningoftheCroutine.

Line8
ee,ce,andreqarelocalvariablesthatwillbepassedasparameterstotheinlineassembler.

Line9
Thislineisthebeginningoftheinlineassemblerroutine.Moveceintoeax.

Line10
Moveeeintoebx.

Line11
Callsomefunctionfromassembler.

Line12
Returnvalueineax,andcopyittoreg.

Line13
Thislineholdstheoutputparameterlist.Theparmregiswriteonly.

Line14
Thislineistheinputparameterlist.Theparmsceandeeareregistervariables.

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se...
6/25/2010

PageSection2.4.InlineAssembly4of7Line15
Thislineistheclobberlist.Theregseaxandebxarechangedbythisroutine.Thecompilerknowsnot
tousethevaluesafterthisroutine.

Line16
Thislinemarkstheendoftheinlineassemblerroutine.
Thissecondexampleusestheswitch_to()functionfrominclude/asmi386/system.h.Thisfunctionis
theheartoftheLinuxcontextswitch.Weexploreonlythemechanicsofitsinlineassemblyinthis
chapter.Chapter9,"BuildingtheLinuxKernel,"covershowswitch_to()isused:
[Viewfullwidth]
include/asmi386/system.h012externstructtask_struct*
FASTCALL(__switch_to(structtask_struct*prev,struct
task_struct*next))...015#defineswitch_to(prev,next,last)do{016unsignedlongesi,edi017asm
volatile("pushfl\n\t"018"pushl%%ebp\n\t"019"movl%%esp,%0\n\t"/*saveESP*/020"movl%5,%%esp\n\t"/*
restoreESP*/021"movl$1f,%1\n\t"/*saveEIP*/022"pushl%6\n\t"/*restoreEIP*/023"jmp__switch_to\n"023
"1:\t"024"popl%%ebp\n\t"025"popfl"026:"=m"(prev>thread.esp),"=m"(prev>thread.eip),027"=a"(last),"=S"
(esi),"=D"(edi)028:"m"(next>thread.esp),"m"(next>thread.eip),029"2"(prev),"d"(next))030}while(0)

Line12
FASTCALLtellsthecompilertopassparametersinregisters.
Theasmlinkagetagtellsthecompilertopassparametersonthestack.

Line15
do{statements...}while(0)isacodingmethodtoallowamacrotoappearmorelikeafunctiontothe
compiler.Inthiscase,itallowstheuseoflocalvariables.

Line16
Don'tbeconfusedthesearejustlocalvariablenames.

Line17
Thisistheinlineassemblerdonotoptimize.

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se...
6/25/2010

PageSection2.4.InlineAssembly5of7Line23
Parameter1isusedasareturnaddress.

Lines1724
\n\thastodowiththecompiler/assemblerinterface.Eachassemblerinstructionshouldbeonitsown
line.

Line26
prev>thread.espandprev>thread.eiparetheoutputparameters:
[%0]=(prev>thread.esp),iswriteonlymemory[%1]=(prev>thread.eip),iswriteonlymemory

Line27
[%2]=(last)iswriteonlytoregistereax:
[%3]=(esi),iswriteonlytoregisteresi[%4]=(edi),iswriteonlytoregisteredi

Line28
Herearetheinputparameters:
[%5]=(next>thread.esp),ismemory[%6]=(next>thread.eip),ismemory

Line29
[%7]=(prev),reuseparameter"2"(registereax)asaninput:
[%8]=(next),isaninputassignedtoregisteredx.
Notethatthereisnoclobberlist.
TheinlineassemblerforPowerPCisnearlyidenticalinconstructtox86.Thesimpleconstraints,such
as"m"and"r,"areusedalongwithaPowerPCsetofmachineconstraints.Hereisaroutineto
exchangea32bitpointer.Notehowsimilartheinlineassemblersyntaxistox86:
include/asmppc/system.h103static__inline__unsignedlong
104xchg_u32(volatilevoid*p,unsignedlongval)105{106unsignedlongprev107108__asm____volatile__("\n\
1091:lwarx%0,0,%2\n"110111"stwcx.%3,0,%2\n\112bne1b"113:"=&r"(prev),"=m"(*(volatileunsignedlong
*)p)

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se...
6/25/2010

PageSection2.4.InlineAssembly6of7114:"r"(p),"r"(val),"m"(*(volatileunsignedlong*)p)115:"cc",
"memory")116117returnprev118}

Line103
Thissubroutineisexpandedinplaceitwillnotbecalled.

Line104
Routinenameswithparameterspandval.

Line106
Thisisthelocalvariableprev.

Line108
Thisistheinlineassembler.Donotoptimize.

Lines109111
lwarx,alongwithstwcx,forman"atomicswap."lwarxloadsawordfrommemoryand"reserves"the
addressforasubsequentstorefromstwcx.

Line112
Branchifnotequaltolabel1(b=backward).

Line113
Herearetheoutputoperands:
[%0]=(prev),writeonly,earlyclobber[%1]=(*(volatileunsignedlong*)p),writeonlymemoryoperand

Line114
Herearetheinputoperands:
[%2]=(p),registeroperand[%3]=(val),registeroperand[%4]=(*(volatileunsignedlong*)p),memoryoperand

Line115
Herearetheclobberoperands:
[%5]=Conditioncoderegisterisaltered[%6]=memoryisclobbered

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se...
6/25/2010

PageSection2.4.InlineAssembly7of7Thisclosesourdiscussiononassemblylanguageandhowthe
Linux2.6kernelusesit.WehaveseenhowthePPCandx86architecturesdifferandhowgeneralASM
programmingtechniquesareusedregardlessofplatform.Wenowturnourattentiontothe
programminglanguageC,inwhichthemajorityoftheLinuxkerneliswritten,andexaminesome
commonproblemsprogrammersencounterwhenusingC.
Copyright@2007OpenSourceProject.org.cn.,,,
chinaperl@gmail.com,

http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch02lev1se...
6/25/2010

Inlineassemblyforx86inLinuxhttp://www128.ibm.com/developerworks/linux/library/lia.html?dwzone...
1of708/28/200601:23PM

Inlineassemblyforx86inLinuxPuttingthepieces
together
Level:Advanced
BharataRao(rbharata@in.ibm.com)IBMLinuxTechnologyCenter,IBMSoftwareLabs,India
01Mar2001
BharataB.Raooffersaguidetotheoveralluseandstructureofinlineassemblyforx86ontheLinux
platform.Hecoversthebasicsofinlineassemblyanditsvarioususages,givessomebasicinlineassembly
codingguidelines,andexplainstheinstancesofinlineassemblycodeintheLinuxkernel.
IfyoureaLinuxkerneldeveloper,youprobablyfindyourselfcodinghighlyarchitecturedependent
functionsoroptimizingacodepathprettyoften.Andyouprobablydothisbyinsertingassembly
languageinstructionsintothemiddleofCstatements(amethodotherwiseknownasinlineassembly).
LetstakealookatthespecificusageofinlineassemblyinLinux.(WelllimitourdiscussiontotheIA32
assembly.)

GNUassemblersyntaxinbrief
LetsfirstlookatthebasicassemblersyntaxusedinLinux.GCC,theGNUCCompilerforLinux,uses
AT&Tassemblysyntax.Someofthebasicrulesofthissyntaxarelistedbelow.(Thelistisbynomeans
completeIveincludedonlythoserulespertinenttoinlineassembly.)
RegisternamingRegisternamesareprefixedby%.Thatis,ifeaxhastobeused,itshouldbeusedas
%eax.
SourceanddestinationorderingInanyinstruction,sourcecomesfirstanddestinationfollows.Thisdiffers
fromIntelsyntax,wheresourcecomesafterdestination.
mov%eax,%ebx,transfersthecontentsofeaxtoebx.

SizeofoperandTheinstructionsaresuffixedbyb,w,orl,dependingonwhethertheoperandisabyte,
word,orlong.ThisisnotmandatoryGCCtriesprovidetheappropriatesuffixbyreadingtheoperands.
Butspecifyingthesuffixesmanuallyimprovesthecodereadabilityandeliminatesthepossibilityofthe
compilersguessingincorrectly.
movb%al,%blBytemove
movw%ax,%bxWordmovemovl%eax,%ebxLongwordmove

ImmediateoperandAnimmediateoperandisspecifiedbyusing$.
movl$0xffff,%eaxwillmovethevalueof0xffffintoeaxregister.

IndirectmemoryreferenceAnyindirectreferencestomemoryaredonebyusing().

Inlineassemblyforx86inLinuxhttp://www128.ibm.com/developerworks/linux/library/lia.html?dwzone...
movb(%esi),%alwilltransferthebyteinthememory
pointedbyesiintoalregister

2of708/28/200601:23PM

Inlineassembly
GCCprovidesthespecialconstruct"asm"forinlineassembly,whichhasthefollowingformat:
asm(assemblertemplate
:outputoperands(optional):inputoperands(optional):listofclobberedregisters(optional))

Inthisexample,theassemblertemplateconsistsofassemblyinstructions.TheinputoperandsaretheC
expressionsthatserveasinputoperandstotheinstructions.TheoutputoperandsaretheCexpressionson
whichtheoutputoftheassemblyinstructionswillbeperformed.
asm("movl%%cr3,%0\n":"=r"(cr3val))
a%eaxb%ebxc%ecxd%edxS%esiD%edi

Memoryoperandconstraint(m)Whentheoperandsareinthememory,anyoperationsperformedonthem
willoccurdirectlyinthememorylocation,asopposedtoregisterconstraints,whichfirststorethevaluein
aregistertobemodifiedandthenwriteitbacktothememorylocation.Butregisterconstraintsare
usuallyusedonlywhentheyareabsolutelynecessaryforaninstructionortheysignificantlyspeedupthe
process.MemoryconstraintscanbeusedmostefficientlyincaseswhereaCvariableneedstobeupdated
inside"asm"andyoureallydontwanttousearegistertoholditsvalue.Forexample,thevalueofidtris
storedinthememorylocationloc:
("sidt%0\n"::"m"(loc))

Matching(Digit)constraintsInsomecases,asinglevariablemayserveasboththeinputandtheoutput
operand.Suchcasesmaybespecifiedin"asm"byusingmatchingconstraints.

Inlineassemblyforx86inLinuxhttp://www128.ibm.com/developerworks/linux/library/lia.html?dwzone...
asm("incl%0":"=a"(var):"0"(var))

Inourexampleformatchingconstraints,theregister%eaxisusedasboththeinputandtheoutput
variable.varinputisreadto%eaxandupdated%eaxisstoredinvaragainafterincrement."0"here
specifiesthesameconstraintasthe0thoutputvariable.Thatis,itspecifiesthattheoutputinstanceofvar
shouldbestoredin%eaxonly.Thisconstraintcanbeused:
Incaseswhereinputisreadfromavariableorthevariableismodifiedandmodificationiswrittenbackto
thesamevariableIncaseswhereseparateinstancesofinputandoutputoperandsarenotnecessary
Themostimportanteffectofusingmatchingrestraintsisthattheyleadtotheefficientuseofavailable
registers.
3of708/28/200601:23PM

Examplesofcommoninlineassemblyusage
Thefollowingexamplesillustrateusagethroughdifferentoperandconstraints.Therearetoomany
constraintstogiveexamplesforeachone,butthesearethemostfrequentlyusedconstrainttypes.
"asm"andtheregisterconstraint"r"Letsfirsttakealookat"asm"withtheregisterconstraintr.Our
exampleshowshowGCCallocatesregisters,andhowitupdatesthevalueofoutputvariables.
intmain(void){
intx=10,y
asm("movl%1,%%eax
"movl%%eax,%0"
:"=r"(y)/*yisoutputoperand*/:"r"(x)/*xisinputoperand*/:"%eax")/*%eaxisclobberedregister*/}

Inthisexample,thevalueofxiscopiedtoyinside"asm".xandyarepassedto"asm"bybeingstoredin
registers.Theassemblycodegeneratedforthisexamplelookslikethis:
main:
pushl%ebpmovl%esp,%ebpsubl$8,%espmovl$10,4(%ebp)movl4(%ebp),%edx/*x=10isstoredin%edx*/
#APP/*asmstartshere*/
movl%edx,%eax/*xismovedto%eax*/movl%eax,%edx/*yisallocatedinedxandupdated*/
#NO_APP/*asmendshere*/
movl%edx,8(%ebp)/*valueofyinstackisupdatedwith
thevaluein%edx*/

GCCisfreeheretoallocateanyregisterwhenthe"r"constraintisused.Inourexampleitchose%edxfor
storingx.Afterreadingthevalueofxin%edx,itallocatedthesameregisterfory.

Inlineassemblyforx86inLinuxhttp://www128.ibm.com/developerworks/linux/library/lia.html?dwzone...

Sinceyisspecifiedintheoutputoperandsection,theupdatedvaluein%edxisstoredin8(%ebp),the
locationofyonstack.Ifywerespecifiedintheinputsection,thevalueofyonstackwouldnotbe
updated,eventhoughitdoesgetupdatedinthetemporaryregisterstorageofy(%edx).
Andsince%eaxisspecifiedintheclobberedlist,GCCdoesntuseitanywhereelsetostoredata.
Bothareproduced.inputxandNoteoutputthatifyyouwerehaveallocatedalotofininstructions,the
same%edxthisregister,maynotassumingbethecase.thatToinputsmakearesureconsumedthatinput
beforeandoutput
outputs
areallocatedindifferentregisters,wecanspecifythe&constraintmodifier.Hereisourexamplewiththe
constraintmodifieradded.
intmain(void){
intx=10,y
asm("movl%1,%%eax
"movl%%eax,%0"
:"=&r"(y)/*yisoutputoperand,notethe
&constraintmodifier.*/:"r"(x)/*xisinputoperand*/:"%eax")/*%eaxisclobberedregister*/}

Anddifferenthereregistersistheassemblyacross"asm".
codegeneratedforthisexample,fromwhichitisevidentthatxandyhavebeenstoredin
main:
pushl%ebpmovl%esp,%ebpsubl$8,%espmovl$10,4(%ebp)movl4(%ebp),%ecx/*x,theinputisin%ecx*/
#APP
movl%ecx,%eaxmovl%eax,%edx/*y,theoutputisin%edx*/
#NO_APP
movl%edx,8(%ebp)

4of708/28/200601:23PM

Useofspecificregisterconstraints
Nowletstakealookathowtospecifyindividualregistersasconstraintsfortheoperands.Inthe
followingexample,thecpuidinstructiontakestheinputinthe%eaxregisterandgivesoutputinfour
registers:%eax,%ebx,%ecx,%edx.Theinputtocpuid(thevariable"op")ispassedto"asm"intheeax
register,ascpuidexpectsitto.Thea,b,c,anddconstraintsareusedintheoutputtocollectthevaluesin
thefourregisters,respectively.
asm("cpuid"
:"=a"(_eax),"=b"(_ebx),"=c"(_ecx),"=d"(_edx):"a"(op))

Andbelowyoucanseethegeneratedassemblycodeforthis(assumingthe_eax,_ebx,etc....variablesare
storedonstack):

Inlineassemblyforx86inLinuxhttp://www128.ibm.com/developerworks/linux/library/lia.html?dwzone...
movl20(%ebp),%eax/*storeopin%eaxinput*/#APP
cpuid#NO_APP
movl%eax,4(%ebp)/*store%eaxin_eaxoutput*/movl%ebx,8(%ebp)/*storeotherregistersinmovl
%ecx,12(%ebp)respectiveoutputvariables*/movl%edx,16(%ebp)

Thestrcpyfunctioncanbeimplementedusingthe"S"and"D"constraintsinthefollowingmanner:
asm("cld\n
rep\nmovsb":/*noinput*/:"S"(src),"D"(dst),"c"(count))

Thesourcepointersrcisputinto%esibyusingthe"S"constraint,andthedestinationpointerdstisput
into%ediusingthe"D"constraint.Thecountvalueisputinto%ecxasitisneededbyrepprefix.
Andhereyoucanseeanotherconstraintthatusesthetworegisters%eaxand%edxtocombinetwo32bit
valuesandgeneratea64bitvalue:
#definerdtscll(val)\
__asm____volatile__("rdtsc":"=A"(val))
Thegeneratedassemblylookslikethis(ifvalhasa64bitmemoryspace).
#APP
rdtsc#NO_APP
movl%eax,8(%ebp)/*AsaresultofAconstraintmovl%edx,4(%ebp)%eaxand%edxserveasoutputs*/
Noteherethatthevaluesin%edx:%eaxserveas64bitoutput.

5of708/28/200601:23PM

Usingmatchingconstraints
Hereyoucanseethecodeforthesystemcall,withfourparameters:
#define_syscall4(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4)\typename(type1arg1,type2arg2,
type3arg3,type4arg4)\{\long__res\__asm__volatile("int$0x80"\
:"=a"(__res)\:"0"(__NR_##name),"b"((long)(arg1)),"c"((long)(arg2)),\
"d"((long)(arg3)),"S"((long)(arg4)))\__syscall_return(type,__res)\}

Intheaboveexample,fourargumentstothesystemcallareputinto%ebx,%ecx,%edx,and%esiby
usingtheconstraintsb,c,d,andS.Notethatthe"=a"constraintisusedintheoutputsothatthereturn
valueofthesystemcall,whichisin%eax,isputintothevariable__res.Byusingthematchingconstraint
"0"asthefirstoperandconstraintintheinputsection,thesyscallnumber__NR_##nameisputinto%eax
andservesastheinputtothesystemcall.Thus

Inlineassemblyforx86inLinuxhttp://www128.ibm.com/developerworks/linux/library/lia.html?dwzone...

%eaxserveshereasbothinputandoutputregister.Noseparateregistersareusedforthispurpose.Note
alsothattheinput(syscallnumber)isconsumed(used)beforetheoutput(thereturnvalueofsyscall)is
produced.
6of708/28/200601:23PM

Useofmemoryoperandconstraint
Considerthefollowingatomicdecrementoperation:
__asm____volatile__(
"lockdecl%0":"=m"(counter):"m"(counter))

Thegeneratedassemblyforthiswouldlooksomethinglikethis:
#APP
lockdecl24(%ebp)/*counterismodifiedonitsmemorylocation*/#NO_APP.

Youmightthinkofusingtheregisterconstrainthereforthecounter.Ifyoudo,thevalueofthecounter
mustfirstbecopiedontoaregister,decremented,andthenupdatedtoitsmemory.Butthenyoulosethe
wholepurposeoflockingandatomicity,whichclearlyshowsthenecessityofusingthememory
constraint.

Usingclobberedregisters
Consideranelementaryimplementationofmemorycopy.
asm("movl$count,%%ecx
up:lodslstoslloopup"
:/*nooutput*/:"S"(src),"D"(dst)/*input*/:"%ecx","%eax")/*clobberedlist*/

Whilelodslmodifies%eax,thelodslandstoslinstructionsuseitimplicitly.Andthe%ecxregister
explicitlyloadsthecount.ButGCCwontknowthisunlessweinformit,whichisexactlywhatwedoby
including%eaxand%ecxintheclobberedregisterset.Unlessthisisdone,GCCassumesthat%eaxand
%ecxarefree,anditmaydecidetousethemforstoringotherdata.Noteherethat%esiand%ediareused
by"asm",andarenotintheclobberedlist.Thisisbecauseithasbeendeclaredthat"asm"willusethem
intheinputoperandlist.Thebottomlinehereisthatifaregisterisusedinside"asm"(implicitlyor
explicitly),anditisnotpresentineithertheinputoroutputoperandlist,youmustlistitasaclobbered
register.

Conclusion
Onthewhole,inlineassemblyishugeandprovidesalotoffeaturesthatwedidnoteventouchonhere.
Butwithabasicgraspofthematerialinthisarticle,youshouldbeabletostartcodinginlineassemblyon
yourown.

Inlineassemblyforx86inLinuxhttp://www128.ibm.com/developerworks/linux/library/lia.html?dwzone...
7of708/28/200601:23PM

Resources
RefertotheUsingandPortingtheGNUCompilerCollection(GCC)manual.
RefertotheGNUAssembler(GAS)manual.
CheckoutBrennansGuidetoInlineAssembly.

Abouttheauthor
Bharata B. Rao has a bachelor of Engineering in Electronics and Communication from Mysore
University, India. Hehasbeenworkingfor IBMGlobalServices,India since1999.Heis amemberofthe
IBMLinuxTechnologyCenter, whereheconcentratesprimarilyon Linux RAS(Reliability,Availability,
and Serviceability).Other areasofinterestareoperating systeminternals and processorarchitecture. He
canbereachedatrbharata@in.ibm.com.

phiral.net
Home
|==[LinuxAssemblyandDisassemblytheBasics]==|
|==|
[Introductiontoas,ldandwritingyourownasmfromscratch.
Firstoffyouhavetoknowwhatasystemcallis.Asystemcall,orsoftwareinterruptisthemechanismusedbyan
applicationprogramtorequestaservicefromtheoperatingsystem.Systemcallsoftenuseaspecialmachinecode
instructionwhichcausetheprocessortochangemodeorcontext(e.g.from"usermore"to"supervisormode"or
"protectedmode").Thisswitchisknownasacontextswitch,forobviousreasons.Acontextistheprotectionand
accessmodethatapieceofcodeisexecutingin,itsdeterminedbyahardwaremediatedflag.Ifyouhaveeverheard
ofpeopletalkingaboutringzeroorcr0theyarereferringtocodethatexecutesatprotectedorsupervisormodesuch
asallkernelcode.AcontextswitchallowstheOStoperformrestrictedactionssuchasaccessinghardwaredevices
orthememorymanagementunit.Generally,operatingsystemsprovidealibrarythatsitsbetweennormalprograms
andtherestoftheoperatingsystem,usuallytheClibrary(libc),suchasGlibc,ortheWindowsAPI.Thislibrary
handlesthelowleveldetailsofpassinginformationtothekernelandswitchingtosupervisormode.TheseAPI'sgive
youaccessfunctionsthatmakeyourjobeasier,forinstanceprintftoprintaformattedstringorthe*allocfamilytoget
morememory.
Inlinuxthesystemcallsaredefinedinthefile/usr/include/asm/unistd.h.
entropy@phalarisentropy$cat/usr/include/asm/unistd.h#ifndef_ASM_I386_UNISTD_H_#define
_ASM_I386_UNISTD_H_
/*
*Thisfilecontainsthesystemcallnumbers.*/
#define __NR_restart_syscall 0 #define __NR_exit 1#define __NR_fork2 #define__NR_read3#define__NR_write
4#define__NR_open5#define__NR_close6
[...snip...]
Eachsystemcallisshownasthesystemcallnameprecededby__NR_andthenfollowedbythesystemcall
number.Thesystemcallnumberisveryimportantforwritingasmprogramsthatdon'tusegcc,acompilerorlibc.The
systemcallandfaultlowlevelhandlingroutinesarecontainedinthefile/usr/src/linux/arch/i386/kernel/entry.S
althoughthisisoverourheadfornow.
Thetextthatyoutypefortheinstructionsoftheprogramisknownasthesourcecode.Inordertotransformsource
codeintoaexecutableprogramyoumustassembleandlinkit.Thesestepsaredoneforyoubyacompiler,butwe
willdothemseperatly.Assemblingistheprocessthattransformsyoursourcecodeintoinstructionsforthemachine.
Themachineitselfonlyreadsnumbersbuthumansworkmuchbetterwithwords.Anassemblylanguageisahuman
readableformofmachinecode.Thelinuxassemblersnameis`as`,youcantype`ash`toseeitsarguments.`as`
generatesandobjectfileoutofasourcefile.Anobjectfileismachinecodethathasnotbeenfullyputtogetheryet.
Objectfilescontaincompact,preparsedcode,oftencalledbinaries,thatcanbelinkedwithotherobjectfilesto
generateafinalexecutableorcodelibrary.Anobjectfileismostlymachinecode.Thelinkeristheprogram
responsibleforputtingalltheobjectfilestogetherandaddinginformationsothekernelknowshowtoloadandrunit.
`ld`isthenameofthelinkeronlinux.
(1of7)[6/25/20102:03:56PM]http://www.phiral.net/linuxasmtwo.htm
LinuxAssemblyandDisassemblytheBasics

Sotosummarize,sourcecodemustbeassembledandlinkedinordertoproduceandexecutableprogram.Onlinux
x86thisisaccomplishedwith
assource.soobject.oldobject.ooexecutable
Where"source.s"isyourassemblycode,"object.o"istheobjectfileproducedfrom`as`andoutput(o),and
"executable"isthefinalexecutableproducedwhentheobjectfilehasbeenlinked.
Inthelasttutorialweusedgcctogeneratetheasmandthentocompiletheprogram.Whenwecalledwritewe
pushedthelengthofthestring,theaddressofthestring,andthefiledescriptorontothestackandthenissuedthe
instruction"callwrite".Thisneedssomeexplanationbecausehowwedoitnowistotallydifferent.Thatway,pushing
valuesontothestack,isbecausewewereusingC(hello.c)andhencegccgeneratesCcodewhichusestheC
callingconvention.Acallingconventionisthewaythatvariablesarestoredandtheparametersandreturnvaluesare
transfered,Ctakesitsparametersandpassesitsreturnvariablesinastackframe(egpushl$14,pushl$.LC0,pushl
$1,callwrite).Astackframeisapieceofthestackthatholdsalltheinfoneededtocallafunction.
Sowhenweissuedthe"callwrite"instructionwewereusingtheCLibrary(libc),andthewritetherewasreallythe
systemcallwrite,samename,butwrappedinlibc(eg.getpid()isawrapperforsyscall(SYS_get_pid)).Nowwhenwe
writeourownasmfornowwewillnotbeusinglibc,eventhoughthatwasiseasieritsnotalwayspossibletouseand
itsgoodtoknowwhatshappeningonalowerlevel.
Here'sourfirstprogram.
entropy@phalarisasm$cathello.s
.section.datahello:
.ascii"Hello,World!\n\0".section.text.globl_start_start:
movl$4,%eaxmovl$14,%edxmovl$hello,%ecxmovl$1,%ebxint$0x80movl$1,%eaxmovl$0,%ebxint$0x80
entropy@phalarisasm$ashello.sohello.oentropy@phalarisasm$ldhello.oohelloentropy@phalarisasm$
./helloHello,World!
Sameoutputasbeforeandaccomplishedthesamethingbutdoneverydifferently.
.section.data
Startsthesection.datawhereallourdatagoes.Wecouldjustaseasilyhavedone.section.rodatalikewhatgcc
generatedintheintroandthenthestringwouldhavebeenreadonlybutitsmuchmorecommontoputinitializeddata
intothe.datasection..rodatasectionismorelikewewantedtodoa#definehello"Hello,World\n"inC,inthe.data
sectionitsmoresimilartocharhello[]="Hello,World\n".
hello:
(2of7)[6/25/20102:03:56PM]http://www.phiral.net/linuxasmtwo.htm
LinuxAssemblyandDisassemblytheBasics

LinuxAssemblyandDisassemblytheBasics
Thelabelhello,whichrememberisasymbol(asymbolbeingastringrepresentationforanaddress)followedbya
colon.Alabelsays,whenyouassemble,takethenextinstructionordatafollowingthecolonandandmakethatthe
labelsvalue.
.ascii"Hello,World!\n\0"
Andhereiswhatthevalueofthelabelhello:isgoingtobe,thelabelhelloisgoingtopointtothefirstcharacterofthe
string(.asciidefinesastring)"Hello,World!\n\0".
.section.text
Herewestartourcodesection.
.globl_start
`as` expects _start while `gcc` expects main to be the starting function of an executable. Again .globl tells the
assemblerthatitshouldn'tgetridofthesymbolafterassemblybecausethelinkerneedsit.
_start:
_startisasymbolthatisgoingtobereplacedbyanaddressduringeitherassemblyorlinking._starthereiswhere
ourprogramwillstarttoexecutewhenloadedbythekernel.
movl$4,%eax
Whencallingasystemcallthesystemcallnumberyouwanttocallisputintotheregistereax.Aswesawabovein
thefile/usr/include/asm/unistd.h,thewritesystemcallwasdefinedas"#define__NR_write4".Sohereweare
movingtheimmediatevalue4intoeax,sowhenwecallthekerneltodoitsworkitwillknowwewantwrite.
movl$14,%edxmovl$hello,%ecxmovl$1,%ebx
Thewritesystemcallisexpectingthreeargumentsnamely,thefiledescriptortowriteto,theaddressofthestringto
write,andthelengthofthestringtowrite.Whencallingsystemcalls,functionargumentsarepassedinregisters,
whichdiffersfromtheCLibraryorlibcconventionwhichexpectsfunctionargumentstobepushedontothestack.So
wehavethesystemcallnumbergoesintoeax,thefirstargumentgoesintoebx,thethirdintoecx,thefourthintoedx.
Therecanbeuptosixargumentsinebx,ecx,edx,esi,edi,ebpconsequently.Iftherearemorearguments,theyare
simplypassedthoughthestructureasfirstargument.Sowefillintheregistersthatwriteneedstodoitsjob,wemove
1whichisSTDOUTintoebx,weputthelabelhello'svalue(whichistheaddressofthestring"Hello,World!\n\0")into
ecx,andweputthelengthofthestring14,intoedx.
int$0x80
Thisinstructionint(errupts)thekernel($0x80)andasksittodothesystemfunctionwhosindexisineax.Aninterrupt
interruptstheprogramsflowandasksthekerneltodosomethingforus.Thekernelwillthenpreformthesystem
functionandthenreturncontroltoourprogram.Beforetheinterruptwewereexecutinginausermodecontext,
duringthesystemcallwewereexecutinginaprotectedmodecontext,andwhenthekernelisdoneandreturns
controltoourprogramweareagainexecutinginausermodecontext.Sothekernelreadseaxdoesawriteofour
stringandreturns.
movl$1,%eax
Nowweredoneandweneedtoexit,sowhatnumberdoweusetoexecuteexit?Lookbackatunistd.handwesee
thatexitis"#define__NR_exit1".
(3of7)[6/25/20102:03:56PM]http://www.phiral.net/linuxasmtwo.htm

LinuxAssemblyandDisassemblytheBasics
movl$0,%ebx
exitexpectsoneargumentnamelythereturncode(0meansnoerrors),soweputthatintoebx.
int$0x80
Callthekerneltoexecuteexitwithreturncode0andweredone.
Ontothedisassembly.
Compilewithdebuggingsymbols,`as`usesthesamegorgstabsthat`gcc`does.
entropy@phalarisasm$asghello.sohello.o
Andlinkit.
entropy@phalarisasm$ldhello.oohello
Startgdb.
entropy@phalarisasm$gdbhelloGNUgdb6.3Copyright2004FreeSoftwareFoundation,Inc.GDBisfree
software,coveredbytheGNUGeneralPublicLicense,andyouarewelcometochangeitand/ordistributecopiesofit
undercertainconditions.Type"showcopying"toseetheconditions.ThereisabsolutelynowarrantyforGDB.Type
"showwarranty"fordetails.ThisGDBwasconfiguredas"i386pclinuxgnu"...Usinghostlibthread_dblibrary
"/lib/libthread_db.so.1".
Setabreakpointattheaddressof_startsowecanstepthroughit.
(gdb)break*_startBreakpoint1at0x8048094:filehello.s,line7.(gdb)runStartingprogram:
/home/entropy/asm/helloHello,World!
Programexitednormally.Currentlanguage:autocurrentlyasm(gdb)
Thebreakpointdidn'twork,Imnotsurewhythishappensbutwecandoaquickfix.Hereisthefixedasm.
entropy@phalarisasm$cathello.s
.section.datahello:
.ascii"Hello,World!\n\0".section.text.globl_start_start:
nopmovl$4,%eaxmovl$14,%edxmovl$hello,%ecxmovl$1,%ebxint$0x80movl$1,%eaxmovl$0,%ebxint
$0x80
(4of7)[6/25/20102:03:56PM]http://www.phiral.net/linuxasmtwo.htm

LinuxAssemblyandDisassemblytheBasics
Theonlydifferenceisthenopornooperationrightafter_start.Nowwecansetourbreakpointanditwillwork.
Reassembleandlink.
entropy@phalarisasm$asghello.sohello.oentropy@phalarisasm$ldhello.oohello
Startgdb.entropy@phalarisasm$gdbhelloGNUgdb6.3Copyright2004FreeSoftwareFoundation,Inc.GDBis
freesoftware,coveredbytheGNUGeneralPublicLicense,andyouarewelcometochangeitand/ordistributecopies
ofitundercertainconditions.Type"showcopying"toseetheconditions.
ThereisabsolutelynowarrantyforGDB.Type"showwarranty"fordetails.ThisGDBwasconfiguredas
"i386pclinuxgnu"...Usinghostlibthread_dblibrary"/lib/libthread_db.so.1".
Listourassembly.Iputincommentssoitwouldbeeasiertofollow.
Breakpoint1at0x8048095:filehello.s,line8.(gdb)list_start2hello:#labelhello,addressofthefirstchar3.ascii
"Hello,World!\n\0"#.asciidefinesastring4.section.text#ourcodestart5.globl_start#thestartsymboldefinedas
.globl6_start:#thestartlabel7nop#nooperationfordebuggingwithgdb8movl$4,%eax#mov4into%eax,4is
write(fd,buf,len)9movl$14,%edx#14isthelengthofourstring10movl$hello,%ecx#theaddressofourstring
11movl$1,%ebx#1isSTDOUT,tothescreen(gdb)<hitenter>12int$0x80#callthekernel13movl$1,%eax#
move1into%eax,1issyscallexit()14movl$0,%ebx#move0into%ebx,exit'sreturnvalue15int$0x80#call
kernel
Setabreakpointatournop.
(gdb)break*_start+1Breakpoint1at0x8048095:filehello.s,line8.
Andrunit.
(gdb)runStartingprogram:/home/entropy/asm/hello
Breakpoint1,_start()athello.s:88movl$4,%eax#mov4into%eax,4iswrite(fd,buf,len)Currentlanguage:auto
currentlyasm
Nowthebreakpointworks.
(gdb)step_start()athello.s:99movl$14,%edx#lengthforwrite14isthelengthofourstring(gdb)step_start()at
hello.s:1010movl$hello,%ecx#theaddressofourstring(gdb)step_start()athello.s:1111movl$1,%ebx#1is
STDOUT,tothescreen(gdb)step_start()athello.s:12
(5of7)[6/25/20102:03:56PM]http://www.phiral.net/linuxasmtwo.htm

LinuxAssemblyandDisassemblytheBasics
12int$0x80#callthekernel
Checktheregisterstoseeiftheyhavethecorrectinformationinthem.
(gdb)print$edx$1=14(gdb)x/s$ecx0x80490b8<hello>:"Hello,World!\n"(gdb)print$ebx$2=1(gdb)print$eax
$3=4(gdb)
Looksgoodsoletthekerneldoitswork.
(gdb)stepHello,World!_start()athello.s:1313movl$1,%eax#move1into%eax,1issyscallexit()
Ithasexecutedthewritesystemcall,youcanseetheprintedstringandreturnedtogdb.Nowwecallexit.
(gdb)step_start()athello.s:1414movl$0,%ebx#move0into%ebx,exit'sreturnvalue(gdb)step_start()at
hello.s:1515int$0x80#callkernel(gdb)step
Programexitednormally.
Anditsdone.
(gdb)qentropy@phalarisasm$
Checkouttheobjdumpoutput.
entropy@phalarisasm$objdumpdhello
hello:fileformatelf32i386
Disassemblyofsection.text:
08048094<_start>:
8048094:90nop8048095:b804000000mov$0x4,%eax804809a:ba0e000000mov$0xe,%edx804809f:b9
b8900408mov$0x80490b8,%ecx80480a4:bb01000000mov$0x1,%ebx80480a9:cd80int$0x8080480ab:
b801000000mov$0x1,%eax80480b0:bb00000000mov$0x0,%ebx80480b5:cd80int$0x80
entropy@phalarisasm$
Noticethedifferencefromthelasttutorialobjdumpoutput,nosnippingoftonsoflinesofextrasectionsandsuch,its
onlythecodewecodedintherewhichissomuchcleaner.Noticehoweasyitwouldbetotaketheopcodesand
makesometinyshellcodeoutof,againwithoutthenulls.What?Youwantsomeshellcodetoprint"Hello,World!\n"
foryournext0day?Nexttimemyfriend,nexttime.
(6of7)[6/25/20102:03:56PM]http://www.phiral.net/linuxasmtwo.htm

LinuxAssemblyandDisassemblytheBasics
(7of7)[6/25/20102:03:56PM]http://www.phiral.net/linuxasmtwo.htm

'3103"!$772*1<:08/&%%
AssemblerInstructionswithCExpression
Operands
asm(CODE:OUTPUTS:INPUTS:
CLOBBERED)
Example:

asm("add%1,%0":"=r"(sum):"r"(x),
"0"(y))
Code:add%1to%0andstoretheresult
in%0
Outputs:genericregister,storedintolocal
variablesumrightaftertheexecutionof
assemblycode.
Inputs:genericregisters,initializedfrom
localvariablesxandyrightbeforethe
executionofassemblycode.
Clobbers:nothingexceptinput/output

registers
1

'3598)3,49859879*7808980437"!
mmemoryoperand
rgenericregisteroperand
iimmediateoperand
ffloatingpointregister
ttopofFPstack
unexttotopofFPstack
a,b,c,dA,B,C,Dregisters(EAX/AX/
AL)
DDIregister
SSIregister
Atwo32bitregisterscombinedtoforma
64bitregister
2

16bytes

XMM1

x2
XMM0

x1
MOVAPS
MULPS
ADDPS

+=
MOVUPS
3

(.63+7#
GCCtexinfodocumentation
GCCOnlineDocumentationhttp://gcc.gnu.org/onlinedocs/
LinuxAssemblyHOWTO
http://www.linuxdoc.org/HOWTO/AssemblyHOWTO/
IntelSIMDResources
http://developer.intel.com/vtune/cbts/simd.htm
LinuxAssemblyResources
http://linuxassembly.org/resources.html
LinuxParallelProcessingHOWTO
http://www.linuxdoc.org/HOWTO/ParallelProcessingHOWTO
.html
OptimizingMILCMathRoutineswithSSE
http://qcdhome.fnal.gov/sse/
4

You might also like