ARM Microcontroller Code Size (Full)

ARMMicrocontrollerCodeSizeAnalysis|Overview 1
32BitMicrocontrollerCodeSize
Analysis
Draft1.2.4.JosephYiu,AndrewFrame
Overview
Microcontrollerapplicationprogramcodesizecandirectlyaffectthecostandpowerconsumptionof
productsthereforeitisalmostalwaysviewedasanimportantfactorintheselectionofa
microcontrollerforembeddedprojects.Sincethereleaseandavailabilityof32bitprocessorssuch
astheARMCortexM3,moreandmoremicrocontrollerusershavediscoveredthebenefitsof
switchingto32bitproductslowerpower,greaterenergyefficiency,smallercodesizeandmuch
betterperformance.Whilstmostofthebenefitsofusing32bitmicrocontrollersarewidelyknown,
thecodesizeadvantageof32bitmicrocontrollersislessobvious.
Inthisarticlewewillexplainwhy32bitmicrocontrollerscanreduceapplicationcodesizewhilststill
achievinghighsystemperformanceandeaseofuse.
Typicalmythsofprogramsize
Myth#1:8bitand16bitmicrocontrollershavesmallercodesize
Thereisacommonmisconceptionthatswitchingfroman8bitmicrocontrollertoa32bit
microcontrollerwillresultinmuchbiggercodesizewhy?Manypeoplehavetheimpressionthat8
bitmicrocontrollersuse8bitinstructionsand32bitmicrocontrollersuse32bitinstructions.This
impressionisoftenreinforcedbyslightlymisleadingmarketingfromthe8bitand16bit
microcontrollervendors.
Inreality,manyinstructionsin8bitmicrocontrollersare16bit,24bitsorothersizeslargerthan8
bit,forexample,thePIC18instructionsizesare16bitand,withthe8051architecture,although
someinstructionsare1bytelong,manyothersare2or3byteslong.
Sowouldcodesizebebettermovingtoa16bitmicrocontroller?Notnecessarily.Takingthe
MSP430asanexample,asingleoperandinstructioncantake4bytes(32bits)andadoubleoperand
instructioncantake6bytes(48bits).Intheworstcase,anextendedimmediate/indexinstructionin
MSP430Xcantake8bytes(64bits).
SohowaboutthecodesizeforARMCortexmicrocontrollers?TheARMCortexM3andCortexM0
processorsarebasedonThumb2technology,whichprovidesexcellentcodedensity.Thumb2
microcontrollershave16bitinstructionsaswellas32bitinstructions,withthe32bitinstruction
functionalityasupersetofthe16bitversion.InmostcasesaCcompilerwillusethe16bitversion
oftheinstruction.The32bitversionwouldonlybeusedwhentheoperationcannotbeperformed
ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 2
witha16bitinstruction.Asaresult,mostoftheinstructionsinanARMCortexmicrocontroller
programare16bits.Thatsevensmallerthansomeoftheinstructionsin8bitmicrocontrollers.
Instruction size
8051
Min
Max
Number of
bits
PIC18
MSP430 /
MSP430X
Min
Max
ARM
Min
Max
PIC24
16
32
48
64
Figure1:Sizeofasingleinstructioninvariousprocessors
WithinacompiledprogramforCortexMprocessors,thenumberof32bitinstructionscanbeonlya
smallportionofthetotalinstructioncount.Forexample,theamountof32bitinstructionsinthe
Dhrystoneprogramimageisonly15.8%ofthetotalinstructioncount(averageinstructionsizeis
18.53bits)whencompiledfortheCortexM3.FortheCortexM0theratioof32bitinstructionsis
evenlowerat5.4%(averageinstructionsize16.9bits).
Myth#2:Myapplicationonlyprocesses8bitdataand16bitdata
Manyembeddeddevelopersthinkthatiftheirapplicationonlyprocesses8bitdatathenthereisno
benefitinswitchingtoa32bitmicrocontroller.However,lookingintotheoutputfromtheC
compilercarefully,inmostcasesthehumbleintegerdatatypeisactually16bits.Sowhenyou
haveaforloopwithanintegerasloopindex,comparingavaluetoanintegervalue,orusingaC
libraryfunctionthatusesaninteger(e.g.memcpy()),youareactuallyusing16bitorlargerdata.
Thiscanaffectcodesizeandperformanceinvariousways:
Foreachintegercomputation,an8bitprocessorwillneedmultipleinstructionstocarryout
theoperations.Thisdirectlyincreasesthecodesizeandtheclockcyclecount.
Iftheintegervaluehastobesavedintomemory,orifyouneedtoloadanimmediatevalue
fromprogramROMtothisinteger,itwilltakemultipleinstructionsandmultipleclockcycles.
Sinceanintegercantakeuptwo8bitregisters,moreregistersarerequiredtoholdthe
samenumberofintegervariables.Whenthereareaninsufficientnumberofregistersinthe
registerbanktoholdlocalvariables,somehavetobestoredinmemory.Thusan8bit
microcontrollermightresultinmorememoryaccesseswhichincreasescodesizeand
reducesperformanceandpowerefficiency.Thesameissueappliestotheprocessingof32
bitdataon16bitmicrocontrollers.
Sincemoreregistersarerequiredtoholdanintegerinan8bitmicrocontrollerwhenpassing
variablestoafunctionviathestack,orsavingregistercontentsduringcontextswitchingor
interruptservicing,thenumberofstackoperationsrequiredismorethanthatof32bit
microcontrollers.Thisincreasestheprogramsize,andcanalsoaffectinterruptlatency
becauseanInterruptServiceRoutine(ISR)mustmakesurethatallregistersusedaresaved
atISRentryandrestoredatISRexit.Thesameissueappliestotheprocessingof32bitdata
on16bitmicrocontrollers.
Thereisevenmorebadnewsfor8bitmicrocontrollerusers:memoryaddresspointerstakemultiple
bytessodataprocessinginvolvingtheuseofpointerscanthereforebeextremelyinefficient.
Myth#3:A32bitprocessorisnotefficientathandling8bitand16bitdata
Most32bitprocessorsareactuallyveryefficientathandling8bitand16bitdata.Compact
memoryaccessinstructionsforsignedandunsigned8bit,16bitand32bitdataareallavailable.
Therearealsoanumberofinstructionsspeciallyincludedfordatatypeconversions.Overallthe
handlingof8bitand16bitdatain32bitprocessorssuchastheARMCortexmicrocontrollersisjust
aseasyandefficientashandling32bitdata.
Myth#4:ClibrariesforARMprocessorsaretoobig
TherearevariousClibraryoptionsforARMprocessors.Formicrocontrollerapplications,anumber
ofcompilervendorshavedevelopedClibrarieswithamuchsmallerfootprint.Forexample,the
ARMdevelopmenttoolshaveasmallerversionoftheClibrarycalledMicroLib.TheseClibrariesare
especiallydesignedformicrocontrollersandallowapplicationcodesizetobesmallandefficient.
Myth#5:InterrupthandlingonARMmicrocontrollersismorecomplex
OntheARMCortexmicrocontrollerstheinterruptserviceroutinesarejustnormalCsubroutines.
VectoredornestedinterruptsaresupportedbytheNestedVectoredInterruptController(NVIC)
withnoneedforsoftwareintervention.Infactthesetupprocessandprocessingofaninterrupt
requestismuchsimplerthan8bitand16bitmicrocontrollers,asgenerallyyouonlyneedto
programtheprioritylevelofaninterruptandthenenableit.
Theinterruptvectorsarestoredinavectortableinthebeginningofthememory,normallywithin
theflash,withouttheneedforanysoftwareprogrammingsteps.Whenaninterruptrequesttakes
placetheprocessorautomaticallyfetchesthecorrespondinginterruptvectorandstartstoexecute
theISR.Someoftheregistersarepushedtothestackbyahardwaresequenceandrestored
automaticallywhentheinterrupthandlerexits.Theotherregistersthatarenotcoveredbythe
hardwarestackingsequencearepushedontothestackbyCcompilergeneratedcodeonlyifthe
registerisusedandmodifiedwithintheISR.
Whataboutmovingto16bitmicrocontrollers?
16bitmicrocontrollerscanbeefficientinhandling16bitintegersand8bitdata(e.g.strings)
howeverthecodesizeisstillnotasoptimalasusing32bitprocessors:
Handlingof32bitdata:iftheapplicationrequireshandlingofanylonginteger(32bit)or
floatingpointtypesthentheefficiencyof16bitprocessorsisgreatlyreducedbecause
multipleinstructionsarerequiredforeachprocessingoperation,aswellasdatatransfers
betweentheprocessorandthememory.
Registerusage:Whenprocessing32bitdata,16bitprocessorsrequirestworegistersto
holdeach32bitvariable.Thisreducesthenumberofvariablesthatcanbeheldinthe
registerbank,hencereducingprocessingspeedaswellasincreasingstackoperationsand
memoryaccesses.
Memoryaddressingmode:Many16bitarchitecturesprovideonlybasicaddressingmodes
similarto8bitarchitectures.Asaresult,thecodedensityispoorwhentheyareusedin
applicationsthatrequireprocessingofcomplexdatasets.
64Kbyteslimitation:Many16bitprocessorsarelimitedto64Kbytesofaddressable
memoryreducingthefunctionalityoftheapplication.Some16bitarchitectureshave
extensionstoallowmorethan64Kbytesofmemorytobeaccessed,however,these
extensionshaveaninstructioncodeandclockcycleoverhead,forexample,amemory
pointerwouldbelargerthan16bitsandmightrequiremultipleinstructionsandmultiple
registerstoprocessit.
ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 5
InstructionSetefficiency
Whencustomersporttheirapplicationsfrom8bitarchitecturetoARMCortexmicrocontrollers,
theyveryoftenfindthatthetotalcodehasdramaticallydecreased.Forexample,whenMelfas(a
leadingcompanyincapacitivesensingtouchscreencontrollers)evaluatedtheCortexM0processor,
theyfoundthattheCortexM0programsizewaslessthanhalfofthatofthe8051and,atthesame
time,deliveredfivetimesmoreperformanceatthesameclockfrequency.This,forexample,could
enablethemtoruntheapplicationat1/5clockspeedoftheequivalent8051product,reducingthe
powerconsumption,andloweringproductcostatthesametimeduetoasmallerprogramflashsize
requirements.
SohowdoesARMarchitectureprovidesuchbigadvantages?ThekeyfactorisThumb2technology
whichprovidesahighlyefficientunifiedinstructionset.
PowerfulAddressingmode
TheARMCortexmicrocontrollerssupportanumberofaddressingmodesformemorytransfer
instructions.Forexample:
Immediateoffset(Address=Registervalue+offset)
Registeroffset((Address=Registervalue1+shifted(Registervalue2))
PCrelated(Address=CurrentPCvalue+offset)
Stackpointerrelated(Address=SP+offset)
Multipleregisterloadandstore,withoptionalautomaticbaseaddressupdate
PUSH/POPinstructionswithmultipleregisters
Asaresultofthesevariousaddressingmodes,datatransferbetweenregistersandmemorycanbe
handledwithfewerinstructions.SincethePUSHandPOPinstructionssupportmultipleregisters,in
mostcases,savingandrestoringofregistersinafunctioncallwillonlyneedonePUSHinthe
beginningoffunctionandonePOPattheendofthefunction.ThePOPcanevenbecombinedwith
thereturninstructionattheendoffunctiontofurtherreducetheinstructioncount.
Conditionalbranches
AlmostallprocessorsprovideconditionalbranchinstructionshoweverARMprocessorsprovide
improvedconditionalbranchingbyhavingseparatedbranchconditionsforsignedandunsigneddata
operationresults,andprovidingagoodbranchrange.
Forexample,whencomparingtheconditionalbranchesoftheCortexM0andMSP430,theCortex
M0hasmorebranchconditionsavailable,makingitpossibletogeneratemorecompactcodeno
matterwhetherthedatabeingprocessissignedorunsigned.TheMSP430conditionalbranches
mightrequiremultipleinstructionstogetthesameoperations.
Generallythesamesituationappliestomany8bitor16bitmicrocontrollerswhendealingwith
signeddata,additionalstepsmightalsoberequiredintheconditionalbranch.
InadditiontothebranchinstructionsavailableintheCortexM0,theCortexM3processoralso
supportscompareandbranchinstructions(CBZandCBNZ).Thisfurthersimplifiessomeofthe
conditionalbranchinstructionsequence.
ConditionalExecution
AnotherareathatallowstheARMCortexM3microcontrollerstohavemorecompactcodeisthe
conditionalexecutionfeature.TheCortexM3supportsaninstructioncalledIT(IFTHEN).This
instructionallowsupto4subsequentinstructionstobeconditionallyexecutedreducingtheneed
foradditionalbranches.Forexample,
if(xpos1<xpos2){x1=xpos1;
x2=xpos2;
}else{
x1=xpos2;
x2=xpos1;
Thiscanbeconvertedtothefollowingassemblycode(needs12bytesintheCortexM3):
CMP R0, R1
I TTEE CC ; i f unsi gned <
MOVCC R2, R0
MOVCC R3, R1
MOVCS R3, R0
MOVCS R2, R1
Otherarchitecturesmightneedanadditionalbranch(e.g.needs14bytesinMSP430):
CMP. W R14, R13
J GE Label 1 ; i f unsi gned <
MOV. W R11, R14

MOV. W R12, R13
J MP Label 2
Label 1
MOV. W R11, R13
MOV. W R12, R14
Label 2
ThisresultsinanextratwobytesfortheMSP430whencomparedtoCortexM3.
MultiplyandDivide
BoththeCortexM0andCortexM3processorssupportsinglecyclemultiplyoperations.TheCortex
M3alsohasmultiplyandmultiplyaccumulateinstructionsfor32bitor64bitresults.These
instructionsgreatlyreducethecodesizerequiredwhenhandlingmultiplicationoflargevariables.
Mostother8bitand16bitmicrocontrollersalsohavemultiplyinstructionshoweverthelimitation
oftheregistersizeoftenmeansthatthemultiplicationrequiresmultiplesteps,iftheresultneedsto
bemorethan8or16bits.
TheMSP430doesnothavemultiplyinstruction(MSP430documentslaa329,reference1).Tocarry
outmultiplicationeitheramemorymappedhardwaremultiplierisused,orthemultiplyoperation
hastobehandledbysoftwareusingaddandshift.Evenifahardwaremultiplierispresentthe
memorymappednatureofthemultiplierresultsintheadditionaloverheadoftransferringdatato
andfromtheexternalhardware.Inaddition,usingthemultiplierwithinaninterrupthandlercould
causeexistingdatainthemultipliertobelost.Asaresult,interruptsareusuallydisabledbeforea
multiplyoperationandtheinterruptisreenabledaftermultiplicationiscompleted.Thisadds
additionalsoftwareoverheadandaffectsinterruptlatencyanddeterminism.
TheCortexM3processoralsohasunsignedandsignedintegerdivideinstructions.Thisreducesthe
codesizerequiredinapplicationsthatneedtoperformintegerdivisionbecausethereisnoneedfor
theClibrarytoincludeafunctionforhandlingdivideoperations.
Powerfulinstructionset
Inadditionaltothestandarddataprocessing,memoryaccessandprogramcontrolinstructions,the
Cortexmicrocontrollersalsosupportanumberofotherinstructionstohelpdatatypeconversion.
TheCortexM3processoralsosupportsanumberofbitfieldoperationsreducingthesoftware
overheadin,forexample,peripheralcontrolandcommunicationdataprocessing.
ARMMicrocontrollerCodeSizeAnalysis|Breakingthe64Kbytememorybarrier 8
Breakingthe64Kbytememorybarrier
Asalreadymentioned,many8bitand16bitmicrocontrollersarelimitedto64kbytesaddressable
memory.Duetothenatureof8bitand16bitmicrocontrollerarchitecture,thecodingefficiencyof
thesemicrocontrollersoftendecreasesdramaticallywhentheapplicationexceedsthe64kbyte
memorybarrier.In8bitand16bitmicrocontrollers(e.g.8051,PIC24,C166)thisisoftenhandledby
memorybankswitchingormemorysegmentationwiththeswitchingcodegeneratedautomatically
bytheCcompilers.Everytimeafunctionordatainadifferentmemorypageisrequiredbank
switchingcodewouldbeneededandhencefurtherincreasestheprogramsize.

Figure2:Increasecodesizeoverheadofmemorybankswitchingorsegmentationin8bitand16bit
systems
Thememorybankswitchingnotonlycreateslargercodebutitalsogreatlyreducestheperformance
ofasystem.Thisisespeciallythecaseifthedatabeingprocessedisondifferentmemorybank(e.g.
copyingablockofdatafromonepagetoanotherpagecanbeverycostlyintermsofperformance.)
Thisisparticularlyinefficientfor8bitmicrocontrollerslikethe8051becausetheMCS51
architecturedoesnothavepropersupportforsuchamemorybankswitchingfeature.Therefore
memoryswitchinghastobecarriedoutbysavingandupdatingmemorybankcontrollike
I/Oportregisters.Inaddition,thememorypageswitchingcodeusuallyhastobecarriedout
inacongestedsharedmemoryspacewithlimitedsize.Atthesametimesomeofthe
memorypagesmightnotbefullyutilizedandmemoryspaceiswasted.
Forthe8bitand16bitmicrocontrollersthatsupportmemoryofover64kthisoftencomesata
price.TheMSP430Xdesignovercomesthe64KbytesmemorybarrierbyincreasingtheProgram
Counter(PC)andregisterwidthto20bits.Despitenomemorypagingbeinginvolved,thesizesof
someMSP430XinstructionsareconsiderablylargerthantheoriginalMSP430.Forexample,when
thelargememorymodelisused,adoubleoperandformattedinstructioncantake8bytesrather
than6(a33%increases):
ARMMicrocontrollerCodeSizeAnalysis|Examples 9
Op-code
15 12 11 8
Rsrc Ad
7
B/W
6
As
5
Rdst
4 3 0
Source or destination 15:0
Destination 15:0
MSP430 Double
Operand
intruction
Op-code
15 12 11 8
Rsrc Ad
7
B/W
6
As
5
Rdst
4 3 0
Source or destination 15:0
Destination 15:0
MSP430X
Double Operand
intruction
00011 Source 19:16
Destination
19:16
A/L Rsrv
Figure3:SupportoflargermemorysystemincreasesthesizeofsomeinstructionsinMSP430X
Apartfromthesizeoftheinstructionitself,theuseofthe20bitaddressingalsoincreasesthe
numberofstackoperationsrequired.Sincethememoryisonly16bit,thesavingofa20bitaddress
pointerwillneedtwostackpushoperations,resultinginextrainstructionsandpoorutilizationofthe
stackmemory.
Figure4:UseoflargememorydatamodelinMSP430Xincreasescodesize
Asaresult,anMSP430Xapplicationhasalowercodedensitywhenthelargememorymodelisused,
whichisrequiredwhentheaddressrangeexceedsthe64krange.
InARMCortexmicrocontrollers,32bitlinearaddressingisusedtoprovide4GBofmemoryspacefor
embeddedapplications.Thereforethereisnopagingoverheadandtheprogrammingmodeliseasy
touse.
Examples
Todemonstratethecodesizecomparedto8bitand16bitprocessors,anumberoftestcasesare
compiledandillustratedhere.ThetestsarebasedonMSP430CompetitiveBenchmarkdocument
fromTexasinstruments(SLAA205C,reference2).Theresultslistedhereshowtotalprogram
memorysizeinbytes.
MSP430results:
ThetestslistedarecompiledusingIAREmbeddedWorkbench4.20.1withhardware
multiplerenabled,optimizationlevelsettoHighwithSizeoptimization.Unlessspecified,
theSmalldatamodelisusedandtypedoubleis32bit.Theresultsareobtainedatlinker
outputreport(CODE+CONST).
ARMCortexprocessorresults:
ThetestslistedarecompiledusingRealViewDevelopmentSuite4.0SP2.Optimizationlevel
is3forsize,minimalvectortable,andMicroLIBisused.Theresultsareobtainedatlinker
outputreport(VECTORS+CODE).
Test Generic
MSP430
MSP430F5438 MSP430F5438
largedata
model
CortexM3
Math8bit 198 198 202 144
Math16bit 144 144 144 144
Math32bit 256 244 256 120
MathFloat 1122 1122 1162 600
Matrix2dim8bit 180 178 196 184
Matrix2dim16bit 268 246 290 256
Matrixmult 276 228 (linkererror) 228
Switch8bit 200 218 218 160
Switch16bit 198 218 218 160
Firfilter(Note1) 1202 1170 1222 716(820
without
modification)
Dhry 923 893 1079 900
Whet(Note2) 6434 6308 6614 4384(8496
without
modification)
Note1:TheconstantdataarrayintheFirfiltertestismodifiedtouse16bitdatatypeontheCortex
Mprocessor(constunsignedshortintINPUT[]).
Note2:Whencertainmathfunctionsareused(sin,cos,atan,sqrt,exp,log)intheARMCstandard
thedoubleprecisionlibrariesareusedbydefault.Thiscanresultinsignificantlylargerprogramsize
unlessadjustmentsaremade.Inordertoachieveanequivalentcomparison,theprogramcodeis
editedsothatsingleprecisionversionsareused(sinf,cosf,atanf,sqrtd,expf,logf).Also,someof
theconstantdefinitionshavebeenadjustedtosingleprecision(e.g.1.0becomes1.0F).
Figure5:Codesizecomparisonforbasicoperations
Thetotalsizeforsimpletests(integermath,matrixandswitchtests)are:
Summaryforsimple
tests
GenericMSP430 MSP430F5438 CortexM3
Totalsize(bytes) 1720 1674 1396
Advantage(%smaller) 2.6% 18.8%
Forapplicationsusingfloatingpoint,thereusasignicantadvantageforCortexmicrocontrollers.,
whereasDhrystoneprogramsizeiscloser.
Figure6:Codesizecomparisonforfloatingpointoperationsandbenchmarksuites
Thetotalsizeforbenchmarkandfloatingpointtests(Dhrystone,Whetstone,FirfilterandMathFloat)
are:
Summaryforsimple
tests
GenericMSP430 MSP430F5438 CortexM3
Totalsize(bytes) 9681 9493 6600
Advantage(%smaller) 1.9% 31.8%
Observations:
1. Fromtheresults,wecanseethattheCortexmicrocontrollershavebettercodedensity
comparedtoMSP430inmostcases.Theremainingtestsshowsimilarcodedensitywhen
comparedtoMSP430.
2. Oneofthetests(firfilter)usesanintegerdatatypeforaconstantarray.Sinceanintegeris
32bitintheARMprocessorandis16bitonMSP430,theprogramhasbeenmodifiedto
allowadirectcomparison.
3. WhenthelargedatamemorymodelisusedwithMSP430,thecodesizeincreasesbyupto
20%(dhrystone).
4. WeareunabletoreproducealloftheclaimedresultsintheTexasInstrumentsdocument.
ThismaybebecausethestorageofconstantdatainROMmighthavebeenomittedfrom
theircodesizecalculations.
ARMMicrocontrollerCodeSizeAnalysis|Additionalinvestigationonfloatingpoint 13
Additionalinvestigationonfloatingpoint
WhenanalysingtheresultsofthewhetstonebenchmarkitbecameapparentthattheMSP430C
compileronlygeneratedsingleprecisionfloatingoperations,whiletheARMCcompilergenerated
doubleprecisionoperationsforsomeofthemathfunctionsused.
Afterchangingthecodetouseonlysingleprecisionfloatingpointsthecodesizereduced
dramaticallyandresultedinmuchsmallercodesizethantheMSP430codesize.
TheIARMSP430compilerhasanoptiontodefinefloatingpoint:Sizeoftypedoublewhichisby
defaultsetto32bit(singleprecision).Ifitissetto64bit(asinARMCcompiler),thecodesize
increasedsignificantly.
Programsize GenericMSP430 MSP430430F5438
TypeDoubleis32bit 6434 6308
TypeDoubleis64bit 11510 11798
TheseresultsmatchthoseseenfortheARMCortexM3processor.
Programsize CortexM3
Whetstonemodifiedtousesingleprecisiononly 4384
Outofboxcompileforwhetstone(usedouble
precisionformathfunctions)
8496
Theoptionofsettingtypedoubleto32bitisquitesensibleforsmallmicrocontrollerapplications
wheretheCcodemightonlyneedtoprocesssourcedatageneratedfrom12bit/14bitADC.
Benchmarkingusingdifferentdefaulttypescanmakeaverybigdifferenceandnotshowaccurate
comparativeresults.
ARMMicrocontrollerCodeSizeAnalysis|Recommendationsonhowtogetthesmallest
codesizewithCortexMmicrocontrollers
14
RecommendationsonhowtogetthesmallestcodesizewithCortexM
microcontrollers
UseMicroLib
IntheARMdevelopmenttoolsthereisanoptiontousetheareaoptimizedMicroLIBratherthanthe
standardClibraries.TheMicroLIBissuitableformostembeddedapplicationsandhasamuch
smallercodesizewhencomparedtothestandardClibrary.
Ensuretheuseofareaoptimizations
TheperformanceofCortexMmicrocontrollersismuchhigherthanthatof16bitand8bit
microcontrollerssowhenportingapplicationsfromthesemicrocontrollersyoucangenerallyselect
thehighestareaoptimizationratherthanselectingoptimizationsforspeed.Theresulting
performancewillstillbemuchhigherthanthatofa16bitor8bitsystemrunningatthesameclock
frequency.
Usetherightdatatype
Whenportingapplicationsfrom8bitor16bitmicrocontrollers,youmightneedtomodifythedata
typeforconstantarraystoachievethemostoptimalprogramsize.Forexample,anintegeris
normally16bitin8bitand16bitmicrocontrollers,whileinARMmicrocontrollersintegersare32
bit.
Type Numberofbitsin
8051
Numberofbitsin
MSP430
NumberofbitsinARM
char,unsignedchar 8 8 8
enum 8/16 16 8/16/32(smallestis
chosen)
short,unsignedshort 16 16 16
int,unsignedint 16 16 32
long,unsignedlong 32 32 32
float 32 32 32
double 32 32 64
Whenportingaconstantarrayofintegersfroman8bitor16bitarchitecture,youshouldmodify
thedatatypefrominttoshortinttomakesuretheconstantarrayremainsthesamesize.For
example,
constintmydata={1234,5678,};
Thisshouldbechangedto:
constshortintmydata={1234,5678,};
ARMMicrocontrollerCodeSizeAnalysis|Recommendationsonhowtogetthesmallest
codesizewithCortexMmicrocontrollers
15
Foranarrayofintegervariables(nonconstantdata),changingfromanintegertoashortinteger
mightalsopreventanincreaseinmemoryusageduringsoftwareporting.Mostotherdata(e.g.
variables)doesnotrequiremodification.
Floatingpointfunctions
Somefloatingpointfunctionsaredefinedassingleprecisionin8bitor16bitmicrocontrollersand
arebydefaultdefinedasdoubleprecisioninARMmicrocontrollers,aswehavefoundoutwiththe
whetstonetestanalysis.Whenportingapplicationcodefrom8bitor16bitmicrocontrollerstoan
ARMmicrocontroller,youmighthavetoadjustmathfunctionstosingleprecisionversionsand
modifyconstantdefinitionstoensurethattheprogrambehavesinthesameway.Forexample,in
thewhetstoneprogramcode,asectionofcodeusessomemathfunctionsthataredoubleprecision
inARMcompilers:
X=T*atan(T2*sin(X)*cos(X)/(cos(X+Y)+cos(XY)1.0));
Y=T*atan(T2*sin(Y)*cos(Y)/(cos(X+Y)+cos(XY)1.0));
Ifwewanttousesingleprecisiononly,theprogramcodehastobechangedto
X=T*atanf(T2*sinf(X)*cosf(X)/(cosf(X+Y)+cosf(XY)1.0F));
Y=T*atanf(T2*sinf(Y)*cosf(Y)/(cosf(X+Y)+cosf(XY)1.0F));
Otherconstantdefinitionssuchas:
/*Module7:Procedurecalls*/
X=1.0;
Y=1.0;
Z=1.0;
shouldtobechangedtothefollowingforsingleprecisionrepresentation:
/*Module7:Procedurecalls*/
X=1.0F;
Y=1.0F;
Z=1.0F;
Defineperipheralsasdatastructure
Youcanalsoreduceprogramsizebydefiningregistersinperipheralsasadatastructure.For
example,insteadofrepresentingtheSysTicktimerregistersas
#def i ne SYSTI CK_CTRL ( *( ( vol at i l e unsi gned l ong *) ( 0xE000E010) ) )
#def i ne SYSTI CK_LOAD ( *( ( vol at i l e unsi gned l ong *) ( 0xE000E014) ) )
#def i ne SYSTI CK_VAL ( *( ( vol at i l e unsi gned l ong *) ( 0xE000E018) ) )
#def i ne SYSTI CK_CALI B ( *( ( vol at i l e unsi gned l ong *) ( 0xE000E01C) ) )
ARMMicrocontrollerCodeSizeAnalysis|Conclusions 16
youcandefinetheSysTickregistersas:
t ypedef st r uct
{
vol at i l e unsi gned i nt CTRL;
vol at i l e unsi gned i nt LOAD;
vol at i l e unsi gned i nt VAL;
unsi gned i nt CALI B;
} SysTi ck_Type;

#def i ne SysTi ck ( ( SysTi ck_Type *) 0xE000E010)
Bydoingthis,youonlyneedoneaddressconstanttobestoredintheprogramROM.Theregister
accesseswillbeusingthisaddressconstantwithdifferentaddressoffsetsfordifferentregisters.Ifa
sequenceofhardwareregisteraccessesisrequiredforaperipheral,usingadatastructurecan
reducecodesizeaswellasimproveperformance.Most8bitmicrocontrollersdonothavethesame
addressingmodefeaturewhichcanresultinamuchlargercodesizeforthesametask.
Conclusions
32bitprocessorsprovideequalormoreoftenbettercodesizethan8bitand16bitarchitectures
whilstatthesametimedeliveringmuchbetterperformance.
Forusersof8bitmicrocontrollers,movingtoa16bitarchitecturecansolvesomeoftheinherent
problemswith8bitarchitectures,however,theoverallbenefitsofmigratingfrom8bitto16bitis
muchlessthanthatachievedbymigratingtothe32bitCortexprocessors.
Asthepowerconsumptionandcostof32bitmicrocontrollershasreduceddramaticallyoverlast
fewyears,32bitprocessorshavebecomethebestchoiceformanyembeddedprojects.
Reference
ThefollowingarticlesonMSP430arereferenced:
Reference
1 MSP430CompetitiveBenchmarking
http://focus.ti.com/lit/an/slaa205c/slaa205c.pdf
2 EfficientMultiplicationandDivisionUsingMSP430
http://focus.ti.com/lit/an/slaa329/slaa329.pdf

ARM Microcontroller Code Size (Full)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ARM Microcontroller Code Size (Full)

Uploaded by

Copyright:

Available Formats

ARMMicrocontrollerCodeSizeAnalysis|Overview 1

MOV. W R11, R14

You might also like