Professional Documents
Culture Documents
ASM Tutorial
ASM Tutorial
By Sukasa
02-20-2007
(Credit goes to Glyph Pheonix for the original idea)
Well, you're here to learn ASM, right? You can't find a good tutorial on google
right? Or, you just needed some clarification, right? Good. Time to learn about
ASM, how it works, and all that.
Now, you've heard about ASM and HDMA and DMA and all that stuff that hackers like
BMF toss around, right? Well, Now you want to start to learn ASM, don'tcha?
First off
ASM stands for ASseMbly. Assembly is essentially a series of text commands, that
are compiled into machine code to be inserted into a ROM.
So!
First off, Opcodes in 65816 ASM are ALWAYS one byte long. After that you have
Anywhere from zero to four bytes that are part of the 1-byte opcode.
For example (you'll learn this opcode, and what it does, first)
LDA #$02 is two byte: 1 byte for the opcode, and one byte for #$02.
STA $7E0019 is 4 bytes: 1 byte for the opcode, and three bytes for $7E0019.
RTS is only one byte, becuase there is no data to go with the opcode, just the
opcode itself.
II (PP PP PP PP)
"II" is the opcode byte, and the "PP's" are the parameter byte, which vary from
opode to opcode, and aren't always there, depending on the opcode being used.
So, LDA #$02 would be two bytes: II PP, or $A9 $02. ($A9 is the opcode {LDA
Immediate}, $02 is the parameter).
LDA loads the accumulator with either an immediate vale (something like LDA #$40,
where the accumulator is then set to $40), or set to a
direct/indirect/long/whatever value, such as LDA $7E0019 (in which case the
accumulator is loaded with the value stored at RAM address $7E0019).
STA stores the contents of the accumulator to a RAM address (or a hardware
register, but we'll get to those MUCH later on.).
STA is useful for storing the results of a math operation or copying small amounts
of data.
It is very useful in returning from blocktool blocks when you are done (well,
actually, it's the ONLY way)
LDA
STA
RTS
Say we want to make a cxustom block that makes you have a cape as soon as you touch
it (for this example, assume there is no such block already).
LDA #$02
STA $7E0019
RTS
Now... The first line loaded the accumulator with #$02, which is that byte for
having a cape.
The second line then stores that byte to $7E0019, which is where the game keeps
track of mario's current status, instantly giving him a cape.
Lastly, the third line returns from the subroutine, finishing the code and
preventing a crash.
So, you made your very own custom block. Cool. Now, it's usually a good idea to not
change the Accumulator or anything else when you make a block, because n ot doing
so can have undesired effects.
NOTE: List to the following section carefully when you use the opcodes listed here,
or your block will crash SMW!!!!!
First of all, When you want to save the content of the Accumulator, say if you
didn't have ANY RAM where you could keep data that's in the accumulator, but you
desperatly need to do some math, what do you do?
...What's that? You don't know what the stack is? Silly me.
The stack is a section of (RAM? please let me know) that holds bytes that you push
or pull from it with the corresponding opcodes. Think of it as a stack of books, to
which you can add or remove blocks, but only off the top. This is the stack.
OK, so back to the situtation involving the accumulator and math. To push the
Accumulator, you use the opcode "PHA". To pull it to get it's original contents
back into the Accumulator, you pull it (PLA, thanks smallhacker). There are other
Push/Pull opcodes, but those are for later.
IMPORTANT WARNING: However many times you push to the stack, your block must pull
from it the EXACT same number of times before exiting it's routine. Failure to do
so will CRASH SMW!
Smallhacker's advice
Push once, and you need to pull once.
Push nothing, and you may pull nothing.
Push 52 times, and you need to pull 52 times.
Why? Well, it's just how the SNES works. You see, the RTS command I showed you
before finds out where to return to by pulling two bytes off the stack. Therefore,
if you've pushed a different number of times than you've pulled, the RTS command
will get the wrong return address, and SMW will crash.
Now, what if you needed a block that made you super mario, but only when you are
small mario? What do you do there?
Well, that's actually pretty easy. Thanks to a nice little opcode, call CMP
(CoMPare), you can compare the contents of the accumulator to a set value, or a RAM
address, and then get back a set of flags that certain other opcodes use to change
where in the code yotu execute. It's like lookiong to see if mom is watching before
you put your hand in the cookie jar. After all, you wouldn't do it if she's
looking, right?
Working in tandem with the CMP opcode are the Branch opcodes. There are close to a
dozen of them, and they all divert your code to different sectiosn of code,
depending on certain processor flags (don't worry about those just yet.)
So, the one Branch command we're worried about right now is the "BNE" Opcode
(Branch if Not Equal).
So, we know that $7E0019 is equal to 00 when you are small mario, so what you wan
tto do is get the value of $7E0019, and then compare it to $00. Now, how do you use
the branch command??????
When you're working in a text editor, such as notepad, the asnwer is labels.
Opcode 1
opcode 2
Label:
opcode 3
...
When you are coding, you can branch to labels by a method such as this:
BNE Labelname
Now... in order to only make you big mario when you are small, you first need to
type in the RTS command, and a label on the line above it. Note that labels are
only used by the compiler and do not end up in the final .bin file!
And finally, you need to add in the branch command, and then the lines that make
you big.
NOTE: BCS is the normal name for BGE. BCC = BLT, in case you ever see those names
instead of BGE and BLT.
Due to the nature of BVC and BVS, they are discussed at the end of the math
chapter..
Doing math!!
So, you want to do math, huh? Well, with the 65c816, you can either add or
subtract. You CANNOT multiply or divide with the 65c816, and I will cover the way
to use the SNES's math Coprocessor in a later chapter.
Adding
To add two numbers togther, you use the ADC (ADd with Carry). Oh yeah, there is no
straight add or subtract command in 65816 ASM. Only these semi-straight commands.
To use ADC, you first type in "CLC" (I will cover these knds of opcodes next
chapter), then on the line below, you type in ADC xxx, where "xxx" is either a RAM
address or an immediate value (Remember, #$xx).
Then, you accumulator will contain the two numbers added together.
NOTE: IF the two number equal more than #$FF, your answer will end up being #$100
less than it should be, so... #$80 + #$80 = #$00, and #$80 + #$90 = #$10. #$50 +
#$40 = #$90. Get it?
So,to use SBC you first need to use the command SEC. then, you add in (on the next
line) SBC xxx, where "xxx" is an immediate value or a RAM addess, just like the ADC
command.
SBC then stores in the accumulator the result of (Accumulator - SBC Parameter).
In any math operation, if your number either goes over FF in addition, or under $00
in subtraction, wrapping to the other value, A certain processor flag is set.
BVC (Branch when OVerflow Clear) will branch if the operation does not wrap from
$FF to $00 or vice versa.
BVS (Branch when OVerflow Set) Is the opposite; it branches when the operation does
wrap on the boundary, for $FF to $00 or vice versa.
So... You saw how I used "CLC" and "SEC" in the last chapter, right? Well, now I
suppose you're wondering what those commands are, am I right? Of course I am.
CLC stands for CLear Carry flag. Basicaly, it messes with one of 8 btis that the
processor uses to keep track of what it is doing, i.e. the last results of a
compare or a couple other things that I'll discuss later because they are more
advanced. Basically, the carry flag is used during math as a sort of 9th bit,
although it's not really usable in that you can't get it's value and use it in 8-
bit math.
SEC does the same, except it messes with the carry flag in the opposite was of CLC.
Oh, and SEC stands for SEt Carry flag.
Ah.. X and Y. These two little guys can't do math at all... Not are they as usable
as the Accumulator. In fact, they cando only one thing each that the Accumulator
can't do... Indexing. (Check a later chapter.)
Basically, the X and Y registers can hold values, and comparte them to RAM
addresses, just like th eaccumulator. This is done by CPY and CPX (ComPare Y,
ComPare X). Also, the X and Y value can use the stack, and are handy for keeping a
certain value close at hand whilst keeping the Accumulator free for doing math and
the like.
NOTE: Together, the Accumulator (henceforth to be known as "A"), X, and Y are known
as the Registers in this tutorial
Incrementing and decrementing a register by one can be done fairly easily, with a
single instruction. this is true for the A, X, and Y registers. There are two
instructions that do that for us.
First off, there is INC (INCrement). INC adds 1 to either A, X, Y, or a RAM value.
Counting by ones, you say? Well, gosh darn it, what good does that do us?
A HELL of a lot of good. Counting by ones is especially good for loops, such as
where you need to multiply two numbers together, and you can't use the
multiplication registers.
So, we'll was that 7E0624 contains some number custom ASM put in that you need to
multiply by $3. Well, what's a good way to do that? SIMPLE! A Loop!
Loops
Loops are sections of code that repeat a given number of times, before having
execution move on somewhere else. Here is a simple loop that does absolutely
nothing at all, save waste processor cycles:
PHA
LDA #$30
loop:
DEC A
CMP #$00
BNE loop
PLA
RTS
Now... follow how that works. First, you save the contents of A (it's good coding
practice when hacking, in my opinion). Next, you load A with $30, and then thirdly,
you subtract one from that.
Next, you see if you've hit $00 yet. if not, you go back to step three. if so, you
pull the contents of A and then return.
PHA
PHY //Push the Y register... Wer're using it in this code segment.
//Note that you coudl use X instead, it's all a matter of preference.
LDY $7E0624
LDA #$00
loop:
CLC
ADC #$03
DEY //This is DEC Y... Note that the
// C was simply changed into Y. The
//same holds true for X.
CPY #$00 //compare Y with $00
BNE loop
//Do something with the multiplied number.
PLY //Note order of instructions. However you push a set of registers,
PLA //You MUST pull them in the opposite fashion. Push , Y; Pull Y, A.
RTS
There you go, that's how use use loops and the decrement instruction. Note that
increment could be used that way as follows:
PHA
PHY
LDA #$00
LDY #$00
loop:
CLC
ADC #$03
INY //Increment Y ... INC Y -> INY
CPY $7E0624
BNE loop
//DoStuffHere
PLY
PLA
RTS
Now, there is something else to consider here. Notice the CMP #$00/CPX #$00/etc.?
you don't need them, and this is why: when you modify a register value, the
processor flags are changed just like when you use CMP. IF you had X at one, for
example, and you decremented it, it would be zero and so the zero flag would be
set. The zero flag is a processor flag. The same thing occurs if you CPX #$00 when
X is zero, the zero flag is set because #$00 == #$00. BNE breaks when the zero flag
is not set, or when the last operation gave you a number other than zero.
Bit Widths
Bit widths control the maximum number one of the registers can hold. The registers
can either hold an 8-bit number, or a 16-bit number.
So far, you have learned SNES ASM with only 8-bit numbers. However, there are times
where 8-bit is not enough to hold the data you need. A Good example of this is the
score. Say you need to make a block that becomes passable ONLY when your score is
over 5000. Well, the score is stored in RAM as 500, and the extra zero is placed in
by graphics.
But, here's you problem. an 8-bit register only goes up to 255, and 500 > 255. So,
what do you do? Easy. you change either the accumulator,. or the X/Y register's bit
width to 16-bit.
This command sets the processor bitflag for the accumulator's bit width to the 16-
bit setting. Thus, you can load the score's full value into the accumulator and
compare it.
NOTE: When you are in 16-bit mode, the immediate opcodes, like LDA #$66, become
two-bytes long to the processor. Therefore, LDA #$55 when your accumulator is 16-
bit will crash the SNES! Instead, add the ".W" string to the end of the number or
opcode to tell your compiler that the immediate number is 16-bit, or simply use a
value greater than $00FF.
Not that REP #$10 and SEP #$10 control the bit widths of BOTH the X and Y registers
as one. You CANNOT have the Y register one bit width and the X register another,
they MUST be the same.
To write ASM, all you need is a text editor like Notepad, or it's equivalent, and
an ASM compiler, such as SPASM, X816112f, or whatever compiler you prefer.
http://www.zophar.net is a good place to find compilers, however you should pick
the one that is best for you, and it might not be at zophar's! And for a complete
listing of all opcodes:
65816ref.hlp
There is plenty more to learn about ASM, and over the next while I'll continue
updating this with more info. However, for now I'm going to leave it at this, and
edit in more info later on. Hope this helps a little in the meantime!