Tim's beginners guide to 68k

These articles reproduced here courtesy of Tim Humphrey and originated from the Fantasm mailing list.
Contact information:
http://www.winthrop.edu/~humphret
zzhumphreyt@winthrop.edu
1.		Intro - Source fields
1a.	Installing and using Macsbug
2.		Registers and basic instructions
3.		Addressing modes and memory
4.		More on addressing modes (NEW)
5.		Variables & Interpretation (NEW)

There's two things to learn here: assembly and MacOS; since you 
need to know how to write in a language to learn the OS, it's probably 
best to learn assembly first.  A few questions first though.  Are you 
totally new to assembly, i.e. do you know about the 4 fields for an 
instruction?  Do you know about stacks?  I'll probably start from the 
very beginning--since this list is now archived, maybe somebody will 
read it later and learn assembly.

One thing I found helpful when first starting out was learning how to 
enter and exit my program--for a while, *all* my programs ended in system 
errors.  Basically, the first line of code executed is the start of your 
program.  It doesn't need to be labeled "Main" or anything like that.  If 
you're using Fantasm's Build mode, which you should, then the first file 
assembled contains the start of your program.

Fantasm, not assembly, has a requirement that you have at least one 
globally defined label.  So, even though I said that you don't have to 
call the first line "Main", you might desire to do so in order to please 
Fantasm.

Your program can end in various ways--a system error being the most 
undesirable.  Usually it ends with an 'rts' instruction.  When the 
processor sees this instruction it quits your program.  (It actually does 
more than this, but for right now this is what it does.)  Your program 
may end in a system error if you just let it end at the last line, so 
always end it with an 'rts' instruction.
 
So the simplest program you could make would look something like this:
Main:           rts                             ;end program
                global  Main

This program lives only to die, i.e. that's all it does.

If you're totally new to assembly I'll go into the structure of an 
instruction.

Each assembly instruction is always only one line long.  This line is 
always divided into 4 fields which are separated by white space (spaces, 
tabs).  I'll discuss the fields below.

1st Field) This the label for the instruction.  This is commonly used as a 
reference for other instructions.  The colon that ends the label isn't 
required to be there, but you'll find useful to always suffix your labels 
with it.  The label is an optional field, although this doesn't mean you 
should ignore it.  Don't forget to put a tab or a space in this field if 
you don't intend to use it--notice in the second line of the program 
above I use tabs to fill in the label field, instead of typing in one.

2nd Field) The instruction field.  This field is, of course, required--if 
it weren't there you wouldn't have an instruction!  Instructions in this 
field can be the instructions actually used by the processor, 'rts'; used 
by the assembler, 'global'; or custom-made instructions, commonly known as 
macros and no, there wasn't a macro listed in the program.

The instructions used by the processor are what you'll probably use the 
most and they comprise the bulk of your program.  Instructions for the 
assembler are known as directives.  The assembler knows the instruction 
is its own by the pure fact that it isn't a processor instruction, not to 
complex really.  I'll discuss macros some other time, since they're 
custom-instructions you can do without them.  Note, you might also see 
this referred to as the opcode field, it's an OPeration CODE.

3rd Field) Operand Field.  Many instructions need data to work on, and 
this is where you put that data.  Think of it like this, when you add 
numbers you have to have the instruction, the plus sign +, and you have 
the data for the instruction, the numbers.  So this: 2+2, would be this 
in assembly:
                add     2,2
If you use the HP brand of graphing calculators then you'd recognize this 
as reverse-polish notation: issue the instruction and then the data.  It 
might be a pain now, but you'll come to appreciate it--actually makes 
more sense doing it this way the more you think about.  Notice that each 
individual data element is separated by a comma and not a space, remember 
that spaces separate instruction fields and not data elements.  If an 
instruction doesn't need any data, like the 'rts' instruction, then you 
can of course leave it blank.

4th Field) Comments.  This field is the comment field.  I'm not exactly 
sure if Fantasm automatically recognizes this as a comment field, but 
it's customary to begin this field with a comment character.  Comments in 
assembly begin with either a semicolon (;), or an asterisk (*).  Anything 
after this character is ignored.  Comments do not bleed over to the next 
line, so when you press return the comment is done.  (Actually, when you 
press return the entire instruction is done, regardless of what fields 
you have left to fill in)  You can make the entire line be a comment by 
simply making the first character a comment character.  (As a personal 
preference, when I make the entire line a comment I use the asterisk, I 
use the semicolon when I want to add a comment after an instruction--I 
don't have to hold down the shift key so it's a little faster:)

Because assembly instructions are rather cryptic at times, you might want 
to comment each line of your code--versus other languages, you'd be 
surprised at how soon you forget what a piece of code does.  This is just 
my own personal preference however, and you'll probably develop your own 
commenting style.  If you do comment, I recommend you make it something 
informative: don't say "adding 2 and 2", say "adding 2 to player 2's 
score".



Moving right along...

Last time I discussed the format of a typical assembly instruction, 
they're composed of 4 fields: label, instruction, data, comment.  I also 
talked about how to enter and exit the program: first line enters, and an 
'rts' instruction exits.  Now it's time to write a program that does 
something.

As I stated before, you have to learn assembly and MacOS, so you won't be 
able to see any output on the screen for awhile.  So in order to follow 
your programs you should get a debugger.  Stu suggested getting MacsBug, 
and I do to.  You can get it at 
<ftp://ftp.apple.com/devworld/Tool_Chest/Testing_-_Debugging/Debuggers_-_dcmds
/>.  You can also get version 6.5.4 there too if you 
want it.

Once you get MacsBug you'll probably wonder what to do with it since it 
isn't an extension, control panel, or application; in fact, if you 
double-click it the Finder refers to it as a document.  What you do is 
put it in the top level of your System Folder, i.e. don't put it any 
folders that are in the System Folder.  Once there, reboot for it to take 
affect--if you watch the startup sequence you'll see a message saying 
that a debugger was installed.

Now, how do you use it?  If you press the command-power combination--power 
is that triange key in the upper-right of the keyboard--then you'll be 
dumped into MacsBug.  What you should see is a white screen divided into 
sections.  I'll briefly touch on them, after awhile you'll kinda learn 
what each of them is for:
 - the left side of the screen lists registers, condition codes, the 
current application name, and previous stack levels
 - the big section of the screen is where all output goes
 - below the output section are three lines that show the next 
instructions waiting to be executed by the processor
 - finally, below the 3 instructions is a command line for you type in 
MacsBug commands--betcha never though the Mac had a command-line did ya':)

To begin learning MacsBug type in "help" and some text will appear 
detailing the major sections that the help is divided into.  You can type 
in "help" and one of the topics to get help on that topic.  For instance, 
to get help on editing in MacsBug you would type "help editing".  You can 
play around with the help to learn everything about MacsBug, but for now 
I'll tell you some basic commands you'll want to know about:

 G) "G" stands for go and it exits you out of MacsBug and returns you to 
whatever you were doing before you entered.  While you're in MacsBug 
processing is suspended and nothing gets done on your computer until you 
leave--CD players will still play because they don't need processing to 
play.

 ES) This force quits the current application, equivalent to 
command-option-esc.  It also exists MacsBug.

 RS) This unmounts all disks and restarts the computer.  There is another 
command that restarts without unmounting, but you don't want to use that 
unless you're feeling adventurous.  If it happens that for some reason 
you can't leave MacsBug by typing "g" or "es" then you'll have to resort 
to "rs".  If even that won't work then you'll have to use 
command-control-power.  (Command-control-power is equivalent to that 
other MacsBug restart command, so if you must force a restart try "rs" 
before doing command-control-power.)

The method of entering MacsBug described above, command-power, is only 
one way of entering MacsBug.  Another way is to put a 'debug' command 
in your program.  When the processor sees this it dumps you into a 
debugger, if present.  Note, that if you don't have MacsBug installed, or 
some other debugger, then 'debug' will trigger a system error.  When you're 
in MacsBug using this method the processor will be about to execute the 
command immediately following the 'debug' command.  So in this program:
                debug
Main:           rts
                global Main
The section in MacsBug that shows the three commands will have 'rts' at 
the top, meaning it's going to be executed next.  (Notice the label isn't 
at the very start of the program, you just need one in there, but it 
doesn't matter where.)

You could type in "g" to continue the program, which would promptly 
quit.  However, you can use MacsBug to step through the program, 
instruction by instruction.  To do this you can use these two commands:

 S) "S" steps into an instruction.  I use the word, "into", for a special 
reason: there's another command that steps "over" instructions.  
Essentially, when you step into an instruction you directly execute 
it, you don't try to do anything special with it.  If you were to step into a 
MacOS trap, for instance, then you would begin to see the actual commands 
executed by the trap, and not merely execute the trap.  Read on in step 
over to learn more...

SO) "SO" steps over an instruction.  It does the same thing as "s" except 
it will just execute the instruction.  If you were to step over a MacOS 
trap then the trap would be considered one instruction, and not a gateway 
to other instructions--the trap would execute and you wouldn't see what's 
going on.  When MacsBug does this, stepping over a trap, it will flip the 
screen to the main screen and flip back to MacsBug when it's done.  This 
logically brings me to another MacsBug command...

 ESC or ~) Pressing either "esc" or the tilde key will flip the screen 
from MacsBug to the true screen.  This is the screen that you see before 
you went into MacsBug, e.g. the desktop.

You can also indicate a number of instructions to step. So you could step 
into, or over, 3 instructions by typing "s 3", or similarly "so 3".  Try 
this to see the difference between stepping into and stepping over.

If you did try this you might be surprised that when the 'rts' 
instruction was executed your program didn't quit.  When I said that 
'rts' does more than just quit your program this is some of what I 
meant.  Actually, 'rts' returns program execution to the routine that 
called the routine containing the 'rts' instruction--sound confusing?!

Your program is essentially just a routine: program execution was going on 
before your program began, and it'll continue after it's done.  'rts' means 
"ReTurn from Subroutine", so executing 'rts' is returning program execution 
to the calling routine; probably the Process Manager.  So if you do step 
past 'rts' the instructions you see won't belong to your program.

Now that you know how to look at what your program is doing, I'll discuss 
the basic data storage unit in assembly: the register...



Last time I talked about viewing your programs via a debugger, MacsBug.  
(BTW, if you're tired of MacsBug's colors then you can download a program 
that will let you change them.  You can get it at 
<ftp://mirrors.aol.com/pub/info-mac/dev/color-macsbug.hqx>.  When you 
launch it, you'll get a dialog to search for MacsBug, find and select it 
and you're on your way.)  Now that you can use MacsBug, in color, you can 
observe the fundamental assembly tool in action: registers.

Registers really are your tools, there really isn't any other way I can 
say it.  Registers will be your variables, they will be your constants, 
they will hold your pointers and handles, they pretty much are your 
program.  Unless you've programmed in assembly before there isn't 
anything you can relate registers to in other languages.  The best 
analogy I can really think of for registers, is that they are your 
tools.  As such, there are very few limitations on their usage--heck, 
you use them to do other things so there shouldn't be any restrictions 
on them.  They are totally unique to assembly, and totally essential.

Hopefully you've gotten the concept that you'll be using registers for 
everything that you do.  But you may be wondering, "What is a register?"  
A register is a holding place in the processor for data.  When you want 
to add two numbers, the numbers will be in registers, added, and then 
stored back in registers.  (As with all new things there is more than 
meets the eye, so what I just said isn't totally true; but for now just 
believe it.)  Essentially, the processor uses registers to carry out your 
instructions.

Exactly what a register is is unimportant.  What's important is knowing 
that you have to use them for you to program in assembly, i.e to program 
the processor.  So, how do you use them?  The answer isn't as simple as 
the question.  Assembly language uses the instruction set of the 
processor for its commands, so the assembly language for one processor, 
68K, won't be the same as for another, PowerPC.  If the processors are 
different, then don't you think the registers too will be different?  
They are.  So if I were to tell you how to access registers for one 
processor it wouldn't be quite the same method for another processor.  
Since every Mac user has access to the 68K instruction set, I'll teach in 
that--plus the fact that's all I know.  (It actually isn't so different 
accessing registers for other processors, but since it is different and 
you're just starting out, it might hinder the learning process to learn too 
much seemingly conflicting stuff too soon.)

Having said that, here's how you would move the value 2 into the first 
68K data register:
                move.l  #2,d0
After all the scare I used you'd thought it would've been tougher 
than that wouldn't you:)  Even so, there are a couple of things to look 
at here, so let's dissect that instruction.

----------

First, look at the instruction, 'move'.  This instruction does just that, 
it moves data from one place to another.  When you explicitly want to put 
a value into a register you'll use this instruction.  As a side note in 
case you ever get confused, you don't technically move data, you copy 
it.  The place that you're moving from doesn't lose its value as you 
might think.  It just sounds better to say you're moving rather than 
you're copying:)

After the 'move' instruction, there's this ".l".  That suffix tells the 
processor how much data to use, in this case, how much to move.  Here's a 
program to use so you can see what's going on in MacsBug:
                debug
Main:           move.l  #2,d0
                rts
                global Main
When you enter MacsBug, you'll be ready to execute the 'move' 
instruction.  Before you do, notice something.  In the bottom-left column 
of the screen there's a bunch of numbers that have labels preceding 
them.  The numbers preceded by "D0, D1, D2, ... D7" are data registers, 
and the numbers are what is in the respective register.  Although your 
actual numbers will look different, here's an example:
D0 6743FF23
This means that the number 6743FF23 is the current value of the D0 
register.  (Yes, 6743FF23 really is a number:)

Now, step into the next instruction, by typing "s", and notice what 
happens to the D0 register.  The entire thing becomes 2:
D0 00000002
Isn't the value 2 small enough so that only the last digit needed to be 
changed?  Yes it is, but you told the processor to use all of the 
register to hold the value 2, so it did.  You told it this via the ".l" 
suffix to move.  What does the ".l" suffix mean anyway?  In 68K it means 
long, which really means 32-bits.  That's right, the 68K is 32-bit, and 
the MacOS has been since way back when--kinda makes you laugh when you 
hear all the hype about 32-bit Win95 programs.  Whenever you hear that 
some processor is 32-bit, this is what it means: its registers hold 32 
bits worth of data--this comes out to be about 4 billion something, I 
think.

Other suffixes you can use are:
.w - means word, which is 16-bits--half a long
.b - means byte, which is 8-bits--half a word, and a fourth of a long
So you couldn't tell the processor to just change the last digit even if 
you wanted to, the best you could do would be a byte.  (Again, there's 
more than meets the eye here..., if you *really* wanted to just change 
the last digit you could.)

What happens when you only use half of a register, *which* half gets 
used, or what about a fourth of a register?  The answer is always the 
first half, or fourth, of the register; essentially the righmost part.  
The processor starts counting the bits of a register from the right, and 
goes to the left.  So, if you did this:
                move.w  #2,d0
this,
D0 6743FF23
becomes this,
D0 67430002
Similarly, moving a byte would result in this,
D0 6743FF02

I'm sure by now you know what the operand part of the 'move' instruction 
I listed does.  It moves a source value, "#2", into a destination "D0".  
You could change the "D0" into "D1 or D2, ... D7", and you would move the 
value 2 into the other data registers.  Try it to get used to using 
more than one register, you won't hurt anything--at least nothing 
rebooting won't solve:)  There really isn't any difference between 
each of the 8 data registers, so use and abuse them however you see fit.



Last time, which really was a long time ago, I talked about the 
importance of registers in assembly programs.  I also talked about how to 
move data into the registers, data of varying size.

To start of this lesson I want to show you something, and to do that I'll 
need to introduce a new MacsBug command:

IL) Typing "il" just by itself produces half a page of disassembled 
instructions starting from the next instruction.  In assembly an 
assembler, Fantasm, converts your instructions to the actual numbers the 
processor needs in order to execute.  Disassembly is exactly the opposite, 
taking the numbers and producing the assembly instructions you write--the 
disassembly won't look exactly like your original source of course, but 
it'll be much more readable than the hex equivalent.

To get more disassembled lines, just repeatedly press return.  You can 
also put an address after "il" to disassemble from that address.

If you haven't already done so, type "il" and look what happens.  You'll 
get half a page of disassembled instructions, like I said, but take a 
look at each line.  What you'll find is that every line uses a register in 
some form or another; and the few that don't appear to use them, do.  
What's the whole point?  The point is, is that every program on your 
computer uses the same finite number of registers that your program 
uses.  If you broke into MacsBug by typing command-power, then you'll 
especially notice this.  The registers that you see on the left-hand side 
of MacsBug are essentially, all the registers there are.  So how in the 
world can an infinite number of programs use the same finite number of 
registers and still work?

The answer is that each program uses the register for whatever is has to 
do, and if it needs to save some data, it saves it in memory.  You might 
think of an analogy of registers somewhat like this--yes, I've come up 
with a new analogy:

Registers are like the RAM on your computer system, you have a finite 
amount of RAM and any number of programs that can use it; but not all at 
the same time.  Suppose that you only have 8 megs of RAM, and your System 
is using up 3 megs; you have 5 left.  You decide to do some web page 
editing, and open up a new program; let's say it uses up 2 megs leaving 
only 3 free.  Now you want to preview it by opening up Netscape, but 
Netscape requires at least 4 megs of RAM and you only have 3, what'll you 
do?

You could buy some more RAM, but a more sane solution is to quit your web 
editing program to free up more memory, then you'll have enough to launch 
Netscape.  This analogy of quitting programs to free up RAM is similar to 
what is done with registers; although with a little caveat.  You have to 
quit a program to gain use of its memory, but you can just take over a 
register and start using it--rude, perhaps, but quite effective.  So if you 
wanted to use register D0 for something, just use it.  But here's a 
problem, if you just use D0 without considering who was using it before, 
don't you think some other program will do the same thing to you?  What 
happens if D0 has something important in it that you want to keep for later?

The solution, save the contents of the register to memory.  In 68K, you 
do this with address registers.  In MacsBug, these are the registers that 
start with, "A0, A1, ..., A7".  Address registers are registers, just 
like the data registers, so you can move data into them, so you could 
write an instruction like this:
                move.l  #0,a0
For various reasons though, you'll almost never issue the instruction I 
just wrote.  I'll explain why in the next lesson.

Address registers are your link to memory, RAM.  To store something in 
memory you would issue an instruction like this:
                move.l  #2,(a0)
The difference between this instruction, and the one I listed above are 
the parenthesis.  The parenthesis indicate to the processor that the 
number 2 should be moved to what a0 points to.  If A0 contained zero, 
from the first instruction, then the number 2 would be stored in memory 
location 0, from the second instruction.  Do you see what I'm talking 
about?  Here's something that might help you out even more.

Whenever you deal with memory, or registers, the processor computes 
something called the "effective address" (EA) to figure out exactly where 
you want to manipulate stuff.  The effective address doesn't technically 
have to be an "address", it just means where you want to manipulate stuff.  
So, in a previous instruction I posted in a previous lesson:
                move.l  #2,d0
The source effective address is the number 2.  This isn't an address, of 
course, but it is an effective address: it tells the processor where the 
source data is, the number 2.  This kind of effective address is known 
as "immediate".  It's called this because the data is immediately 
available, it doesn't have to go out to memory or look in a register, the 
value is just given to it.  You indicate the immediate form by preceding 
a number with a pound sign #.

The destination effective address is the register D0.  Again, not an 
address, but an effective address; it merely tells the processor where to 
do stuff.  In this case, it does stuff in/to the register D0.  And in 
this particular instruction, it moves stuff to that register.  This kind 
of effective address is called "register direct".  You'll understand the 
name when I tell you about another effective address.

"Register indirect" is something that applies only to address registers.  
What this EA means is that the place to do stuff is contained in the 
register.  So, if A0 contained 0, the place to do stuff would be in 
address location 0; in register indirect, effective address truly does 
mean an actual address.  Using the instruction listed above:
                move.l  #2,(a0)
The place to get stuff is immediately told to the processor as the number 
2, and the place to put that stuff is the address pointed to by A0, 
i.e. the value that's in A0.  Get it?  In this instruction:
                move.l  #2,a0
The place to get stuff is immediately told to the processor as the number 
2, but the place to put that stuff is the register A0, not what A0 points 
to.  Get it?

So to sum up, you've learned 3 different addressing modes, effective 
addresses:

 - immediate, indicated by a # sign and then a number, e.g. #2, and means 
an actual number
 - register direct, indicated by a register, e.g. D0, and means the 
actual register
 - register indirect, indicated by a register surrounded by 
parenthesis, e.g. (a0), and means the address pointed to, contained in, 
the register; naturally, this mode only applies to address registers.

You'll learn more addressing modes later on, but these three should keep 
you happy for awhile.

Before you start storing values to memory though, you should have a place 
to store them to.  If you don't then you could store a value in someone 
else's program, possibly corrupting it.  There are various ways to get 
memory for your program: use the Memory Manager to get it for you, use 
the stack, or just create space for it in your program.  Since you have 
to use the OS for the first method, and we haven't discussed that, the 
first option isn't viable right now.  Unless you know about the stack, 
and unless you only need temporary memory, the second option isn't 
advisable either.  So that leaves the last option: embedding the space in 
your program.

You do this by using a Fantasm directive: 'ds', define space.  You can 
specify how much memory you want, and how big each individual memory 
"module" should be.  To allocate 16 bytes of storage, you could do this:
                ds.b    16
you could also do this to get the same amount of memory,
                ds.w    8
or you could do this which does the same thing,
                ds.l    4
You'll know which method to use when you start writting your program, but 
since I explicitly said allocate 16 *bytes*, it would make more sense 
to use the first instruction,
                ds.b    16
even though the others are just as valid.

Now that you have the memory how do you go about accessing it, where is 
it?  Well, the simple answer, is that it's where you put it!  In this 
little program, the 16 bytes are after the 'rts' instruction:
                debug
Main:           move.l  #0,a0                   ;A0 contains 0
                move.l  #3,d0                   ;D0 contains 3
                lea     space(pc),a0            ;A0 contains address of space
                move.b  space(pc),d1            ;D1 contains first byte of space
                move.b  d0,(a0)                 ;first byte of space has 3
                move.b  space(pc),d1            ;D1 contains 3
                move.b  (a0),d1                 ; "
                rts
space:          ds.b    16
                global  Main

Whoa!  Introduced a few new things there, didn't I:)  The first new 
thing, is the instruction 'lea'.  This means 'Load Effective Address', 
and does just that: it loads the effective address of something into a 
register, an address register.  It's too bad the effective address 
contains so many unknown things, or else you could figure out what the 
effective address is:)

The "pc" is a special register, much like an address register, which 
contains the address of the next instruction to be executed; it means 
"Program Counter".  You can't really directly manipulate it, so an 
instruction like this would be illegal:
                move.l  #4,pc
However, you can directly read it, so this instuction would be legal:
                move.l  (pc),a0
There is no "pc" direct addressing mode, so this too would be illegal:
                move.l  pc,a0
If you think about it, you really don't need the last instruction: since 
you can't directly move data to it, why have an instruction where you 
could directly move data from it.  In short, the only access you have to 
the program counter is the second instruction, register indirect.

So, this is what happens in the second instruction:
                move.l  (pc),a0
You move what is pointed to by the pc into A0.  What is pointed to, 
contained in, the pc is the address of the next instruction; so you would 
move the next instruction into A0 with this instruction!  Still don't get 
it?  Look over this little example:
00000000        move.l  (pc),a0         ;A0 contains the number for 'rts'
00000004        rts
At the first instruction, the pc contains the value 00000004, the address 
of the next instruction.  According to what I said above about register 
indirect, the effective address is the address pointed to by the 
register.  Well, the pc is pointing to 00000004, because it contains 
00000004.  So, the 'move' instruction is going to move what's in 00000004 
into A0.  What's in 00000004 is literally the next instruction, 'rts'.  
So, A0 would have the number that represents 'rts'--whatever that happens 
to be.

If you get what I just said, you might think that that's a little stupid, 
you're always going to move the next instruction into something--you want 
to execute instructions, not get their values!  The question is asked, 
"Is it possible to reach beyond what is pointed to in a register?"  The 
answer is yes, and it's called base displacement--well, it's called 
something, but I doubt it's what I just said:)  Anyway, the syntax looks 
like this: 0(a0).  Simply, add a number before the parenthesis for 
register indirect and you get this new addressing mode.  The effective 
address in this case is what is contained in the register, plus the 
number.  So, the particular example I just showed is equivalent to register 
indirect, since zero plus anything is the same thing.  You could put a 
negative number there to go back, just as you can put a positive number to 
go forward.  Either way, the number is stored internally as 16-bits, which 
is equivalent to this range, -32768 to 32767; so if you have something 
that's beyond 32767 bytes then you're in trouble:)

In the original instruction I listed:
                lea     space(pc),a0
I used a label, "space", instead of a number, what's going on?  The same 
thing that I just said.  In this particular case though, Fantasm is doing 
the work of figuring out what the number is and not us mere humans.  It's 
getting the distance from the label, "space", to what the pc would 
contain if your program were running; and when you assemble your program, 
it inserts this number instead of the label "space".  Be thankful Fantasm 
does this chore for you!  You can confidently rest assured that the right 
number will be inserted, so it really doesn't matter how Fantasm does 
what it does.

So, now that you know what all that new stuff means, let's figure exactly 
what's going on.

 - first, figure out the effective address.  It's what's in the pc, the 
address of the next instruction.  Plus, the offset from the next 
instruction to "space".  Which comes out to be the address of "space".  
For now, just trust that this is what's going on.
 - second, see what the instruction is going to do with that effective 
address.  This instruction, 'lea', is simply going to Load the Effective 
Address into a register, in this case A0.

So, A0 contains the address of "space".  Another instruction in the 
program:
                move.b  space(pc),d1
has the same effective address, space(pc), but does something different 
with it.  It moves stuff from that address.  So, D1 will contain the 
first byte in "space"; it won't contain the address of "space".  
Basically, once 'lea' gets the effective address, it just *stores* it; 
whereas 'move' will *use* the address for something.

Anyway, step through that program; you might want to run it a couple of 
times until you understand what's going on.  Don't forget to pay 
attention to the register contents as you go along.  BTW, when you type 
in "il" in MacsBug, you won't see "pc" listed in the disassembly, you'll 
see an asterick instead.  So this,
                move.b  space(pc),d1
would be this in MacsBug
                move.b  *+8,d1
The eight would be whatever Fantasm puts in, i.e. it probably won't be 
eight as I listed.

Last time I talked about the volatility of registers and how to save their
values, to memory.  I also talked about the pc, program counter, and the
'lea' instruction; which together you can use to save registers to memory.

                debug
Main:           lea     space(pc),a0
                move.w  (a0),d0
                move.w  #3,d1
                move.w  d1,(a0)
                move.w  (a0),d0
                rts
space:          ds.w    4

I talked about the 'lea' instruction I listed above in the last lesson,
but I feel it deserves a little more talking about.  From the last lesson
you know that the source addressing mode is base displacement register
indirect, or something like that.  The proper use for it is to put a
number before the parenthesis, and not a label like I did.  So if you had
this:
                lea     4(pc),a0
you would be taking what's in the pc and adding it to 4.

Having the label there instead is a special syntax to Fantasm.  It means
different things depending on exactly what the label is, but for right
now, and in the program above: it means the address for the label.
Ultimately, the label in the addressing mode will get replaced with the
offset from the instruction to the label; this is why it is a displacement
register indirect addressing mode, because to the processor a number is
where the label is.

The method used above is the only way you can get the address of a part
of your program, and hence access to your embedded data.  Why?  Because
you don't know where in memory your program is going to be when it is run.
Virtually all modern OSs allow multiple programs to run at once, and to be
loaded in any order: so it doesn't matter if you launch Netscape first or
SimpleText, they'll both work regardless of who goes first.  So it's
entirely possible for SimpleText to load at memory address 10000 one time,
and then to load at 30000 the other.

The one thing that you do know is the relative distance of the data you
want.  If you imagine your program starting at memory address 0, then your
data could be at 20.  It's important to note that the data isn't really at
20, but it's at 20+start-of-your-program.  So if your program started at
10000, your data would be at 20+10000 which is 10020.  If your program
started at 30000, then the data would be at 30020.

Since you never know absolutely where your data is, you have to specify
relative addresses.  The normal address registers won't contain anything
useful to you, because any program could have used them before your
program was run.  The only register that is predictable enough for you to
use is the pc: it *always* has the address of the next instruction.  So
you can use this to your advantage to find the addresses of parts of your
program; all you have to do is specify an offset, a distance, from the pc
to wherever you want.  As I stated above, specifying a label is a syntax
to Fantasm meaning the address of the data you want.  So instead of
figuring out the distance from the label to the current instruction, which
would be quite hard, you can just put in the label as the offset to tell
Fantasm to figure out the offset.

Once you do get the address figured out, you have to store it somewhere;
otherwise you'd have to figure it out again if you ever wanted to use that
particular address.  That's why you have to use the 'lea' instruction: all
it does is store addresses, it doesn't even attempt to use the address in
any way.

So, to sum up, if you want to get the address of a part of your
program, and hence your data:
* use the 'lea' instruction
* specify the label of the part of the program you want the address of
* use the pc
* specify an address register
So it would look something like this:
                lea     space(pc),a2

When I first introduced the 'lea' instruction, I did so kind of
hapharzardly.  But now you see that it is really an important instruction.
Maybe you thought you could get away with not using the 'lea' instruction.
Maybe you thought you could get away with not using the pc.  Or maybe you
thought you could get away with not using the displacement register
indirect addressing mode.  Well, you can't, live with it.

Previously, I stated that you would almost never do something like this:
                move.l  #2,a0
moving an immediate value into an address register.  Maybe now you see
why I said that.  Since you use address registers to, well, address
memory, and you never know where in memory your program will be;
specifying an immediate value is in essence specifying an absolute
address, in this case memory location 2.  Do you know what's always going
to be there?  Of course not, that's why you would almost never do
something like this: you might think you would write to your program but
you would in fact end up writing to whatever is occupying that space;
which more than likely won't be your program.  Later on, you'll see an
exception to this; but in general, never move an immediate value to an
address register.



Last time I emphasized the importance of the 'lea' instruction in
accessing your variables.

Here's the program I listed in lesson 4:

                debug
Main:           lea     space(pc),a0
                move.w  (a0),d0
                move.w  #3,d1
                move.w  d1,(a0)
                move.w  (a0),d0
                rts
space:          ds.w    4

There's one thing I want you to notice: *where* I put the space for the
variables, the 'ds' instruction.  What do you think is in "space" when
you're running your program?  Well, most likely it's going to be zeroes,
but it's possible it could be anything.  Even more intriguing, what do you
think would happen if "space" was above the 'rts' instruction?  What if it
looked like this:

                debug
Main:           lea     space(pc),a0
                move.w  (a0),d0
                move.w  #3,d1
                move.w  d1,(a0)
space:          ds.w    4
                move.w  (a0),d0
                rts

If you're feeling lucky, try this program; pay careful attention to
what's going on while in MacsBug.

If you just tried that out and you're reading this sentence *immediately*
after reading the previous sentence, then you're lucky.  Most likely, that
modified program would have crashed; your program definately, and your
computer probably.  Why, because you were executing data and not your 
instructions.  If you paid attention, you would have even saw the program
change itself, self-modifying code.  All of this from just putting the
variable "space" before the 'rts' instruction, amazing...

This serves to highlight special considerations when using embedded data
for your variables.  You should always put the space for the variables
after the end of your program.  Really, you should put them after code
that's going to be executed, which right now *is* the end of your program.
If you don't then you get serious errors like the ones you just saw.

* --- Assembly Lesson 5.5 Variables & Interpretation --- *

Now that you pretty much know how to use variables, I thought I'd talk
about some ways to effectively use them in assembly.

Unlike in high-level languages, there are no data types at all in
assembly.  About the closest thing you can come to is the size of data:
byte, word, long.  Even then the sizes aren't terribly restrictive; what's
to keep you from moving only a word into a long?  (You could do the
opposite, move a long into a word, but since the space you're moving into,
word, is smaller than the space you're moving, long, you wouldn't
generally do this as you'd get problems.)  If all high-level languages
eventually trickle down into assembly, how do you represent things like
integers, floats, strings, classes, etc.?

The answer is you don't represent, you interpret.  What is it that makes
the number 65, 65; and not the letter "A"?  Internally, the number 65 and
the letter "A" are the exact same thing.  They seem different because it
is known to the program when to interpret the number 65 as a number and
when to interpret it as a letter.  If you put a prompt on the screen
asking for a *letter*, and you type in "A", then the program knows that
the number 65 should be a letter.  If you put a prompt asking for a
*number*, then 65 will be known as a number.  Thereafter, whenever the
program displays what you typed in, it knows to interpret the number it
has stored: the first way it will display 65 as a letter, the second, as a
number.

Here's a question, "Exactly *how* does the program know how to interpret
numbers?"  I suppose the program could have a really sophisticated level
of artifical intelligence; but since we as humans make the programs, and
we don't even fully understand how we know stuff, I doubt this.  The best
way would be to arbitrarily say that certain variables are of a certain
type.  So whenever we see a particular variable, regardless of it's value,
we interpret it a certain way.  So when you store a number, you would
store it in one variable; and when you store a letter, you would store it
in another variable.  This way, you can easily know how to interpret the
data; even if the number is 65 and the letter is "A".

In Fantasm, you can specify a few interpretations:
* decimal numbers - to specify these just type in the number, to move 65
into register D0, do this:
                move.l  #65,d0
* hexadecimal numbers - for these just prefix the number with a dollar
sign $, to move hexadecimal 41 into D0:
                move.l  #$41,d0
* binary numbers - for these just prefix the number with a percent sign %:
                move.l  #%1000001,d0
* letters - for these just put the letters in double quotation marks "":
                move.l  #"A",d0
In case you haven't figured it out, all four of the above sample
instructions move the exact same thing into D0.  But did I move a letter,
binary, hexadecimal, or a decimal into D0?  Well, it depends on what
you're doing as to what you think you moved.  If I were specifying a file
type to the OS, then I moved a letter; if I were specifying the score for
player 2, then I moved a decimal; if I were specifying an event mask to
the Event Manager, then I moved a binary; if I were specifying the
contents of a file to a programmer, then I moved a hexadecimal.

It all depends on what you're doing as to how to interpret things.  I know
this might be confusing now, but later on as you progress in assembly
you'll appreciate the freedom to do whatever you want.  After all, if you
can interpret the letter "A" as a number, 65, then you can add one to 65
to get 66 which can then be interpreted as "B"; you can keep on doing this
to display the entire alphabet.  Most likely, you'll specify things as
decimal, since that's what we're used to, but don't be afraid to spread
your wings and do wild things that you never even imagined before; with
no restrictions placed upon you, you can certainly do them.

One thing I would like to note is this.  A letter is stored using ASCII
representation, which is just a standard way of interpreting numbers as
letters, i.e. 65 means the capital letter "A".  ASCII characters are a
byte in length.  68K registers are 4 bytes in length.  So you can specify
up to four characters at a time in one register.  So you could do this and
it would be perfectly valid:
                move.l  #"Help",d0
but this wouldn't be valid:
                move.l  #"Help Me!",d0

One other thing before I finally finish up the topic of memory and
addressing.  You can specify things like records, structs, etc. through
the use of Fantasm's 'rs' directives.  Imagine you had a record that
consisted of these items:
cats - a long
dogs - a long
apples - a byte
oranges - a word
How would you go about storing this in memory?

Well, you know that one record takes up 4+4+1+2=11 bytes, so you would
need to allocate at least 11 bytes of memory to store it.  You would get
the address for it through a 'lea' instruction.  Now what, you have the
address to the start of 11 bytes, split up between four items.

The key thing to understand is that the items take up a specified amount
of space, and that they are located at a relative distance to each other.
Remember the example I stated about your data being at a relative distance
from the start of your program, say 20+start-of-program?  Apply that same
idea here: each item is at a relative distance from the start of the
record.

Since "cats" comes first they are at offset 0 from the start of the
list, they are the start of the list.  Since "cats" are a long, which is 4
bytes, "dogs" begins at offset 4 from the start of the record; similarly,
"apples" are at offset 8, and "oranges" are at offset 9.  "Oranges" is the
last item in the record and is at offset 9, "oranges" takes up a word,
which is 2 bytes, so 9+2=11 and that's how you get the end of the list,
the size of it.  So here's how you could access "apples":

                debug
Main:           lea     record(pc),a0
                move.b  8(a0),d0
                rts
record:         ds.b    11

D0 does have the value in "apples", even though it wasn't too apparent.
You can make it more noticable by defining a label to be a value and then
putting that in, instead of 8.  You do this by using 'equ' directives.  It
would look like this:
apples:         equ     8
and like this in the program:

apples:         equ     8
                debug
Main:           lea     record(pc),a0
                move.b  apples(a0),d0
                rts
record:         ds.b    11

Remember when I said that when you use the displacement register indirect
addressing mode--try saying that 3 times fast--and you specify a label,
that label means different things depending on the label?  Well, this is
one of those different meanings.  Since "apples" was explicitly given a
value, of 8, that's what it is.  A label like "record" doesn't have an
explicit value, and so it evaluates to the address of "record".  In any
case, when you put in "apples" in the above program, it evaluates to 8;
and the whole addressing mode ends up giving you the value in "apples".

You could just as well define each of the record items, cats, dogs,
apples, and oranges, with 'equ' directives; but what happens if you later
decide to change the order of the record or the size of one of the items?
The answer, you're screwed!  Instead you can use 'rs' directives.

'rs' directives are similar to 'equ' in that the label that contains the
directive is given a value; the difference is that you don't explicitly
give it.  When your program is getting assembled it goes through two
scans: the first one gathers data about your program, and the second
actually assembles it.  At the start of the first scan there's a counter
that is initially zero.  Each time Fantasm comes to a 'rs' directive, it
assigns the current counter value to the label and then increments the
counter by the size of the 'rs' directive.  The net effect is that your
labels get properly assigned values, even though you haven't explicitly
assigned them.  If you change the order of the 'rs' directives or the size
of any one of them, only a reassemble is necessary, since the scan will
correctly reassign the label values.  So here's what our record would look
like using 'rs' directives:

cats:           rs.l    1
dogs:           rs.l    1
apples:         rs.b    1
oranges:        rs.w    1

The 'rs' directive is sized, meaning you can specify a long, word, or a
byte.  And you can also specify how many of each size you want; I only
specified 1 of each item.  'rs' directives are also similar to 'ds' 
directives, the difference is that 'rs' directives don't actually allocate
any space like 'ds' directives do.  One thing to note is that every item
of the record has to be listed, I can't just list "apples" like I did when
I used an 'equ'.  (You could just list "apples" if you really wanted to,
do you see how?  Like this, 'rs.b 8'.  Put that directive above "apples",
with no label if you like, and "apples" would get the right value;
although by doing this you kind of defeat the purpose of using 'rs' 
directives.)  If you want multiple record definitions, then you can use
the 'rsreset' directive to reset the counter before you start specifying
the record.  So here's a real good way to get the juice out of "apples":

                rsreset
cats:           rs.l    1
dogs:           rs.l    1
apples:         rs.b    1
oranges:        rs.w    1

                debug
Main:           lea     record(pc),a0
                move.b  apples(a0),d0
                rts
record:         ds.b    11
                global  Main

About the only downside to 'rs' directives is that you can't leave out the
size specification.  If you wanted to make "apples" a word, then you would 
naturally have to change the size in the 'rs' definition; but you would
also have to change the size anywhere you use "apples", like in the 'move'
instruction.

* --- MacsBug Stuff --- *

If you want to examine memory then you can use this instruction:

DM) this displays the memory at the address you specify

If you were in MacsBug and you wanted to see what was in "record", then
*after* the 'lea' instruction, type this in "dm a0".  The address you
specify doesn't have to be an actual memory address, it can be a register;
you could even display memory from a data register!

If you break into MacsBug inside of a program that has windows on the
screen, type this in "dm windowlist^ window".  What came up?  Neat, huh?
Look at the help sections on memory, expressions, and templates to figure
out how I did this.


..._Tim_...
--=[Until you believe something, there is nothing to be proven]=--
http://www.winthrop.edu/~humphret
zzhumphreyt@winthrop.edu
Back to Docs