These articles reproduced here courtesy of Tim Humphrey and originated from the Fantasm mailing list.
1. Intro - Source fields 1a. Installing and using Macsbug 2. Registers and basic instructions 3. Addressing modes and memory 4. More on addressing modes (NEW) 5. Variables & Interpretation (NEW)
There's two things to learn here: assembly and MacOS; since you need to know how to write in a language to learn the OS, it's probably best to learn assembly first. A few questions first though. Are you totally new to assembly, i.e. do you know about the 4 fields for an instruction? Do you know about stacks? I'll probably start from the very beginning--since this list is now archived, maybe somebody will read it later and learn assembly. One thing I found helpful when first starting out was learning how to enter and exit my program--for a while, *all* my programs ended in system errors. Basically, the first line of code executed is the start of your program. It doesn't need to be labeled "Main" or anything like that. If you're using Fantasm's Build mode, which you should, then the first file assembled contains the start of your program. Fantasm, not assembly, has a requirement that you have at least one globally defined label. So, even though I said that you don't have to call the first line "Main", you might desire to do so in order to please Fantasm. Your program can end in various ways--a system error being the most undesirable. Usually it ends with an 'rts' instruction. When the processor sees this instruction it quits your program. (It actually does more than this, but for right now this is what it does.) Your program may end in a system error if you just let it end at the last line, so always end it with an 'rts' instruction. So the simplest program you could make would look something like this: Main: rts ;end program global Main This program lives only to die, i.e. that's all it does. If you're totally new to assembly I'll go into the structure of an instruction. Each assembly instruction is always only one line long. This line is always divided into 4 fields which are separated by white space (spaces, tabs). I'll discuss the fields below. 1st Field) This the label for the instruction. This is commonly used as a reference for other instructions. The colon that ends the label isn't required to be there, but you'll find useful to always suffix your labels with it. The label is an optional field, although this doesn't mean you should ignore it. Don't forget to put a tab or a space in this field if you don't intend to use it--notice in the second line of the program above I use tabs to fill in the label field, instead of typing in one. 2nd Field) The instruction field. This field is, of course, required--if it weren't there you wouldn't have an instruction! Instructions in this field can be the instructions actually used by the processor, 'rts'; used by the assembler, 'global'; or custom-made instructions, commonly known as macros and no, there wasn't a macro listed in the program. The instructions used by the processor are what you'll probably use the most and they comprise the bulk of your program. Instructions for the assembler are known as directives. The assembler knows the instruction is its own by the pure fact that it isn't a processor instruction, not to complex really. I'll discuss macros some other time, since they're custom-instructions you can do without them. Note, you might also see this referred to as the opcode field, it's an OPeration CODE. 3rd Field) Operand Field. Many instructions need data to work on, and this is where you put that data. Think of it like this, when you add numbers you have to have the instruction, the plus sign +, and you have the data for the instruction, the numbers. So this: 2+2, would be this in assembly: add 2,2 If you use the HP brand of graphing calculators then you'd recognize this as reverse-polish notation: issue the instruction and then the data. It might be a pain now, but you'll come to appreciate it--actually makes more sense doing it this way the more you think about. Notice that each individual data element is separated by a comma and not a space, remember that spaces separate instruction fields and not data elements. If an instruction doesn't need any data, like the 'rts' instruction, then you can of course leave it blank. 4th Field) Comments. This field is the comment field. I'm not exactly sure if Fantasm automatically recognizes this as a comment field, but it's customary to begin this field with a comment character. Comments in assembly begin with either a semicolon (;), or an asterisk (*). Anything after this character is ignored. Comments do not bleed over to the next line, so when you press return the comment is done. (Actually, when you press return the entire instruction is done, regardless of what fields you have left to fill in) You can make the entire line be a comment by simply making the first character a comment character. (As a personal preference, when I make the entire line a comment I use the asterisk, I use the semicolon when I want to add a comment after an instruction--I don't have to hold down the shift key so it's a little faster:) Because assembly instructions are rather cryptic at times, you might want to comment each line of your code--versus other languages, you'd be surprised at how soon you forget what a piece of code does. This is just my own personal preference however, and you'll probably develop your own commenting style. If you do comment, I recommend you make it something informative: don't say "adding 2 and 2", say "adding 2 to player 2's score".
Moving right along... Last time I discussed the format of a typical assembly instruction, they're composed of 4 fields: label, instruction, data, comment. I also talked about how to enter and exit the program: first line enters, and an 'rts' instruction exits. Now it's time to write a program that does something. As I stated before, you have to learn assembly and MacOS, so you won't be able to see any output on the screen for awhile. So in order to follow your programs you should get a debugger. Stu suggested getting MacsBug, and I do to. You can get it at <ftp://ftp.apple.com/devworld/Tool_Chest/Testing_-_Debugging/Debuggers_-_dcmds />. You can also get version 6.5.4 there too if you want it. Once you get MacsBug you'll probably wonder what to do with it since it isn't an extension, control panel, or application; in fact, if you double-click it the Finder refers to it as a document. What you do is put it in the top level of your System Folder, i.e. don't put it any folders that are in the System Folder. Once there, reboot for it to take affect--if you watch the startup sequence you'll see a message saying that a debugger was installed. Now, how do you use it? If you press the command-power combination--power is that triange key in the upper-right of the keyboard--then you'll be dumped into MacsBug. What you should see is a white screen divided into sections. I'll briefly touch on them, after awhile you'll kinda learn what each of them is for: - the left side of the screen lists registers, condition codes, the current application name, and previous stack levels - the big section of the screen is where all output goes - below the output section are three lines that show the next instructions waiting to be executed by the processor - finally, below the 3 instructions is a command line for you type in MacsBug commands--betcha never though the Mac had a command-line did ya':) To begin learning MacsBug type in "help" and some text will appear detailing the major sections that the help is divided into. You can type in "help" and one of the topics to get help on that topic. For instance, to get help on editing in MacsBug you would type "help editing". You can play around with the help to learn everything about MacsBug, but for now I'll tell you some basic commands you'll want to know about: G) "G" stands for go and it exits you out of MacsBug and returns you to whatever you were doing before you entered. While you're in MacsBug processing is suspended and nothing gets done on your computer until you leave--CD players will still play because they don't need processing to play. ES) This force quits the current application, equivalent to command-option-esc. It also exists MacsBug. RS) This unmounts all disks and restarts the computer. There is another command that restarts without unmounting, but you don't want to use that unless you're feeling adventurous. If it happens that for some reason you can't leave MacsBug by typing "g" or "es" then you'll have to resort to "rs". If even that won't work then you'll have to use command-control-power. (Command-control-power is equivalent to that other MacsBug restart command, so if you must force a restart try "rs" before doing command-control-power.) The method of entering MacsBug described above, command-power, is only one way of entering MacsBug. Another way is to put a 'debug' command in your program. When the processor sees this it dumps you into a debugger, if present. Note, that if you don't have MacsBug installed, or some other debugger, then 'debug' will trigger a system error. When you're in MacsBug using this method the processor will be about to execute the command immediately following the 'debug' command. So in this program: debug Main: rts global Main The section in MacsBug that shows the three commands will have 'rts' at the top, meaning it's going to be executed next. (Notice the label isn't at the very start of the program, you just need one in there, but it doesn't matter where.) You could type in "g" to continue the program, which would promptly quit. However, you can use MacsBug to step through the program, instruction by instruction. To do this you can use these two commands: S) "S" steps into an instruction. I use the word, "into", for a special reason: there's another command that steps "over" instructions. Essentially, when you step into an instruction you directly execute it, you don't try to do anything special with it. If you were to step into a MacOS trap, for instance, then you would begin to see the actual commands executed by the trap, and not merely execute the trap. Read on in step over to learn more... SO) "SO" steps over an instruction. It does the same thing as "s" except it will just execute the instruction. If you were to step over a MacOS trap then the trap would be considered one instruction, and not a gateway to other instructions--the trap would execute and you wouldn't see what's going on. When MacsBug does this, stepping over a trap, it will flip the screen to the main screen and flip back to MacsBug when it's done. This logically brings me to another MacsBug command... ESC or ~) Pressing either "esc" or the tilde key will flip the screen from MacsBug to the true screen. This is the screen that you see before you went into MacsBug, e.g. the desktop. You can also indicate a number of instructions to step. So you could step into, or over, 3 instructions by typing "s 3", or similarly "so 3". Try this to see the difference between stepping into and stepping over. If you did try this you might be surprised that when the 'rts' instruction was executed your program didn't quit. When I said that 'rts' does more than just quit your program this is some of what I meant. Actually, 'rts' returns program execution to the routine that called the routine containing the 'rts' instruction--sound confusing?! Your program is essentially just a routine: program execution was going on before your program began, and it'll continue after it's done. 'rts' means "ReTurn from Subroutine", so executing 'rts' is returning program execution to the calling routine; probably the Process Manager. So if you do step past 'rts' the instructions you see won't belong to your program. Now that you know how to look at what your program is doing, I'll discuss the basic data storage unit in assembly: the register...
Last time I talked about viewing your programs via a debugger, MacsBug. (BTW, if you're tired of MacsBug's colors then you can download a program that will let you change them. You can get it at <ftp://mirrors.aol.com/pub/info-mac/dev/color-macsbug.hqx>. When you launch it, you'll get a dialog to search for MacsBug, find and select it and you're on your way.) Now that you can use MacsBug, in color, you can observe the fundamental assembly tool in action: registers. Registers really are your tools, there really isn't any other way I can say it. Registers will be your variables, they will be your constants, they will hold your pointers and handles, they pretty much are your program. Unless you've programmed in assembly before there isn't anything you can relate registers to in other languages. The best analogy I can really think of for registers, is that they are your tools. As such, there are very few limitations on their usage--heck, you use them to do other things so there shouldn't be any restrictions on them. They are totally unique to assembly, and totally essential. Hopefully you've gotten the concept that you'll be using registers for everything that you do. But you may be wondering, "What is a register?" A register is a holding place in the processor for data. When you want to add two numbers, the numbers will be in registers, added, and then stored back in registers. (As with all new things there is more than meets the eye, so what I just said isn't totally true; but for now just believe it.) Essentially, the processor uses registers to carry out your instructions. Exactly what a register is is unimportant. What's important is knowing that you have to use them for you to program in assembly, i.e to program the processor. So, how do you use them? The answer isn't as simple as the question. Assembly language uses the instruction set of the processor for its commands, so the assembly language for one processor, 68K, won't be the same as for another, PowerPC. If the processors are different, then don't you think the registers too will be different? They are. So if I were to tell you how to access registers for one processor it wouldn't be quite the same method for another processor. Since every Mac user has access to the 68K instruction set, I'll teach in that--plus the fact that's all I know. (It actually isn't so different accessing registers for other processors, but since it is different and you're just starting out, it might hinder the learning process to learn too much seemingly conflicting stuff too soon.) Having said that, here's how you would move the value 2 into the first 68K data register: move.l #2,d0 After all the scare I used you'd thought it would've been tougher than that wouldn't you:) Even so, there are a couple of things to look at here, so let's dissect that instruction. ---------- First, look at the instruction, 'move'. This instruction does just that, it moves data from one place to another. When you explicitly want to put a value into a register you'll use this instruction. As a side note in case you ever get confused, you don't technically move data, you copy it. The place that you're moving from doesn't lose its value as you might think. It just sounds better to say you're moving rather than you're copying:) After the 'move' instruction, there's this ".l". That suffix tells the processor how much data to use, in this case, how much to move. Here's a program to use so you can see what's going on in MacsBug: debug Main: move.l #2,d0 rts global Main When you enter MacsBug, you'll be ready to execute the 'move' instruction. Before you do, notice something. In the bottom-left column of the screen there's a bunch of numbers that have labels preceding them. The numbers preceded by "D0, D1, D2, ... D7" are data registers, and the numbers are what is in the respective register. Although your actual numbers will look different, here's an example: D0 6743FF23 This means that the number 6743FF23 is the current value of the D0 register. (Yes, 6743FF23 really is a number:) Now, step into the next instruction, by typing "s", and notice what happens to the D0 register. The entire thing becomes 2: D0 00000002 Isn't the value 2 small enough so that only the last digit needed to be changed? Yes it is, but you told the processor to use all of the register to hold the value 2, so it did. You told it this via the ".l" suffix to move. What does the ".l" suffix mean anyway? In 68K it means long, which really means 32-bits. That's right, the 68K is 32-bit, and the MacOS has been since way back when--kinda makes you laugh when you hear all the hype about 32-bit Win95 programs. Whenever you hear that some processor is 32-bit, this is what it means: its registers hold 32 bits worth of data--this comes out to be about 4 billion something, I think. Other suffixes you can use are: .w - means word, which is 16-bits--half a long .b - means byte, which is 8-bits--half a word, and a fourth of a long So you couldn't tell the processor to just change the last digit even if you wanted to, the best you could do would be a byte. (Again, there's more than meets the eye here..., if you *really* wanted to just change the last digit you could.) What happens when you only use half of a register, *which* half gets used, or what about a fourth of a register? The answer is always the first half, or fourth, of the register; essentially the righmost part. The processor starts counting the bits of a register from the right, and goes to the left. So, if you did this: move.w #2,d0 this, D0 6743FF23 becomes this, D0 67430002 Similarly, moving a byte would result in this, D0 6743FF02 I'm sure by now you know what the operand part of the 'move' instruction I listed does. It moves a source value, "#2", into a destination "D0". You could change the "D0" into "D1 or D2, ... D7", and you would move the value 2 into the other data registers. Try it to get used to using more than one register, you won't hurt anything--at least nothing rebooting won't solve:) There really isn't any difference between each of the 8 data registers, so use and abuse them however you see fit.
Last time, which really was a long time ago, I talked about the importance of registers in assembly programs. I also talked about how to move data into the registers, data of varying size. To start of this lesson I want to show you something, and to do that I'll need to introduce a new MacsBug command: IL) Typing "il" just by itself produces half a page of disassembled instructions starting from the next instruction. In assembly an assembler, Fantasm, converts your instructions to the actual numbers the processor needs in order to execute. Disassembly is exactly the opposite, taking the numbers and producing the assembly instructions you write--the disassembly won't look exactly like your original source of course, but it'll be much more readable than the hex equivalent. To get more disassembled lines, just repeatedly press return. You can also put an address after "il" to disassemble from that address. If you haven't already done so, type "il" and look what happens. You'll get half a page of disassembled instructions, like I said, but take a look at each line. What you'll find is that every line uses a register in some form or another; and the few that don't appear to use them, do. What's the whole point? The point is, is that every program on your computer uses the same finite number of registers that your program uses. If you broke into MacsBug by typing command-power, then you'll especially notice this. The registers that you see on the left-hand side of MacsBug are essentially, all the registers there are. So how in the world can an infinite number of programs use the same finite number of registers and still work? The answer is that each program uses the register for whatever is has to do, and if it needs to save some data, it saves it in memory. You might think of an analogy of registers somewhat like this--yes, I've come up with a new analogy: Registers are like the RAM on your computer system, you have a finite amount of RAM and any number of programs that can use it; but not all at the same time. Suppose that you only have 8 megs of RAM, and your System is using up 3 megs; you have 5 left. You decide to do some web page editing, and open up a new program; let's say it uses up 2 megs leaving only 3 free. Now you want to preview it by opening up Netscape, but Netscape requires at least 4 megs of RAM and you only have 3, what'll you do? You could buy some more RAM, but a more sane solution is to quit your web editing program to free up more memory, then you'll have enough to launch Netscape. This analogy of quitting programs to free up RAM is similar to what is done with registers; although with a little caveat. You have to quit a program to gain use of its memory, but you can just take over a register and start using it--rude, perhaps, but quite effective. So if you wanted to use register D0 for something, just use it. But here's a problem, if you just use D0 without considering who was using it before, don't you think some other program will do the same thing to you? What happens if D0 has something important in it that you want to keep for later? The solution, save the contents of the register to memory. In 68K, you do this with address registers. In MacsBug, these are the registers that start with, "A0, A1, ..., A7". Address registers are registers, just like the data registers, so you can move data into them, so you could write an instruction like this: move.l #0,a0 For various reasons though, you'll almost never issue the instruction I just wrote. I'll explain why in the next lesson. Address registers are your link to memory, RAM. To store something in memory you would issue an instruction like this: move.l #2,(a0) The difference between this instruction, and the one I listed above are the parenthesis. The parenthesis indicate to the processor that the number 2 should be moved to what a0 points to. If A0 contained zero, from the first instruction, then the number 2 would be stored in memory location 0, from the second instruction. Do you see what I'm talking about? Here's something that might help you out even more. Whenever you deal with memory, or registers, the processor computes something called the "effective address" (EA) to figure out exactly where you want to manipulate stuff. The effective address doesn't technically have to be an "address", it just means where you want to manipulate stuff. So, in a previous instruction I posted in a previous lesson: move.l #2,d0 The source effective address is the number 2. This isn't an address, of course, but it is an effective address: it tells the processor where the source data is, the number 2. This kind of effective address is known as "immediate". It's called this because the data is immediately available, it doesn't have to go out to memory or look in a register, the value is just given to it. You indicate the immediate form by preceding a number with a pound sign #. The destination effective address is the register D0. Again, not an address, but an effective address; it merely tells the processor where to do stuff. In this case, it does stuff in/to the register D0. And in this particular instruction, it moves stuff to that register. This kind of effective address is called "register direct". You'll understand the name when I tell you about another effective address. "Register indirect" is something that applies only to address registers. What this EA means is that the place to do stuff is contained in the register. So, if A0 contained 0, the place to do stuff would be in address location 0; in register indirect, effective address truly does mean an actual address. Using the instruction listed above: move.l #2,(a0) The place to get stuff is immediately told to the processor as the number 2, and the place to put that stuff is the address pointed to by A0, i.e. the value that's in A0. Get it? In this instruction: move.l #2,a0 The place to get stuff is immediately told to the processor as the number 2, but the place to put that stuff is the register A0, not what A0 points to. Get it? So to sum up, you've learned 3 different addressing modes, effective addresses: - immediate, indicated by a # sign and then a number, e.g. #2, and means an actual number - register direct, indicated by a register, e.g. D0, and means the actual register - register indirect, indicated by a register surrounded by parenthesis, e.g. (a0), and means the address pointed to, contained in, the register; naturally, this mode only applies to address registers. You'll learn more addressing modes later on, but these three should keep you happy for awhile. Before you start storing values to memory though, you should have a place to store them to. If you don't then you could store a value in someone else's program, possibly corrupting it. There are various ways to get memory for your program: use the Memory Manager to get it for you, use the stack, or just create space for it in your program. Since you have to use the OS for the first method, and we haven't discussed that, the first option isn't viable right now. Unless you know about the stack, and unless you only need temporary memory, the second option isn't advisable either. So that leaves the last option: embedding the space in your program. You do this by using a Fantasm directive: 'ds', define space. You can specify how much memory you want, and how big each individual memory "module" should be. To allocate 16 bytes of storage, you could do this: ds.b 16 you could also do this to get the same amount of memory, ds.w 8 or you could do this which does the same thing, ds.l 4 You'll know which method to use when you start writting your program, but since I explicitly said allocate 16 *bytes*, it would make more sense to use the first instruction, ds.b 16 even though the others are just as valid. Now that you have the memory how do you go about accessing it, where is it? Well, the simple answer, is that it's where you put it! In this little program, the 16 bytes are after the 'rts' instruction: debug Main: move.l #0,a0 ;A0 contains 0 move.l #3,d0 ;D0 contains 3 lea space(pc),a0 ;A0 contains address of space move.b space(pc),d1 ;D1 contains first byte of space move.b d0,(a0) ;first byte of space has 3 move.b space(pc),d1 ;D1 contains 3 move.b (a0),d1 ; " rts space: ds.b 16 global Main Whoa! Introduced a few new things there, didn't I:) The first new thing, is the instruction 'lea'. This means 'Load Effective Address', and does just that: it loads the effective address of something into a register, an address register. It's too bad the effective address contains so many unknown things, or else you could figure out what the effective address is:) The "pc" is a special register, much like an address register, which contains the address of the next instruction to be executed; it means "Program Counter". You can't really directly manipulate it, so an instruction like this would be illegal: move.l #4,pc However, you can directly read it, so this instuction would be legal: move.l (pc),a0 There is no "pc" direct addressing mode, so this too would be illegal: move.l pc,a0 If you think about it, you really don't need the last instruction: since you can't directly move data to it, why have an instruction where you could directly move data from it. In short, the only access you have to the program counter is the second instruction, register indirect. So, this is what happens in the second instruction: move.l (pc),a0 You move what is pointed to by the pc into A0. What is pointed to, contained in, the pc is the address of the next instruction; so you would move the next instruction into A0 with this instruction! Still don't get it? Look over this little example: 00000000 move.l (pc),a0 ;A0 contains the number for 'rts' 00000004 rts At the first instruction, the pc contains the value 00000004, the address of the next instruction. According to what I said above about register indirect, the effective address is the address pointed to by the register. Well, the pc is pointing to 00000004, because it contains 00000004. So, the 'move' instruction is going to move what's in 00000004 into A0. What's in 00000004 is literally the next instruction, 'rts'. So, A0 would have the number that represents 'rts'--whatever that happens to be. If you get what I just said, you might think that that's a little stupid, you're always going to move the next instruction into something--you want to execute instructions, not get their values! The question is asked, "Is it possible to reach beyond what is pointed to in a register?" The answer is yes, and it's called base displacement--well, it's called something, but I doubt it's what I just said:) Anyway, the syntax looks like this: 0(a0). Simply, add a number before the parenthesis for register indirect and you get this new addressing mode. The effective address in this case is what is contained in the register, plus the number. So, the particular example I just showed is equivalent to register indirect, since zero plus anything is the same thing. You could put a negative number there to go back, just as you can put a positive number to go forward. Either way, the number is stored internally as 16-bits, which is equivalent to this range, -32768 to 32767; so if you have something that's beyond 32767 bytes then you're in trouble:) In the original instruction I listed: lea space(pc),a0 I used a label, "space", instead of a number, what's going on? The same thing that I just said. In this particular case though, Fantasm is doing the work of figuring out what the number is and not us mere humans. It's getting the distance from the label, "space", to what the pc would contain if your program were running; and when you assemble your program, it inserts this number instead of the label "space". Be thankful Fantasm does this chore for you! You can confidently rest assured that the right number will be inserted, so it really doesn't matter how Fantasm does what it does. So, now that you know what all that new stuff means, let's figure exactly what's going on. - first, figure out the effective address. It's what's in the pc, the address of the next instruction. Plus, the offset from the next instruction to "space". Which comes out to be the address of "space". For now, just trust that this is what's going on. - second, see what the instruction is going to do with that effective address. This instruction, 'lea', is simply going to Load the Effective Address into a register, in this case A0. So, A0 contains the address of "space". Another instruction in the program: move.b space(pc),d1 has the same effective address, space(pc), but does something different with it. It moves stuff from that address. So, D1 will contain the first byte in "space"; it won't contain the address of "space". Basically, once 'lea' gets the effective address, it just *stores* it; whereas 'move' will *use* the address for something. Anyway, step through that program; you might want to run it a couple of times until you understand what's going on. Don't forget to pay attention to the register contents as you go along. BTW, when you type in "il" in MacsBug, you won't see "pc" listed in the disassembly, you'll see an asterick instead. So this, move.b space(pc),d1 would be this in MacsBug move.b *+8,d1 The eight would be whatever Fantasm puts in, i.e. it probably won't be eight as I listed.
Last time I talked about the volatility of registers and how to save their values, to memory. I also talked about the pc, program counter, and the 'lea' instruction; which together you can use to save registers to memory. debug Main: lea space(pc),a0 move.w (a0),d0 move.w #3,d1 move.w d1,(a0) move.w (a0),d0 rts space: ds.w 4 I talked about the 'lea' instruction I listed above in the last lesson, but I feel it deserves a little more talking about. From the last lesson you know that the source addressing mode is base displacement register indirect, or something like that. The proper use for it is to put a number before the parenthesis, and not a label like I did. So if you had this: lea 4(pc),a0 you would be taking what's in the pc and adding it to 4. Having the label there instead is a special syntax to Fantasm. It means different things depending on exactly what the label is, but for right now, and in the program above: it means the address for the label. Ultimately, the label in the addressing mode will get replaced with the offset from the instruction to the label; this is why it is a displacement register indirect addressing mode, because to the processor a number is where the label is. The method used above is the only way you can get the address of a part of your program, and hence access to your embedded data. Why? Because you don't know where in memory your program is going to be when it is run. Virtually all modern OSs allow multiple programs to run at once, and to be loaded in any order: so it doesn't matter if you launch Netscape first or SimpleText, they'll both work regardless of who goes first. So it's entirely possible for SimpleText to load at memory address 10000 one time, and then to load at 30000 the other. The one thing that you do know is the relative distance of the data you want. If you imagine your program starting at memory address 0, then your data could be at 20. It's important to note that the data isn't really at 20, but it's at 20+start-of-your-program. So if your program started at 10000, your data would be at 20+10000 which is 10020. If your program started at 30000, then the data would be at 30020. Since you never know absolutely where your data is, you have to specify relative addresses. The normal address registers won't contain anything useful to you, because any program could have used them before your program was run. The only register that is predictable enough for you to use is the pc: it *always* has the address of the next instruction. So you can use this to your advantage to find the addresses of parts of your program; all you have to do is specify an offset, a distance, from the pc to wherever you want. As I stated above, specifying a label is a syntax to Fantasm meaning the address of the data you want. So instead of figuring out the distance from the label to the current instruction, which would be quite hard, you can just put in the label as the offset to tell Fantasm to figure out the offset. Once you do get the address figured out, you have to store it somewhere; otherwise you'd have to figure it out again if you ever wanted to use that particular address. That's why you have to use the 'lea' instruction: all it does is store addresses, it doesn't even attempt to use the address in any way. So, to sum up, if you want to get the address of a part of your program, and hence your data: * use the 'lea' instruction * specify the label of the part of the program you want the address of * use the pc * specify an address register So it would look something like this: lea space(pc),a2 When I first introduced the 'lea' instruction, I did so kind of hapharzardly. But now you see that it is really an important instruction. Maybe you thought you could get away with not using the 'lea' instruction. Maybe you thought you could get away with not using the pc. Or maybe you thought you could get away with not using the displacement register indirect addressing mode. Well, you can't, live with it. Previously, I stated that you would almost never do something like this: move.l #2,a0 moving an immediate value into an address register. Maybe now you see why I said that. Since you use address registers to, well, address memory, and you never know where in memory your program will be; specifying an immediate value is in essence specifying an absolute address, in this case memory location 2. Do you know what's always going to be there? Of course not, that's why you would almost never do something like this: you might think you would write to your program but you would in fact end up writing to whatever is occupying that space; which more than likely won't be your program. Later on, you'll see an exception to this; but in general, never move an immediate value to an address register.
Last time I emphasized the importance of the 'lea' instruction in accessing your variables. Here's the program I listed in lesson 4: debug Main: lea space(pc),a0 move.w (a0),d0 move.w #3,d1 move.w d1,(a0) move.w (a0),d0 rts space: ds.w 4 There's one thing I want you to notice: *where* I put the space for the variables, the 'ds' instruction. What do you think is in "space" when you're running your program? Well, most likely it's going to be zeroes, but it's possible it could be anything. Even more intriguing, what do you think would happen if "space" was above the 'rts' instruction? What if it looked like this: debug Main: lea space(pc),a0 move.w (a0),d0 move.w #3,d1 move.w d1,(a0) space: ds.w 4 move.w (a0),d0 rts If you're feeling lucky, try this program; pay careful attention to what's going on while in MacsBug. If you just tried that out and you're reading this sentence *immediately* after reading the previous sentence, then you're lucky. Most likely, that modified program would have crashed; your program definately, and your computer probably. Why, because you were executing data and not your instructions. If you paid attention, you would have even saw the program change itself, self-modifying code. All of this from just putting the variable "space" before the 'rts' instruction, amazing... This serves to highlight special considerations when using embedded data for your variables. You should always put the space for the variables after the end of your program. Really, you should put them after code that's going to be executed, which right now *is* the end of your program. If you don't then you get serious errors like the ones you just saw. * --- Assembly Lesson 5.5 Variables & Interpretation --- * Now that you pretty much know how to use variables, I thought I'd talk about some ways to effectively use them in assembly. Unlike in high-level languages, there are no data types at all in assembly. About the closest thing you can come to is the size of data: byte, word, long. Even then the sizes aren't terribly restrictive; what's to keep you from moving only a word into a long? (You could do the opposite, move a long into a word, but since the space you're moving into, word, is smaller than the space you're moving, long, you wouldn't generally do this as you'd get problems.) If all high-level languages eventually trickle down into assembly, how do you represent things like integers, floats, strings, classes, etc.? The answer is you don't represent, you interpret. What is it that makes the number 65, 65; and not the letter "A"? Internally, the number 65 and the letter "A" are the exact same thing. They seem different because it is known to the program when to interpret the number 65 as a number and when to interpret it as a letter. If you put a prompt on the screen asking for a *letter*, and you type in "A", then the program knows that the number 65 should be a letter. If you put a prompt asking for a *number*, then 65 will be known as a number. Thereafter, whenever the program displays what you typed in, it knows to interpret the number it has stored: the first way it will display 65 as a letter, the second, as a number. Here's a question, "Exactly *how* does the program know how to interpret numbers?" I suppose the program could have a really sophisticated level of artifical intelligence; but since we as humans make the programs, and we don't even fully understand how we know stuff, I doubt this. The best way would be to arbitrarily say that certain variables are of a certain type. So whenever we see a particular variable, regardless of it's value, we interpret it a certain way. So when you store a number, you would store it in one variable; and when you store a letter, you would store it in another variable. This way, you can easily know how to interpret the data; even if the number is 65 and the letter is "A". In Fantasm, you can specify a few interpretations: * decimal numbers - to specify these just type in the number, to move 65 into register D0, do this: move.l #65,d0 * hexadecimal numbers - for these just prefix the number with a dollar sign $, to move hexadecimal 41 into D0: move.l #$41,d0 * binary numbers - for these just prefix the number with a percent sign %: move.l #%1000001,d0 * letters - for these just put the letters in double quotation marks "": move.l #"A",d0 In case you haven't figured it out, all four of the above sample instructions move the exact same thing into D0. But did I move a letter, binary, hexadecimal, or a decimal into D0? Well, it depends on what you're doing as to what you think you moved. If I were specifying a file type to the OS, then I moved a letter; if I were specifying the score for player 2, then I moved a decimal; if I were specifying an event mask to the Event Manager, then I moved a binary; if I were specifying the contents of a file to a programmer, then I moved a hexadecimal. It all depends on what you're doing as to how to interpret things. I know this might be confusing now, but later on as you progress in assembly you'll appreciate the freedom to do whatever you want. After all, if you can interpret the letter "A" as a number, 65, then you can add one to 65 to get 66 which can then be interpreted as "B"; you can keep on doing this to display the entire alphabet. Most likely, you'll specify things as decimal, since that's what we're used to, but don't be afraid to spread your wings and do wild things that you never even imagined before; with no restrictions placed upon you, you can certainly do them. One thing I would like to note is this. A letter is stored using ASCII representation, which is just a standard way of interpreting numbers as letters, i.e. 65 means the capital letter "A". ASCII characters are a byte in length. 68K registers are 4 bytes in length. So you can specify up to four characters at a time in one register. So you could do this and it would be perfectly valid: move.l #"Help",d0 but this wouldn't be valid: move.l #"Help Me!",d0 One other thing before I finally finish up the topic of memory and addressing. You can specify things like records, structs, etc. through the use of Fantasm's 'rs' directives. Imagine you had a record that consisted of these items: cats - a long dogs - a long apples - a byte oranges - a word How would you go about storing this in memory? Well, you know that one record takes up 4+4+1+2=11 bytes, so you would need to allocate at least 11 bytes of memory to store it. You would get the address for it through a 'lea' instruction. Now what, you have the address to the start of 11 bytes, split up between four items. The key thing to understand is that the items take up a specified amount of space, and that they are located at a relative distance to each other. Remember the example I stated about your data being at a relative distance from the start of your program, say 20+start-of-program? Apply that same idea here: each item is at a relative distance from the start of the record. Since "cats" comes first they are at offset 0 from the start of the list, they are the start of the list. Since "cats" are a long, which is 4 bytes, "dogs" begins at offset 4 from the start of the record; similarly, "apples" are at offset 8, and "oranges" are at offset 9. "Oranges" is the last item in the record and is at offset 9, "oranges" takes up a word, which is 2 bytes, so 9+2=11 and that's how you get the end of the list, the size of it. So here's how you could access "apples": debug Main: lea record(pc),a0 move.b 8(a0),d0 rts record: ds.b 11 D0 does have the value in "apples", even though it wasn't too apparent. You can make it more noticable by defining a label to be a value and then putting that in, instead of 8. You do this by using 'equ' directives. It would look like this: apples: equ 8 and like this in the program: apples: equ 8 debug Main: lea record(pc),a0 move.b apples(a0),d0 rts record: ds.b 11 Remember when I said that when you use the displacement register indirect addressing mode--try saying that 3 times fast--and you specify a label, that label means different things depending on the label? Well, this is one of those different meanings. Since "apples" was explicitly given a value, of 8, that's what it is. A label like "record" doesn't have an explicit value, and so it evaluates to the address of "record". In any case, when you put in "apples" in the above program, it evaluates to 8; and the whole addressing mode ends up giving you the value in "apples". You could just as well define each of the record items, cats, dogs, apples, and oranges, with 'equ' directives; but what happens if you later decide to change the order of the record or the size of one of the items? The answer, you're screwed! Instead you can use 'rs' directives. 'rs' directives are similar to 'equ' in that the label that contains the directive is given a value; the difference is that you don't explicitly give it. When your program is getting assembled it goes through two scans: the first one gathers data about your program, and the second actually assembles it. At the start of the first scan there's a counter that is initially zero. Each time Fantasm comes to a 'rs' directive, it assigns the current counter value to the label and then increments the counter by the size of the 'rs' directive. The net effect is that your labels get properly assigned values, even though you haven't explicitly assigned them. If you change the order of the 'rs' directives or the size of any one of them, only a reassemble is necessary, since the scan will correctly reassign the label values. So here's what our record would look like using 'rs' directives: cats: rs.l 1 dogs: rs.l 1 apples: rs.b 1 oranges: rs.w 1 The 'rs' directive is sized, meaning you can specify a long, word, or a byte. And you can also specify how many of each size you want; I only specified 1 of each item. 'rs' directives are also similar to 'ds' directives, the difference is that 'rs' directives don't actually allocate any space like 'ds' directives do. One thing to note is that every item of the record has to be listed, I can't just list "apples" like I did when I used an 'equ'. (You could just list "apples" if you really wanted to, do you see how? Like this, 'rs.b 8'. Put that directive above "apples", with no label if you like, and "apples" would get the right value; although by doing this you kind of defeat the purpose of using 'rs' directives.) If you want multiple record definitions, then you can use the 'rsreset' directive to reset the counter before you start specifying the record. So here's a real good way to get the juice out of "apples": rsreset cats: rs.l 1 dogs: rs.l 1 apples: rs.b 1 oranges: rs.w 1 debug Main: lea record(pc),a0 move.b apples(a0),d0 rts record: ds.b 11 global Main About the only downside to 'rs' directives is that you can't leave out the size specification. If you wanted to make "apples" a word, then you would naturally have to change the size in the 'rs' definition; but you would also have to change the size anywhere you use "apples", like in the 'move' instruction. * --- MacsBug Stuff --- * If you want to examine memory then you can use this instruction: DM) this displays the memory at the address you specify If you were in MacsBug and you wanted to see what was in "record", then *after* the 'lea' instruction, type this in "dm a0". The address you specify doesn't have to be an actual memory address, it can be a register; you could even display memory from a data register! If you break into MacsBug inside of a program that has windows on the screen, type this in "dm windowlist^ window". What came up? Neat, huh? Look at the help sections on memory, expressions, and templates to figure out how I did this. ..._Tim_... --=[Until you believe something, there is nothing to be proven]=-- http://www.winthrop.edu/~humphret email@example.com