Assembler 80x86 for beginner Ombrae bjdc@canl.nc asm, Assembler 80x86, programming I. Intro So you want to learn Asm? Why? Maybe, you want to write the deadliest virus, or to crack the last soft you download, maybe you want only to optimize your code... Today, peolpe find many many reason to NOT to learn Asm... As I'm honest, I will give you these reasons: _Asm is hard to learn (e.g. "Why must I push the es and then pop it into ax???) _Asm is hard to understand, with lots of jmp that makes your code jump anywhere. _Asm uses completely occults terms like PSP, DTA, register, stack, accumulator... _you work during 2 days, producting code that can be written in 2 hours in another langage _wow!! these pages and pages of code make me crazy!!!!!!!! _why must I use Asm, when modern computer uses MB of mem and have 500MHz processor???? _asm isn't portable..... _and many more..... Yes, but Asm can give you one thing that no other langage cannot give you.... TOTAL POWER!!!! Asm gives you the total (maybe not all...) control of your processor. Your OS speaks with CPU using bits of code, you too can produce the code to gain the control of a computer, doing what you want (and what you can code :) with your hardware... What Asm offer you : _speed _space _capability _control _a better knowledge of your computer (So you are a C expert?? So what is the PSP? How do you set a new DTA? Can you put your prog in TSR without int 27h or function 31h of int 21h? What int 27h and function 31h of int21h? What is the fastest way to print 1 character on sceen?) II. Basic: Computer Well... What is a computer??? You can answer this. It's a machine that resolves operations. A caluclator or an abacus too! Are they computers? I don't think so. (look, this is the last Abacus Power 500MHz with full SCSI2!!! :) A computer is a very complex assembly of electronics, where your little output on your screen is only the tip of a BIG iceberg... It uses a stream of eletrical values coded as 1 or 0, swarming into a maze of transistors which act as open/close doors. The status of these doors determine what you do. Imagine that you have two bags (b1 and b2), each with a white(W) and a black(B) ball. If you pick one ball in each bag, you have 4 possibilities: b1 and b2 => W b1 and b2 => B b1 => W; b2 => B or b1 => B; b2 => W Now imagine that you can decide which balls you want... How can you be certain that you will have the right balls?? You put into each bag only the right ball... (Yeah, the approch isn't conventionnal for logical explaination..... :) Forget that example.... Imagine that you have a dungeon, with 2 doors and 1 guard (You love RPG?? me yes!) One door leads to the exit, the other to death (yes, this is also a form of exit but well, we're not in philo.... :) (Oh lucky one you are!! you don't have philo in your exam when you want to be a scientist (or anything you want to be...)) The guard opens the exit door when you say AlkwfKnLllJld, and the Big Death Door when you say A knKJBkb... What do you say to exit??? AlkwfKnLllJld of course!!!! our computer is like these 2 doors with a guard.... it's much more complex... Say the right word (in binary please!!!) and you have the good effect... A line of 0 and 1 (a 0 or a 1 is a bit) entered in your CPU by the bus (there are 3 different buses (@, system, control)), open or close some doors, activating some circuit, and finaly saying that you have a fatal error... Talking in binary isn't very cool (nor in hex or dec...), this is very hard, and often leads to error (and I don't speak of debbuging....). So, a big brain invented Asm... They replace 100100100101110 operations (ops) by 2,3 or 4 characters code. An operation uses an 'operator' and some 'operands' (I don't know the english term for operand sorry...). An operator can have 0, 1 or 2 operandes, and one result. EX: 1+2=3 operator +, operandes 1 and 2, result 3. So, when you talk with your computer, you send it the succession of operations to do. Asm allow you to send an operation directly to the CPU. Combine some operations, and you have a program. You can ask me how high level language (ex: C) talks to your CPU?? Do they use another langage? No.. they use the same langage... But at a higher level. When you use a C function, this function is traduct in a set of asm operations, but some of (a lot of!) these operations are useless!!!! When high level language are created, we cannot know how a function will be used! So, we use more than necessary redundant protections, include some unprobable case, which means more operations, more code, more CPU time... In asm you put the code which is necessary, and only that! Ex: I try to make a comparaison between C and Asm, so I code a simple prog, which do nothing(nothing!). _C=> 1723bits _Asm: 2 bits Interresting no? Now we return on our comuter... Learning Asm force you to learn register, segment and other exotic knowledge... What is register? This is the replacement where we put our operande. Registers (reg) are electronic compounds placed into the CPU, whose access is many more times faster than memory. They are much more expensive too, so their number is limited. As they are limited, and hardware, they have predefinate names. The good news is that there are not many and complex names. The bad news is that you must know ALL these names... :) The registers are divide into 3 groups: _general purpose registers _segment registers _Special regisers 1. General Purpose Can be divide in 2: *On 486 computer, there is four (4) multipurpose registers: AX, BX, CX, DX of 16bits, each on can be divide into two (2) 8bits registers named AH-AL BH-BL CH-CL DH-DL They look like that: | AX | 1 2 3 4 5 6 7 8| 9 10 11 12 13 14 15 16| | AH | AL | (AHigh and ALow) We can place what we want into these reg (if they can contain it: Ax = 0 to 65535) On pentium, there are 32bits (thir.. Ok I stop here!!! :) registers too: EAX, EBX, ECX, EDX *SI, DI, SP and BP: SI and DI: Source and Destination reg, used like indirect pointer to memory SP: Stack Pointer.... When we see the stack, we see that.... BP: Base Pointer, many use (delta offset in virii... :) 2. Segment Segment is piece of memory 64KB long where you put your code. Segment Registers hold the address of these segment. CS: code segment, where your code are in memory DS: Data segment, where are your data SS: stack segment, wait a moment ES: extra segment, for what you want 3. Special _IP: Instruction pointer, you can directly modifie it, this register contain the @ of the next instruction _flags register: flag are... flag! :) We use flag (1bit long each) to determine the computer status (ex: many error return a Carry flag (CF), so after an instruction, we check the carry flag to see if all is good) We can use value in memory as an operande or constant too. Last: the STACK. I know what the stack is... I can explain it... in french :( The stack is, well, a stack is where we can push or pick(pop) a value. There is only one rule: FIFO!!!! First In, First Out!! So if we push the value 50, if we pop, we have the value 50! We cannot directly pop value below 50. SS: Stack segment, where begin the stack. SP: pointer to the last element of the stack ____ | | <= SP Each time we push, we add 1 element to the stack, and one to the SP | | Each time we pop, we remove the element on top of the stack, and | | decrease the SP of one. | | If we remove more elements of the stack than the real number of element, | | this is call a stack underflow. If you add too much element, this can lead | | to a stack overflow. Stack under and overflow are dangerous... | | Don't modify SP or SS if you don't know what you do.... | | ____ <= SS III. Very(!) Basic Asm (Enfin!) OK, first, get TASM or MASM, TLINK or LINK+EXE2BIN (you can find EXE2BIN under dos) Open your Notepad (or any editor who don't add some stuff, like Word), and begin to write... __________________________________________________________________________________ Code1 segment ; you can can put commenters with a ; assume cs:code, ds:code ; like that org 100h starthere: mov ah, 09h lea dx, phrase int 21h mov ah, 4ch int 21h phrase db 'Look mum!!', 10, 13 'I write on my computer!!!$' code1 ends end starthere ___________________________________________________________________________________ Save under first.asm Copy it under your masm/tasm directory open a DOS box Go to masm/tasm dir write: masm first (or tasm first) tlink /t first (or link first exe2bin first.exe first.com) first Then you'll see on your little screen: Look mum!! I write on my computer Explication? OK, go! Code1 segment => define a segment named code1 assume cs, code, ds:code => set cs and ds with the adresse of the segment name code =>as we generate a com file, the segment of code and data are the same org 100h => when we creat COM file, we must put thes directive... (This is because of the structure of com file => which start at offset 100h (h = hexadecimal; 100h = 256, use your calculator) starthere: => a tag, a label... mov ah, 09h => mov = move, so the register AX contain the value 09h (=9) lea dx, phrase => lea = load effective address, so dx contain the address (offset) of phrase int 21h => very important!!! int = interrupt! An interrupt is an action that must perfom the computer => this one (the 21h), is the DOS interrput, there is BIOS, DOS, EMS, and many interrupt When you issue an int 21h, the computer look at a specific place of the memory, where is descibe all the interrut. This one (the most used int), is the DOS interrupt. This interrupt is divide into many function. Each function is recognized throught the ah value. Here, AH = 09h, which is the print a string to screen function. This function print the string pointed by DX and finish by a $ to the screen (the $ isn't printed). mov ah, 4ch => ah = 4ch (PS: in hexa, you can have letters like a, b, c... The norm is that a number beginning by a character have a 0 befor, so use => 0FFh, 0A52h.....) int 21h => function 4ch of the DOS int! => exit the program! phrase db 'Look mum!!', 10, 13 'I write on my computer!!!$' This is data. 'Phrase' is the name of the data, db is a format (digital byte) 'Look mum!!', 10, 13 'I write on my computer!!!$' is the string we print... Complex no??? Well, their is some character which are invisible, like Return, New Line (when you hit Enter).... But they have all a ASCII number, so in a data string, we can put ether ASCII code or keyboard character. All between ' ' are characters, and the number between , , (are separators) is ASCII code. Look at these ex: 'COUCOU' one string: COUCOU 'COU', 'COU', one string: COUCOU 43h, 4fh, 55h, 43h, 4fh , one string: COUCOU (in ASCII code, find a ASCII tavle, it's usefull!) code ends => end of segment named code end starthere =>pretty strange.... this tell the prog that we start at starthere (we can place as we want tag in a program, but they must all have a different name) Now, find some protoype of int and functions and try some test. (Find HelpPC, it's cool and free) ex: int 21h Function 39h -Create Subdirectory- dx contain a string whith the path name (ex: C:\test) (don't finish the string with a $!; $ is only for printing to screen!) Ex of code: code segment assume cs:code, ds:code org 100h start: mov ah, 39h lea dx, testa int 21h mov ah, 4ch int 21h testa db 'C:\cool!', 0 ; must be finish by a 0... code ends end start OK??? You see??? Here is the basic operator: add reg, reg sub reg, reg mul reg, reg div reg, reg mov reg, reg lea (note, 'lea dx, testa' is equivalent 'mov dx, offset testa') Note: When you use a mov operand1, operande 2 ; we read from right to left: move operande2 into operande 1) Ok, finish for me, to you now...... Next time, more base on Asm (This was only the very beginning of asm learning..... :) PS: you want more advanced explaination on computers and asm?? Look at "The Art of Assembler" (in pdf format, 5,53MB unzipped in 25 chapters). This really a good txt.... (I don't finish to read it... :)