Journal: Dr. Dobb's Journal Jan 1993 v18 n1 p127(6) ------------------------------------------------------------------------- Title: Yet another animation method. (Graphics Programming) (Column) Author: Abrash, Michael Abstract: A graphics technique called dirty-rectangle animation can overcome many of the performance problems encountered with VGA display monitors. The technique is called dirty because the graphics drawn using the technique do not match those that appear on-screen. Rather than draw directly to the screen, images are drawn and stored in off-screen or non-display memory. The list of the bounding rectangles for the drawn-to areas are the dirty rectangles. These bounding rectangles are transferred to the screen once all drawing and redrawing is complete. Drawing and redrawing directly to the screen creates excessive flicker and reduces the visual presentation's quality. Dirty-rectangle animation improves image presentation because only the final pixel representation appears on-screen. The technique is also faster because it limits the amount of interaction with VGA hardware. ------------------------------------------------------------------------- Full Text: As documented last month, we brought our pets with us when we moved out here to Seattle. At about the same time, our Golden Retriever, Sam, observed his third birthday. Sam is relatively intelligent, in the sense that he is clearly smarter than a Banana Slug, although if he were in the same room with Jeff Duntemann's dogs Mr. Byte and Chewy, there's a reasonable chance that he would mistake them for something edible (a category that includes rocks, socks, and a surprising number of things too disgusting to mention), and Jeff would have to find a new source of openings for his column. But that's not important now. What is important is that--and I am not making this up--this morning I managed to find the one pair of socks Sam hadn't chewed holes in. And what's even more important is that after we moved and Sam turned three, he calmed down amazingly. We had been waiting for this magic transformation since Sam turned one, the age at which most puppies turn into normal dogs who lie around a lot, waking up to eat their Science Diet (motto, "The dog food that costs more than the average neurosurgeon makes in a year") before licking themselves and going back to sleep. When Sam turned one and remained hopelessly out of control we said, "Goldens take two years to calm down," as if we had a clue. When he turned two and remained undeniably Sam we said, "Any day now." By the time he turned three, we were reduced to figuring that it was only about seven more years until he expired, at which point we might be able to take all the fur he had shed in his lifetime and weave ourselves some clothes without holes in them, or quite possibly a house. But miracle of miracles, we moved, and Sam instantly turned into the dog we thought we'd gotten when we forked over $500--calm, sweet, and obedient. Weeks went by, and Sam was, if anything, better than ever. Clearly, the change was permanent. And then we took Sam to the vet for his annual check-up and found that he had an ear infection. Thanks to the wonders of modern animal medicine, a $5 bottle of liquid restored his health in just two days. And with his health, we got, as a bonus, the old Sam. You see, Sam hadn't changed. He was just tired from being sick. Now he once again joyously knocks down any stranger who makes the mistake of glancing in his direction, and will, quite possibly, be booked any day now on suspicion of homicide by licking. Okay, you give up. What exactly does this have to do with graphics? I'm glad you asked. The lesson to be learned from Sam The Dog With A Brain The Size Of A Walnut is that while things may look like they've changed, in fact they often haven't. Take VGA performance. If you buy a 486 with a Super-VGA, you'll get performance that knocks your socks off, especially if you run Windows. Things are liable to be so fast that you'll figure the Super-VGA has to deserve some of the credit. Well, maybe it does if it's a local-bus VGA. But maybe it doesn't, even if it is local bus--and it certainly doesn't if it's an ISA-bus VGA, because no ISA-bus VGA can run faster than about 300 nanoseconds per access, and VGAs capable of that speed have been common for at least a couple of years now. Your 486 VGA system is fast almost entirely because it has a 486 in it. (486 systems with accelerators such as the ATI Ultra or Diamond Stealth are another story altogether.) Underneath it all, the VGA is still painfully slow--and if you have an old VGA or IBM's original PS/2 motherboard VGA, it's incredibly slow. The fastest ISA-bus VGA around is two to twenty times slower than system memory, and the slowest VGA around is as much as 100 times slower. In the old days, the rule was, "Display memory is slow, and should be avoided." Nowadays, the rule is, "Display memory is not quite so slow, but should still be avoided." So, as I say, sometimes things don't change. Of course, sometimes they do change. For example, in just 49 dog years, I fully expect to own at least one pair of underwear without a single hole in it. Which brings us, deus ex machina and the creek don't rise, to yet another animation method: dirty-rectangle animation. VGA Access Times Actually, before we get to dirty rectangles, I'd like to take you through a quick refresher on VGA memory and I/O access times. I want to do this partly because the slow access times of the VGA make dirty-rectangle animation particularly attractive, and partly as a public service, because even I was shocked by the results of some I/O performance tests I recently ran. Table 1 shows the results of the aforementioned I/O performance tests, as run on two 486/33 Super-VGA systems under the Phar Lap 386!DOS-Extender. (The systems and VGAs are unnamed because this is a not-very-scientific spot test, and I don't want to unfairly malign, say, a VGA whose only sin is being plugged into a lousy motherboard, or vice versa.) Under Phar Lap, 32-bit protected-mode apps run with full I/O privileges, meaning that the OUTs I measured had the best official cycle times possible on the 486: 10 cycles. OUT takes 16 cycles in real mode on a 486, and a mind-boggling 30 cycles in protected mode if running without full I/O privileges (as is normally the case for protected-mode applications). Basically, I/O is just plain slow on a 486. Slow as 30 or even 10 cycles for an OUT is, one could only wish that VGA I/O was actually that fast. The fastest OUT in Table 1 is 26 cycles, and the slowest is 126--this for an operation that's supposed to take 10 cycles. To put this in context, MUL takes only 13 to 42 cycles, and a normal MOV to or from system memory takes exactly one cycle on the 486. In short, OUTs to VGAs are as much as 100 times slower than normal memory accesses, and are generally two to four times slower than display memory accesses, although there are exceptions. Of course, VGA display memory has its own performance problems. The fastest ISA-bus VGA can, at best, support sustained write times of about 10 cycles per word-sized write; 15 or 20 cycles is more common, even for relatively fast Super-VGAs; the worst case I've seen is 65 cycles per byte. However, intermittent writes, mixed with a lot of register- and cache-only code, can effectively execute in one cycle because the VGA and the 486 coprocess. Display memory reads tend to take longer, because coprocessing isn't possible--one microsecond is a reasonable rule of thumb for VGA reads, although there's considerable variation. So VGA memory tends not to be as bad as VGA I/O, but Lord knows it isn't good. In conclusion, OUTs, in general, are lousy on the 486 (and to think they only took three cycles on the 286!). OUTs to VGAs are particularly lousy. Display memory performance is pretty poor, especially for reads. The conclusions are obvious, I would hope. Structure your graphics code, and, in general, all 486 code, to avoid OUTs. For graphics, this especially means using write mode 3 rather than the bit-mask register. When you must use the bit mask, arrange drawing so that you can set the bit mask once, then do a lot of drawing with that mask. For example, draw a whole edge at once, then the middle, then the other edge, rather than setting the bit mask several times on each scan line to draw the edge and middle bytes together. Don't read from display memory if you don't have to. Write each pixel once and only once. [TABULAR DATA OMITTED] It is indeed a strange concept: The key to fast graphics is staying away from the graphics adapter as much as possible. Dirty-rectangle Animation The relative slowness of VGA hardware is part of the appeal of the technique that I call "dirty-rectangle" animation, in which a complete copy of the contents of display memory is maintained in off-screen system (nondisplay) memory. All drawing is done to this system buffer. As offscreen drawing is done, a list is maintained of the bounding rectangles for the drawn-to areas; these are the "dirty" rectangles, dirty in the sense that they do not match the contents of the screen. After all drawing for a frame is completed, all the dirty rectangles for that frame are copied to the screen in a burst, and then the cycle of off-screen drawing begins again. Why, exactly, would we want to go through all this complication, rather than simply drawing to the screen in the first place? The reason is visual quality. If we were to do all our drawing directly to the screen, there'd be a lot of flicker as objects were erased and then redrawn. Similarly, overlapped drawing done with the painter's algorithm (in which farther objects are drawn first, so that nearer objects obscure them) would flicker as farther objects were visible for short periods. With dirty-rectangle animation, only the finished pixels for any given frame ever appear on the screen; intermediate results are never visible. Figure 1 illustrates the visual problems associated with drawing directly to the screen; Figure 2 shows how dirty-rectangle animation solves these problems. Well, then, if we want good visual quality, why not use page flipping? For one thing, not all adapters and modes support page flipping. The CGA and MCGA don't, and neither do the VGA's 640x480 16-color or 320x200 256-color modes, or many Super-VGA modes. In contrast, all adapters support dirty-rectangle animation. Another advantage of dirty-rectangle animation is that it's generally faster. While it may seem strange that it would be faster to draw off screen and then copy the result to the screen, that is often the case, because dirty-rectangle animation usually reduces the number of times the VGA's hardware needs to be touched, especially in 256-color modes. This reduction comes about because when dirty rectangles are erased, it's done in system memory, not in display memory, and since most objects move a good deal less than their full width (that is, the new and old positions overlap), display memory is written to fewer times than with page flipping. (In 16-color modes, this is not necessarily the case, because of the parallelism obtained from the VGA's planar hardware.) Also, read/modify/write operations are performed in fast system memory rather than slow display memory, so display memory rarely needs to be read. This is particularly good because display memory is generally even slower for reads than for writes. Also, page flipping wastes a good deal of time waiting for the page to flip at the end of the frame. Dirty-rectangle animation never needs to wait for anything because partially drawn images are never present in display memory. Actually, in one sense, partially drawn images are sometimes present because it's possibly for a rectangle to be partially drawn when the scanning raster beam reaches that part of the screen. This causes the rectangle to appear partially drawn for one frame, producing a phenomenon I call "shearing." Fortunately, shearing tends not to be particularly distracting, especially for fairly small images, but it can be a problem when copying large areas. This is one area in which dirty-rectangle animation falls short of page flipping, because page flipping has perfect display quality, never showing anything other than a completely finished frame. Similarly, dirty-rectangle copying may take two or more frame times to finish, so even if shearing doesn't happen, it's still possible to have the images in the various dirty rectangles show up nonsimultaneously. In my experience, this latter phenomenon is not a serious problem, but do be aware of it. Dirty Rectangles in Action Listing One (page 140) demonstrates dirty-rectangle animation. This is a very simple implementation, in several respects. For one thing, it's written entirely in C, and animation fairly cries out for assembly language. For another thing, it uses far pointers, which C often handles with less than optimal efficiency, especially because I haven't used library functions to copy and fill memory. (I did this so the code would work in any memory model.) Also, Listing One doesn't attempt to coalesce rectangles so as to perform a minimum number of display-memory accesses; instead, it copies each dirty rectangle to the screen, even if it overlaps with another rectangle, so some pixels get copied multiple times. Listing One runs pretty well, considering all of its failings; on my 486/33, ten 11x11 images animate at a very respectable clip. One point I'd like to make is that although the system-memory buffer in Listing One has exactly the same dimensions as the screen bitmap, that's not a requirement, and there are some good reasons not to make the two the same size. For example, if the system buffer is bigger than the screen, it's possible to pan the visible area around the system buffer. Or, alternatively, the system buffer can be just the size of a desired window, representing a window into a larger, virtual buffer. We could then draw the desired portion of the virtual bitmap into the system-memory buffer, then copy the buffer to the screen, and the effect will be of having panned the window to the new location. Another argument in favor of a small viewing window is that it restricts the amount of display memory actually drawn to. Restricting the display memory used for animation reduces the total number of display-memory accesses, which in turn boosts overall performance; it also improves the performance and appearance of panning, in which the whole window has to be redrawn or copied. If you keep a close watch, you'll notice that many high-performance animation games similarly restrict their full-featured animation area to a relatively small region. Often, it's hard to tell that this is the case, because the animation region is surrounded by flashy digitized graphics and by items such as scoreboards and status screens, but look closely and see if the animation region in your favorite game isn't smaller than you thought. Next month, I'll put the important parts of dirty-rectangle animation into assembler, and I'll coalesce dirty rectangles to minimize display-memory accesses--and maybe, just maybe, I'll do some panning. Then we'll see what kind of stuff dirty-rectangle animation is really made of. 3-D Reading As anyone who's been following this column for a while knows, I'm keenly interested in 3-D graphics. Thus, it is with considerable pleasure that I'm able to report that Programming in 3 Dimensions: 3-D Graphics, Ray Tracing, and Animation by Christopher D. Watkins and Larry Sharp (M&T Books, 1992) is good stuff. There's a fair amount of theory, and lots of 3-D implementation, from modeling and scenes to ray tracing and finally, animation. The animation is the precomputed, playback kind, of the Autodesk Animator sort, and while it lacks the on-the-fly flexibility of the real-time animation we've developed in this column, my oh my, it does look good. If you get this book, I strongly suggest you get the disk as well; in which case, run ANIMATE.EXE, with BOUNCE as the input file, and marvel that you now have, in source form, all the software needed to implement that animation. Ten years ago, I'll bet you couldn't have produced this level of fully rendered, real-time playback animation for less than $50,000 in hardware and software; now, a couple of thousand will easily do the trick. What a great time this is to be a programmer! Recommended.