Journal: Dr. Dobb's Journal August 1991 v16 n8 p165(7) ----------------------------------------------------------------------------- Title: More undocumented 256-color VGA magic. (Graphics Programming) (column) Author: Abrash, Michael. AttFile: Program: GP-AUG91.ASC Source code listing. Summary: Programmers should remember that there are many subtle approaches to any problem and to keep the big picture in mind when implementing programs. Mode X is an undocumented 320 x 320 256-color mode of the VGA standard that supports page flipping, makes available off-screen memory, has square pixels, and permits users to increase performance by as much as four times by using the VGA's hardware. There are four latches in VGA, one for each plane of display memory, and these latches are used to copy data from one part of display memory to another. Latches are suitable for patterned fills and screen-to-screen copies, including scrolls. Four-pixel-wide patterns are extremely useful. ----------------------------------------------------------------------------- Descriptors.. Topic: Tutorial Programming Computer Graphics Pixels Screens Color VGA Standard. Feature: illustration chart. Caption: The latches are loaded by every display memory read. (chart) Bytes written from the latches to corresponding planes. (chart) One useful way to organize display memory in Mode X. (chart) ----------------------------------------------------------------------------- Full Text: Every so often, a programming demon that I'd thought I'd forever laid to rest arises to haunt me once again. A minor example of this -- an imp, if you will -- is the use of "=" when I mean "==," which I've done all too often in the past, and am sure I'll do again. That's minor deviltry, though, compared to the considerably greater evils of one of my personal scourges, of which I was recently reminded anew: too-close attention to detail. Not seeing the forest for the trees. Looking low when I should have looked high. Missing the big picture, if you catch my drift. Thoreau said it best: "Our life is frittered away by detail. . . . Simplify, simplify." That quote sprang to mind when I received a letter from Anton Treuenfels of Fridley, Minnesota, thanking me for clarifying the principles of filling adjacent convex polygons, as discussed in this column in February and March. Anton then went on the describe his own method for filling convex polygons. Anton's approach had its virtues and drawbacks, foremost among the virtues being a simplicity Thoreau would have admired. For instance, in writing my polygon-filling code, I had spent quite some time trying to figure out the best way to identify which edge was the left edge and which the right, finally settling on comparing the slopes of the edges if the top of the polygon wasn't flat, and comparing the starting points of the edges if the top was flat. Anton simplified this tremendously by not bothering to figure out ahead of time which was the right edge of the polygon and which the left, instead scanning out the two edges in whatever order he found them and letting the low-level drawing code test, and if necessary swap, the end-points of each horizontal line of the fill, so that filling started at the leftmost edge. This is a little slower than my approach (although the difference is almost surely negligible), but it also makes quite a bit of code go away. What that example, and others like it in Anton's letter, did was kick my mind into a mode that it hadn't -- but should have -- been in when I wrote the code, a mode in which I began to wonder, "How else can I simplify this code?"; what you might call Occam's Razor mode. You see, I created the convex polygon-drawing code by first writing pseudocode, then writing C code, and finally writing assembly code, and once the pseudocode was finished, I stopped thinking about the interactions of the various portions of the program. In other words, I became so absorbed in individual details that I forgot to consider the code as a whole. That was a mistake, and an embarrassing one for someone who constantly preaches that programmers should look at their code from a variety of perspectives. May my embarrassment be your enlightenment. The point is not whether, in the final analysis, my code or Anton's code is better; both have their advantages. The point is that I was programming with half a deck because I was so fixated on the details of a single sort of implementation; I ended up with relatively hard-to-write, complex code, and missed out on many potentially useful optimizations by being so focused. It's a big world out there, and there are many subtle approaches to any problem, so relax and keep the big picture in mind as you implement your programs. Your code will likely be not only better, but also simpler. And whenever you see me walking across hot coals in this column when there's an easier way to go, please, let me know! Thanks, Anton. Mode X Continued Last month, I introduced you to what I call mode X, an undocumented 320 X 240 256-color mode of the VGA. Mode X is distinguished from mode 13h, the documented 320 X 200 256-color VGA mode, in that it supports page flipping, makes off-screen memory available, has square pixels, and, above all, lets you use the VGA's hardware to increase performance by as much as four times (at the cost of more complex and demanding programming, to be sure -- but end users care about results, not how hard the code was to write, and mode X delivers results in a big way). Last month we saw how the VGA's plane-oriented hardware can be used to speed solid fills. That's a nice technique, but this month we're going to move up to the big guns -- the latches. The VGA has four latches, one for each plane of display memory. Each latch stores exactly one byte, and that byte is always the last byte read from the corresponding plane of display memory, as shown in Figure 1. Furthermore, whenever a given address in display memory is read, all four planes' bytes at that address are read and stored in the corresponding latches, regardless of which plane supplied the byte returned to the CPU (as determined by the Read Map register). As with so much else about the VGA, the above will make little sense to VGA neophytes, but the important point is this: By reading one display memory byte, 4 bytes --one from each plane -- can be loaded into the latches at once. Any or all of those 4 bytes can then be written anywhere in display memory with a single byte-sized write, as shown in Figure 2. The upshot is that the latches make it possible to copy data around from one part of display memory to another, 32 bits (four pixels) at a time -- four times as fast as normal. (Recall from last month that in mode X, pixels are stored one per byte, with four pixels in a row stored in successive planes at the same address, one pixel per plane.) However, any one latch can only be loaded from and written to the corresponding plane, so an individual latch can only work with every fourth pixel on the screen; the latch for plane 0 can work with pixels 0, 4, 8. . ., the latch for plane 1 with pixels 1, 5, 9. . ., and so on. The latches aren't intended for use in 256-color mode -- they were designed to allow individual bits of display memory to be modified in 16-color mode -- but they are nonetheless very useful in mode X, particularly for patterned fills and screen-to-screen copies, including scrolls. Patterned filling is a good place to start, because patterns are widely used in windowing environments for desktops, window backgrounds, and scroll bars, and for textures and color dithering in drawing and game software. Fast mode X fills with patterns that are four pixels in width can be performed by drawing the pattern once to the four pixels at any one address in display memory, reading that address to load the pattern into the latches, setting the Bit Mask register to 0 to specify that all bits drawn to display memory should come from the latches, and then performing the fill pretty much as we did last month, except that each line of the pattern must be loaded into the latches before the corresponding scan line on the screen is filled. Listings One and Two (page 181) together demonstrate a variety of fast mode X four-by-four pattern fills. (The mode set function called by Listing One is from last month's column.) Four-pixel-wide patterns are more useful than you might imagine. There are actually [2.sup.128] possible patterns (16 pixels, each with [2.sup.8] possible colors); that set is certainly large enough for most color-dithering purposes, and includes many often-used patterns, such as halftones, diagonal stripes, and crosshatches. Furthermore, eight-wide patterns, which are widely used, can be drawn with two passes, one for each half of the pattern; this principle can in fact be extended to patterns of arbitrary multiple-of-four widths. (Widths that aren't multiples of four are considerably more difficult to handle, because the latches are four pixels wide.) Allocating Memory in Mode X Listing Two raises some interesting questions about the allocation of display memory in mode X. In Listing Two, whenever a pattern is to be drawn, that pattern is first drawn in its entirety at the very end of display memory; the latches are then loaded from that copy of the pattern before each scan line of the actual fill is drawn. Why this double copying process, and why is the pattern stored in that particular area of display memory? The double copying process is used because it's the easiest way to load the latches. Remember, there's no way to get information directly from the CPU to the ltches; the information must first be written to some location in display memory, because the latches can be loaded only from display memory. By writing the pattern to off-screen memory, we don't have to worry about interfering with whatever is currently displayed on the screen. As for why the pattern is stored exactly where it is, that's part of a master memory allocation plan that will come to fruition next month when I implement a mode X animation program. Figure 3 shows this master plan; the first two pages of memory (each 76,800 pixels long, spanning 19,200 addresses -- that is, 19,200 pixel quadruplets -- in display memory) are reserved for page flipping, the next page of memory (also 76,800 pixels long) is reserved for storing the background (this is used to restore the holes left after images move), the last 16 pixels (four addresses) of display memory are reserved for the pattern buffer, and the remaining 31,728 pixels (7932 addresses) of display memory are free for storage of icons, images, temporary buffers, or whatever. This is an efficient organization for animation, but there are certainly many other possible setups. For example, you might choose to have a solidly-colored background, in which case you could dispense with the background page (instead using the solid rectangle fill routine to replace the background after images move), freeing up another 76,800 pixels of off-screen storage for images and buffers. You could even eliminate page-flipping altogether if you needed to free up a great deal of display memory. For example, with enough free display memory it is possible in mode X to create a virtual bitmap three times larger than the screen, with the screen becoming a scrolling window onto that larger bitmap. This technique has been used to good effect in a number of games, although I don't know if any of those games use mode X. Copying Pixel Blocks Within Display Memory Another fine use for the latches is copying pixels from one place in display memory to another. Whenever both the source and the destination share the same nibble alignment (that is, their start addresses modulo four are the same), it is not only possible but quite easy to use the latches to perform the copy four pixels at a time. Listing Three (page 182) shows a routine that copies via the latches. (When the source and destination do not share the same nibble alignment, the latches cannot be used, because the source and destination planes for any given pixel differ; in that case, you can set the Read Map register to select a source plane and the Map Mask register to select the corresponding destination plane, then copy all pixels in that plane; repeat for all four planes.) Listing Three has an important limitation: It does not guarantee proper handling when the source and destination overlap, as in the case of a downward scroll, for example. Listing Three performs top-to-bottom, left-to-right copying. Downward scrolls require bottom-to-top copying; likewise, rightward horizontal scrolls require right-to-left copying. As it happens, my intended use for Listing Three is to copy images between off-screen memory and on-screen memory, and to save areas under pop-up menus and the like, so I don't really need overlap handling -- and I do really need to keep the size of this column down. However, you will surely want to add overlap handling if you plan to perform arbitrary scrolling and copying in display memory. Now that we have a fast way to copy images around in display memory, we can draw icons and other images between two and four times faster than in mode 13h, depending on the speed of the VGA's display memory. (In case you're worried about the nibble-alignment limitation on fast copies, don't be; I'll address that fully next time, but the secret is to store all four possible rotations in off-screen memory, then select the correct one for each copy.) However, before our fast display memory-to-display memory copy routine can do us any good, we must have a way to get pixel patterns from system memory into display memory, so that they can be copied with the fast copy routine. Copying to Display Memory The final piece of the puzzle is the system memory to display-memory-copy-routine shown in Listing Four (page 182). This routine assumes that pixels are stored in system memory in exactly the order in which they will ultimately appear on the screen; that is, in the same linear order that mode 13h uses. It would be more efficient to store all the pixels for one plane first, then all the pixels for the next plane, and so on for all four planes, because many OUTs could be avoided, but that would make images rather hard to create. And, while it is true that the speed of drawing images is, in general, often a critical performance factor, the speed of copying images from system memory to display memory is not particularly critical in mode X. Important images can be stored in off-screen memory and copied to the screen via the latches must faster than even the speediest system memory-to-display memory-copy-routine could manage. I'm not going to present a routine to perform mode X copies from display memory to system memory, but such a routine would be a straightforward inverse of Listing Four. Coming Up: Our Hero Risks Life, Limb, and Word Count in a Thrilling Conclusion Next month, I'll take all the model X tools we've developed, together with one more tool -- masked image copying -- and the remaining unexplored feature of mode X, page flipping, and build an animation application. I hope that when I'm done, you'll agree with me that mode X is the way to animate on the PC. I also hope that I can fit everything into one column; there are always so many interesting things to say that I have trouble keeping the size of these columns down, and mode X animation covers even more fertile ground than usual. But, hey -- you've already heard about my programming demons; I'll spare you the writing demons. Besides, as I'm fond of saying, end users care about results, not how you produced them. For my writing, you folks are the end users -- and notice how remarkably little you care about how this magazine gets written and produced. You care that it shows up in your mailbox every month, and you care about how it got there. When you're a creator, the process matters. When you're a buyer, results are everything. All important. Sine qua non. The whole enchilada. If you catch my drift. Late Flash! The Mode X mode set code in my July '91 column (Listing One, page 154) has a small -- but critical -- bug. On line 46, the value loaded into AL should be 0E3h, not 0E7h. Without this correction, the screen will roll on fixed-frequency (IBM 851X-style) monitors.