Journal: Dr. Dobb's Journal July 1991 v16 n7 p133(7) ----------------------------------------------------------------------------- Title: Mode X: 256-color VGA magic. (Graphics Programming) Author: Abrash, Michael. AttFile: Program: GP-JUL91.ASC Source code listing. Summary: VGA's 320 x 240 256-color mode is most likely the single best mode of VGA, especially for animation. Features that make this mode so special include its 1:1 aspect ratio, which results in equal pixel spacing vertically and horizontally. Square pixels create the most attractive displays. In addition, mode X allows page flipping, which helps create smooth animation. Mode X pixels are processed in parallel, improving performance up to four times. However, the use of mode X is not widespread, since it is entirely undocumented. Only a very experienced VGA programmer would know that such a mode exists. The author provides mode set code, delineates the bitmap organization, and demonstrate how the basic write pixel and read pixel operations work. ----------------------------------------------------------------------------- Descriptors.. Topic: Programming Instruction VGA Standard Pixels Color Animation Performance Improvement. Feature: illustration chart. Caption: The organization of display memory in mode X. (chart) The Map Mask register selects which planes are written to in planar modes. (chart) ----------------------------------------------------------------------------- Full Text: There's a well-known Latin saying, in complexitate est opportunitas ("in complexity there is opportunity"), that must have been invented with the VGA in mind. Well, actually, it's not exactly well-known (I just thought of it this afternoon), but it should be. As evidence, witness the strange case of the VGA's 320 x 240 256-color mode, which is undeniably complex to program and isn't even documented by IBM -- but which is, nonetheless, perhaps the single best mode the VGA has to offer, especially for animation. What Makes 320 x 240 Special? Five features set the 320 x 240 256-color mode (which I'll call "mode X," befitting its mystery status in IBM's documentation) apart from other VGA modes. First, it has a 1:1 aspect ratio, resulting in equal pixel spacing horizontally and vertically (square pixels). Square pixels make for the most attractive displays, and avoid considerable programming effort that would otherwise be necessary to adjust graphics primitives and images to match the screen's pixel spacing. (For example, with square pixels, a circle can be drawn as a circle; otherwise, it must be drawn as an ellipse that corrects for the aspect ratio -- a slower, more complicated process.) In contrast, mode 13h, the only documented 256-color mode, provides a nonsquare 320 x 200 resolution. Second, mode X allows page flipping, a prerequisite for the smoothest possible animation. Mode 13h does not allow page flipping, nor does mode 12h, the VGA's high-resolution 640 x 480 16-color mode. Third, mode X allows the VGA's plane-oriented hardware to be used to process pixels in parallel, improving performance by up to four times over mode 13h. Fourth, like mode 13h but unlike all other VGA modes, mode X is a byteper-pixel mode (each pixel is controlled by one byte in display memory), eliminating the slow read-before-write and bit-masking operations often required in 16-color modes. In addition to cutting the number of memory accesses in half, this is important because the memory caching schemes used by many VGA clones speed up writes more than reads. Fifth, unlike mode 13h, mode X has plenty of offscreen memory free for image storage. This is particularly effective in conjunction with the use of the VGA's latches; together, the latches and the off-screen memory allow images to be copied to the screen four pixels at a time. There's a sixth feature of mode X that's not so terrific: It's hard to program efficiently. If you've ever programmed a VGA 16-color mode directly, you know that VGA programming can be demanding; mode X is often as demanding as 16-color programming, and operates by a set of rules that turns everyting you've learned in 16-color mode sideways. Programming mode X is nothing like programming the nice, flat bitmap of mode 13h, or, for that matter, the flat, linear (albeit banked) bitmap used by 256-color SuperVGA modes. (I'd like to emphasize that mode X works on all VGAs, not just SuperVGAs.) Many programmers I talk to love the flat bitmap model, and think that it's the ideal organization for display memory because it's so straightforward to program. Remember the saying I started this column with, though; the complexity of mode X truly is opportunity -- opportunity for the best combination of performance and appearance the VGA has to offer. If you do 256-color programming, especially if you use animation, you're missing the boat if you're not using mode X. Although some developers have taken advantage of mode X, its use is certainly not widespread, being entirely undocumented; only an experienced VGA programmer would have the slightest inkling that it exists, and figuring out how to make it perform beyond the write pixel/read pixel level is no mean feat. I've never seen anything in print about it, and, in fact, the only articles I've seen about any of the undocumented 256-color modes were my own articles about the 320 x 200, 320 x 400, and 360 x 480 256-color modes in Programmer's Journal (January and September, 1989). (However, John Bridges has put code for a number of undocumented 256-color resolutions into the public domain, and I'd like to acknowledge the influence of his code on the mode set routine presented in this article.) Given the tremendous advantages of 320 x 240 over the documented mode 13h, I'd very much like to get it into the hands of as many developers as possible, so I'm going to spend the next few columns exploring this odd but worthy mode. I'll provide mode set code, delineate the bitmap organization, and show how the basix write pixel and read pixel operations work. Then I'll move on to the magic stuff: rectangle fills, screen clears, scrolls, image copies, pixel inversion, and, yes, polygon fills (just a different driver), all blurry fast; hardware raster ops; and page flipping. In the end, I'll build a working animation program that showns many of the features of mode X in action. The mode set code is the logical place to begin. Selecting 320 x 240 256-Color Mode We could, if we wished, write our own mode set code for mode X from scratch -- but why bother? Instead, we'll let the BIOS do most of the work by having it set up mode 13h, which we'll then turn into mode X by changing a few registers. Listing One (page 154) does exactly that. After setting up mode 13h, Listing One alters the vertical counts and timings to select 480 visible scan lines. (There's no need to alter any horizontal values, because mode 13h and mode X both have 320-pixel horizontal resolutions.) The Maximum Scan Line register is programmed to double scan each line (that is, repeat each scan line twice), however, so we get an effective vertical resolution of 240 scan lines. It is, in fact, possible to get 400 or 480 independent scan lines in 256-color mode (see the aforementioned articles for details); however, 400-scan-line modes lack square pixels and can't support simultaneous offscreen memory and page flipping, and 480-scan-line modes lack page flipping altogether, due to memory constraints. At the same time, Listing One programs the VGA's bitmap to a planar organization that is similar to that used by the 16-color modes, and utterly different from the linear bitmap of mode 13h. The bizarre bitmap organization of mode X is shown in Figure 1. The first pixel (the pixel at the upper left corner of the screen) is controlled by the byte at offset 0 in plane 0. (The one thing that mode X blessedly has in common with mode 13h is that each pixel is controlled by a single byte, eliminating the need to mask out individual bits of display memory.) The second pixel, immediately to the right of the first pixel, is controlled by the byte at offset 0 in plane 1. The third pixel comes from offset 0 in plane 2, and the fourth pixel from offset 0 in plane 3. Then the fifth pixel is controlled by the byte at offset 1 in plane 0, and that cycle continues, with each group of four pixels spread across the four planes at the same address. The offset M of pixel N in display memory is M = N/4, and the plane P of pixel N is P = N mod 4. For display memory writes, the plane is selected by setting bit P of the Map Mask register (Sequence Controller register 2) to 1 and all other bits to 0; for display memory reads, the plane is selected by setting the Read Map register (Graphics Controller register 4) to P. It goes without saying that this one ugly bitmap organization, requiring a lot of overhead to manipulate a single pixel. The write pixel code shown in Listing Two (page 154) must determine the appropriate plane and perform a 16-bit OUT to select that plane for each pixel written, and likewise for the read pixel code shown in Listing Three (page 154). Calculating and mapping in a plane once for each pixel written is scarcely a recipe for performance. That's all right, though, because most graphics software spends little time drawing individual pixels. I've provided the write and read pixel routines as basic primitives, and so you'll understand how the bitmap is organized, but the building blocks of high-performance graphics software are fills, copies, and bitblts, and it's here that mode X shines. Designing From a Mode X Perspective Listing Four (page 154) shows mode X rectangle fill code. The plane is selected for each pixel in turn, with drawing cycling from plane 0 to plane 3 then wrapping back to plane 0. This is the sort of code that stems from a write-pixel line of thinking; it reflects not a whit of the unique perspective that mode X demands, and although it looks reasonably efficient, it is in fact some of the slowest graphics code you will ever see. I've provided Listing Four partly for illustrative purposes, but mostly so we'll have a point of reference for the substantial speed-up that's possible with code that's designed from a mode X perspective. The two major weaknesses of Listing Four both result from selecting the plane on a pixel by pixel basis. First, endless OUTs (which are particularly slow on 386s and 486s, often much slower than accesses to display memory) must be performed, and, second REP STOS can't be used. Listing Five (page 156) overcomes both these problems by tailoring the fill technique to the organization of display memory. Each plane is filled in its entirety in one burst before the next plane is processed, so only five OUTs are required in all, and REP STOS can indeed be used. (I've used REP STOSB in Listings Five and Six (page 156.) REP STOSW could be used and would improve performance on some 16-bit VGAs; however, REP STOSW requires extra overhead to set up, so it can be slower for small rectangles, especially on 8-bit VGAs. Doing an entire plane at a time can produce a "fading-in" effect for large images, because all columns for one plane are drawn before any columns for the next; if this is a problem, the four planes can be cycled through once for each scan line, rather than once for the entire rectangle. Listing Five is 2.5 times faster than Listing Four at clearing the screen on a 20-MHz cached 386 with a Paradise VGA. Although Listing Five is slightly slower than an equivalent mode 13h fill routine would be, it's not grievously so. In general, performing plane-at-a-time operations can make almost any mode X operation, at the worst, nearly as fast as the same operation in mode 13h (although this sort of mode X programming is admittedly fairly complex). In this pursuit, it can help to organize data structures with mode X in mind. For example, icons could be prearranged in system memory with the pixels organized into four plane-oriented sets (or, again, in four sets per scan line to avoid a fading-in effect) to facilitate copying to the screen a plane at a time with REP MOVS. Hardware Assist from an Unexpected Quarter Listing Five illustrates the benefits of designing code from a mode X perspective; this is the software aspect of mode X optimization, which suffices to make mode X about as fast as mode 13h. That alone makes mode X an attractive mode, given its square pixels, page flipping, and offscreen memory, but superior performance would nonetheless be a pleasant addition to that list. Superior performance is indeed possible in mode X, although, oddly enough, it comes courtesy of the VGA's hardware, which was never designed to be used in 256-color modes. All of the VGA's hardware assist features are available in mode X, although some are not particularly useful. The VGA hardware feature that's truly the key to mode X performance is the ability to process four planes' worth of data in parallel; this includes both the latches and the capability to fan data out to any or all planes. For rectangular fills, we'll just need to fan the data out to various planes, so I'll defer a discussion of other hardware features until another column. (By the way, the ALUs, bit mask, and most other VGA hardware features are also available in mode 13h -- but parallel data processing is not.) In planar modes, such as mode X, a byte written by the CPU to display memory may actually go to anywhere between zero and four planes, as shown in Figure 2. Each plane for which the setting of the corresponding bit in the Map Mask register is 1 receives the CPU data, and each plane for which the corresponding bit is 0 is not modified. In 16-color modes, each plane contains one-quarter of each of eight pixel,s with the 4 bits of each pixel scanning all four planes. Not so in mode X. Look at Figure 1 again; each plane contains one pixel in its entirety, with four pixels at any given address, one per plane. Still, the Map Mask register does the same job in mode X as in 16-color modes; set it to 0Fh (all 1-bits), and all four planes will be written to by each CPU access. Thus, it would seem that up to four pixels could be set by a single mode X byte-sized write to display memory, potentially speeding up operations like rectangle fills by four times. And, as it turns out, four-plane parallelism works quite nicely indeed. Listing Six is yet another rectangle-fill routine, this time using the Map Mask to set up to four pixels per STOS. The only trick to Listing Six is that any left or right edge that isn't aligned to a multiple-of-four pixel column (that is, a column at which one four-pixel set ends and the next begins) must be clipped via the Map Mask register, because not all pixels at the address containing the edge are modified. Performance is as expected; Listing Siz is nearly ten times faster at clearing the screen than Listing Four and just about four times faster than Listing Five -- and also about four times faster than the same rectangle fill in mode 13h. Understanding the bitmap organization and display hardware of mode X does indeed pay. Just so you can see mode X in action, Listing Seven (page 158) is a sample program that selects mode X and draws a number of rectangles. Listing Seven links to any of the rectangle fill routines I've presented. And now, I hope, you begin to see why I'm so fond of mode X. Next month, we'll continue with mode X by exploring the wonders that the latches and parallel plane hardware can work on scrolls, copies, blits, and pattern fills. Notes From the Edsun Front Comments coming my way indicate a great deal of programmer interest in the Edsun CEG/DAC, of which I wrote in April and May. However, everyone who has actually programmed the CEG/DAC complains about how hard it is; the results are nice, but the process of getting there is anything but. Nonetheless, programming the CEG/DAC is certainly a solvable problem, and whoever solves it best will come out looking mighty good. A fair analogy is writing active TSRs. Six years ago, TSR-writing was black magic, and Sidekick, primitive by today's standards, made a fortune. Today, any dope can choose from dozens of books and toolkits and make a rock-solid TSR in a few hours. As programmers develop better tools and a better understanding of the CEG/DAC, the grumbling will subside, and the software will take off. Another case of complexity providing opportunity. Book of the Month This month's book is Advanced Programmer's Guide to SuperVGAs, by Sutty and Blair (Brady, 1990, ISBN 0-13010455-8; $44.95). Pricey for softcover, but included in that price is a diskette of SuperVGA assembly code (which I have not tried out). This book is the single best guide I've seen to the Byzantine world of SuperVGA programming, where every one of dozens of VGA models has different mode numbers and banking schemes. Take it from someone who's waded through a slew of chip databooks and applications notes -- this book will save you a lot of time and aggravation if you have to program SuperVGAs directly. Still, not everything I'd like to see is in there. For example, they cover only the Tseng Labs ET3000 chip, not the now widely used ET4000 that supports 15-bpp graphics. That's not the authors' fault, of course; it's a reflection of the incredible diversity and rate of change in the SuperVGA arena. Mode X. The Edsun CEG/DAC. SuperVGA programming. In complexitate est opportunitas. Q.E.D.