Journal:   Dr. Dobb's Journal  August 1991 v16 n8 p165(7)
-----------------------------------------------------------------------------
Title:     More undocumented 256-color VGA magic. (Graphics Programming)
           (column)
Author:    Abrash, Michael.
AttFile:    Program:  GP-AUG91.ASC  Source code listing.

Summary:   Programmers should remember that there are many subtle approaches
           to any problem and to keep the big picture in mind when
           implementing programs.  Mode X is an undocumented 320 x 320
           256-color mode of the VGA standard that supports page flipping,
           makes available off-screen memory, has square pixels, and permits
           users to increase performance by as much as four times by using
           the VGA's hardware.  There are four latches in VGA, one for each
           plane of display memory, and these latches are used to copy data
           from one part of display memory to another.  Latches are suitable
           for patterned fills and screen-to-screen copies, including
           scrolls.  Four-pixel-wide patterns are extremely useful.
-----------------------------------------------------------------------------
Descriptors..
Topic:     Tutorial
           Programming
           Computer Graphics
           Pixels
           Screens
           Color
           VGA Standard.
Feature:   illustration
           chart.
Caption:   The latches are loaded by every display memory read. (chart)
           Bytes written from the latches to corresponding planes. (chart)
           One useful way to organize display memory in Mode X. (chart)

-----------------------------------------------------------------------------
Full Text:

Every so often, a programming demon that I'd thought I'd forever laid to rest
arises to haunt me once again.  A minor example of this -- an imp, if you
will -- is the use of "=" when I mean "==," which I've done all too often in
the past, and am sure I'll do again.  That's minor deviltry, though, compared
to the considerably greater evils of one of my personal scourges, of which I
was recently reminded anew: too-close attention to detail.  Not seeing the
forest for the trees.  Looking low when I should have looked high.  Missing
the big picture, if you catch my drift.

Thoreau said it best: "Our life is frittered away by detail.  .  .  . 
Simplify, simplify."  That quote sprang to mind when I received a letter from
Anton Treuenfels of Fridley, Minnesota, thanking me for clarifying the
principles of filling adjacent convex polygons, as discussed in this column
in February and March.  Anton then went on the describe his own method for
filling convex polygons.

Anton's approach had its virtues and drawbacks, foremost among the virtues
being a simplicity Thoreau would have admired.  For instance, in writing my
polygon-filling code, I had spent quite some time trying to figure out the
best way to identify which edge was the left edge and which the right,
finally settling on comparing the slopes of the edges if the top of the
polygon wasn't flat, and comparing the starting points of the edges if the
top was flat.  Anton simplified this tremendously by not bothering to figure
out ahead of time which was the right edge of the polygon and which the left,
instead scanning out the two edges in whatever order he found them and
letting the low-level drawing code test, and if necessary swap, the
end-points of each horizontal line of the fill, so that filling started at
the leftmost edge.  This is a little slower than my approach (although the
difference is almost surely negligible), but it also makes quite a bit of
code go away.

What that example, and others like it in Anton's letter, did was kick my mind
into a mode that it hadn't -- but should have -- been in when I wrote the
code, a mode in which I began to wonder, "How else can I simplify this
code?"; what you might call Occam's Razor mode.  You see, I created the
convex polygon-drawing code by first writing pseudocode, then writing C code,
and finally writing assembly code, and once the pseudocode was finished, I
stopped thinking about the interactions of the various portions of the
program.  In other words,  I became so absorbed in individual details that I
forgot to consider the code as a whole.  That was a mistake, and an
embarrassing one for someone who constantly preaches that programmers should
look at their code from a variety of perspectives.  May my embarrassment be
your enlightenment.

The point is not whether, in the final analysis, my code or Anton's code is
better; both have their advantages.  The point is that I was programming with
half a deck because I was so fixated on the details of a single sort of
implementation; I ended up with relatively hard-to-write, complex code, and
missed out on many potentially useful optimizations by being so focused.
It's a big world out there, and there are many subtle approaches to any
problem, so relax and keep the big picture in mind as you implement your
programs.  Your code will likely be not only better, but also simpler.  And
whenever you see me walking across hot coals in this column when there's an
easier way to go, please, let me know!

Thanks, Anton.

Mode X Continued

Last month, I introduced you to what I call mode X, an undocumented 320 X 240
256-color mode of the VGA.  Mode X is distinguished from mode 13h, the
documented 320 X 200 256-color VGA mode, in that it supports page flipping,
makes off-screen memory available, has square pixels, and, above all, lets
you use the VGA's hardware to increase performance by as much as four times
(at the cost of more complex and demanding programming, to be sure -- but end
users care about results, not how hard the code was to write, and mode X
delivers results in a big way).  Last month we saw how the VGA's
plane-oriented hardware can be used to speed solid fills.  That's a nice
technique, but this month we're going to move up to the big guns -- the
latches.

The VGA has four latches, one for each plane of display memory.  Each latch
stores exactly one byte, and that byte is always the last byte read from the
corresponding plane of display memory, as shown in Figure 1.  Furthermore,
whenever a given address in display memory is read, all four planes' bytes at
that address are read and stored in the corresponding latches, regardless of
which plane supplied the byte returned to the CPU (as determined by the Read
Map register).  As with so much else about the VGA, the above will make
little sense to VGA neophytes, but the important point is this: By reading
one display memory byte, 4 bytes --one from each plane -- can be loaded into
the latches at once.  Any or all of those 4 bytes can then be written
anywhere in display memory with a single byte-sized write, as shown in Figure
2.  The upshot is that the latches make it possible to copy data around from
one part of display memory to another, 32 bits (four pixels) at a time --
four times as fast as normal.  (Recall from last month that in mode X, pixels
are stored one per byte, with four pixels in a row stored in successive
planes at the same address, one pixel per plane.)  However, any one latch can
only be loaded from and written to the corresponding plane, so an individual
latch can only work with every fourth pixel on the screen; the latch for
plane 0 can work with pixels 0, 4, 8.  .  ., the latch for plane 1 with
pixels 1, 5, 9.  .  ., and so on.

The latches aren't intended for use in 256-color mode -- they were designed
to allow individual bits of display memory to be modified in 16-color mode --
but they are nonetheless very useful in mode X, particularly for patterned
fills and screen-to-screen copies, including scrolls.  Patterned filling is a
good place to start, because patterns are widely used in windowing
environments for desktops, window backgrounds, and scroll bars, and for
textures and color dithering in drawing and game software.

Fast mode X fills with patterns that are four pixels in width can be
performed by drawing the pattern once to the four pixels at any one address
in display memory, reading that address to load the pattern into the latches,
setting the Bit Mask register to 0 to specify that all bits drawn to display
memory should come from the latches, and then performing the fill pretty much
as we did last month, except that each line of the pattern must be loaded
into the latches before the corresponding scan line on the screen is filled.
Listings One and Two (page 181) together demonstrate a variety of fast mode X
four-by-four pattern fills.  (The mode set function called by Listing One is
from last month's column.)

Four-pixel-wide patterns are more useful than you might imagine.  There are
actually [2.sup.128] possible patterns (16 pixels, each with [2.sup.8]
possible colors); that set is certainly large enough for most color-dithering
purposes, and includes many often-used patterns, such as halftones, diagonal
stripes, and crosshatches.

Furthermore, eight-wide patterns, which are widely used, can be drawn with
two passes, one for each half of the pattern; this principle can in fact be
extended to patterns of arbitrary multiple-of-four widths.  (Widths that
aren't multiples of four are considerably more difficult to handle, because
the latches are four pixels wide.)

Allocating Memory in Mode X

Listing Two raises some interesting questions about the allocation of display
memory in mode X.  In Listing Two, whenever a pattern is to be drawn, that
pattern is first drawn in its entirety at the very end of display memory; the
latches are then loaded from that copy of the pattern before each scan line
of the actual fill is drawn.  Why this double copying process, and why is the
pattern stored in that particular area of display memory?

The double copying process is used because it's the easiest way to load the
latches.  Remember, there's no way to get information directly from the CPU
to the ltches; the information must first be written to some location in
display memory, because the latches can be loaded only from display memory.
By writing the pattern to off-screen memory, we don't have to worry about
interfering with whatever is currently displayed on the screen.

As for why the pattern is stored exactly where it is, that's part of a master
memory allocation plan that will come to fruition next month when I implement
a mode X animation program.  Figure 3 shows this master plan; the first two
pages of memory (each 76,800 pixels long, spanning 19,200 addresses -- that
is, 19,200 pixel quadruplets -- in display memory) are reserved for page
flipping, the next page of memory (also 76,800 pixels long) is reserved for
storing the background (this is used to restore the holes left after images
move), the last 16 pixels (four addresses) of display memory are reserved for
the pattern buffer, and the remaining 31,728 pixels (7932 addresses) of
display memory are free for storage of icons, images, temporary buffers, or
whatever.  This is an efficient organization for animation, but there are
certainly many other possible setups.  For example, you might choose to have
a solidly-colored background, in which case you could dispense with the
background page (instead using the solid rectangle fill routine to replace
the background after images move), freeing up another 76,800 pixels of
off-screen storage for images and buffers.  You could even eliminate
page-flipping altogether if you needed to free up a great deal of display
memory.  For example, with enough free display memory it is possible in mode
X to create a virtual bitmap three times larger than the screen, with the
screen becoming a scrolling window onto that larger bitmap.  This technique
has been used to good effect in a number of games, although I don't know if
any of those games use mode X.

Copying Pixel Blocks Within Display

Memory

Another fine use for the latches is copying pixels from one place in display
memory to another.  Whenever both the source and the destination share the
same nibble alignment (that is, their start addresses modulo four are the
same), it is not only possible but quite easy to use the latches to perform
the copy four pixels at a time.  Listing Three (page 182) shows a routine
that copies via the latches.  (When the source and destination do not share
the same nibble alignment, the latches cannot be used, because the source and
destination planes for any given pixel differ; in that case, you can set the
Read Map register to select a source plane and the Map Mask register to
select the corresponding destination plane, then copy all pixels in that
plane; repeat for all four planes.)

Listing Three has an important limitation: It does not guarantee proper
handling when the source and destination overlap, as in the case of a
downward scroll, for example.  Listing Three performs top-to-bottom,
left-to-right copying.  Downward scrolls require bottom-to-top copying;
likewise, rightward horizontal scrolls require right-to-left copying.  As it
happens, my intended use for Listing Three is to copy images between
off-screen memory and on-screen memory, and to save areas under pop-up menus
and the like, so I don't really need overlap handling -- and I do really need
to keep the size of this column down.  However, you will surely want to add
overlap handling if you plan to perform arbitrary scrolling and copying in
display memory.

Now that we have a fast way to copy images around in display memory, we can
draw icons and other images between two and four times faster than in mode
13h, depending on the speed of the VGA's display memory.  (In case you're
worried about the nibble-alignment limitation on fast copies, don't be; I'll
address that fully next time, but the secret is to store all four possible
rotations in off-screen memory, then select the correct one for each copy.)
However, before our fast display memory-to-display memory copy routine can do
us any good, we must have a way to get pixel patterns from system memory into
display memory, so that they can be copied with the fast copy routine.

Copying to Display Memory

The final piece of the puzzle is the system memory to
display-memory-copy-routine shown in Listing Four (page 182).  This routine
assumes that pixels are stored in system memory in exactly the order in which
they will ultimately appear on the screen; that is, in the same linear order
that mode 13h uses.  It would be more efficient to store all the pixels for
one plane first, then all the pixels for the next plane, and so on for all
four planes, because many OUTs could be avoided, but that would make images
rather hard to create.  And, while it is true that the speed of drawing
images is, in general, often a critical performance factor, the speed of
copying images from system memory to display memory is not particularly
critical in mode X.  Important images can be stored in off-screen memory and
copied to the screen via the latches must faster than even the speediest
system memory-to-display memory-copy-routine could manage.

I'm not going to present a routine to perform mode X copies from display
memory to system memory, but such a routine would be a straightforward
inverse of Listing Four.

Coming Up: Our Hero Risks Life, Limb,

and Word Count in a Thrilling Conclusion

Next month, I'll take all the model X tools we've developed, together with
one more tool -- masked image copying -- and the remaining unexplored feature
of mode X, page flipping, and build an animation application.  I hope that
when I'm done, you'll agree with me that mode X is the way to animate on the
PC.  I also hope that I can fit everything into one column; there are always
so many interesting things to say that I have trouble keeping the size of
these columns down, and mode X animation covers even more fertile ground than
usual.

But, hey -- you've already heard about my programming demons; I'll spare you
the writing demons.  Besides, as I'm fond of saying, end users care about
results, not how you produced them.  For my writing, you folks are the end
users -- and notice how remarkably little you care about how this magazine
gets written and produced.  You care that it shows up in your mailbox every
month, and you care about how it got there.  When you're a creator, the
process matters.  When you're a buyer, results are everything.  All
important.  Sine qua non.  The whole enchilada.

If you catch my drift.

Late Flash!

The Mode X mode set code in my July '91 column (Listing One, page 154) has a
small -- but critical -- bug.  On line 46, the value loaded into AL should be
0E3h, not 0E7h.  Without this correction, the screen will roll on
fixed-frequency (IBM 851X-style) monitors.