Journal:   Dr. Dobb's Journal  July 1991 v16 n7 p133(7)
-----------------------------------------------------------------------------
Title:     Mode X: 256-color VGA magic. (Graphics Programming)
Author:    Abrash, Michael.
AttFile:    Program:  GP-JUL91.ASC  Source code listing.

Summary:   VGA's 320 x 240 256-color mode is most likely the single best mode
           of VGA, especially for animation.  Features that make this mode so
           special include its 1:1 aspect ratio, which results in equal pixel
           spacing vertically and horizontally.  Square pixels create the
           most attractive displays.  In addition, mode X allows page
           flipping, which helps create smooth animation.  Mode X pixels are
           processed in parallel, improving performance up to four times.
           However, the use of mode X is not widespread, since it is entirely
           undocumented.  Only a very experienced VGA programmer would know
           that such a mode exists.  The author provides mode set code,
           delineates the bitmap organization, and demonstrate how the basic
           write pixel and read pixel operations work.
-----------------------------------------------------------------------------
Descriptors..
Topic:     Programming Instruction
           VGA Standard
           Pixels
           Color
           Animation
           Performance Improvement.
Feature:   illustration
           chart.
Caption:   The organization of display memory in mode X. (chart)
           The Map Mask register selects which planes are written to in
           planar modes. (chart)

-----------------------------------------------------------------------------
Full Text:

There's a well-known Latin saying, in complexitate est opportunitas ("in
complexity there is opportunity"), that must have been invented with the VGA
in mind.  Well, actually, it's not exactly well-known (I just thought of it
this afternoon), but it should be.  As evidence, witness the strange case of
the VGA's 320 x 240 256-color mode, which is undeniably complex to program
and isn't even documented by IBM -- but which is, nonetheless, perhaps the
single best mode the VGA has to offer, especially for animation.

What Makes 320 x 240 Special?

Five features set the 320 x 240 256-color mode (which I'll call "mode X,"
befitting its mystery status in IBM's documentation) apart from other VGA
modes.  First, it has a 1:1 aspect ratio, resulting in equal pixel spacing
horizontally and vertically (square pixels).  Square pixels make for the most
attractive displays, and avoid considerable programming effort that would
otherwise be necessary to adjust graphics primitives and images to match the
screen's pixel spacing.  (For example, with square pixels, a circle can be
drawn as a circle; otherwise, it must be drawn as an ellipse that corrects
for the aspect ratio -- a slower, more complicated process.)  In contrast,
mode 13h, the only documented 256-color mode, provides a nonsquare 320 x 200
resolution.

Second, mode X allows page flipping, a prerequisite for the smoothest
possible animation.  Mode 13h does not allow page flipping, nor does mode
12h, the VGA's high-resolution 640 x 480 16-color mode.

Third, mode X allows the VGA's plane-oriented hardware to be used to process
pixels in parallel, improving performance by up to four times over mode 13h.

Fourth, like mode 13h but unlike all other VGA modes, mode X is a
byteper-pixel mode (each pixel is controlled by one byte in display memory),
eliminating the slow read-before-write and bit-masking operations often
required in 16-color modes.  In addition to cutting the number of memory
accesses in half, this is important because the memory caching schemes used
by many VGA clones speed up writes more than reads.

Fifth, unlike mode 13h, mode X has plenty of offscreen memory free for image
storage.  This is particularly effective in conjunction with the use of the
VGA's latches; together, the latches and the off-screen memory allow images
to be copied to the screen four pixels at a time.

There's a sixth feature of mode X that's not so terrific: It's hard to
program efficiently.  If you've ever programmed a VGA 16-color mode directly,
you know that VGA programming can be demanding; mode X is often as demanding
as 16-color programming, and operates by a set of rules that turns everyting
you've learned in 16-color mode sideways.  Programming mode X is nothing like
programming the nice, flat bitmap of mode 13h, or, for that matter, the flat,
linear (albeit banked) bitmap used by 256-color SuperVGA modes.  (I'd like to
emphasize that mode X works on all VGAs, not just SuperVGAs.)  Many
programmers I talk to love the flat bitmap model, and think that it's the
ideal organization for display memory because it's so straightforward to
program.  Remember the saying I started this column with, though; the
complexity of mode X truly is opportunity -- opportunity for the best
combination of performance and appearance the VGA has to offer.  If you do
256-color programming, especially if you use animation, you're missing the
boat if you're not using mode X.

Although some developers have taken advantage of mode X, its use is certainly
not widespread, being entirely undocumented; only an experienced VGA
programmer would have the slightest inkling that it exists, and figuring out
how to make it perform beyond the write pixel/read pixel level is no mean
feat.  I've never seen anything in print about it, and, in fact, the only
articles I've seen about any of the undocumented 256-color modes were my own
articles about the 320 x 200, 320 x 400, and 360 x 480 256-color modes in
Programmer's Journal (January and September, 1989).  (However, John Bridges
has put code for a number of undocumented 256-color resolutions into the
public domain, and I'd like to acknowledge the influence of his code on the
mode set routine presented in this article.)

Given the tremendous advantages of 320 x 240 over the documented mode 13h,
I'd very much like to get it into the hands of as many developers as
possible, so I'm going to spend the next few columns exploring this odd but
worthy mode.  I'll provide mode set code, delineate the bitmap organization,
and show how the basix write pixel and read pixel operations work.  Then I'll
move on to the magic stuff: rectangle fills, screen clears, scrolls, image
copies, pixel inversion, and, yes, polygon fills (just a different driver),
all blurry fast; hardware raster ops; and page flipping.  In the end, I'll
build a working animation program that showns many of the features of mode X
in action.

The mode set code is the logical place to begin.

Selecting 320 x 240 256-Color Mode

We could, if we wished, write our own mode set code for mode X from scratch
-- but why bother?  Instead, we'll let the BIOS do most of the work by having
it set up mode 13h, which we'll then turn into mode X by changing a few
registers.  Listing One (page 154) does exactly that.

After setting up mode 13h, Listing One alters the vertical counts and timings
to select 480 visible scan lines.  (There's no need to alter any horizontal
values, because mode 13h and mode X both have 320-pixel horizontal
resolutions.)  The Maximum Scan Line register is programmed to double scan
each line (that is, repeat each scan line twice), however, so we get an
effective vertical resolution of 240 scan lines.  It is, in fact, possible to
get 400 or 480 independent scan lines in 256-color mode (see the
aforementioned articles for details); however, 400-scan-line modes lack
square pixels and can't support simultaneous offscreen memory and page
flipping, and 480-scan-line modes lack page flipping altogether, due to
memory constraints.

At the same time, Listing One programs the VGA's bitmap to a planar
organization that is similar to that used by the 16-color modes, and utterly
different from the linear bitmap of mode 13h.  The bizarre bitmap
organization of mode X is shown in Figure 1.  The first pixel (the pixel at
the upper left corner of the screen) is controlled by the byte at offset 0 in
plane 0.  (The one thing that mode X blessedly has in common with mode 13h is
that each pixel is controlled by a single byte, eliminating the need to mask
out individual bits of display memory.)  The second pixel, immediately to the
right of the first pixel, is controlled by the byte at offset 0 in plane 1.
The third pixel comes from offset 0 in plane 2, and the fourth pixel from
offset 0 in plane 3.  Then the fifth pixel is controlled by the byte at
offset 1 in plane 0, and that cycle continues, with each group of four pixels
spread across the four planes at the same address.  The offset M of pixel N
in display memory is M = N/4, and the plane P of pixel N is P = N mod 4.  For
display memory writes, the plane is selected by setting bit P of the Map Mask
register (Sequence Controller register 2) to 1 and all other bits to 0; for
display memory reads, the plane is selected by setting the Read Map register
(Graphics Controller register 4) to P.

It goes without saying that this one ugly bitmap organization, requiring a
lot of overhead to manipulate a single pixel.  The write pixel code shown in
Listing Two (page 154) must determine the appropriate plane and perform a
16-bit OUT to select that plane for each pixel written, and likewise for the
read pixel code shown in Listing Three (page 154).  Calculating and mapping
in a plane once for each pixel written is scarcely a recipe for performance.

That's all right, though, because most graphics software spends little time
drawing individual pixels.  I've provided the write and read pixel routines
as basic primitives, and so you'll understand how the bitmap is organized,
but the building blocks of high-performance graphics software are fills,
copies, and bitblts, and it's here that mode X shines.

Designing From a Mode X Perspective

Listing Four (page 154) shows mode X rectangle fill code.  The plane is
selected for each pixel in turn, with drawing cycling from plane 0 to plane 3
then wrapping back to plane 0.  This is the sort of code that stems from a
write-pixel line of thinking; it reflects not a whit of the unique
perspective that mode X demands, and although it looks reasonably efficient,
it is in fact some of the slowest graphics code you will ever see.  I've
provided Listing Four partly for illustrative purposes, but mostly so we'll
have a point of reference for the substantial speed-up that's possible with
code that's designed from a mode X perspective.

The two major weaknesses of Listing Four both result from selecting the plane
on a pixel by pixel basis.  First, endless OUTs (which are particularly slow
on 386s and 486s, often much slower than accesses to display memory) must be
performed, and, second REP STOS can't be used.  Listing Five (page 156)
overcomes both these problems by tailoring the fill technique to the
organization of display memory.  Each plane is filled in its entirety in one
burst before the next plane is processed, so only five OUTs are required in
all, and REP STOS can indeed be used.  (I've used REP STOSB in Listings Five
and Six (page 156.)  REP STOSW could be used and would improve performance on
some 16-bit VGAs; however, REP STOSW requires extra overhead to set up, so it
can be slower for small rectangles, especially on 8-bit VGAs.  Doing an
entire plane at a time can produce a "fading-in" effect for large images,
because all columns for one plane are drawn before any columns for the next;
if this is a problem, the four planes can be cycled through once for each
scan line, rather than once for the entire rectangle.

Listing Five is 2.5 times faster than Listing Four at clearing the screen on
a 20-MHz cached 386 with a Paradise VGA.  Although Listing Five is slightly
slower than an equivalent mode 13h fill routine would be, it's not grievously
so.  In general, performing plane-at-a-time operations can make almost any
mode X operation, at the worst, nearly as fast as the same operation in mode
13h (although this sort of mode X programming is admittedly fairly complex).
In this pursuit, it can help to organize data structures with mode X in mind.
For example, icons could be prearranged in system memory with the pixels
organized into four plane-oriented sets (or, again, in four sets per scan
line to avoid a fading-in effect) to facilitate copying to the screen a plane
at a time with REP MOVS.

Hardware Assist from an

Unexpected Quarter

Listing Five illustrates the benefits of designing code from a mode X
perspective; this is the software aspect of mode X optimization, which
suffices to make mode X about as fast as mode 13h.  That alone makes mode X
an attractive mode, given its square pixels, page flipping, and offscreen
memory, but superior performance would nonetheless be a pleasant addition to
that list.  Superior performance is indeed possible in mode X, although,
oddly enough, it comes courtesy of the VGA's hardware, which was never
designed to be used in 256-color modes.

All of the VGA's hardware assist features are available in mode X, although
some are not particularly useful.  The VGA hardware feature that's truly the
key to mode X performance is the ability to process four planes' worth of
data in parallel; this includes both the latches and the capability to fan
data out to any or all planes.  For rectangular fills, we'll just need to fan
the data out to various planes, so I'll defer a discussion of other hardware
features until another column.  (By the way, the ALUs, bit mask, and most
other VGA hardware features are also available in mode 13h -- but parallel
data processing is not.)

In planar modes, such as mode X, a byte written by the CPU to display memory
may actually go to anywhere between zero and four planes, as shown in Figure
2.  Each plane for which the setting of the corresponding bit in the Map Mask
register is 1 receives the CPU data, and each plane for which the
corresponding bit is 0 is not modified.

In 16-color modes, each plane contains one-quarter of each of eight pixel,s
with the 4 bits of each pixel scanning all four planes.  Not so in mode X.
Look at Figure 1 again; each plane contains one pixel in its entirety, with
four pixels at any given address, one per plane.  Still, the Map Mask
register does the same job in mode X as in 16-color modes; set it to 0Fh (all
1-bits), and all four planes will be written to by each CPU access.  Thus, it
would seem that up to four pixels could be set by a single mode X byte-sized
write to display memory, potentially speeding up operations like rectangle
fills by four times.

And, as it turns out, four-plane parallelism works quite nicely indeed.
Listing Six is yet another rectangle-fill routine, this time using the Map
Mask to set up to four pixels per STOS.  The only trick to Listing Six is
that any left or right edge that isn't aligned to a multiple-of-four pixel
column (that is, a column at which one four-pixel set ends and the next
begins) must be clipped via the Map Mask register, because not all pixels at
the address containing the edge are modified.  Performance is as expected;
Listing Siz is nearly ten times faster at clearing the screen than Listing
Four and just about four times faster than Listing Five -- and also about
four times faster than the same rectangle fill in mode 13h.  Understanding
the bitmap organization and display hardware of mode X does indeed pay.

Just so you can see mode X in action, Listing Seven (page 158) is a sample
program that selects mode X and draws a number of rectangles.  Listing Seven
links to any of the rectangle fill routines I've presented.

And now, I hope, you begin to see why I'm so fond of mode X.  Next month,
we'll continue with mode X by exploring the wonders that the latches and
parallel plane hardware can work on scrolls, copies, blits, and pattern
fills.

Notes From the Edsun Front

Comments coming my way indicate a great deal of programmer interest in the
Edsun CEG/DAC, of which I wrote in April and May.  However, everyone who has
actually programmed the CEG/DAC complains about how hard it is; the results
are nice, but the process of getting there is anything but.  Nonetheless,
programming the CEG/DAC is certainly a solvable problem, and whoever solves
it best will come out looking mighty good.  A fair analogy is writing active
TSRs.  Six years ago, TSR-writing was black magic, and Sidekick, primitive by
today's standards, made a fortune.  Today, any dope can choose from dozens of
books and toolkits and make a rock-solid TSR in a few hours.  As programmers
develop better tools and a better understanding of the CEG/DAC, the grumbling
will subside, and the software will take off.  Another case of complexity
providing opportunity.

Book of the Month

This month's book is Advanced Programmer's Guide to SuperVGAs, by Sutty and
Blair (Brady, 1990, ISBN 0-13010455-8; $44.95).  Pricey for softcover, but
included in that price is a diskette of SuperVGA assembly code (which I have
not tried out).  This book is the single best guide I've seen to the
Byzantine world of SuperVGA programming, where every one of dozens of VGA
models has different mode numbers and banking schemes.  Take it from someone
who's waded through a slew of chip databooks and applications notes -- this
book will save you a lot of time and aggravation if you have to program
SuperVGAs directly.

Still, not everything I'd like to see is in there.  For example, they cover
only the Tseng Labs ET3000 chip, not the now widely used ET4000 that supports
15-bpp graphics.  That's not the authors' fault, of course; it's a reflection
of the incredible diversity and rate of change in the SuperVGA arena.

Mode X.  The Edsun CEG/DAC.  SuperVGA programming.  In complexitate est
opportunitas.  Q.E.D.