Journal:   Dr. Dobb's Journal  June 1992 v17 n6 p139(7)
-----------------------------------------------------------------------------
Title:     Fast antialiasing. (Column)
Author:    Abrash, Michael.
AttFile:    Program:  GP-JUN92.ASC  Source code listing.

Abstract:  Advanced computer graphics techniques require powerful computers
           and dedicated systems to handle the output, but utility
           microcomputers based on Intel's 80386 and 80486 microprocessors
           become burdened when trying to compute such graphics features as
           anti aliasing.  Anti aliasing involves smoothing lines and edges
           to remove the jagged appearance.  Anti aliasing makes computer
           images more attractive and aids in making technical drawings more
           accurate.  An algorithm developed by Xiaolin Wu describing line
           anti aliasing is described.  The algorithm brackets either side of
           a line's pixel with pixels of differing values of color.  The sum
           intensity of the bordering pixels equals the intensity of the
           line's pixel.
-----------------------------------------------------------------------------
Descriptors..
Topic:     Tutorial
           Computer Graphics
           Programming
           Program Development Techniques
           Rendering
           Dithering
           Algorithms.
Feature:   illustration
           chart
           program.

-----------------------------------------------------------------------------
Full Text:

The thought first popped into my head as I unenthusiastically picked through
the salad bar at a local "family" restaurant, trying to decide whether the
meatballs, the fried clams, or the lasagna was likely to shorten my life the
least.  I decided on the chicken in mystery sauce.

The thought recurred when my daughter asked, "Dad, is that fried chicken?"

"I don't think so," I said.  "I think it's stewed chicken."

"It looks like fried chicken."

"Maybe it's fried, stewed chicken," my wife volunteered hopefully.  I took a
bite.  It was, indeed, fried, stewed chicken.  I can now, unhesitatingly and
without reservation, recommend that you avoid fried, stewed chicken at all
costs.

The thought I had was as follows: This is not good food.  Not a profound
thought, but it raises an interesting question: Why was I eating in this
restaurant?  The answer, to borrow a phrase from E.F.  Schumacher, is
appropriate technology.  For a family on a budget, with a small child, tired
of starting at each other over the kitchen table, this was a perfect place to
eat.  It was cheap, it had greasy food and ice cream, no one cared if
children dropped things or talked loudly or walked around, and most important
of all, it wasn't home.  So what if the food was lousy?  Good food was a
luxury, a bonus; everything on the above list was necessary.  A family
restaurant was the appropriate dining-out technology, given the parameters
within which we had to work.

When I read through SIGGRAPH proceedings and other state-of-the-art
computer-graphics material, all too often I feel like I'm dining at a
four-star restaurant with two-year-old triplets and an empty wallet.  We're
talking incredibly inappropriate technology for PC graphics here.  Sure, I
say to myself as I read about an antialiasing technique, that sounds
wonderful--if I had 24-bpp color, and dedicated hardware to do the
processing, and all day to wait to generate one image.  Yes, I think, that is
a good way to do hidden surface removal--in a system with hardware
z-buffering.  Most of the stuff in Computer Graphics is riveting, but, alas,
pretty much useless on PCs.  When an 80x86 has to do all the work, speed
becomes the overriding parameter, especially for real-time graphics.

Literature that's applicable to fast PC graphics is hard enough to find, but
what we'd really like is above-average image quality combined with terrific
speed, and there's almost no literature of that sort around.  There is some,
though, and you folks are right on top of it.  For example, alert reader
Michael Chaplin, of San Diego, wrote to suggest that I might enjoy the
line-antialiasing algorithm presented in Xiaolin Wu's article, "An Efficient
Antialiasing Technique," in the July 1991 issue of Computer Graphics.
Michael was dead-on right.  This is a great algorithm, combining excellent
antialiased line quality with speed that's close to that of non-antialiased
Bresenham's line drawing.  This is the sort of algorithm that makes you want
to go out and write a wireframe animation program, just so you can see how
good those smooth lines look in motion.  Wu antialiasing is a wonderful
example of what can be accomplished on inexpensive, mass-market hardware with
the proper programming perspective.  In sort, it's a splendid example of
appropriate technology for PCs.

Wu Antialiasing

To recap briefly, antialiasing is the process of smoothing lines and edges so
that they appear less jagged.  Antialiasing is partly an aesthetic issue,
because it makes images more attractive.  It's partly an accuracy issue,
because it makes it possible to position and draw images with effectively
more precision than the resolution of the display.  Finally, it's partly a
flat-out necessity, to avoid the horrible, crawling, jagged edges of temporal
aliasing when performing animation.

The basic premise of Wu antialiasing is almost ridiculously simple: As the
algorithm steps one pixed unit at a time along the major (longer) axis of a
line, it draws the two pixels bracketing the line along the minor acis at
each point.  Each of the two bracketing pixels is drawn with a weighted
fraction of the full intensity of the drawing color, with the weighting for
each pixel equal to one minus the pixel's distance along the minor axis from
the ideal line.  Figure 1 illustrates this concept.

The intensities of the two pixels that bracket the line are selected so that
they always sum to exactly 1; that is, to the intensity of one fully
illuminated pixel of the drawing color.  The presence of aggregate full-pixel
intensity means that at each step, the line has the same brightness it would
have if a single pixel were drawn at precisely the correct location.
Moreover, thanks to the distribution of the intensity weighting, that
brightness is centered at the ideal line.  Not coincidentally, a line drawn
with pixel pairs of aggregate single-pixel intensity, centered on the ideal
line, is perceived by the eye not as a jagged collection of pixel pairs, but
as a smooth line centered on the ideal line.  Thus, by weighting the
bracketing pixels properly at each step, we can readily produce what looks
like a smooth line at precisely the right location, rather than the jagged
pattern of line segments that non-antialiased line-drawing algorithms such as
Bresenham's trace out.

You might expect that the implementation of Wu antialiasing would fall into
two distinct areas: tracing out the line (that is, finding the appropriate
pixel pairs to draw) and calculating the appropriate weightings for each
pixel pair.  Not so, however.  The weighting calculations involved only a few
shifts, XORs, and adds; for all practical purposes, tracing and weighting are
rolled into one step--and a very fast step it is.  How fast is it?  On a
33-MHz 486 with a fast VGA, a good but not maxed-out assembler implementation
of Wu antialiasing draws a more than respectable 5000 150-pixel-long vectors
per second.  That's especially impressive considering that about 1,500,000
actual pixels are drawn per second, meaning that Wu antialiasing is drawing
at over 50 percent of the maximum memory bandwidth--and hence more than half
the faster theoretically possible drawing speed--of an AT-bus VGA.  In short,
Wu antialiasing is about as fast an antialiased line approach as you could
ever hope to find for the VGA.

Tracing and Intensity in One

Horizontal, vertical, and diagonal lines do not require Wu antialiasing
because they pass through the center of every pixel they meet; such lines can
be drawn with fast, special-case code.  For all other cases, Wu lines are
traced out one step at a time along the major axis by means of a simple,
fixed-point algorithm.  The move along the minor axis with respect to a
one-pixel move along the major axis (the line slope for lines with slopes
less than 1, 1/slope for lines with slopes greater than 1) is calculated with
a single integer divide.  This value, called the "error adjust," is stored as
a fixed-point fraction, in 0.16 format (that is, all bits are fractional, and
the decimal point is just to the left of bit 15).  An error accumulator, also
in 0.16 format, is initialized to 0.  Then the first pixel is drawn; no
weighting is needed, because the line intersects its endpoints exactly.

Now the error adjust is added to the error accumulator.  The error
accumulator indicates how far between pixels the line has progressed along
the minor axis at any given step; when the error accumulator turns over, it's
time to advance one pixel along the minor axis.  At each step along the line,
the major-axis coordinate advances by one pixel.  The two bracketing pixels
to draw are simply the two pixels nearest the line along the minor axis.  For
instance, if X is the current major-axis coordinate and Y is the current
minor-axis coordinate, then the two pixels to be drawn are (X,Y) and (X,Y +
1).  In short, the derivation of the pixels at which to draw involves nothing
more complicated than advancing one pixel along the major axis, adding the
error adjust to the error accumulator, and advancing one pixel along the
minor axis when the error accumulator turns over.

So far, nothing special; but now we come to the true wonder of Wu
antialiasing.  We know which pair of pixels to draw at each step along the
line, but we also need to generate the two proper intensities, which must be
inversely proportional to distance from the ideal line and sum to 1, and
that's a potentially time-consuming operation.  Let's assume, however, that
the number of possible intensity levels to be used for weighting is the value
NumLevels = [2.sup.n] for some integer n, with the minimum weighting (0
percent intensity) being the value [2.sup.n-1], and the maximum weighting
(100 percent intensity) being the value 0.  Given that, lo and behold, the
most significant n bits of the error accumulator select the proper intensity
value for one element of the pixel pair, as shown in Figure 2.  Better yet,
[2.sup.n-1] minus the intensity of the first pixel selects the intensity of
the other pixel in the pair, because the intensities of the two pixels must
sum to 1; as it happens, this result can be obtained simply by flipping the n
least-significant bits of the first pixel's value.  All this works because
what the error accumulator accumulates is precisely the ideal line's current
distance between the two bracketing pixels.

The intensity calculations take longer to describe than they do to perform.
All that's involved is a shift of the error accumulator to right-justify the
desired intensity weighting bits, and then an XOR to flip the
least-significant n bits of the first pixel's value in order to generate the
second pixel's value.  Listing One (page 154) illustrates just how efficient
Wu antialiasing is; the intensity calculations take only three statements;
and the entire Wu line-drawing loop is only nine statements long.  Of course,
a single C statement can hide a great deal of complexity, but Listing Six
(page 156), the inner loop of an assembler implementation, shows that only 15
instructions are required per step along the major axis--and the number of
instructions could be reduced to ten by special-casing and loop unrolling.
Make no mistake about it, Wu antialiasing is fast.

Sample Wu Antialiasing

The true test of any antialiasing technique is how good it looks, so lt's
have a look at Wu antialiasing in action.  Listing One is a C implementation
of Wu antialiasing.  Listing Two (page 155) is a sample program that draws a
variety of Wu-antialiasing lines, followed by non-antialiased lines, for
comparison.  Listing Three (page 156) contains DrawPixel() and SetMode()
functions for mode 13h, the VGA's 320x200 256-color mode.  Finally, Listing
Four (page 156) is a simple, non-antialiased line-drawing routine.  Link
these four listings together and run the resulting program to see both
Wu-antialiased and non-antialiased lines.

Listing One isn't particularly fast, because it calls DrawPixel() for each
pixel.  On the other hand, DrawPixel() makes it easy to try out Wu
antialiasing in a variety of modes; just adapt the code in Listing Three for
the 256-color mode you want to support.  For example, Listing Five (page 156)
shows code to draw Wu-antialiased lines in 640x480 256-color mode on a
Super-VGA built around the Tseng Labs ET4000 chip with at least 512K of
display memory installed.  It's well worth checking out Wu antialiasing at
640x480.  Although antialiased lines look much smoother than normal lines at
320 x 200 resolution, they're far from perfect, because the pixels are so big
that the eye can't blend them properly.  At 640x480, however, Wu-antialiased
lines look fabulous; from a couple of feet away, they look as straight and
smooth as if they were drawn with a ruler.

Listing One requires that the DAC palette be set up so that a NumLevel-long
block of palette entries contains linearly decreasing intensities of the
drawing color.  The size of the block is programmable, but must be a power of
two.  The more intensity levels, the better.  Wu says that 32 intensities is
enough; on my system, eight and even four levels looked pretty good.  I found
that gamma correction, which gives linearly spaced intensity steps, improved
antialiasing quality significantly.  Fortunately, we can program the palette
with gamma-corrected values, so our drawing code doesnt's have to do eny
extra work.

Listing One isn't very fast, so I implemented Wu antialiasing in assembler,
hard-coded for mode 13h.  The inner loop of the assmbler code is shown in
Listing Six; the full assembler routine will be available as part of the code
archive from this issue of DDJ.  (See "Availability" on page 3.)  High-speed
graphics code and fast VGAs go together like peanut butter and jelly, which
is to say very well indeed; the assembler implementation ran more than twice
as fast on my 486 after I ran the SETBUS utility from last month to put the
VGA into 16-bit mode.  Enough said!

Notes on Wu Antialiasing

Wu antialiasing can be applied to any curve for which it's possible to
calculate at each step the positions and intensities of two bracketing
pixels, although the implementation will generally be nowhere near as
efficient as it is for lines.  However, Wu's anticle in Computer Graphics
does describe an efficient algorithm for drawing antialiased circles.  Wu
also describes a technique for antialiasing solids, such as filled circles
anf polygons.  Wu's approach biases the edges of filled objects outward.
Although this is no good for adjacent polygons of the sort used in rendering,
it's certainly possible to design a more accurate polygon-antialiasing
approach around Wu's basic weighting technique.  The results would not be
quite so good as more sophisticated antialiasing techniques, but they woudl
be much faster.

In general, in fact, the results obtained by Wu antialiasing are only so-so,
by theoretical measures.  Wu antialiasing amounts to a simple box filter
placed over a step approximation of a line, and that process introduces a
good deal of deviation from the ideal.  On the other hand, Wu notes that even
a 10 percent error in intensity doesn't lead to noticeable loss of image
quality, and for Wu-antialiased lines up to 1K pixels in length, the error is
under 10 percent.  If it looks good, it is good--and it looks good.  With a
16-bit error accumulator, fixed-point inaccuracy becomes a problem for
Wu-antialiased lines longer than 1K.  For such lines, you should switch to
using 32-bit error values, which would let you handle lines of any practical
length.

In the listings, I have chosen to truncate, rather than round, the
error-adjust value.  This increases the intensity error of the line but
guarantees that fixed-point inaccuracy won't cause the minor axis to advance
past the endpoint.  Over-running the endpoint would result in the drawing of
pixels outside the line's bounding box, and potentially even in an attempt to
access pixels off the edge of the bitmap.

Finally, I should mention tha, as published, Wu's algorithm draws lines
symmetrically, from both ends at once.  I haven't done this for a number of
reasons, not least of which is that symmetric drawing is an inefficient way
to draw lines that span banks on banked Super-VGAs.  Banking aside, however,
symmetric drawing is potentially faster, because it eliminates half of all
calculations; in so doing, it cuts intensity error in half, as well.

Matrix Orthogonalization Made Easy

Peter Brooks of MicroMind passed along the following nifty idea.  Fixed-point
rotation matrices tend to drift from the proper values over the course of
repeated concatenations, due to fixed-point error, with the colums gradually
ceasing to be mutually perpendicular  (orthogonal).  It is essential that
rotation matrices remain orthogonal, however, in order to perform
nondistorting rotations.  One way to deal with this is to periodically
recalculate one of the column as the cross-product of the other two, which
works because the cross-product of two vectors is an orthogonal vector.

Peter's insight was: Why not recalculate one colum as the cross-product of
the other two every time you recalculate a rotation matrix?  In other words,
do the matrix multiplication to generate two columns of the result matrix,
then just calculate the third column as the cross-product of the other two.
The row or column calculated as the cross-product should be rotated
regularly.  Not only does this guarantee an orthogonal matrix at all times,
but it also turns out to be faster, because it takes fewer operations to
calculate a cross-product than to recalculate a matrix column.  (All of the
above applies equally well if you substitute "row" for "column.")

This sounds like a great idea to me.  My only question is whether one should
alternate between rows and columns, in order to distribute error more evenly.
Any thoughts, readers?

Next Time

We didn't quite make it back to X-Sharp this month, although the topics were
certainly related.  Next month, for sure.  In the meantime, keep those
excellent suggestions coming.  And stay the heck away from fried, stewed
chicken.