Journal:   Dr. Dobb's Journal  Feb 1992 v17 n2 p125(8)
-----------------------------------------------------------------------------
Title:     More 3-D animation. (Graphics Programming)(column) (Tutorial)
Author:    Abrash, Michael.
AttFile:    Program:  3D.EXE  Self extracting archive.
            Program:  GP-FEP92.ASC  Source code listing.

Abstract:  Computer animation, especially three-dimensional animation, is as
           much the art of illusion as a mathematical procedure.  The
           programmer must compensate for such imperfections as 'jaggies' and
           flicker which the viewer would notice, but others are 'deleted' by
           the human brain which makes certain assumptions when viewing a
           scene.  One-sided polygons, drawn as if they were not visible to
           the viewer in two-dimensional space, are discussed.  Removing
           polygons that are not facing the viewer would solve all 'hidden
           surface' problems for a single convex polyhedron, but overlapping
           polyhedrons require other methods such as determining the
           cross-product of two vectors.  Sample programs that perform
           backface removal, incremental transformation and other animation
           algorithms are presented.
-----------------------------------------------------------------------------
Descriptors..
Topic:     Animation
           Tutorial
           Programming Instruction
           Three-Dimensional Graphics
           Programming
           Graphics Systems.
Feature:   illustration
           program.

-----------------------------------------------------------------------------
Full Text:

As I'm fond of pointing out, computer animation isn't a matter of
mathematically exaxt modeling or raw technical prowess, but rather of fooling
the eye and the mind.  That's especially true for 3-D animation, where we're
not only trying to convince viewers that they're seeing objects on a screen
-- when in truth that screen contains no objects at all, only gaggles of
pixels--but we're also trying to create the illusion that the objects exist
in three-space, possessing four dimensions (counting movement over time) of
their own.  To make this magic happen, we must provide cues for the eyes not
only to pick out boundaries, but also to detect depth, orientation, and
motion.  This involves perspective, shading, proper handling of hidden
surfaces, and rapid and smooth screen updates; the whole deal is considerably
more difficult to pull off on a PC than 2-D animation.

And yet, in some senses, 3-D animation is easier than 2-D.  Because there's
more going on in 3-D animation, the eye and brain tend to make more
assumptions, and so are more apt to see what they expect to see, rather than
what's actually there.  If you're piloting a (virtual) ship through a field
of thousands of asteroids at high speed, you're unlikely to notice if the
more distant asteroids occasionally seem to go right through each other, or
if the topographic detail on the asteroids' surfaces sometimes shifts about a
bit.  You'll be busy viewing the asteroids in their primary role, as objects
to be navigated around, and the mere presence of topographic detail will
suffice; without being aware of it, you'll fill in the blanks.  Your mind
will see the topography peripherally, recognize it for what it is supposed to
be, and, unless that landscape does something really obtrusive such as
vanishing altogether or suddenly shooting a spike miles into space, you will
see what you expect to see: a bunch of nicely detailed asteroids tumbling
around you.

To what extent can you rely on the eye and mind to make up for imperfetions
in the 3-D animation process?  In some areas, hardly at all; for example,
jaggies crawling along edges stick out like red flags, and likewise for
flicker.  In other areas, though, the human perceptual system is more
forgiving than you'd think.  Consider this.  At the end of Return of the
Jedi, in the battle to end all battles around the Death Star, there is a
sequence of about five seconds in which several spaceships are visible in the
background.  One of those spaceships (and it's not very far in the
background, either) looks a bit unusual.  What it looks like is a sneaker.
In fact, it is a sneaker -- but unless you know to look for it, you'll never
notice it, because your mind is busy making simplifying assumptions about the
complex scene it's seeing--and one of those assumptions is that medium-sized
objects floating in space are spaceships, unless proven otherwise.  (Thanks
to Chris Hecker for pointing this out.  I'd never have noticed the sneaker,
myself, witout being tipped off--which is, of course, the whole point.)

If it's good enough for George Lucas, it's good enough for us.  And with
that, let's resume our request for real-time 3-D animation on the PC.

One-sided Polygons: Backface Removal

Last month, we implemented the basic polygon drawing pipeline, transforming a
polygon all the way from its basic definition in object space, through the
shared 3-D world space, and into the 3-D space as seen from the viewpoint,
called "view space."  From view space, we performed a perspective projection
to convert the polygon into screen space, then mapped the transformed and
projected vertices to the nearest screen coordinates and filled the polygon.
Armed with code that implemented this pipeline, we were able to watch as a
polygon rotated about its Y axis, and were able to move the polygon around in
space freely.

One of the drawbacks of last month's approach was that the polygon had two
visible sides.  Why is that a drawback?  Well, it isn't, necessarily, but in
our case we want to use polygons to build solid objects with continuous
surfaces, and in the context, only one side of a polygon is ever visible; the
other side always faces the inside of the object, and can never be seen.  It
would save time and simplify the process of hidden surface removal if we
could quickly and easily determine whether the inside or outside face of each
polygon was facing us, so that we could draw each polygon only if it were
visible (that is, had the outside face pointing toward the viewer).  On
average, half the polygons in an object could be instantly rejected by a test
of this sort.  Such testing of polygon visibility goes by a number of names
in the literature, including backplane culling, backface removal, and
assorted variations thereon; I'll refer to it as backface removal.

For a single convex polyhedron, removal of polygons that aren't facing the
viewer would solve all hidden surface problems.  In a convex polyhedron, any
polygon facing the viewer can never be obscured by any other polygon in that
polyhedron; this falls out of the definition of a convex polyhedron.
Likewise, any polygon facing away from the viewer can never be visible.
Therefore, in order to draw a convex polyhedron, if you draw all polygons
facing toward the viewer but none facing away from the viewer, everything
will work out properly, with no additional checking for overlap and hidden
surfaces needed.

Unfortunately, backface removal completely solves the hidden surface problem
only for convex polyhedrons, and only if there's a single convex polyhederon
involved; when convex polyhedrons overlap, other methods must be used.
Nonetheless, backface removal does instantly halve the number of polygons to
be handled in rednering any particular scene.  Backface removal can also
speed hidden-surface handling if objects are built out of convex polyhedrons,
as we'll see in a future column.  This month, though, we have only one convex
polyhedron to deal wit, so backface removal alone will do the trick.

Give that I've convinced you that backface removal would be a handy thing to
have, how do we actually do it?  A it?  A logical approach, often implemented
in PC literature, would be to calculate the plane equation for the plane in
which the polygon lies, and see which way the normal (perpendicular) vector
to the plane points.  That works, but there's a more efficient way to
calculate the normal to the polygon: as the cross-product of two of the
polygon's edges.

The cross-product of two vectors is defined as the vector shown in Figure 1.
One interesting property of the cross-product vector is that it is
perpendicular to the plane in which the two original vector lie.  If we take
the cross-product of the vectors that form two edges of a polygon, the result
will be a vector perpendicular to the polygon; then, we'll know that the
polygon is visible if and only if the cross-product vector points toward the
viewer.  We need one more thing to make the cross-product approach work,
though.  The cross-product can actually point either way, depending on which
edges of the polygon we choose to work with and the order in which we
evaluate them, so we must establish some conventions for defining polygons
and evaluating the cross-product.  We'll define only convex polygons, with
the vertices defined in clockwise order, as viewed from the outside; that is,
if you're looking at the visible side of the polygon, the vertices will
appear in the polygon definition in clockwise order.  With those assumptions,
the cross-product becomes a quick and easy indicator of polygon orientation
with respect to the viewer; we'll calculate it as the cross-product of the
first and last vectors in a polygon, as shown in Figure 2, and it it's
pointing toward the viewer, we'll know that the polygon is visible.
Actually, we don't even have to calculate the entire cross-product vector,
because the Z component alone suffices to tell us which way the polygon is
facing: positive Z means visible, negative Z means not.  The Z component can
be calculated very efficiently, with only two multiplies and a subtraction.

The question remains of the proper space in which to perform backface
removal.  There's a temptation to perform it in view space, which is, after
all, the space defined with respect to the viewer, but view space is not a
good choice.  Screen space--the space in which perspective projection has
been performed--is the best choice.  The purpose of backface removal is to
determine whether each polygon is visible to the viewer, and, despite its
name, view space does not provide that information; unlike screen space, it
does not reflect perspective effects.

Backface removal may also be performed using the polygon vertices in screen
coordinates, which are integers.  This is less accurate than using the screen
space coordinates, which are floating point, but is, by the same token,
faster.  In Listing Three, backface removal is performed in screen
coordinates in the interest of speed.

Backface removal, as implemented in Listing Three, will not work reliably if
the polygon is not convex, if the vertices don't appear in counterclockwise
order, if either the first or last edge in a polygon has zero length, or if
the first and last edges are congruent.  These latter two points are the
reason it's preferable to wrok in screen space rather than screen coordinates
(which suffer from rounding problems), speed considerations aside.

Backface Removal in Action

Listings One through Five together form a program that rotates a solid cube
in real time under user control.  Listing One (page 147) is the main program;
Listing Two (page 147) performs transformation and projection; Listing Three
(page 147) performs backface removal and draws visible faces; Listing Four
(page 148) concatenates incremental rotations to the object-to-world
transformation matrix; Listing Five (page 150) is the general header file.
Also required from previous columns are: Listings One and Two from last month
(draw clipped line list, matrix math functions); Listings One and Six from
July 1991, (mode X mode set, rectangle fill); Listing Six from September
1991; Listing Four from March 1991 (polygon edge scan); and the
FillConvexPolygonO function from Listing One from February 1991.  All
necessary modules, along with a make file, will be available as part of the
code from this issue.

The sample program places a cube, floating in three-space, under the complete
control of the user.  The arrow keys may be used to move the cube left,
right, up, and down, and the A and T keys may be used to move the cube away
from or toward the viewer.  The F1 and F2 keys perform rotation around the Z
axis, the axis running from the viewer straight into the screen.  The 4 and 6
keys perform rotation around the Y (vertical) axis, and the 2 and 8 keys
perform rotation around the X axis, which runs horizontally across the
screen; the latter four keys are most conveniently used by flipping the
keypad to the numeric state.

The demo involves six polygons, one for each side of the cube.  Each of the
polygons must be transformed and projected, so it would seem that 24 vertices
(four for each polygon) must be handled, but some steps ahve been taken to
improve performance.  All vertices for the object have been stored in a
single list; the definition of each face contains not the vertices for that
face themselves, but rather indexes into the object's vertex list, as shown
in Figure 3.  This reduces the number of vertices to be manipulated from 24
to 8 , for there are, after all, only eight vertices in a cube, with three
faces sharing each vertex.  In this way, the transformation burden is
lightened by two-thirds.  Also, as mentioned earlier, backface removal is
performed with integers, in screen coordinates, rather than with
floating-point values in screen space.  Finally, the RecalcXForm flag is set
whenever the user changes the object-to-world transformation.  Only when this
flaf is set is the full object-to-view transformation recalculated and the
object's vertices transformed and projected again: otherwise, the values
already stored within the object are reused.  In the sample application, this
brings no visible improvement, because there's only the one object, but the
underlying mechanism is sound: In a full-blown 3-D animation application,
with multiple objects moving about the screen, it would help a great deal to
flag which of the objects had moved with respect to the viewer, performing a
new transformation and projection only for those that had.

With the above optimizations, the sample program is certainly adequately
responsive on a 20-MHz 386 (sans 387; I'm sure it's wonderfully responsive
with a 387).  Still, it couldn't quite keep up with the keyboard when I
modified it to read only one key each time through the loop--and we're
talking about only eight vertices here.  This indicates that we're already
near the limit of animation complexity possible with our current approach.
It's time to start rethinking that approach; over two-thirds of the overall
time is spent in floating-point calculations, and it's there that we'll begin
to attack the performance bottleneck we find ourselves up against.

Incremental Transformation

Listing Four contains three functions; each concatenates an additional
rotation around one of the three axes to an existing rotation.  In order to
improve performance, only the matrix entries that are affected in a rotation
around each particular axis are recalculated (all but four of the entries in
a single-axis rotation matrix are either 0 or 1, as shown last month).  This
cuts the number of floating-point multiplies from the 64 required for the
multiplication of two 4X4 matrices to just 12, and floating point adds from
48 to 6.

Be aware that Listing Four performs an incremental rotation on top of
whatever rotation is already in the matrix.  The cube may already have been
turned left, right, up, down, and sideways; regardless, Listing Four just
tacks the specified rotation onto whatever already exists.  In this way, the
object-to-world transformation matrix contains a history of all the rotations
ever specified by the user, concatenated one after another onto the original
matrix.  Potential loss of precisionis a problem associated with using such
an approach to represent a very long concatenation of transformations,
especially with fixed-point arithmetic; that's not a problem for us yet, but
we'll run into it eventually.

A Note on Rounding Negative Numbers

Last month, I added 0.5 and truncated in order to round values from
floating-point to integer format.  In Listing Two this month, I've switched
to adding 0.5 and using the floorO function.  For positive values, the two
approahces are equivalent; for negative values, only the floorO approach
works properly.

Object Representation

Each object consists of a list of vertices and a list of faces, with the
vertices of each face defined by pointers into the vertex list; this allows
each vertex to be transformed exactly once, even though several faces may
share a single vertex.  Each object contains the vertices not only in their
original, untransformed state, but in three other forms as well: transformed
to view space, transformed and projected to screen space, and converted to
screen coordinates.  Earlier, we saw that it can be convenient to store the
screen coordinates within the object, so that if the object hasn't moved with
respect to the viewer, it can be redrawn without the need for recalculation,
but why bother storing the view and screen space forms of the vertices as
well?

The screen space vertices are useful for some sorts of hidden surface
removal.  For example, in order to determine whether two polygons overlap as
seen by the viewer, you must first know how they look to the viewer,
accounting for perspective; screen space provides that information.  (So do
the final screen coordinates, but with less accuracy, and without any Z
information.)  The view space vertices are useful for collision and proximity
detection; screen space can't be used here, because objects are distorted by
the perspective projection into screen space.  World space would serve as
well as view space for collision detection, but because it's possible to
transform directly from object space to view space with a single matrix, it's
often preferable to skip over world space altogeher.  It's not mandatory that
vertices be stored for all these different spaces, but the coordinates in all
those spaces have to be calculated as intermediate steps anyway, so we might
as well keep them around for those occasions when they're needed.

Coming up: shading, hidden surfaces, and performance.