Journal: Dr. Dobb's Journal Feb 1992 v17 n2 p125(8) ----------------------------------------------------------------------------- Title: More 3-D animation. (Graphics Programming)(column) (Tutorial) Author: Abrash, Michael. AttFile: Program: 3D.EXE Self extracting archive. Program: GP-FEP92.ASC Source code listing. Abstract: Computer animation, especially three-dimensional animation, is as much the art of illusion as a mathematical procedure. The programmer must compensate for such imperfections as 'jaggies' and flicker which the viewer would notice, but others are 'deleted' by the human brain which makes certain assumptions when viewing a scene. One-sided polygons, drawn as if they were not visible to the viewer in two-dimensional space, are discussed. Removing polygons that are not facing the viewer would solve all 'hidden surface' problems for a single convex polyhedron, but overlapping polyhedrons require other methods such as determining the cross-product of two vectors. Sample programs that perform backface removal, incremental transformation and other animation algorithms are presented. ----------------------------------------------------------------------------- Descriptors.. Topic: Animation Tutorial Programming Instruction Three-Dimensional Graphics Programming Graphics Systems. Feature: illustration program. ----------------------------------------------------------------------------- Full Text: As I'm fond of pointing out, computer animation isn't a matter of mathematically exaxt modeling or raw technical prowess, but rather of fooling the eye and the mind. That's especially true for 3-D animation, where we're not only trying to convince viewers that they're seeing objects on a screen -- when in truth that screen contains no objects at all, only gaggles of pixels--but we're also trying to create the illusion that the objects exist in three-space, possessing four dimensions (counting movement over time) of their own. To make this magic happen, we must provide cues for the eyes not only to pick out boundaries, but also to detect depth, orientation, and motion. This involves perspective, shading, proper handling of hidden surfaces, and rapid and smooth screen updates; the whole deal is considerably more difficult to pull off on a PC than 2-D animation. And yet, in some senses, 3-D animation is easier than 2-D. Because there's more going on in 3-D animation, the eye and brain tend to make more assumptions, and so are more apt to see what they expect to see, rather than what's actually there. If you're piloting a (virtual) ship through a field of thousands of asteroids at high speed, you're unlikely to notice if the more distant asteroids occasionally seem to go right through each other, or if the topographic detail on the asteroids' surfaces sometimes shifts about a bit. You'll be busy viewing the asteroids in their primary role, as objects to be navigated around, and the mere presence of topographic detail will suffice; without being aware of it, you'll fill in the blanks. Your mind will see the topography peripherally, recognize it for what it is supposed to be, and, unless that landscape does something really obtrusive such as vanishing altogether or suddenly shooting a spike miles into space, you will see what you expect to see: a bunch of nicely detailed asteroids tumbling around you. To what extent can you rely on the eye and mind to make up for imperfetions in the 3-D animation process? In some areas, hardly at all; for example, jaggies crawling along edges stick out like red flags, and likewise for flicker. In other areas, though, the human perceptual system is more forgiving than you'd think. Consider this. At the end of Return of the Jedi, in the battle to end all battles around the Death Star, there is a sequence of about five seconds in which several spaceships are visible in the background. One of those spaceships (and it's not very far in the background, either) looks a bit unusual. What it looks like is a sneaker. In fact, it is a sneaker -- but unless you know to look for it, you'll never notice it, because your mind is busy making simplifying assumptions about the complex scene it's seeing--and one of those assumptions is that medium-sized objects floating in space are spaceships, unless proven otherwise. (Thanks to Chris Hecker for pointing this out. I'd never have noticed the sneaker, myself, witout being tipped off--which is, of course, the whole point.) If it's good enough for George Lucas, it's good enough for us. And with that, let's resume our request for real-time 3-D animation on the PC. One-sided Polygons: Backface Removal Last month, we implemented the basic polygon drawing pipeline, transforming a polygon all the way from its basic definition in object space, through the shared 3-D world space, and into the 3-D space as seen from the viewpoint, called "view space." From view space, we performed a perspective projection to convert the polygon into screen space, then mapped the transformed and projected vertices to the nearest screen coordinates and filled the polygon. Armed with code that implemented this pipeline, we were able to watch as a polygon rotated about its Y axis, and were able to move the polygon around in space freely. One of the drawbacks of last month's approach was that the polygon had two visible sides. Why is that a drawback? Well, it isn't, necessarily, but in our case we want to use polygons to build solid objects with continuous surfaces, and in the context, only one side of a polygon is ever visible; the other side always faces the inside of the object, and can never be seen. It would save time and simplify the process of hidden surface removal if we could quickly and easily determine whether the inside or outside face of each polygon was facing us, so that we could draw each polygon only if it were visible (that is, had the outside face pointing toward the viewer). On average, half the polygons in an object could be instantly rejected by a test of this sort. Such testing of polygon visibility goes by a number of names in the literature, including backplane culling, backface removal, and assorted variations thereon; I'll refer to it as backface removal. For a single convex polyhedron, removal of polygons that aren't facing the viewer would solve all hidden surface problems. In a convex polyhedron, any polygon facing the viewer can never be obscured by any other polygon in that polyhedron; this falls out of the definition of a convex polyhedron. Likewise, any polygon facing away from the viewer can never be visible. Therefore, in order to draw a convex polyhedron, if you draw all polygons facing toward the viewer but none facing away from the viewer, everything will work out properly, with no additional checking for overlap and hidden surfaces needed. Unfortunately, backface removal completely solves the hidden surface problem only for convex polyhedrons, and only if there's a single convex polyhederon involved; when convex polyhedrons overlap, other methods must be used. Nonetheless, backface removal does instantly halve the number of polygons to be handled in rednering any particular scene. Backface removal can also speed hidden-surface handling if objects are built out of convex polyhedrons, as we'll see in a future column. This month, though, we have only one convex polyhedron to deal wit, so backface removal alone will do the trick. Give that I've convinced you that backface removal would be a handy thing to have, how do we actually do it? A it? A logical approach, often implemented in PC literature, would be to calculate the plane equation for the plane in which the polygon lies, and see which way the normal (perpendicular) vector to the plane points. That works, but there's a more efficient way to calculate the normal to the polygon: as the cross-product of two of the polygon's edges. The cross-product of two vectors is defined as the vector shown in Figure 1. One interesting property of the cross-product vector is that it is perpendicular to the plane in which the two original vector lie. If we take the cross-product of the vectors that form two edges of a polygon, the result will be a vector perpendicular to the polygon; then, we'll know that the polygon is visible if and only if the cross-product vector points toward the viewer. We need one more thing to make the cross-product approach work, though. The cross-product can actually point either way, depending on which edges of the polygon we choose to work with and the order in which we evaluate them, so we must establish some conventions for defining polygons and evaluating the cross-product. We'll define only convex polygons, with the vertices defined in clockwise order, as viewed from the outside; that is, if you're looking at the visible side of the polygon, the vertices will appear in the polygon definition in clockwise order. With those assumptions, the cross-product becomes a quick and easy indicator of polygon orientation with respect to the viewer; we'll calculate it as the cross-product of the first and last vectors in a polygon, as shown in Figure 2, and it it's pointing toward the viewer, we'll know that the polygon is visible. Actually, we don't even have to calculate the entire cross-product vector, because the Z component alone suffices to tell us which way the polygon is facing: positive Z means visible, negative Z means not. The Z component can be calculated very efficiently, with only two multiplies and a subtraction. The question remains of the proper space in which to perform backface removal. There's a temptation to perform it in view space, which is, after all, the space defined with respect to the viewer, but view space is not a good choice. Screen space--the space in which perspective projection has been performed--is the best choice. The purpose of backface removal is to determine whether each polygon is visible to the viewer, and, despite its name, view space does not provide that information; unlike screen space, it does not reflect perspective effects. Backface removal may also be performed using the polygon vertices in screen coordinates, which are integers. This is less accurate than using the screen space coordinates, which are floating point, but is, by the same token, faster. In Listing Three, backface removal is performed in screen coordinates in the interest of speed. Backface removal, as implemented in Listing Three, will not work reliably if the polygon is not convex, if the vertices don't appear in counterclockwise order, if either the first or last edge in a polygon has zero length, or if the first and last edges are congruent. These latter two points are the reason it's preferable to wrok in screen space rather than screen coordinates (which suffer from rounding problems), speed considerations aside. Backface Removal in Action Listings One through Five together form a program that rotates a solid cube in real time under user control. Listing One (page 147) is the main program; Listing Two (page 147) performs transformation and projection; Listing Three (page 147) performs backface removal and draws visible faces; Listing Four (page 148) concatenates incremental rotations to the object-to-world transformation matrix; Listing Five (page 150) is the general header file. Also required from previous columns are: Listings One and Two from last month (draw clipped line list, matrix math functions); Listings One and Six from July 1991, (mode X mode set, rectangle fill); Listing Six from September 1991; Listing Four from March 1991 (polygon edge scan); and the FillConvexPolygonO function from Listing One from February 1991. All necessary modules, along with a make file, will be available as part of the code from this issue. The sample program places a cube, floating in three-space, under the complete control of the user. The arrow keys may be used to move the cube left, right, up, and down, and the A and T keys may be used to move the cube away from or toward the viewer. The F1 and F2 keys perform rotation around the Z axis, the axis running from the viewer straight into the screen. The 4 and 6 keys perform rotation around the Y (vertical) axis, and the 2 and 8 keys perform rotation around the X axis, which runs horizontally across the screen; the latter four keys are most conveniently used by flipping the keypad to the numeric state. The demo involves six polygons, one for each side of the cube. Each of the polygons must be transformed and projected, so it would seem that 24 vertices (four for each polygon) must be handled, but some steps ahve been taken to improve performance. All vertices for the object have been stored in a single list; the definition of each face contains not the vertices for that face themselves, but rather indexes into the object's vertex list, as shown in Figure 3. This reduces the number of vertices to be manipulated from 24 to 8 , for there are, after all, only eight vertices in a cube, with three faces sharing each vertex. In this way, the transformation burden is lightened by two-thirds. Also, as mentioned earlier, backface removal is performed with integers, in screen coordinates, rather than with floating-point values in screen space. Finally, the RecalcXForm flag is set whenever the user changes the object-to-world transformation. Only when this flaf is set is the full object-to-view transformation recalculated and the object's vertices transformed and projected again: otherwise, the values already stored within the object are reused. In the sample application, this brings no visible improvement, because there's only the one object, but the underlying mechanism is sound: In a full-blown 3-D animation application, with multiple objects moving about the screen, it would help a great deal to flag which of the objects had moved with respect to the viewer, performing a new transformation and projection only for those that had. With the above optimizations, the sample program is certainly adequately responsive on a 20-MHz 386 (sans 387; I'm sure it's wonderfully responsive with a 387). Still, it couldn't quite keep up with the keyboard when I modified it to read only one key each time through the loop--and we're talking about only eight vertices here. This indicates that we're already near the limit of animation complexity possible with our current approach. It's time to start rethinking that approach; over two-thirds of the overall time is spent in floating-point calculations, and it's there that we'll begin to attack the performance bottleneck we find ourselves up against. Incremental Transformation Listing Four contains three functions; each concatenates an additional rotation around one of the three axes to an existing rotation. In order to improve performance, only the matrix entries that are affected in a rotation around each particular axis are recalculated (all but four of the entries in a single-axis rotation matrix are either 0 or 1, as shown last month). This cuts the number of floating-point multiplies from the 64 required for the multiplication of two 4X4 matrices to just 12, and floating point adds from 48 to 6. Be aware that Listing Four performs an incremental rotation on top of whatever rotation is already in the matrix. The cube may already have been turned left, right, up, down, and sideways; regardless, Listing Four just tacks the specified rotation onto whatever already exists. In this way, the object-to-world transformation matrix contains a history of all the rotations ever specified by the user, concatenated one after another onto the original matrix. Potential loss of precisionis a problem associated with using such an approach to represent a very long concatenation of transformations, especially with fixed-point arithmetic; that's not a problem for us yet, but we'll run into it eventually. A Note on Rounding Negative Numbers Last month, I added 0.5 and truncated in order to round values from floating-point to integer format. In Listing Two this month, I've switched to adding 0.5 and using the floorO function. For positive values, the two approahces are equivalent; for negative values, only the floorO approach works properly. Object Representation Each object consists of a list of vertices and a list of faces, with the vertices of each face defined by pointers into the vertex list; this allows each vertex to be transformed exactly once, even though several faces may share a single vertex. Each object contains the vertices not only in their original, untransformed state, but in three other forms as well: transformed to view space, transformed and projected to screen space, and converted to screen coordinates. Earlier, we saw that it can be convenient to store the screen coordinates within the object, so that if the object hasn't moved with respect to the viewer, it can be redrawn without the need for recalculation, but why bother storing the view and screen space forms of the vertices as well? The screen space vertices are useful for some sorts of hidden surface removal. For example, in order to determine whether two polygons overlap as seen by the viewer, you must first know how they look to the viewer, accounting for perspective; screen space provides that information. (So do the final screen coordinates, but with less accuracy, and without any Z information.) The view space vertices are useful for collision and proximity detection; screen space can't be used here, because objects are distorted by the perspective projection into screen space. World space would serve as well as view space for collision detection, but because it's possible to transform directly from object space to view space with a single matrix, it's often preferable to skip over world space altogeher. It's not mandatory that vertices be stored for all these different spaces, but the coordinates in all those spaces have to be calculated as intermediate steps anyway, so we might as well keep them around for those occasions when they're needed. Coming up: shading, hidden surfaces, and performance.