Journal: Dr. Dobb's Journal March 1992 v17 n3 p119(7) ----------------------------------------------------------------------------- Title: Fast 3-D animation: meet X-Sharp. (Graphics Programming) (Column) Author: Abrash, Michael. AttFile: Program: GP-MAR92.ASC Source code listing. Abstract: Three elements come into play to enable 12 three-dimensional graphics cubes to rotate at an update rate of about 15 frames per second on a 20-MHz 386-based machine with a slow video graphics array. One of them is fixed-point arithmetic, which provides the graphics with an immediate order-of-magnitude performance boost. The second element is the use of the 386-based machine's 32-bit multiply and divide instructions. The third performance enhancement key is to maintain and operate on only the relevant sections of the transformation coordinates and matrices. The techniques are very powerful and can cause some problems. For example, since points outside of the 64K by 64K by 64K space cannot be handled, multiple matrix concatenations may turn out to be imprecise. ----------------------------------------------------------------------------- Descriptors.. Topic: Three-Dimensional Graphics Rendering Tutorial Performance Improvement Animation Programming. Feature: illustration program. ----------------------------------------------------------------------------- Full Text: Across the lake from Vermont, a few miles into upstate New York, the Ausable River has carved out a fairly impressive gorge known as "Ausable Chasm." Impressive for the East, anyway; you might think of it as the poor man's Grand Canyon. This summer, I did the tour with my wife and five-year-old, and it was fun, although I confess that I didn't loosen my grip on my daughter's hand until we were on the bus and headed for home; that gorge is deep, and the railings tend to be of the single-bar, rusted-out variety. New Yorkers can drive straight to this wonder of nature, but Vermonters must take their cars across on the ferry; the alternative is driving three hours around the south end of Lake Champlain. No problem; the ferry ride is an hour well spent on a beautiful lake. Or, rather, no problem -- once you're on the ferry. Getting to New York is easy, but, as we found out, the line of cars waiting to come back from Ausable Chasm gets lengthy about mid-afternoon. The ferry can hold only so many cars, and we wound up spending an unexpected hour exploring the wonders of the ferry docks. Not a big deal, with a good-natured kid and an entertaining mom; we got ice cream, explored the beach, looked through binoculars, and told stories. It was a fun break, actually, and before we knew it, the ferry was steaming back to pick us up. A friend of mine, an elementary-school teacher, helped take 65 sixth graders to Ausable Chasm. Never mind the potential for trouble with 65 kids loose on a ferry. Never mind what it was like trying to herd that group around a gorge that looks like it was designed to swallow children and small animals without a trace. The hard part was getting back to the docks and finding they'd have to wait an hour for the next ferry. As my friend put it, "Let me tell you, an hour is an eternity with 65 sixth grades screaming the song 'You Are My Sunshine." Apart from reminding you how lucky you are to be working in a quiet, air-conditioned room, in front of a gently humming computer, free to think deep thoughts and eat Cheetos to your heart's content, this story provides a useful perspective on the malleable nature of time. An hour isn't just an hour -- it can be forever, or it can be the wink of an eye. Just think of the last hour you spent working under a deadline; I bet it went past in a flash. Which is not to say, mind you, that I recommend working in a bus full of screaming kids in order to make time pass more slowly; there are quality issues here, as well. In our 3-D animation work so far, we've used floating-point arithmetic. Floating-point arithmetic, even with a floating-point processor but especially without, is the microcomputer animation equivalent of working in a school bus: It takes forever to do anything, and you just know you're never going to accomplish as much as you want to. This month, it's time for fixed-point arithmetic, which will give us an instant order-of-magnitude performance boost. We'll also give our 3-D animation code a much more powerful and extensible framework, making it easy to add new and different sorts of objects. Taken together, these alterations will let us start to do some really interesting animation. Unfortunately, they take a lot of code, so I'll have to keep the text short. Therefore, without further ado, I give you real real-time animation. Fixed Point, Native 386, and More As of last month, we were at the point where we could rotate, move, and draw a solid cube in real time. This month's program, shown in Listings One through Ten (pages 134 through 138), goes a bit further, rotating 12 solid cubes at an update rate of about 15 frames per second (fps) on a 20-MHz 386 with a slow VGA. That's 12 transformation matrices, 72 polygons, and 96 vertices being handled in real time; not Star Wars, granted, but a giant step beyond a single cube. Run the program if you get a chance; you may be surprised at just how effective this level of animation is. I'd like to point out, in case anyone missed it, that this is fully general 3-D. I'm not using any shortcuts or tricks, like prestoring coordinates or pregenerating bitmaps; if you were to feed in different rotations or vertices, the animation would change accordingly. The keys to this month's performance are three. The first key is fixed-point arithmetic. In the last two months, we've worked with floating-point coordinates and transformation matrices. Those values are now stored as 32-bit fixed-point numbers, in the form 16.16 (16 bits of whole number, 16 bits of fraction). 32-bit fixed-point numbers allow sufficient precision for 3-D animation, but can be manipulated with fast integer operations, rather than slow floating-point processor operations or excruciatingly slow floating-point emulator operations. Although the speed advantage of fixed-point varies depending on the operation, the processor, and whether a coprocessor is present, fixed-point multiplication can be as much as 100 times faster than the emulated floating-point equivalent. (I'd like to take a moment to thank Chris Hecker for his invaluable input in this area.) The second performance key is the use of the 386's native 32-bit multiply and divide instructions. Real-mode C compilers, such as Borland C++ and Microsoft C, call library routines to perform multiplications and divisions involving 32-bit values, and those library functions are fairly slow, especially for division. On a 386, 32-bit multiplication and division can be handled with the bit of code in Listing Nine -- and most of even that code is only for rounding. The third performance key is maintaining and operating on only the relevant portions of transformation matrices and coordinates. The bottom row of every transformation matrix we'll use (at least for the near future) is [0 0 0 1], so why bother using or recalculating it when concatenating transforms and transforming points? Likewise for the fourth element of a 3-D vector in homogeneous coordinates, which is always 1. Basically, transformation matrices are treated as consisting of a 3 X 3 rotation matrix and a 3 X 1 translation vector, and coordinates are treated as 3 X 1 vectors. This saves a great many multiplications in the course of transforming each point. Just for fun, I reimplemented the animation of Listings One through Ten with floating-point instructions. Together, the preceeding optimizations improve the performance of the entire animation -- including drawing time and overhead, not just math -- by more than ten times over the code that uses the floating-point emulator. Amazing what one can accomplish with a few dozen lines of assembler and a switch in number format, isn't it? Note that no assembly code other than the native 386 multiply and divide is used in Listings One through Ten, although the polygon fill code is of course mostly in assembler; we've achieved 12 cubes animated at 15 fps while doing the 3-D work almost entirely in Borland C++ -- and we're still doing sine and cosine via the floating-point emulator. Happily, we're still nowhere near the upper limit on the animation potential of the PC. Drawbacks The techniques we've used to turbo-charge 3-D animation are very powerful, but there's a dark side to them as well. Obviously, native 386 instructions won't work on 8088 and 286 machines. That's rectifiable; equivalent multiplication and division routines could be implemented for real mode (and I may just do that one of these months, especially if enough of you give me a hard time for taking the easy way out with 386 instructions), and performance would still be reasonable. It sure is nice to be able to plug in a 32-bit IMUL or DIV and be done with it, though. More importantly, 32-bit fixed-point arithmetic has limitations in range and accuracy. Points outside a 64K X 64K X 64K space can't be handled, imprecision tends to creep in over the course of multiple matrix concatenations, and it's quite possible to generate the dreaded divide by 0 interrupt if Z coordinates with absolute values less than one are used. I don't have space to discuss these issues in detail now, but here are some brief thoughts. The working 64K X 64K X 64K fixed-point space can be paged into a larger virtual space. Imprecision of a pixel or two rarely matters in terms of display quality, and deterioration of concatenated rotations can be corrected by restoring orthogonality, for example by periodically calculating one row of the matrix as the cross-product of the other two (forcing it to be perpendicular to both). 3-D clipping with a front clip plane of -1 or less can prevent divide overflow. A New Animation Framework Listings One through Ten represent not merely faster animation, but also a mostly complete, extensible, data-driven animation framework. Where earlier animation code was hardwired to demonstrate certain concepts, this month's code is intended to serve as the basis for a solid animation package. Objects are stored, in their entirety, in customizable structures; new structures can be devised for new sorts of objects. Drawing, preparing for drawing, and moving are all vectored functions, so that variations such as shading or texturing, or even radically different sorts of graphics objects, such as scaled bitmaps, could be supported. The cube initialization is entirely data driven; more or different cubes, or other sorts of convex polyhedrons, could be added by simply changing the initialization data in Listing Eight. The animation framework is not yet complete. Movement is supported only along the Z axis, and then in a non-general fashion. More interesting movement isn't supported at this point because of the one gaping hole in the package: hidden-surface removal. Until this is implemented -- and it will be, soon -- nothing can safely overlap. It would actually be easy enough to perform hidden-surface removal by keeping the cubes in different Z bands and drawing them back to front, but this gets into sorting and list issues, and is not a complete solution -- and I've crammed as much as will fit into this month's code, anyway. Where the Time Goes The distribution of execution time in the animation code is no longer wildly biased toward transformation, but sine and cosine are certainly still sucking up cycles. Likewise, the overhead in the calls to FixedMul() and FixedDiv() are costly. Much of this is correctable with a little carefully crafted assembly language and a lookup table; expect that soon. When all that is firmly in place, we'll take a look at the number of pixels being drawn versus the bandwidth of display memory; that'll give us an idea of how close we are to the theoretical limit of VGA animation. Probably not too close, even with those optimizations; a faster 2-D clipping approach and still faster polygon-fill code will most likely be in order. (Yes, Virginia, there is an even faster way to fill polygons!) Regardless, this month we have made the critical jump to a usable level of performance and a serviceable framework. From here on out, it's the fun stuff. X-Sharp Three-dimensional animation is a complicated business, and it takes an astonishing amount of functionality just to get off the launching pad: page flipping, polygon filling, clipping, transformations, list management, and so forth. There's no way all of that could fit in a single column, and in fact I've been building toward a critical mass of animation functionality since my very first column in DDJ. This month builds on code from no less than five previous columns. The code that's required in order to link this month's animation program is: Listing One from January (draw clipped line list); Listings One and Six from July 1991 (mode X mode set, rectangle fill); Listing Six from September 1991; Listing Four in March 1991 (polygon edge scan); and the FillConvexPolygon() function from Listing One from February 1991. The struct keywords in FillConvexPolygon() must be removed to reflect the switch to typedefs in the animation header file. This is the last time I'm going to list all the code needed to build the animation package, and I am not going to print every change to every module from now on. The code is simply getting too large to show every bit of it, and the scope is only going to grow as I add functions such as shading, 3-D clipping, and hidden surfaces; also, I'm going to be tinkering with stuff such as converting key code to assembler and modifying functions slightly as structures change, and I hate to take up valuable space in DDJ for what's basically fine-tuning and housekeeping. I think we've reached the point where we can call this an ongoing project and give it a name. In the spirit of Al Stevens' wonderful D-Flat, I hereby dub the animation package X-Sharp. (X for mode X, sharp because who wants a flat animation package?) From now on, I will make the full source for X-Sharp available, complete with make files, online, and otherwise. It will be available in the file XSHARPn.ARC in the DDJ Forum on CompuServe and on M&T Online. Alternatively, you can send me a 360K or 720K formatted diskette and an addressed, stamped diskette mailer, care of DDJ (411 Borel Ave., San Mateo, CA 94403-3522), and I'll send you the latest copy of X-Sharp. There's no charge, but, in the spirit of Al Stevens's "careware," it'd be very much appreciated if you'd slip in a dollar or so for the folks at the Vermont Association for the Blind and Visually Impaired, who help the visually impaired build productive, self-sufficient lives, and are amazingly successful at it. Imagine for a moment trying to do your work if you lost your sight (and it can be done; I got a request for the text of my book Zen of Assembly Language in computer-readable form from a blind programmer the other day)--heck, imagine trying to cross the street--and I suspect you'll understand why it matters so much. As Al says, it's purely voluntary, but both you and I will feel good. I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix (user name MABRASH in both cases).