Journal: Dr. Dobb's Journal April 1992 v17 n4 p139(7) ----------------------------------------------------------------------------- Title: Raw speed and more. (Graphics Programming column; X-Sharp 3-D animation program) (Tutorial) Author: Abrash, Michael. AttFile: Program: GP-APR92.ASC Source code listing. Abstract: X-Sharp three-dimensional (3-D) software has assembly language added for performance improvement, and for matrix math functions. Several functions can be eliminated by converting them to pure assembly language, such as the function to multiply a matrix by a vector, and the function to concatenate matrices. Depth sorting is useful for handling hidden surfaces, however it does not address the issue of nonconvex objects. Convex polyhedrons are addressed. Fast microcomputer-based animation is not seriously affected by problems with depth sorting, problems that can affect photo-realistic rendering to a greater extent. Features that are still to be added to X-Sharp include shading, antialiasing, support for non-convex objects, 3-D clipping, texture mapping and performance improvement. ----------------------------------------------------------------------------- Descriptors.. Topic: Assembly Language Tutorial Programming Instruction Graphics Software Optimization Performance Improvement Software Design Animation. Feature: illustration cartoon program. ----------------------------------------------------------------------------- Full Text: As usual, there's an awful lot to cover this month, so I'll have to make this quick. That's a shame, because this absolutely true story is even lovelier in the long version, as told by--well, I promised I wouldn't even hint at who he is, for reasons that will soon be obvious. Years ago, this friend of mine--let's call him Bert--went to Hawaii with three other fellows to celebrate their graduation from high school. This was an unchaperoned trip, and they behaved pretty much as responsibly as you'd expect four teenages to behave, which is to say, not; there's a story about a rental car that, to this day, Bert can't bring himself to tell. They had a good time, though, save for one thing: no girls. By and by, they met a group of girls by the pool, but the boys couldn't get past the hi-howya-doin stage, so they retired to their hotel room to plot a better approach. This being the early '70s, and them being slightly tipsy teenagers with raging hormones and the effective combined IQ of four eggplants, it took them no time at all to come up with a brilliant plan: streaking. The girls had mentioned their room number, so the boys piled into the elevator, pushed the button for the girls' floor, shucked their clothes as fast as they could, and sprinted to the girls' door. They knocked on the door and ran on down the hall. As the girls opened their door, Bert and his crew raced past, toward the elevator, laughing hysterically. Bert was by far the fastest of them all. He whisked between the elevator doors just as they started to close; by the time his friends got there, it was too late, and the doors slid shut in their faces. As the elevator began to move, Bert could hear the frantic pounding of six fists thudding on the closed doors. As Bert stood among the clothes littering the elevator floor, the thought of his friends stuck in the hall, naked as jaybirds, was just too much, and he doubled over with helpless laughter, tears streaming down his face. The universe had blessed him with one of those exceedingly rare moments of perfect timing and execution. The universe wasn't done with Bert quite yet, though. He was still contorted with laughter--and still quite thoroughly undressed--when the elevator doors opened again. On the lobby. And with that, we come to this month's topics, raw speed and hidden surfaces. Raw Speed, Part I: Assembly Language I would like to state, here and for the record, that I am not an assembly language fanatic. Frankly, I prefer programming in C; assembly language is hard work, and I can get a whole lot more done with fewer hassles in C. However, I am a performance fanatic, performance being defined as having programs be as nimble as possible in those areas where the user wants fast response. And, in the course of pursuing performance, there are times when a little assembly language goes a long way. We're now in the fourth month of working on the X-Sharp #-D animation package. In real-time animation, performance is sine qua non--Latin for "Make it fast or find another line of work"--so some judiciously applied assembly language is in order. Last month, we got up to a serviceable performance level by switching to fixed-point math, then implementing the fixed-point multiplication and division functions in assembler in order to take advantage of the 386's 32-bit capabilities. There's another area of the program that fairly cries out for assembly language: matrix math. The function to multiply a matrix by a vector (XformVec()) and the function to concatenate matrices (ConcatXforms()) both loop heavily around calls to FixedMul(); a lot of calling and looping can be eliminated by converting these functions to pure assembly language. Listing One, page 157, is the module FIXED.ASM from the current version of X-Sharp, with XformVec() and ConcatXforms() implemented in assembly language. The code is heavily optimized, to the extent of completely unrolling the loops via macros so that looping is eliminated althogether. FIXED.ASM is highly effective; the time taken for matrix math is now down to the point where it's a fairly minor component of execution time, representing less than ten percent of the total. It's time to turn our optimization sights elsewhere. Raw Speed, Part II: Look it Up It's a funny thing about Turbo Profiler: Time spent in the Borland C++ 80x87 emulator doesn't show up directly anywhere that I can see in the timing results. The only way to detect it is by way of the line that reports what percent of total time is represented by all the areas that were profiled; if you're profiling all areas, whatever's not explicitly accounted for seems to be the floating-point emulator time. This quirk fooled me for a while, leading me to think sine and and cosine weren't major drags on performance, because the sin() and cos() functions spend most of their time in the emulator, and that time doesn't show up in Turbo Profiler's statistics on those functions. Once I figured out what was going on, it turned out that not only were sin() and cos() major drags, they were taking up over half the total execution time by themselves. The solution is a lookup table. Listing One contains a function called CosSin() that calculates both the sine and cosine of an angle, via a lookup table. The function accepts agnles in tenths of degrees; I decided to use tenths of degrees rather than radians because that way it's always possible to look up the sine and cosine of the exact angle requested, rather than approximating, as would be required with radians. Tenths of degrees should be fine enough control for most purposes; if not, it's easy to alter CosSin() for finer gradations yet. GENCOS.C, the program used to generate the lookup table (COSTABLE.INC), included in Listing One, can be found in the X-Sharp archive. GENCOS.C can generate a cosine table with any integral number of steps per degree. FIXED.ASM speeds X-Sharp up quite a bit, and it changes the performance balance a great deal. When we started out with 3-D animation, calculation time was the dragon we faced; more than 90 percent of the total time was spent doing matrix and projection math. Additional optimizations in the area of math could still be made (using 32-bit multiplies in the backface-removal code, for example), but fixed-point math, the sine and cosine lookup, and selective assembler optimizations have done a pretty good job already. The bulk of the time taken by X-Sharp is now spent drawing polygons, drawing rectangles (to erase objects), and waiting for the page to flip. In other words, we've slain the dragon of 3-D math, or at least wounded it grievously; now we're back to the dragon of polygon filling. We'll address faster polygon filling soon, but for the moment, we have more than enough horsepower to have some fun with. First, though, we need one more feature: hidden surfaces. Hidden Surfaces So far, we've made a number of simplifying assumptions in order to get the animation to look good; for example, all objects must currently be convex polyhedrons. We'll deal with that one down the road a little, but first we have to address a still more fundamental limitation, which is that right now, objects can never pass behind or in front of each other. In short, it's time to have a look at hidden surfaces. There are a passel of ways to do hidden surfaces. Way off at one end (the slow end) of the spectrum is z-buffering, whereby each pixel of each polygon is checked as it's drawn to see whether it's the frontmost version of the pixel at those coordinates. At the other end is the technique of simply drawing the objects in back-to-front order, so that nearer objects are drawn on top of farther objects. The latter approach, depth sorting, is the one we'll take today. (Actually, true depth sorting involves detecting and resolving possible ambiguities when objects overlap in z; simply sorting on z, which we'll be doing this month, is more precisely known as the "painter's algorithm.") Depth sorting is fast but less than perfect. For one thing, it doesn't address the issue of nonconvex objects, so we'll have to stick with convex polyhedrons for now. For another, there's the question of what part of each object to use as the sorting key; the nearest point, the center, and the farthest point are all possibilities--and, whichever point is used, depth sorting doesn't handle some overlap cases properly. Figure 1 illustrates one case in which back-to-front sorting doesn't work, regardless of what point is used as the sorting key. For photo-realistic rendering, these are serious problems. For fast PC-based animation, however, they're manageable. Choose objects that aren't too elongated; arrange their paths of travel so they don't intersect in problematic ways; and, if they do overlap incorrectly, trust that the glitch will be lost in the speed of the animation and the complexity of the screen. Listing Two, page 159, shows the key routines for depth sorting, from the X-Sharp file OLIST.C. Objects are now stored in a linked list. The initial, empty list, created by InitializeObjectList(), consists of a sentinel entry at either end, one at the farthest possible z coordinate, and one at the nearest. New entries are inserted by AddObject() in z-sorted order. Each time the objects are moved, before they're drawn at their new locations, SortObjects() is called to z-sort the object list, so that drawing wil proceed from back to front. The z-sorting is done on the basis of the objects' center points; a center-point filed has been added to the object structure to support this, and the center point for each object is now transformed along with the vertices. That's really all there is to depth sorting--and now we can have objects that overlap in x and y. I wish I could reproduce the lastes X-Sharp demonstration program here; I really do. It's a lot of fun, with 11 spinning cubes and a whirling faceted ball bouncing back and forth between the screen and deep space at varying speeds. The number of faces has nearly doubled since last month, to 138, and the number of vertices is up to 150, but thanks to FIXED.ASM, the animation is snappier than ever, and of course there's a much-enhanced sense of depth, thanks to the visual cues made possible by depth sorting. We've reached the point of genuinely dynamic animation on a 386, even with a slowpoke VGA; my daughter thinks it's a little scary having those things hurtling toward her so fast. On a 486 with a fast (Tseng ET-4000) VGA, the animation is too fast; the illusion of reality is broken. We should all have such problems. I'd like to list the demo program in its entirety, but I can't. As I explained last month, X-Sharp is way too large, over 2000 lines now; even the changes since last month would blow my page budget out of the water. Instead, I've made the full source available in the file XSHARPn.ARC in the DDJ Forum on CompuServe, on M&T Online, and in the graphic.disp conference on Bix. Alternatively, you can send me a 360K or 720K formatted diskette and an addressed, stamped diskette mailer, care of DDJ, 411 Borel Ave., San Mateo, CA 94402, and I'll send you the latest copy of X-Sharp. There's no charge, but it'd be very much appreciated if you'd slip in a dollar or so to help out the folks at the Vermont Association for the Blind and Visually Impaired. It's not every day that you can make a difference so easily--and, believe me, their work does make a difference. I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix (user name mabrash in both cases). Rounding FIXED.ASM containes the equate ROUNDING[underscore]ON. When this equate is 1, the results of multiplications and divisions are rounded to the nearest fixed-point values; when it's 0, the results are truncated. The difference between the results produced by the two approaches is, at most, [2.sup.-16]; you wouldn't think that would make much difference, now, would you? But it does. When the animation is run with rounding disabled, the cubes start to distort visibly after a few minutes, and after a few minutes more they look like they've been run over. In contrast, I've never seen any significant distortion with rounding on, even after a half-hour or so. I think the difference with rounding is not that it's so much more accurate, but rather that the errors are evenly distributed; with truncation, the errors are biased, and biased errors become very visible when they're applied to right-angle objects. Even with rounding, though, errors will eventually creep in, and reorthogonalization will become necessary at some point. We'll have a look at that soon. The performance cost of rounding is small, and the benefits are highly visible. Still, truncation errors become significant only when they accumulate over time, as, for example, when rotation matrices are repeatedly concatenated over the course of many transformations. Some time could be saved by rounding only in such cases. For example, division is performed only in the course of projection, and the results do not accumulate over time, so it would be reasonable to disable rounding for division. Having a Ball For three months, we've had nothing to look at but triangles and cubes. It's time for something a little more visually appealing, so the demonstration program now features a 72-sided ball. What's particularly interesting about this ball is that it's created by the GENBALL.C program in the BALL subdirectory of X-Sharp, and both the size of the ball and the number of bands of faces are programmable. GENBALL.C spits out to a file all the arrays of vertices and faces needed to create the ball, ready for inclusion in INITBALL.C. True, if you change the number of bands, you must change the Color array in INITBALL.C to match, but that's a tiny detail; by and large, the process of generating a ballshaped object is now automated. In fact, we're not limited to ball-shaped objects; substitute a different vertex and face generation program for GENBALL.C, and you can make whatever convex polyhedron you want; again, all you have to do is change the Color array correspondingly. You can easily create multiple versions of the base object, too; INITCUBE.C is an example of this, creating 11 different cubes. What we have here is the first glimmer of an object-editing system. GENBALL.C is the prototype for object definition, and INITBALL.C is the prototype for general-purpose object instantiation. Certainly, it would be nice to have an interactive 3-D object editing tool and resource management setup, and perhaps we'll do that someday. We have our hands full with the drawing end of things at the moment, though, and for now it's enough to be able to create objects in a semiautomated way. Eating Crow and Other Delicacies Back in November, I said, "Hicolor programming unavoidably requires handling broken rasters" (Hicolor mode features 32K colors, by means of the Sierra Hicolor DAC combined with a Hicolor-capable Super-VGA.) Any absolute assertion on my part seems to be the cue for readers to come swarming out of the woodworks with examples to the contrary. In this case, M&T Online user rfredrick was quick to point out that setting the bitmap width--th offset from the start of one line to the start of the next--to 1928 bytes in 640X480 Hicolor mode eliminates broken rasters (displayed lines that span banks). A bitmap width of 1928 is selected by setting the Row Offset register (CRTC register 13h) to 241, like so: outpw(0x3d4, 0x13 \ (241 << 8));. Once the width is set, all you have to do is use 1928 for the offset from one row to the next, rather than 1280, and you're all set. Actually, the breaks are still there, but they can be ignored because they happen off to the right of the displayed portion of the bitmap. To get the Hicolor drawing code from the November 1991 column to support 1928-wide Hicolor bitmaps, just change BitmapWidthInBytes from 640*2 to 1928. rfrederick credits this information to the folks at Everex. Thanks to all involved for passing it along. Coming Up There are a lot of wonderful things to add to X-Sharp, including shading of many sorts, antialiasing, support for non-convex objects, faster polygon drawing and clipping, 3-D clipping, texture mapping, still better performance--you get the idea. Personally, I'm excited as all get-out about this stuff, and I'll certainly cover more of it in the near future. Graphics is a broad field, though, and you folks have diverse interests; I'd like to hear what you're interested in seeing in this space. More 3-D animation? More animation of other types? Programming techniques for PC graphics hardware, such as 24-bpp VGAs, the S3 Super-VGA accelerator, and XGA? JPEG? Graphics operations such as seed fills and fast (and I do mean fast!) line drawing? Color mapping to a 256-color palette? Dithering? Something else? It's a big list, because this is a big--and tremendously exciting--field. So long as there's code to be written for a topic (after all, this is the Graphics Programming column), I'm game. Let me know what you'd like to see.