Journal: Dr. Dobb's Journal March 1991 v16 n3 p129(6) ----------------------------------------------------------------------------- Title: Fast convex polygons. (Graphics programming) (tutorial) Author: Abrash, Michael. AttFile: Program: GP-MAR91.ASC Source code listing. Summary: Assembly language programming can improve graphics performance and help programmers understand the function of their programs. Fast convex polygon filling can be performed by optimizing the existing polygon filling implementation. The drawing portion of the program is easily accelerated by using the 'memset' library function, which uses the REP STOS instruction. The code replacement yields a speed an order of magnitude faster than the old implementation. The polygon edge tracing procedures are slow because they use floating-point calculations, but integer calculations are both faster and more accurate. Modification of the drawing and edge tracing portions of the program increase performance almost 20 times. Assembly language can be used in place of C to improve speed even further. ----------------------------------------------------------------------------- Descriptors.. Topic: Graphics Software Programming Instruction Painting Tutorial Type-In Programs Solids Modeling. Feature: illustration table program. Caption: Execution times of various versions of the convex polygon drawing code. (table) Drawing polygons. (program) ----------------------------------------------------------------------------- Full Text: Fast Convex Polygons Last month, we explored the surprisingly intricate process of filling convex polygons. This month we're going to fill them an order of magnitude or so faster. Two thoughts may occur to some of you at this point: "Oh, no, he's not going to get into assembly language and device dependent code, is he?" and, "Why bother with polygon filling--or, indeed, any drawing primitives--anyway? Isn't that what GUIs and third-party libraries are for?" To which I answer, "Well, yes, I am," and, "If you have to ask, you've missed the magic of microcomputer programming." Actually, both questions ask the same thing, and that is: "Why should I, as a programmer, have any idea how my program actually works?" Put that way, it sounds a little different, doesn't it? GUIs, reusable code, portable code written entirely in high-level languages, and object-oriented programming are all the rage now, and promise to remain so for the foreseeable future. The thrust of this technology is to enhance the software development process by offloading as much responsibility as possible to other programmers, and by writing all remaining code in modular, generic form. This modular code then becomes a black box to be reused endlessly without another thought about what actually lies inside. GUIs also reduce development time by making many interface choices for you. That, in turn, makes it possible to create quickly and reliably programs that will be easy for new users to pick up, so software becomes easier to both produce and learn. This is, without question, a Good Thing. The "black box" approach does not, however, necessarily cause the software itself to become faster, smaller, or more innovative; quite the opposite, I suspect. I'll reserve judgement on whether that is a good thing or not, but I'll make a prediction: In the short run, the aforementioned techniques will lead to noticeably larger, slower programs, as programmers understand less and less of what the key parts of their programs do and rely increasingly on general-purpose code written by other people. (In the long run, programs will be bigger and slower yet, but computers will be so fast and will have so much memory that no one will care.) Over time, PC programs will also come to be more similar to one another--and to programs running on other platforms, such as the Mac--as regards both user interface and performance. Again, I am not saying that this is bad. It does, however, have major implications for the future nature of PC graphics programming, in ways that will directly affect the means by which many of you earn your livings. Not so very long from now, graphics programming--all programming, for that matter--will become mostly a matter of assembling in various ways components written by other people, and will cease to be the all-inclusively creative, mind-bendingly complex pursuit it is today. (Using legally certified black boxes is, by the way, one direction in which the patent lawyers are leading us; legal considerations may be the final nail in the coffin of homegrown code.) For now, a PC programmer, to understand and even control every single thing that happens on a computer if you so desire, to realize any vision you may have. Take advantage of this unique window of opportunity to create some magic! Neither does it hurt to understand what's involved in drawing, say, a filled polygon, even if you are using a GUI. You will better understand the performance implications of the available GUI functions, and you will be able to fill in any gaps in the functions provided. You may even find that you can out-perform the GUI on occasion by doing your own drawing into a system memory bitmap, then copying the result to the screen. You will also be able to understand why various quirks exist, and will be able to put them to good use. For example, X Window follows the polygon drawing rules described last month (although it's not obvious from the documentation); if you understood last month's discussion, you're in good shape to use polygons under X. In short, even though it runs counter to current trends, it helps to understand how things work, especially when they're very visible parts of the software you develop. That said, let's learn more about filling convex polygons. Fast Convex Polygon Filling When last we left the topic of filling convex polygons, the implementation we had met all of our functional requirements. In particular, it met stringent rules that guaranteed that polygons would never overlap at shared edges, an important consideration when building polygon-based images. Unfortunately, the implementation was also slow as molasses. This month we'll work up polygon filling code that's fast enough to be truly usable. Last month's polygon filling code involved three major tasks, each performed by a separate function: Tracing each polygon edge to generate a coordinate list (performed by the function ScanEdge); drawing the scanned-out horizontal lines that constitute the filled polygon (DrawHorizontalLineList); and characterizing the polygon and coordinating the tracing and drawing (FillConvexPolygon). The amount of time that the sample program from last month spent in each of these areas is shown in Table 1. As you can see, half the time was spent drawing and the other half was spent tracing the polygon edges (the time spent in FillConvexPolygon was relatively minuscule), so we have our choice of where to begin optimizing. Fast Drawing Let's start with drawing, which is easily sped up. The original code used a double-nested loop that called a draw-pixel function to plot each pixel in the polygon individually. That's a ridiculous approach in a graphics mode that offers linearly mapped memory, as does VGA mode 13h, the mode with which we're working. At the very least, we could point a far pointer to the left edge of each polygon scan line, then draw each pixel in that scan line in quick succession, using something along the lines of *ScrPtr++ = FillColor; inside a loop. However, it seems silly to use a loop when the 80x86 has an instruction, REP STOS, that's uniquely suited to filling linear memory buffers. There's no way to use REP STOS directly in C code, but it's a good bet that the memset library function uses REP STOS, so you could greatly enhance performance by using memset to draw each scan line of the polygon in a single shot. That, however, is easier said than done. The memset function linked in from the library is tied to the memory model in use; in small (tiny, small, or medium) data models memset accepts only near pointers, so it can't be used to access screen memory. Consequently, a large (compact, large, or huge) data model must be used to allow memset to draw to display memory--a clear case of the tail wagging the dog. This is an excellent example of why, although it is possible to use C to do virtually anything, it's sometimes much simpler just to use a little assembly code and be done with it. At any rate, Listing One (page 146) shows a version of DrawHorizontalLineList that uses memset to draw each scan line of the polygon in a single call. When linked to last month's test program, Listing One increases pure drawing speed (disregarding edge tracing and other nondrawing time) by more than an order of magnitude over last month's draw-pixel-based code. This despite the fact that Listing One requires a large (in this case, compact) data model. Listing One works fine with Turbo C++, but may not work with other compilers, for it relies on the aforementioned interaction between memset and the selected memory model. At this point, I'd like to mention that benchmarks are notoriously unreliable; the results in Table 1 are accurate only for the test program, and then only when running on a particular system. Results could be vastly different if smaller, larger, or more complex polygons were drawn, or if a faster or slower computer/VGA combination were used. These factors notwithstanding, the test program does fill a variety of polygons of varying complexity sized from large to small and in between, and certainly the order of magnitude difference between Listing One and the old version of DrawHorizontalLineList is a clear indication of which code is superior. Anyway, Listing One has the desired effect of vastly improving drawing time. There are cycles yet to be had in the drawing code, but as tracing polygon edges now takes 92 percent of the polygon filling time, it's logical to optimize the tracing code next. Fast Edge Tracing There's no secret as to why last month's ScanEdge was so slow: It used floating point calculations. One secret of fast graphics is using integer or fixed-point calculations, instead. (Sure, the floating point code would run faster if a math coprocessor were installed, but it would still be slower than the alternatives; besides, why require a math coprocessor when you don't have to?) Both integer and fixed-point calculations are fast. In many cases, fixed-point is faster, but integer calculations have one tremendous virtue: They're completely accurate. The tiny imprecision inherent in either fixed- or floating-point calculations can result in occasional pixels being one off from their proper location. This is no great tragedy, but after going to so much trouble to ensure that polygons don't overlap at common edges, why not get it exactly right? In fact, when I tested out the integer edge tracing code by comparing an integer-based test image to one produced by floating point calculations, two pixels out of the whole screen differed, leading me to suspect a bug in the integer code. It turned out, however, that in those two cases, the floating point results were sufficiently imprecise to creep from just under an integer value to just over it, so that the ceil function returned a coordinate that was one too large. Floating point is very accurate--but it is not precise. Integer calculations, properly performed, are. Listing Two (page 146) shows a C implementation of integer edge tracing. Vertical and diagonal lines, which are trivial to trace, are special-cased. Other lines are broken into two categories: Y-major (closer to vertical) and X-major (closer to horizontal). The handlers for the Y-major and X-major cases operate on the principle of similar triangles: The number of X pixels advanced per scan line is the same as the ratio of the X delta of the edge to the Y delta. Listing Two is more complex than the original floating point implementation, but not painfully so. In return for that complexity, Listing Two is more than 80 times faster--and, as just mentioned, it's actually more accurate than the floating point code. You gotta love that integer arithmetic. The Finishing Touch: Assembly Language The C implementation is now nearly 20 times as fast as the original, good enough for most purposes. Still, it requires that a large data model be used (for memset), and it's certainly not the fastest possible code. The obvious next step is assembly language. Listing Three (page 146) is an assembly language version of DrawHorizontalLineList. In actual use, it proved to be about 36 percent faster than Listing One; better than a poke in the eye, but just barely. There's more to these timing results than meets that eye, though. Display memory generally responds much more slowly than system memory, especially in 386 and 486 systems. That means that much of the time taken by Listing Three is actually spent waiting for display memory accesses to complete, with the processor forced to idle by wait states. If, instead, Listing Three drew to a local buffer in system memory or to a particularly fast VGA, the assembly implementation might well display a far more substantial advantage over the C code. And indeed it does. When the test program is modified to draw to a local buffer, both the C and assembly language versions get 0.29 seconds faster, that being a measure of the time taken by display memory wait states. With those wait states factored out, the assembly language version of DrawHorizontalLineList becomes almost three times as fast as the C code. There is a lesson here. An optimization has no fixed payoff; its value fluctuates according to the context in which it is used. There's relatively little benefit to further optimizing code that already spends half its time waiting for display memory; no matter how good your optimizations, you'll get only a two-times speedup at best, and generally much less than that. There is, on the other hand, potential for tremendous improvement when drawing to system memory, so if that's where most of your drawing will occur, optimizations such as Listing Three are well worth the effort. Know the environments in which your code will run, and know where the cycles go in those environments. Maximizing REP STOS Listing Three doesn't take the easy way out and use REP STOSB to fill each scan line; instead, it uses REP STOSW to fill as many pixel pairs as possible via word-sized accesses, using STOSB only to do odd bytes. Word accesses to odd addresses are always split by the processor into 2-byte accesses. Such word accesses take twice as long as word accesses to even addresses, so Listing Three makes sure that all word accesses occur at even addresses, by performing a leading STOSB first if necessary. Listing Three is another case in which it's worth knowing the environment in which your code will run. Extra code is required to perform aligned word-at-a-time filling, resulting in extra overhead. For very small or narrow polygons, that overhead might overwhelm the advantage of drawing a word at a time, making plain old REP STOSB faster. Faster Edge Tracing Finally, Listing Four (page 147) is an assembly language version of ScanEdge. Listing Four is a relatively straightforward translation from C to assembly, but is nonetheless almost twice as fast as Listing Two. The version of ScanEdge in Listing Four could certainly be sped up still further by unrolling the loops. FillConvexPolygon, the overall coordination routine, hasn't even been converted to assembly language, so that could be sped up as well. I haven't bothered with these optimizations because all code other than DrawHorizontalLineList takes only 14 percent of the overall polygon filling time when drawing to display memory; the potential return on optimizing nondrawing code simply isn't great enough to justify the effort. Part of the value of a profiler is being able to tell when to stop optimizing; with Listings Three and Four in use, more than two-thirds of the time taken to draw polygons is spent waiting for display memory, so optimization is pretty much maxed out. However, further optimization might be worthwhile when drawing to system memory, where wait states are out of the picture and the nondrawing code takes a significant portion (46 percent) of the overall time. Again, know where the cycles go. By the way, note that all the versions of ScanEdge and FillConvexPolygon that we've looked at are adapter independent, and that the C code is also machine independent; all adapterspecific code is isolated in DrawHorizontalLineList. This makes it easy to add support for other graphics formats, such as the 8514/A, the XGA, or, for that matter, a non-PC system. Books of the Month Two books share honors this month. One isn't new, but is well worth having if you're a PC graphics programmer. Programmer's Guide to PC & PS/2 Video Systems, by Richard Wilton (Microsoft Press, 1987) is pretty much the standard graphics reference for the PC, covering all standards up through the VGA, including MCGA, EGA, Hercules, CGA, and MDA (but not 8514/A). For 8514/A programming, Graphics Programming for the 8514/A, by Jake Richter and Bud Smith (M&T Books, 1990), is good; which is lucky, because it's the only 8514/A book I know of!