Document Title:
Master Class: Fantasm and C
Author: Stuart Ball
Additional attrib: Robert Probin.
Date: 11 June 1999.
Last Updated: 20 June 1999.
© Lightsoft 1999.
Abstract: This
document will present an example of accelerating a C function
using Fantasm.
The example chosen is that of replacing a C based pixel doubling
blitter with a faster version
written in PowerPC assembly language.
Aims: Primarily to give the reader knowledge of loading
and running a Fantasm fragment from
another language and secondly to present to the community a fast
Macintosh pixel doubler.
Detail: I know a fair few applications are written completely
with Fantasm (Bumbler &
Rainbow Painter
for example) which gives me a nice warm feeling in my tummy;
after all that's what it's for. Another
place where Fantasm can be extremely handy is in accelerating
parts of other programs written in other
languages. This Master Class will concentrate on accelerating
a C function with Fantasm although the methods
apply equally well to any high level language capable of calling
MacOS functions such as Pascal or BASIC.
You may ask is there really any need to do this? Understand
that I am biased - I want people
to buy Fantasm, but even so I firmly believe there can be a time
when you simply can't get a compiler
to produce the fastest code. I want to show you how with a little
work you could speed up your game or
demo using Fantasm; even if you've never considered the idea of
assembly language before.The example
I will present is a pixel doubler; a very real world example.
Imagine you have a game
(called "Ezzy Loves Quazi") and one of the options you
would like to present the user is that of faster operation
through a pixel-doubled display (thus your drawing software has
four times less work to do). A quick search
on the web pulls up some C to do the job. It uses memory to build
a "double" (or a Float64) out of an array of
8 chars which is then written to the screen most likely with a
stfdu instruction. Seems fast enough.
Then you think there might be a way of doing it without all
that writing to memory which is always
a slowdown. There is, but getting a compiler to output the right
instructions is nigh on impossible.
Here's the C:
void pixel_doubling_blit_8( const void * source_bitmap, int source_rowbytes, int source_width, int source_height, void * dest_bitmap, int dest_rowbytes) { double temp[1]; register double temp2; while (--source_height >= 0) { unsigned long * src = ((unsigned long *)source_bitmap)-1; double * dst1 = ((double *)dest_bitmap)-1; double * dst2 = dst1+dest_rowbytes/8; int w = source_width; while ((w-=4) >= 0) { unsigned long pixx = *(++src); unsigned char * mid = (unsigned char *)&temp[1]; *(--mid) = pixx; *(--mid) = pixx; pixx >>= 8; *(--mid) = pixx; *(--mid) = pixx; pixx >>= 8; *(--mid) = pixx; *(--mid) = pixx; pixx >>= 8; *(--mid) = pixx; *(--mid) = pixx; temp2 = temp[0]; *(++dst1) = temp2; *(++dst2) = temp2; } source_bitmap = ((char *)source_bitmap)+source_rowbytes; dest_bitmap = ((char *)dest_bitmap)+2*dest_rowbytes; } }
And the assm replacement which builds the double pixels in two registers we call reg3 and reg4:
******************************************************************MODULE HEADER***** ** FILENAME : splat.s ** MODULE TITLE : screen splats ** PROJECT : EzzyLovesQuazi ** DATE STARTED : 3 June 1999, Thur 12:00pm ** FIRST AUTHOR : Stuart Ball ** ** COPYRIGHT (c) 1999 Lightsoft ** (c) 1999 Robert Probin and Stuart Ball ** http://www.lightsoft.co.uk/ ** ** Lightsoft is a trading name of Robert Probin and Stuart Ball. ** ************************************************************************************ ** HISTORY: ** Date Initial Descrip. ** ** ** ************************************************************************************ * ************************************************************************************ ** INCLUDED FILES ** * ;The params as passed and their registers are: ;void pixel_doubling_blit_8( ; const void * source_bitmap, = r3 ; int source_rowbytes, = r4 ; int source_width, = r5 ; int source_height, = r6 ; void * dest_bitmap, = r7 ; int dest_rowbytes) = r8 ;The code splat_320: stw r13,-0x0008(SP) ;save these...and restore at end stw r14,-0x0004(SP) ;setup counts etc. subi r6,r6,1 cmpwi r6,0 ble return subi r5,r5,4 srawi r10,r8,3 cmpwi cr6,r5,0 addze r10,r10 slwi r8,r8,1 slwi r9,r10,3 .line_loop: subi r11,r7,8 subi r12,r3,4 add r10,r11,r9 blt cr6,.next_line addi r0,r5,4 srawi r0,r0,2 mtctr r0 .pixel_loop: reg3: requ r13 ;reg3 is really r13 reg4: requ r14 ;reg4 is really r14 lwzu `reg3,0x0004(r12) ;4 pixels from src and update the src pointer mr `reg4,`reg3 ;copy input ;now our replacement code - we don't write to memory to build the double, we do it on-chip. ;rlwimi operands as follows: ;dest,source,shift left,mbegin,mend --- BIG NOTE: mbegin and mend are mask positions for ;TARGET data, therefore represent the FINISHING positions AFTER the data is moved! rlwimi `reg3,`reg3,8,8,23 ; assemble two copies of bottom byte (byte 4), plus shift third byte up to second byte position rlwimi `reg4,`reg4,32-8,8,23 ;32-8 is right shift 8. top byte (first) is duplicated into second byte, plus second byte shifted into third position rlwimi `reg3,`reg3,8,0,7 ; duplicate byte in second position and put into top position (first byte). rlwimi `reg4,`reg4,32-8,24,31 ; duplicate third into fourth. ;now store as a double on the stack stw `reg3,-0x14(sp) ;make double on stack stw `reg4,-0x18(sp) lfd f13,-0x0018(SP) ;and load the 8 pixels stfdu f13,0x0008(r11) ;and store to output line 1 stfdu f13,0x0008(r10) ;and line 2 - remove this to have a line skipped (and faster) display. bdnz .pixel_loop ;and keep going until we've done the whole line .next_line: subi r6,r6,1 add r3,r3,r4 cmpwi r6,0 add r7,r7,r8 bge .line_loop .return: lwz r13,-0x0008(SP) lwz r14,-0x0004(SP) blr
(If anybody can improve on the core speed of this routine, I'd
love to see the code!)
So we put all that in a Fantasm source file and create an Anvil
project to build it. One of the neat things about Anvil (apart
from
its ability to play mp3's without skipping :) is its ability to
place the code virtually anywhere. In this case we will create
a
fragment and place it directly in the C based application; we
create a merged PowerPC project and then select Ezzy Loves Quazi
as the target.
While I'm here (and as this is a Master Class) a quick discussion
of the two fragment formats - XCOFF and PEF.
PEF (Prefered Executable Format) is the way to go - it's Apple's
"proprietry" format that's good for MacOS and OSX "client"
(they have to change that name!). It generally results in fragments
that load faster due to the hash entries for identifiers.
XCOFF (eXtended COmmon Object File Format)is typical IBM's over-engineering.
It may have its place if you want some security. There are many
disassemblers for the Mac that will quickly disassemble a PEF
fragment but not many that will deal with an XCOFF fragment. This
is why we maintain our XCOFF linker. If you do need to make your
fragment more secure you may want to consider using the XCOFF
linker; otherwise the PEF one is recommended. If you really want
security tell the linker to produce "no header" and
load the binary modules manually at run-time. This is a bit fiddly
but
does work. Back to the plot...
We now have Anvil building our PPC assm fragment directly in to the C application. How now to load and call it from C?
We set Anvil to poke the assm fragment into Ezzy Loves Quazi
as a resource of type Fppc with an ID of 128. So, all we have
to
do from our C application is load the resource and then we can
call it via something like CallUniversalProc, viz:
Load the code at init time (once only!):
Handle splat_code_handle;
UniversalProcPtr splat_code_ptr;
unsigned long splatProcInfo=0xFFFC1; //7 params in , 0 out, C. Inside Mac PPC System Software Pg 2-17 void load_screen_splat() { int hsize; OSErr my_err; CFragConnectionID splat_conn_id; Str255 ErrName; Str63 fragname = "\pFantScreenSplat"; ProcPtr splat_main_addr; splat_code_handle=GetResource('Fppc',128); if (splat_code_handle==0) report_error("Corrupt App. Couldn't load Fppc 128 code resource.","\p",4); MoveHHi(splat_code_handle); HLock(splat_code_handle); hsize=GetHandleSize(splat_code_handle); my_err=GetMemFragment(*splat_code_handle, hsize, fragname, kPrivateCFragCopy, &splat_conn_id, (Ptr *) &splat_main_addr,ErrName); if (my_err!=0) report_error("Couldn't link Fppc code resource. Try giving App more memory.","\p",my_err); splat_code_ptr=NewRoutineDescriptor (splat_main_addr,splatProcInfo, GetCurrentISA()); }
We can then call the the code when we want to do the blit as pixel_doubling_blit_8:
extern unsigned long splatProcInfo; extern UniversalProcPtr splat_code_ptr; void pixel_doubling_blit_8 (const void * source_bitmap, int source_rowbytes, int source_width, int source_height, void * dest_bitmap, int dest_rowbytes) { CallUniversalProc(splat_code_ptr,splatProcInfo, source_bitmap, source_rowbytes, source_width, source_height, dest_bitmap, dest_rowbytes,0); //params for 320*240 splat return; }
There you go!
There are at least two alternate methods you could use, both slightly more complex:
1/ Get Anvil to output the code to another resource which is
copied to your application when the C project is built. Obvious
at first but a bit
long winded.
2/ Get Anvil to create a shared library and then dynamically link
to that in your C project. If the assm fragment had more than
a few functions
it may be worth doing it this way; but I personally like the simplicity
of the method given in this class.
END.
Notes:
1/ All code in this Master Class was built and tested with
Anvil 3.00b8 using Fantasm 6b3 for the assembly
language and MrC (driven via Anvil 3's MrC AIT) v3.01 for the
C. Debuggers were The PowerMac Debugger for the C and of course
good old Macsbug.
2/ This method (loading code from a resource) also works for
68K code should you need it; just change the ISA when creating
theUniversalProcPtr.
3/ You may note the last paramter passed in CallUniversalProc
is apparently not used by the assm function. This parameter is
used to select a given
function within the assm function although the code is not shown
here. For example a 1 could mean use another routine to skip every
other line
during the blit.
4/ I use the words "splat" and "blit" interchangeably.
The mean the same thing - blit.
©Lightsoft 1999. All trademarks acknowledged.
We always enjoy feedback from articles, so
if you have any constructive criticism, corrections or wish to
clarify something, please email
us or post to the Fantasm mailing list.
Other articles in this series: Memory Protection With Fantasm.