Document Title: Master Class: Fantasm and C
Author: Stuart Ball
Additional attrib: Robert Probin.
Date: 11 June 1999.
Last Updated: 20 June 1999.
© Lightsoft 1999.

Abstract: This document will present an example of accelerating a C function using Fantasm.
The example chosen is that of replacing a C based pixel doubling blitter with a faster version
written in PowerPC assembly language.

Aims: Primarily to give the reader knowledge of loading and running a Fantasm fragment from
another language and secondly to present to the community a fast Macintosh pixel doubler.

 

Detail: I know a fair few applications are written completely with Fantasm (Bumbler & Rainbow Painter
for example) which gives me a nice warm feeling in my tummy; after all that's what it's for. Another
place where Fantasm can be extremely handy is in accelerating parts of other programs written in other
languages. This Master Class will concentrate on accelerating a C function with Fantasm although the methods
apply equally well to any high level language capable of calling MacOS functions such as Pascal or BASIC.

You may ask is there really any need to do this? Understand that I am biased - I want people
to buy Fantasm, but even so I firmly believe there can be a time when you simply can't get a compiler
to produce the fastest code. I want to show you how with a little work you could speed up your game or
demo using Fantasm; even if you've never considered the idea of assembly language before.The example
I will present is a pixel doubler; a very real world example. Imagine you have a game
(called "Ezzy Loves Quazi") and one of the options you would like to present the user is that of faster operation
through a pixel-doubled display (thus your drawing software has four times less work to do). A quick search
on the web pulls up some C to do the job. It uses memory to build a "double" (or a Float64) out of an array of
8 chars which is then written to the screen most likely with a stfdu instruction. Seems fast enough.

Then you think there might be a way of doing it without all that writing to memory which is always
a slowdown. There is, but getting a compiler to output the right instructions is nigh on impossible.

Here's the C:

void
pixel_doubling_blit_8(
	const void * source_bitmap,
	int          source_rowbytes,
	int          source_width,
	int          source_height,
	void *       dest_bitmap,
	int          dest_rowbytes)
{
	double temp[1];
	register double temp2;
	while (--source_height >= 0)
	{
		unsigned long * src = ((unsigned long *)source_bitmap)-1;
		double * dst1 = ((double *)dest_bitmap)-1;
		double * dst2 = dst1+dest_rowbytes/8;
		int w = source_width;
		while ((w-=4) >= 0)
		{

			unsigned long pixx = *(++src);
			unsigned char * mid = (unsigned char *)&temp[1];
			*(--mid) = pixx;
			*(--mid) = pixx;
			pixx >>= 8;
			*(--mid) = pixx;
			*(--mid) = pixx;
			pixx >>= 8;
			*(--mid) = pixx;
			*(--mid) = pixx;
			pixx >>= 8;
			*(--mid) = pixx;
			*(--mid) = pixx;

			temp2 = temp[0];
			*(++dst1) = temp2;
			*(++dst2) = temp2;
		} 
		source_bitmap = ((char *)source_bitmap)+source_rowbytes;
		dest_bitmap = ((char *)dest_bitmap)+2*dest_rowbytes;
	}
}

 

 

And the assm replacement which builds the double pixels in two registers we call reg3 and reg4:

******************************************************************MODULE HEADER*****
** FILENAME	 	 : splat.s
** MODULE TITLE	 : screen splats 
** PROJECT	 	 : EzzyLovesQuazi
** DATE STARTED	 : 3 June 1999, Thur 12:00pm
** FIRST AUTHOR	 : Stuart Ball
**
** COPYRIGHT (c) 1999 Lightsoft
**	 	  (c) 1999 Robert Probin and Stuart Ball
**	 	  http://www.lightsoft.co.uk/
**
** Lightsoft is a trading name of Robert Probin and Stuart Ball.
**
************************************************************************************
** HISTORY:
** Date Initial	 	   Descrip.
**
**
**
************************************************************************************
*
************************************************************************************
** INCLUDED FILES
**
*

;The params as passed and their registers are:
;void pixel_doubling_blit_8(
;	const void * source_bitmap,	= r3
;	int          source_rowbytes, = r4
;	int          source_width,	= r5
;	int          source_height,	= r6
;	void *       dest_bitmap,	= r7
;	int          dest_rowbytes)	= r8
;The code
splat_320:
	stw        r13,-0x0008(SP)  ;save these...and restore at end
	stw        r14,-0x0004(SP)
;setup counts etc.	
	subi       r6,r6,1
	cmpwi      r6,0 
	ble        return 
	subi       r5,r5,4
	srawi      r10,r8,3 
	cmpwi      cr6,r5,0 
	addze      r10,r10 
	slwi       r8,r8,1 	
	slwi       r9,r10,3 
.line_loop:      
      subi       r11,r7,8  
	subi       r12,r3,4 
	add        r10,r11,r9 
	blt        cr6,.next_line
	addi       r0,r5,4   
	srawi      r0,r0,2 
	mtctr      r0 
	
.pixel_loop:
reg3:	requ	r13  ;reg3 is really r13
reg4:	requ	r14  ;reg4 is really r14

      lwzu	`reg3,0x0004(r12) 	  ;4 pixels from src and update the src pointer
	mr	`reg4,`reg3		  	  ;copy input
;now our replacement code - we don't write to memory to build the double, we do it on-chip.
						;rlwimi operands as follows:
						;dest,source,shift left,mbegin,mend --- BIG NOTE: mbegin and mend are mask positions for 
						;TARGET data, therefore represent the FINISHING positions AFTER the data is moved! 
	rlwimi `reg3,`reg3,8,8,23    ; assemble two copies of bottom byte (byte 4), plus shift third byte up to second byte position
	rlwimi `reg4,`reg4,32-8,8,23	 ;32-8 is right shift 8. top byte (first) is duplicated into second byte, plus second byte shifted into third position
	rlwimi `reg3,`reg3,8,0,7	 ; duplicate byte in second position and put into top position (first byte).
	rlwimi `reg4,`reg4,32-8,24,31 ; duplicate third into fourth.
			
;now store as a double on the stack
      stw	 `reg3,-0x14(sp)	;make double on stack
      stw	 `reg4,-0x18(sp)
    	
      lfd    f13,-0x0018(SP) ;and load the 8 pixels
	stfdu  f13,0x0008(r11)   ;and store to output line 1
	stfdu  f13,0x0008(r10)   ;and line 2 - remove this to have a line skipped (and faster) display.
	bdnz   .pixel_loop	 ;and keep going until we've done the whole line

.next_line:  subi       r6,r6,1	
	add    r3,r3,r4 
	cmpwi  r6,0 
	add    r7,r7,r8 
	bge    .line_loop  
.return:   
	
	lwz    r13,-0x0008(SP) 
	lwz    r14,-0x0004(SP) 
	blr



(If anybody can improve on the core speed of this routine, I'd love to see the code!)

 

 

 

So we put all that in a Fantasm source file and create an Anvil project to build it. One of the neat things about Anvil (apart from
its ability to play mp3's without skipping :) is its ability to place the code virtually anywhere. In this case we will create a
fragment and place it directly in the C based application; we create a merged PowerPC project and then select Ezzy Loves Quazi as the target.

While I'm here (and as this is a Master Class) a quick discussion of the two fragment formats - XCOFF and PEF.
PEF (Prefered Executable Format) is the way to go - it's Apple's "proprietry" format that's good for MacOS and OSX "client" (they have to change that name!). It generally results in fragments that load faster due to the hash entries for identifiers.

XCOFF (eXtended COmmon Object File Format)is typical IBM's over-engineering. It may have its place if you want some security. There are many disassemblers for the Mac that will quickly disassemble a PEF fragment but not many that will deal with an XCOFF fragment. This is why we maintain our XCOFF linker. If you do need to make your fragment more secure you may want to consider using the XCOFF linker; otherwise the PEF one is recommended. If you really want security tell the linker to produce "no header" and load the binary modules manually at run-time. This is a bit fiddly but
does work. Back to the plot...

 

We now have Anvil building our PPC assm fragment directly in to the C application. How now to load and call it from C?

We set Anvil to poke the assm fragment into Ezzy Loves Quazi as a resource of type Fppc with an ID of 128. So, all we have to
do from our C application is load the resource and then we can call it via something like CallUniversalProc, viz:

 

Load the code at init time (once only!):

Handle splat_code_handle;
UniversalProcPtr splat_code_ptr;

unsigned long splatProcInfo=0xFFFC1;	//7 params in , 0 out, C. Inside Mac PPC System Software Pg 2-17

void load_screen_splat()
{
int hsize;
OSErr my_err;
CFragConnectionID splat_conn_id;
Str255 ErrName;
Str63 fragname = "\pFantScreenSplat";
ProcPtr splat_main_addr;


splat_code_handle=GetResource('Fppc',128);
if (splat_code_handle==0) report_error("Corrupt App. Couldn't load Fppc 128 code resource.","\p",4);
MoveHHi(splat_code_handle);
HLock(splat_code_handle);
hsize=GetHandleSize(splat_code_handle);
my_err=GetMemFragment(*splat_code_handle, hsize, fragname, kPrivateCFragCopy, &splat_conn_id, 
               (Ptr *) &splat_main_addr,ErrName);
if (my_err!=0) report_error("Couldn't link Fppc code resource. Try giving App more memory.","\p",my_err);
splat_code_ptr=NewRoutineDescriptor (splat_main_addr,splatProcInfo, GetCurrentISA());
}

We can then call the the code when we want to do the blit as pixel_doubling_blit_8:

extern unsigned long splatProcInfo;
extern UniversalProcPtr splat_code_ptr;
void pixel_doubling_blit_8
(const void * source_bitmap, int source_rowbytes, int source_width, int source_height, void * dest_bitmap, 
int dest_rowbytes)
{
CallUniversalProc(splat_code_ptr,splatProcInfo, source_bitmap, source_rowbytes, 
                                               source_width, source_height, dest_bitmap, 
                                               dest_rowbytes,0);	//params for 320*240 splat
return;
}

 

There you go!

There are at least two alternate methods you could use, both slightly more complex:

1/ Get Anvil to output the code to another resource which is copied to your application when the C project is built. Obvious at first but a bit
long winded.
2/ Get Anvil to create a shared library and then dynamically link to that in your C project. If the assm fragment had more than a few functions
it may be worth doing it this way; but I personally like the simplicity of the method given in this class.

END.

 

Notes:

1/ All code in this Master Class was built and tested with Anvil 3.00b8 using Fantasm 6b3 for the assembly
language and MrC (driven via Anvil 3's MrC AIT) v3.01 for the C. Debuggers were The PowerMac Debugger for the C and of course
good old Macsbug.

2/ This method (loading code from a resource) also works for 68K code should you need it; just change the ISA when creating
theUniversalProcPtr.

3/ You may note the last paramter passed in CallUniversalProc is apparently not used by the assm function. This parameter is used to select a given
function within the assm function although the code is not shown here. For example a 1 could mean use another routine to skip every other line
during the blit.

4/ I use the words "splat" and "blit" interchangeably. The mean the same thing - blit.

©Lightsoft 1999. All trademarks acknowledged.

We always enjoy feedback from articles, so if you have any constructive criticism, corrections or wish to clarify something, please email us or post to the Fantasm mailing list.


Other articles in this series: Memory Protection With Fantasm.