tricky wrote: ↑Wed Oct 07, 2020 6:17 pm
...
Main sprite routine:
Code: Select all
.draw_sprite ; Y=sprite
ldx OLD_H,y : beq return : dex
lda OLD_T,y : sta ld0+sm_lo : sta ld1+sm_lo : sta ld2+sm_lo
lda OLD_X,y : and #3 : ORA #HI(SPRITES) : clc : sta ld0+sm_hi : adc #4 : sta ld1+sm_hi : adc #4 : sta ld2+sm_hi
lda OLD_X,y : and #&7C : asl A : sta st0+sm_lo : sta st0+3+sm_lo : adc #8 : sta st1+sm_lo : sta st1+3+sm_lo : clc : adc #8 : sta st2+sm_lo : sta st2+3+sm_lo
lda OLD_Y,y : lsr A : lsr A : lsr A : ora #HI(SCREEN) : sta st0+sm_hi : sta st0+3+sm_hi : sta st1+sm_hi : sta st1+3+sm_hi : sta st2+sm_hi : sta st2+3+sm_hi
lda OLD_Y,y : and #7 : tay
.row
.ld0 : lda SPRITES,x : .st0 : eor SCREEN,y : sta SCREEN,y
.ld1 : lda SPRITES,x : .st1 : eor SCREEN,y : sta SCREEN,y
.ld2 : lda SPRITES,x : .st2 : eor SCREEN,y : sta SCREEN,y
dex : bmi done
dey : bpl row
lda st0+sm_hi : sec : sbc #1 : ora #&60 : sta st0+sm_hi : sta st0+3+sm_hi : sta st1+sm_hi : sta st1+3+sm_hi : sta st2+sm_hi : sta st2+3+sm_hi
ldy #7 : bne row
.done : ldy #0
RTS
The subject of sprite plotting comes up quite often on *., so I thought a bit more elaboration might be useful here.
Assuming that we have a sprite X and Y in pixels, although many other representations are useful, we first need to calculate a screen address.
I'm assuming here that the screen is MODE 1 and has been shrunk horizontally to 256 pixels which is two pages &200. I might look at full size later on.
The way that the screen (assuming SCREEN is say &4000) is layed out, we need:
A vertical offset within a character: lda spr_y : and #7
A character row address: lda spr_y : and #&F8 : lsr A : lsr A : adc #HI(SCREEN) ; if screen is &40000, we could use ora, but it is the same cost here
A row offset in bytes: lda spr_x : and #&FC : asl A ; A=LO(offset), C=HI(offset)
And probably an offset within a byte: lda spr_x : and #3 ; used to choose which pixel offset copy of the sprite to use.
Which sprite to draw: lda spr_z
You might also want animation and this could be combined with pixel offset as in the example above.
If we are drawing a sprite that can be at various pixel offsets within a byte and it is 4 pixels wide, we will need to draw 2 columns (8 pixels wide) for the three offsets that aren't 0. We can either optimise for the aligned case and draw 1, and 2 for the rest, or keep it simple and make the sprite 5 pixels wide
We also need to know how we want to draw the sprite and here I' assuming that the sprites are accessed with indexed addressing and the screen with indirect addressing:
We might need to clear the sprite:
lda #0 : sta (screen0),y : sta (screen1),y
Just copy it to the screen, this is the fastest and most simple, but will erase any overlapping graphics.*
lda spr_col0,x : sta (screen0),y : lda spr_col1,x : sta (screen1),y
ORing sprites on gets around the issue of sprites erasing overlapping graphics, but works best with a single colour 3 sprite like my Pac-Man.
lda spr_col0,x : ora (screen0),y : sta (screen0),y : lda spr_col1,x : ora (screen1),y : sta (screen1),y
EOR (or XOR) this is useful as drawing it a second time puts everything back as it was.**
lda spr_col0,x : eor (screen0),y : sta (screen0),y : lda spr_col1,x : eor (screen1),y : sta (screen1),y
Masked drawing uses a second set of graphics to keep or discard the background pixels which looks much neater.
lda spr_msk0,x : and (screen0),y : ora spr_col0,x : sta (screen0),y : lda spr_msk1,x : and (screen1),y : ora spr_col1,x : sta (screen1),y
If you have a lot of sprite data, which is doubled with masks, you can "auto mask" which needs an extra 256 byte page aligned table of "keep" bits. ***
lda spr_col0,x : sta mask0+1 : .mask0 : lda masks : and (screen0),y : ora spr_col0,x : sta (screen0),y
lda spr_col1,x : sta mask1+1 : .mask1 : lda masks : and (screen1),y : ora spr_col1,x : sta (screen1),y
A similar technique can be used to draw mirrored (1 table) sprites and to create the misaligned 3 pixel offsets on the fly (6 tables) and even mirrored and shifted together (6 tables). This is what I do in my Donkey Kong demo as it would otherwise require about 100K of graphics data.
* If a black pixel is left around the sprite and it only moves a single pixel, then the copy method can erase as it goes. My Frogger port uses this technique for everything except the frog.
** if two identical sprites are on top of each other, they disappear.
*** sprite masking can be done much more efficiently on the 6502 (B/B+) than the 65C02 (Master/Compact) by using "undocumented opcodes".
Whether to use indexed addressing or indirect indexed addressing is usually a matter of which will be faster, but the indexed version usually requires self modifying code, so isn't suitable for code in ROM but is fine for code in RAM and data in ROM. Generally self modifying code ,x is slower to update and indirect indexed addressing (),y is slower to execute. If something is written once and then read a few times, it is probably quicker to use ,x but if the address is reused (EOR) then it can be faster to use (),y.
For this EOR (XOR) routine, we use the sprite address once and the screen address twice, so will go with ,x for sprite access and (),y for screen address. The sprite data is assumed to start at &3000 and be stored in columns with the four horizontal offsets for the first column before the second column. We are also assuming that the y coordinate of the sprite is for the bottom pixel, this allows a small optimisation in the inner loop.
Code: Select all
lda spr_y : and #7 : tay ; offset within char row
eor spr_y : lsr A : lsr A : ora #&40 : tax ; eor saves lda,and
tax : lda spr_x : and #&FC : asl A
{ bcc lhs : inx : clc : .lhs } stx screen0+1 : sta screen0
adc #8 { bcc lhs : inx : .lhs } stx screen0+1 : sta screen0
lda spr_x : and #3 : ora #&30 : sta sprite0+2 ; pixel offset indexes column/page
ora #4 : sta sprite1+2 ; second column is four pages after the first column.
lda spr_bottom : sta sprite0+1 : sta sprite1+1 ; sprite low byte starting at bottom
ldx spr_height_minus_one
This looks like a lot of setup code and it is, which is why sometimes it can be better to store the data in a way that is more friendly to drawing as well as pixel X,Y which is more friendly to your brain and collision detection.
To draw these arbitrarily high sprites that we are going to assume don't go off any edge of the screen with EOR drawing we could do:
Code: Select all
.draw
.sprite0 : lda sprites,x : eor (screen0),y : sta (screen0),y
.sprite1 : lda sprites,x : eor (screen1),y : sta (screen1),y
dex : bmi done ; will only be taken once, so optimise for not taking branch
dey : bpl draw ; drawing up allows us to dey : bpl instead of inc : cmp #8 : bne
dec screen0+ 1 : dec screen0+ 1 : dec screen1+ 1 : dec screen1+ 1 ; several ways to do this
ldy #7 : bne draw ; ALWAYS, save a byte over jmp, but still takes 3 cycles
.done
RTS
The code in the loops could be "unrolled" but as we are using self modifying code and would cost more to write 8 copies of the addresses.
Every time I write one of these routines, which is several times per game, they are slightly different and I have just written this in the browser as I am not on my dev machine, so this is probably not quite the same as any of my previous routines
AND IS UNTESTED!
The last thing to note is that adding more columns, making the sprite wider, is less than a linear cost increase as the looping cost doesn't change and calculating the additional addresses is much cheaper than the first. The only cost that is linear is moving to the next char row.
I do have versions of this code where I know I have 9 pixel high sprites, so unroll the loop twice and only include the bmi after the dex in the second loop. or even 17 pixel high ones that are unrolled three times. There is a 16 pixel high one unrolled three times that checks x<0 between the second and this copies and in the third copy. Usually there isn't room for many versions of this code: drawing types, widths and many unrolled copies but in Frogger that is exactly what there is as there is so much sprite area to draw.
One thing that I do in a couple of my games including Frogger is:
Code: Select all
.loop
lda sprite_row0,y : sta screen0_row0,y : sta screen1_row0,y : sta screen2_row0,y ; top half
lda sprite_row1,y : sta screen0_row1,y : sta screen1_row1,y : sta screen2_row1,y ; bottom half
dey : bpl loop
RTS
This only works as the sprites are always character row aligned and self erase.
In the actual these routines are optimised for 12 and 14 pixel high sprites and probably other sizes which require the loops to be unrolled and the sprite read ,x while the screen is written ,y as for say a 12 pixel high sprite, Y must count 6 and skip 2 (top_row+2,y for the top and bottom_row,y for the bottom) but there isn't room to have gaps in the sprite data.
Enough for now, please feel free to point out any mistakes or improvements and ask any questions as I'm sure you won't be the only one asking "why do it like that?", to which the answer will probably be "because I couldn't think of a better way
".