tricky wrote:Changing both colours takes 46 cycles, but a bit less from first register write to last.
There are 128 cycles per scanline, with 80 of them visible, leaving 48 cycles to change the palette and allow for which of the 7 cycles in the BEQ+BIT the vsync starts. As I haven't seen any flicker, I either have my maths wrong, or don't use the changed colour in the first/last byte!
This can be easily reduced to 44 cycles by calculating one of the EORs ahead of time and storing the result in X. This gives a first-write-to-last-write time of 41 cycles, which allows 7 cycles of jitter. More than enough!
But stepping back a bit: I have reason to believe cycle-exact synchronization is possible. I'm not aware of it having been done on the Beeb, but it's been done on other 6502-based systems, notably the C64 (magic phrase: "stable raster"). While I have no code (yet?), let's see what it would buy us if it could be achieved.
With precise timing palette writes could happen anywhere, not just between lines. We have 256 lines * 128 cycles = 32768 cycles for the displayed part of the frame, plus the vblank interval. Assume that during vblank we can rewrite the entire palette plus A,X,Y. Assume also that memory is no object. Then code like
writes an arbitrary palette value in 6 cycles[1]. We can shave off 2*3=6 cycles due to not needing to load the 3 values already set up in A,X,Y. So this gets us roughly (32768+6)/6~=5462 palette writes per frame, or about 21.3 per line. This is much better than the 8 required to change two colours per line!
There's also the flash bit, which for the same price as a palette write can invert the RGB values of an arbitrarily-chosen subset of the entire palette. We want to write D8 or D9 to the control register (FE20). But since we don't really care about the top 3 bits, which adjust the cursor, there are 8 possibilities for each choice of the flash bit: {1,3,5,7,9,B,D,F}{8,9}. When writing to the palette register these correspond to setting any logical colour to either phyical colour 14 or 15. Conveniently these are flashing colours themselves, so black, white, red and cyan are potentially cheaper to use than other colours due to the possibility of storing the same value into both the control and the palette register, saving two cycles.
How much space would this take? "LDA #xx: STA FE21" is 5 bytes. 5462 copies is about 27K. Obviously this wouldn't fit on a model B with 20K also used for screen memory. Conceivably it might fit on a B with sideways RAM, but the non-contiguous nature of the code (we have 0000-3000 free, then 8000-C000) would complicate matters. With shadow screen the code would easily fit in RAM. On a Master we'd also have use of STZ for additional cycle savings.
Even targeting a memory-limited model B, clearly two colour changes per line is barely scratching the surface of what's possible. There's a lot of potential for optimization here!
Tom.
[1] I'm
reasonably sure there's no cycle stretching and writes to the video ULA run at 2MHz.