Rich Talbot-Watkins wrote: ↑Thu May 06, 2021 12:15 pm
But then the final step I'd probably take is the same one that modern compilers take, which is to notice that the whole thing is just an arithmetic sequence, and thus you can calculate the result without a loop at all!
Code: Select all
10 PRINT FNsum(100)
20 END
30 :
40 DEF FNsum(max%)
50 =max%*(max%+1) DIV 2
You tempted me again! Feeling newly invigourated from my last escapade, I couldn't let this one off so lightly
The result is further down.
In 6502 assembler, the crux lies in being able to multiply without a
MUL-style instruction. It's done by right-shifting and adding based on fixed-width addition. For those not sure how this works, there is an excellent tutorial about it on page 221-225 of the Electron User Guide which is where I learned how to do it. The principle is that to multiply an
N-bit number by an
M-bit number you need (
N +
M) bits of space and either
N iterations or
M iterations, whichever way you want to do the multiplication.
This implementation has a fatter loop than the last implementation (maximum 74 clocks as opposed to 51 for the looped version). However, it always runs a constant 24 iterations which yields a bounded loop time of just 1776 clocks for 0 <~
X <= 92,681. The last implementation was around 4.7m clocks worst running time which was visible when you run it (51 x 92,681 clocks; over 2 seconds!). This new version is constant time and is instantaneous in comparison.
This new implementation is optimised for size. There is one place I could have unrolled it further for speed. However, doing so resulted in branch ranges being exceeded which meant a little more twiddling was required. I leave that as an "exercise for the reader"
Here's the code (again, tested at
https://bbcmic.ro/ but not on a real Electron).
Code: Select all
DIM srce 5, dest 6
DIM code 255
FOR I% = 0 TO 2 STEP 2
P%=code
[OPT I%
LDA srce+3 \ no point computing anything above &1FFFF.
BNE overflow
LDA srce+2
AND #&FE
BNE overflow
STA dest+3 \ to multiply two 24 bit numbers we'll need 48 bits of
STA dest+4 \ space. the result will be the bottom 32 of those 48 bits.
STA dest+5 \ initialise upper 24 of those 48 bits to zero.
LDA #24 \ use 24-bit counter at srce+4 to perform multiplication.
STA srce+4 \ objective is to calculate !srce * (!srce + 1).
CLC \ set up YX = !srce AND &FFFF (bottom 16 bits of the
LDX srce \ "multiplicator") and !dest = !srce + 1 ("multiplicand").
TXA
ADC #1
STA dest
LDY srce+1
TYA
ADC #0
STA dest+1
LDA srce+2 \ bits 23-16 of multiplicator not registerised.
ADC #0
STA dest+2
.multiply
CLC \ clear carry in preparation for next round of addition.
LDA dest \ test bottom bit of multiplicand. this tells us whether we
AND #1 \ need to add the multiplicator (YX) to top bits of result.
BEQ rotate \ no addition of multiplicator required? skip to rotate.
TXA \ add multiplicator (YX) to top 24 bits of the result.
ADC dest+3
STA dest+3
TYA
ADC dest+4
STA dest+4
LDA srce+2 \ bits 23-16 of multiplicator not registerised. note that
ADC dest+5 \ if we overflow (C=1) at this stage then it will ripple
STA dest+5 \ into the result on the rotate so no need to raise errors.
.rotate
ROR dest+5 \ advance multiplication. this implicitly rotates the
ROR dest+4 \ multiplicand, setting us up for the next bottom bit test.
ROR dest+3 \ previous error checks clamped (!srce + 1) to 18 bits, so
ROR dest+2 \ bit 23 of mulitplicand (carry result on 24th rotate) will
ROR dest+1 \ always be zero; no explicit CLC needed for 25th rotate.
ROR dest
DEC srce+4 \ one more bit of the multiplicand analysed. after 24
BEQ rotate \ iterations, the result is a factor of two too big. run a
BPL multiply \ 25th iteration with just the rotate (DIV 2) to finalise.
LDA dest+4
ORA dest+5 \ if all is good, we should have only 32 bits of result.
BNE overflow
RTS
.overflow
BRK
EQUB 20
EQUS "Too big"
BRK
]
NEXT
INPUT "Upper bound", !srce
CALL code
PRINT "Result: ";!dest;" (or in hex: &";RIGHT$("0000000"+STR$~!dest,8);")"
Edited: code indentation plus a few extra comments.