Semi-OT: what comes after 80 bit long doubles

BigEd · Post by **BigEd** » Mon Jun 14, 2021 1:45 pm

(This is a bit off-topic but follows on from various comments about floating point in BBC Basic implementations)

Since the original 8087 from Intel and the closely related IEEE standard for floating point arithmetic, we've had available the choice of floats, doubles, and long doubles. In many cases, these are 32, 64 and 80 bit floating point representations, supported directly by the hardware.

But since the rise of ARM and the fall of most other architectures, and especially since Apple's introduction of the M1 product line, we're in a position that not all common laptops and computers offer the 80 bit long double. It's just not there in the hardware.

William Kahan, who had a lot to do with the original 8087 and the closely related IEEE standard, apparently said in 2002:

For now the 10-byte Extended format is a tolerable compromise between the value of extra-precise arithmetic and the price of implementing it to run fast; very soon two more bytes of precision will become tolerable, and ultimately a 16-byte format ... That kind of gradual evolution towards wider precision was already in view when IEEE Standard 754 for Floating-Point Arithmetic was framed.

So, perhaps we should say the 80 bit format is a great idea but not the final story. It turns out there's already a standard for quad precision, a 128 bit format, which might in the future be the one to look for. Whether it will be offered in commodity machines is another question - I'm guessing not in low-end machines. It seems the market is satisfied with 64 bit floats, and has forgotten or discounted the arguments which led to the 80bit compromise.

It turns out that ARM64 procedure call standard defines long double as quad precision, which is not to say that quad precision is natively available. And there's a library for GCC which offers quad precision. See
Wikipedia's Quadruple-precision floating-point format.

See also Paul Zimmermann's slide deck "How Slow is Quadruple Precision?"

Quadruple precision is indeed slow, but we can do much better!
We saved a factor of 10 with little effort, probably we can save another factor of 2 with more effort.

Possibly, if we're unlikely to get 128bit float support in hardware, we're better off with double-double floats, which are not an IEEE standard but are (presumably? hopefully?) higher performance than quad precision. This presentation suggests no worse than a 10x slowdown.

Deleted User 9295 · Post by **Deleted User 9295** » Mon Jun 14, 2021 2:08 pm

BigEd wrote: ↑Mon Jun 14, 2021 1:45 pm Possibly, if we're unlikely to get 128bit float support in hardware, we're better off with double-double floats

I'd like to see a (speed) comparison between Apple's implementation of 80-bit floats in Rosetta 2 and GCC's implementation of 128-bit floats in the libquadmath library. Naïvely one might expect 80-bit floats to be less work, but I expect in practice it's not that simple, especially if the double-double approach can be used for any of the computations.

BigEd · Post by **BigEd** » Mon Jun 14, 2021 2:26 pm

Ah, yes, me too: I don't have an M1 system to play with, but I think perhaps I know someone who does.

Would any simple loop of trig functions in BBC Basic also be an interesting measurement? I suppose I really ought to be comparing the M1's 64 bit floats with Rosetta's 80 bit floats, to try to factor out the raw speed of the machine. Or perhaps if the slower answer is fast enough, we're all happy.

(I should perhaps also add: just possibly the future is 64 bit floats. But part of me would prefer to see a 128 bit future, even if I can't say exactly why.)

Deleted User 9295 · Post by **Deleted User 9295** » Mon Jun 14, 2021 2:44 pm

BigEd wrote: ↑Mon Jun 14, 2021 2:26 pm I suppose I really ought to be comparing the M1's 64 bit floats with Rosetta's 80 bit floats, to try to factor out the raw speed of the machine.

I can easily run the same BBC BASIC program in both native (64-bit double) mode and Rosetta (80-bit long double) mode on the same hardware. That won't give a true comparison of the FP computation speeds, however, because of the overhead of moving the 80-bit values around compared with 64-bits. However I would expect the true 80-bit speed to be higher, rather than lower, than the crude comparison would suggest.

Deleted User 9295 · Post by **Deleted User 9295** » Mon Jun 14, 2021 3:15 pm

Make of this what you will. The figures are for native 64-bit floats (AArch64, M1) and emulated 80-bit floats (Rosetta 2), respectively, on the same hardware, but the overhead of executing the x86 code is likely to be a significant factor and hence the emulated 80-bit floats faster than this would suggest:

: native.png (28.39 KiB) Viewed 4650 times

: rosetta.png (30.37 KiB) Viewed 4650 times

BigEd · Post by **BigEd** » Mon Jun 14, 2021 3:57 pm

Thanks - interesting, especially the slowness of exponentiation. But anyhow, 10x slower is the order of magnitude, and that's good to know. It's likely that the Rosetta version on the M1 isn't going to run much faster than the native version on the machine being replaced, and it might even run a bit slower.

In practical terms, the question would be whether this is fast enough - does a Mandelbrot still render, does a physics engine still evaluate, at a pleasing rate.

Deleted User 9295 · Post by **Deleted User 9295** » Mon Jun 14, 2021 5:38 pm

BigEd wrote: ↑Mon Jun 14, 2021 3:57 pm In practical terms, the question would be whether this is fast enough - does a Mandelbrot still render, does a physics engine still evaluate, at a pleasing rate.

I can put it into context for you. Here are the same timings for a Raspberry Pi 4 (native 64-bit doubles):

: rpi4timing.png (15.25 KiB) Viewed 4623 times

So in most cases 80-bit floats emulated by Rosetta 2 are significantly faster than native 64-bit floats on a Raspberry Pi!

Post by **flibble** » Mon Jun 14, 2021 8:51 pm

81 bit long doubles...

BigEd · Post by **BigEd** » Mon Jun 14, 2021 9:42 pm

There's a long video of a talk, from 1995, in which (at about 1h20 mark) William Kahan talks about 80 bit floats, vs SPARC's 64 bit floats, and the fact that some 70+ instructions are needed to perform exponentiation adequately well. Without which, if I understand correctly, you can readily lose several bits of accuracy.
Floating Point - Past, present and Future

There's a program Qtest which explores and reports a machine's accuracy, on page 20 of this paper. We see, for example, Cray's Y-MP getting only 24 bits of accuracy from their 48 bits of precision.

BigEd · Post by **BigEd** » Tue Jun 15, 2021 2:28 pm

I had an idea for a test. So, I compared BBC Basic on my Intel-based MacBook with the in-browser version of BBC Basic. I think this is comparing 80 bit extended precision with 64 bit double precision.

Here's the Intel-based result:

Code: Select all

>@%=&1919
>PRINT (10^256)^(1+2^-30)/2^788          
    6142761521963062542.5

or in hex (showing just 63 bits)

Code: Select all

    553F 790E F88B D50E

The in-browser version returns

Code: Select all

    6142761521963065344

which is hex

Code: Select all

    553F 790E F88B E000

showing, I think, just 49 bits correct, out of 53 significant bits. (This being the expected consequence of not having extra bits for intermediate results, which is the motivation for extended precision)

The correct answer would be

Code: Select all

    6142761521963062543.8188...

and so the extended precision result is just under 3 ULPs adrift.

Deleted User 9295 · Post by **Deleted User 9295** » Tue Jun 15, 2021 3:58 pm

BigEd wrote: ↑Tue Jun 15, 2021 2:28 pmThis being the expected consequence of not having extra bits for intermediate results, which is the motivation for extended precision

Indeed, but counter-intuitively using extra precision for intermediate results doesn't always make things better, as is described here. I imagine this is a rare occurrence, but in this specific case performing a division with 64-bit-mantissa precision ('80 bits' long double) and then rounding the result to 53-bit-mantissa precision ('64-bits' double) gives the 'wrong' answer, in the sense that it's not the value closest to the exact result. However performing the division directly with 53-bit-mantissa precision does give the correct answer. Go figure!

BigEd · Post by **BigEd** » Tue Jun 15, 2021 4:54 pm

Good point, and thanks for the link. I see the blogger references Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic" which is always a good sign!

Deleted User 9295 · Post by **Deleted User 9295** » Tue Jun 15, 2021 5:12 pm

BigEd wrote: ↑Tue Jun 15, 2021 4:54 pm Good point, and thanks for the link. I see the blogger references Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic" which is always a good sign!

Here is his example in BBC BASIC:

Code: Select all

   10 x = 50178230318
   20 y = 100000000000
   30 ratio# = x / y
   40 @% = &1212
   50 PRINT ratio#

Run this in one of my BASICs on an x86 machine and it will print 0.501782303179999944 but on an ARM (or in-browser) it will print 0.501782303180000055 - slightly closer to the precise result - exactly as he reported. I wonder how exceptional it is?

Soruk · Post by **Soruk** » Sat Jun 19, 2021 10:08 pm

Richard Russell wrote: ↑Tue Jun 15, 2021 5:12 pm
BigEd wrote: ↑Tue Jun 15, 2021 4:54 pm Good point, and thanks for the link. I see the blogger references Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic" which is always a good sign!
Here is his example in BBC BASIC:
Code: Select all
   10 x = 50178230318
   20 y = 100000000000
   30 ratio# = x / y
   40 @% = &1212
   50 PRINT ratio#
Run this in one of my BASICs on an x86 machine and it will print 0.501782303179999944 but on an ARM (or in-browser) it will print 0.501782303180000055 - slightly closer to the precise result - exactly as he reported. I wonder how exceptional it is?

With slight adjustments for Matrix Brandy (ratio with no #), I get 0.501782303179999944 on x86-32 (CentOS 6), but 0.501782303180000055 on x86-64 (CentOS 8 ) and ARM!

Edit: If I change the temporary float store to "long double" (80 bits on gcc) x86-64 now gives 0.501782303179999944. It looks like the compiler (or libraries) on CentOS 6 are doing FP division at 80 bits then casting the value, whereas on CentOS 8 the division is done at the bit depth of the target variable. In this example, it looks like doing the whole thing in 64-bit space renders the more accurate result.

Deleted User 9295 · Post by **Deleted User 9295** » Sat Jun 19, 2021 10:32 pm

Soruk wrote: ↑Sat Jun 19, 2021 10:08 pm In this example, it looks like doing the whole thing in 64-bit space renders the more accurate result.

Yes, that's the point: 'double rounding' gives rise to a (slightly) less accurate result in this case. But the question is: how exceptional is it? Has the author of that article hit upon an extraordinarily rare combination of numerator and denominator which exhibits this behaviour, or does it happen fairly often?

There's no doubt, though, that when chaining multiple operations, using 80-bit floats to hold the intermediate results is beneficial. So on balance it has to be the better way. I presume that's impossible in Matrix Brandy, or can it keep intermediate values within an expression in 80-bit format?

Soruk · Post by **Soruk** » Sat Jun 19, 2021 10:42 pm

Richard Russell wrote: ↑Sat Jun 19, 2021 10:32 pm
Soruk wrote: ↑Sat Jun 19, 2021 10:08 pm In this example, it looks like doing the whole thing in 64-bit space renders the more accurate result.
Yes, that's the point: 'double rounding' gives rise to a (slightly) less accurate result in this case. But the question is: how exceptional is it? Has the author of that article hit upon an extraordinarily rare combination of numerator and denominator which exhibits this behaviour, or does it happen fairly often?

There's no doubt, though, that when chaining multiple operations, using 80-bit floats to hold the intermediate results is beneficial. So on balance it has to be the better way. I presume that's impossible in Matrix Brandy, or can it keep intermediate values within an expression in 80-bit format?

If it's multiple steps within the C code associated with a single BASIC operation it should be feasible enough, but passing 80-bit floats back to BASIC to be passed into another operation, there's no capability for it. I've previously attempted to change the typedef for "float64" to use "long double" but it broke really badly - I'd need to trawl the code to see if assumptions are being made of the size of "float64"

Coeus · Post by **Coeus** » Wed Jun 23, 2021 1:25 am

I think this discussion reinforces something which should be common knowledge, but perhaps is not, that even with well designed hardware and software (compiler or interpreter) a programmer still needs be careful. It is naive to assume that the obvious way of implementing a mathematical formula is the best or that things that are mathematically equivalent will give the same answer or even the same degree of accuracy.

scruss · Post by **scruss** » Wed Jun 23, 2021 4:38 am

flibble wrote: ↑Mon Jun 14, 2021 8:51 pm 81 bit long doubles...

Just that one bit better …

Curiously, all the attention to floats I've been aware of recently are towards narrower ones, not wider. Some of them have support from accelerated graphics cards, so can be calculated at phenomenal rates with adequate (if you're careful) precision

BigEd · Post by **BigEd** » Wed Jun 30, 2021 4:27 pm

Just to note that Richard recently reported a performance improvement on Raspberry Pi 4 when running the whole machine in 64 bit mode, only recently possible, compared to the usual 32 bit mode. With one effect being that the emulation of 80 bit floats, while still slower than native 64 bit floats, is not nearly so much slower - if I read the numbers correctly.
https://www.raspberrypi.org/forums/view ... 2#p1881512

The much more noticeable effect, and somewhat unexpected, is that everything runs a lot faster in 64 bit mode - twice as fast, or more. (I believe this to be somewhat unexpected, even though 64 is obviously twice as large as 32.)

Deleted User 9295 · Post by **Deleted User 9295** » Wed Jun 30, 2021 6:12 pm

BigEd wrote: ↑Wed Jun 30, 2021 4:27 pm Richard recently reported a performance improvement on Raspberry Pi 4 when running the whole machine in 64 bit mode, only recently possible

It's not yet possible (for anybody except me, anyway!) to run BBC BASIC for SDL 2.0 on 64-bit Pi OS; you can't even build it from source (I haven't updated GitHub with the necessary mods). So this remains of theoretical interest only for the time being.

My current plan is to include an experimental edition for the 64-bit Pi OS (currently in beta test) with the next release of BBCSDL, and of course GitHub will be updated at the same time.

With one effect being that the emulation of 80 bit floats, while still slower than native 64 bit floats, is not nearly so much slower

There's no emulation of 80-bit floats on ARM editions of BBC BASIC (Android, iOS, 64-bit Pi OS, Apple Silicon Mac) nor the in-browser edition; they support 64-bit floats only and therefore perform calculations with less accuracy. If I'd known that ARM CPUs don't support 80-bit floats I probably would never have used them in the x86 editions either.

The truth is that, not having at the time ever owned or used an ARM-based device, I naïvely assumed that modern ARM CPUs did support 80-bit floats. I knew that the Acorn FP Co-Processor included such support, and I never guessed that it would be dropped when the FPU was integrated with the CPU. It still seems crazy!

BigEd · Post by **BigEd** » Wed Jun 30, 2021 6:28 pm

Ah, my mistake about the 80 bit float emulation. I inferred it from

Code: Select all

NEXT (80-bit real):         39 ns

but evidently shouldn't have.

Deleted User 9295 · Post by **Deleted User 9295** » Wed Jun 30, 2021 7:06 pm

BigEd wrote: ↑Wed Jun 30, 2021 6:28 pm I inferred it from
Code: Select all
NEXT (80-bit real):         39 ns

What would you rather it said? I can't fit:

Code: Select all

NEXT (80-bit real or 64-bit real depending on edition):         39 ns

Is there a shorthand term for the default floating point variable type, i.e. the type of a non-integer numeric variable with no suffix character, that everybody would understand?

BigEd · Post by **BigEd** » Wed Jun 30, 2021 7:07 pm

Oh, I see, it was default size. That makes sense. Perhaps just "float"?

stardot.org.uk

Semi-OT: what comes after 80 bit long doubles

Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles

Re: Semi-OT: what comes after 80 bit long doubles