New: Console Mode editions of BBC BASIC
Re: New: Console Mode editions of BBC BASIC
which file is item() defined in ?
Re: New: Console Mode editions of BBC BASIC
It does look like a compiler issue. I slightly suspect making them normal non register variables with LTO working will have similar performance.
Re: New: Console Mode editions of BBC BASIC
I was wondering about the impact of LTO on global register variables, I wouldn't be too surprised if they are mutually exclusive. I do enable LTO on my Web Assembly (in-browser) build, but of course that doesn't support register variables anyway - not least because it uses LLVM rather than GCC as the compiler (and I don't know whether the concept of 'register' is entirely meaningful in Wasm). Sadly the performance improvement was minimal.
Re: New: Console Mode editions of BBC BASIC
I do hope that you don't consider me one of your detractors. I am only trying to help get to the bottom of the Pico issue. As I have said previously, I value your continuing efforts with BBC BASIC.Richard Russell wrote: ↑Mon Nov 08, 2021 4:13 pm So I do feel rather in a 'no win' situation. No matter how much circumstantial evidence I produce, I cannot prove that my code isn't at fault. Therefore the only thing that my detractors will accept is discontinuing my BASICs altogether. And since I was already considering doing that, I will add this sorry affair to the list of arguments for doing so.
Personally, I would avoid the use of r9 and r11 on 32bit ARM since the ARM APCS document has other uses for those registers, (r11 can be used as the frame pointer) and r9 can be used as the 'platform register' - the meaning of which is defined by the platform standard!?Richard Russell wrote: ↑Mon Nov 08, 2021 4:13 pmI admit that they are not exactly 'extensively documented'; one thing that I've failed to find, for example, is an explicit list of which registers may, and which may not, be used for global register variables in each CPU architecture. So I've resorted to applying common sense, i.e. that global register variables must be called-saved in the ABI, and not reserved for special purposes like SP (r13) or LR (r14). But that is somewhat unsatisfying given how crucial they are to the correct functioning of the interpreter.
I would also avoid r10 since I seem to recall that under one of the old ARM APCS standards, (can't find it at the moment) it was used as part of stack extension, or limit checking.
Again, Richard, thank you for all your efforts.
Regards,
Rob.
Re: New: Console Mode editions of BBC BASIC
Yeah LTO can be hit and miss. On cortex-m0 where the register file has limitations LTO should help.
Re: New: Console Mode editions of BBC BASIC
ARM R9 (APCS' v6) used to be the one typically selected in ye olden days as one to use for a global register; I have used that myself down the years (albeit not in gcc, but Norcroft C).
I note that this particular target appears to be compiling for Thumb mode, where access to R8 upwards is 'limited'. It may be that our gcc friends don't expect you to be using such registers. Might be worth seeing if a simple code fragment can reproduce it for submission to gcc as a bug, otherwise it's unlikely to be fixed.
@Rob: Yes, R10 is used on APCSes on RISC OS as the stack limit register (APCS' sl) for software stack checking, unless -apcs /noswst or #pragma no_check_stack is used.
I note that this particular target appears to be compiling for Thumb mode, where access to R8 upwards is 'limited'. It may be that our gcc friends don't expect you to be using such registers. Might be worth seeing if a simple code fragment can reproduce it for submission to gcc as a bug, otherwise it's unlikely to be fixed.
@Rob: Yes, R10 is used on APCSes on RISC OS as the stack limit register (APCS' sl) for software stack checking, unless -apcs /noswst or #pragma no_check_stack is used.
Miserable old curmudgeon who still likes a bit of an ARM wrestle now and then. Pi 4, 3, ARMX6, SA Risc PC, A540, A440
Re: New: Console Mode editions of BBC BASIC
And, boom, since r0-r3 cannot be used (they aren't callee-saved) and r4-r7 are too precious to lose from general purpose use in Thumb2 mode, that leaves only r8. And I don't think it makes much sense to have only one global register variable in BBC BASIC rather than two.RobMcK wrote: ↑Mon Nov 08, 2021 5:28 pm Personally, I would avoid the use of r9 and r11 on 32bit ARM since the ARM APCS document has other uses for those registers, (r11 can be used as the frame pointer) and r9 can be used as the 'platform register' - the meaning of which is defined by the platform standard!?
I would also avoid r10 since I seem to recall that under one of the old ARM APCS standards, (can't find it at the moment) it was used as part of stack extension, or limit checking.
I argue that in the absence of explicit documentation it's acceptable to make empirical decisions based on whether they actually seem to work in practice in the real world. There is a chasm between us in this regard.
Re: New: Console Mode editions of BBC BASIC
It's all the Cortex-M0+ can execute - no ARM 32 execution engine at all! Jumping to an even address results in an immediate exception, reporting something like 'entering ARM mode', but it can't!
Re: New: Console Mode editions of BBC BASIC
It certainly looks like gcc -Os is being a bit dim (which is to say, broken) for the Pico's Cortex-M0+ - but fortunately we also see that -Os isn't helping much either, so we don't need it.
Good digging Dave - I was reading about OpenOCD, but haven't tried it. Perhaps now you have gdb in the mix, you can also find out how the stack is laid out within item()? Maybe one or more of these will help (with a breakpoint somewhere inside item()):
Good digging Dave - I was reading about OpenOCD, but haven't tried it. Perhaps now you have gdb in the mix, you can also find out how the stack is laid out within item()? Maybe one or more of these will help (with a breakpoint somewhere inside item()):
Edit: maybe see also this cut-out-and-keep reference card.For the current stack frame:
- info frame lists general info about the frame (where things start in memory, etc.)
- info args lists arguments to the function
- info locals lists local variables stored in the frame
Re: New: Console Mode editions of BBC BASIC
I think it's worth taking a step back here and summarising....
The global register "problem" only happens on the Pico BBC Basic build, and then only with the -Os (optize for size) optimizer.
It highly likely this is a bug in the C compiler; the evidence (the generated code) is compelling, and I see no other plausible explanation.
There is, however, no reason to want to use -Os optimizer on the Pico; benchmarking shows it to be about 10% slower than -O1 for BBC Basic.
Given this, I see no reason to change the use of global register variables for esp/esi (r10/11).
Dave
The global register "problem" only happens on the Pico BBC Basic build, and then only with the -Os (optize for size) optimizer.
It highly likely this is a bug in the C compiler; the evidence (the generated code) is compelling, and I see no other plausible explanation.
There is, however, no reason to want to use -Os optimizer on the Pico; benchmarking shows it to be about 10% slower than -O1 for BBC Basic.
Given this, I see no reason to change the use of global register variables for esp/esi (r10/11).
Dave
Re: New: Console Mode editions of BBC BASIC
I agree but having limited test coverage hinders this. With the help of others, it looks like we (the community of which you are a valued member) are getting to the bottom of the register reuse issue.Richard Russell wrote: ↑Mon Nov 08, 2021 5:44 pm I argue that in the absence of explicit documentation it's acceptable to make empirical decisions based on whether they actually seem to work in practice in the real world.
I feel that it would be good to see if it only impacts item(), or if it is much wider spread.
Hopefully we can also make a nice small self-contained piece of code which the maintainers of GCC can use to fix the bug and in the meantime put warnings in the source where appropriate.
Regards,
Rob
Re: New: Console Mode editions of BBC BASIC
If you're able to cook up a test case and submit a bug to the gcc team, Rob, that would be excellent!
Re: New: Console Mode editions of BBC BASIC
I believe it's much wider spread: many functions seem to push/pop r10,r11,
Here's another example that matters: comma()
Code: Select all
void comma (void)
{
if (nxt () != ',')
error (5, NULL) ; // 'Missing ,'
esi++ ;
}
Code: Select all
10004210 <comma>:
10004210: b510 push {r4, r14}
10004212: 46de mov r14, r11
10004214: 4654 mov r4, r10
10004216: b510 push {r4, r14}
10004218: f7ff fb50 bl 100038bc <nxt>
1000421c: 282c cmp r0, #44 ; 0x2c
1000421e: d003 beq.n 10004228 <comma+0x18>
10004220: 2100 movs r1, #0
10004222: 2005 movs r0, #5
10004224: f7ff fb18 bl 10003858 <error>
10004228: 2301 movs r3, #1
1000422a: 449a add r10, r3 <<<<<<<<<<<< esi++
1000422c: bc18 pop {r3, r4}
1000422e: 46a3 mov r11, r4
10004230: 469a mov r10, r3
10004232: bd10 pop {r4, r15}
If I change the global register variables to normal global variables, then r10, r11 are no longer stacked, and are mostly not used.
So It seems the -Os optimizer is niavely looking at what callee-save registers (r4..r11) are used by the function, and is saving them on entry, and restoring them on exit. Which is precisely the wrong thing to do if they are global register variables.
Here's a minimal example (single file, test.c) that reproduces the problem:
Code: Select all
#include <stdio.h>
register char *esi asm ("r10");
void _exit(int i) {
while (1);
}
void comma (void)
{
esi++ ;
}
int main(int argc, char *argv[]) {
esi = argv[0];
comma();
return (int) esi;
}
Code: Select all
arm-none-eabi-gcc test.c -Os -mthumb -o test
Code: Select all
arm-none-eabi-objdump -M reg-names-raw -D test > test.dis
Code: Select all
000081da <comma>:
81da: 4653 mov r3, r10 ; Push r10 onto the stack
81dc: b408 push {r3}
81de: 2301 movs r3, #1 ; Increment r10
81e0: 449a add r10, r3
81e2: bc08 pop {r3} ; Pop r10 off the stack
81e4: 469a mov r10, r3
81e6: 4770 bx r14
Code: Select all
000081c2 <comma>:
81c2: 2301 movs r3, #1
81c4: 469c mov r12, r3
81c6: 44e2 add r10, r12
81c8: 4770 bx r14
Dave
Re: New: Console Mode editions of BBC BASIC
Thumb is the Work of the Devil (with apologies to Dave Jaggar!)
Miserable old curmudgeon who still likes a bit of an ARM wrestle now and then. Pi 4, 3, ARMX6, SA Risc PC, A540, A440
Re: New: Console Mode editions of BBC BASIC
I'm not sure at all where it should be reported....Richard Russell wrote: ↑Mon Nov 08, 2021 7:43 pm Do you know where this should be reported, and will you do it?
I'm wondering if any of the other contributors to this thread are better connected to the Arm/GCC folks.
Re: New: Console Mode editions of BBC BASIC
What version of GCC are you using. 9.2.1 creates the bugged code you quote but anything newer than 9.3 produces the following:
Code: Select all
comma:
add r10, r10, #1
bx lr
Re: New: Console Mode editions of BBC BASIC
OK, definitely using 10.3.1 (phew!) session log attached at the end.
Chris, it looks like you generated ARM code not THUMB code:
Code: Select all
comma:
add r10, r10, #1
bx lr
Try passing in the -mthumb flag.
Dave
Code: Select all
$ cat test.c
#include <stdio.h>
register char *esi asm ("r10");
void _exit(int i) {
while (1);
}
void comma (void)
{
esi++ ;
}
int main(int argc, char *argv[]) {
esi = argv[0];
comma();
return (int) esi;
}
$ arm-none-eabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/disk1/home/dmb/atom/PiTubeDirect/not_in_git/gcc-arm-none-eabi-10.3-2021.07/bin/../lib/gcc/arm-none-eabi/10.3.1/lto-wrapper
Target: arm-none-eabi
Configured with: /mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/install-native --libexecdir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/install-native/lib --infodir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/install-native/arm-none-eabi --build=x86_64-linux-gnu --host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/build-native/host-libs/usr --with-isl=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/build-native/host-libs/usr --with-libelf=/mnt/workspace/workspace/GCC-10-pipeline/jenkins-GCC-10-pipeline-260_20210727_1627371386/build-native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Arm Embedded Toolchain 10.3-2021.07' --with-multilib-list=rmprofile,aprofile
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.3.1 20210621 (release) (GNU Arm Embedded Toolchain 10.3-2021.07)
$ arm-none-eabi-gcc test.c -Os -mthumb -o test
$ arm-none-eabi-objdump -M reg-names-raw -D test > test.dis
$ grep -A8 "<comma>:" test.dis
000081da <comma>:
81da: 4653 mov r3, r10
81dc: b408 push {r3}
81de: 2301 movs r3, #1
81e0: 449a add r10, r3
81e2: bc08 pop {r3}
81e4: 469a mov r10, r3
81e6: 4770 bx r14
$ arm-none-eabi-gcc test.c -O1 -mthumb -o test
$ arm-none-eabi-objdump -M reg-names-raw -D test > test.dis
$ grep -A4 "<comma>:" test.dis
000081c2 <comma>:
81c2: 2301 movs r3, #1
81c4: 469c mov r12, r3
81c6: 44e2 add r10, r12
81c8: 4770 bx r14
Re: New: Console Mode editions of BBC BASIC
That may not be sufficient. I use Thumb code for my 32-bit Raspberry Pi build of BBCSDL and haven't seen the problem, so it may need the particular Embedded ABI subset (i.e. gcc-arm-none-eabi).
Re: New: Console Mode editions of BBC BASIC
adds r0,r0,#1 is a valid instruction on the Pico, add r0,r0,#1 isn't; neither is valid for r10.
Last edited by Deleted User 9295 on Mon Nov 08, 2021 9:18 pm, edited 1 time in total.
Re: New: Console Mode editions of BBC BASIC
Ahh yes... wrong compiler in Compiler Explorer. Sorry.
Now I get:
Code: Select all
comma:
mov r3, r10
push {r3}
movs r3, #1
add r10, r10, r3
pop {r3}
mov r10, r3
bx lr
edit: any of R0-R7 produce e.g.
Code: Select all
comma:
adds r7, r7, #1
bx lr
https://godbolt.org/z/eef7j3oTn
Re: New: Console Mode editions of BBC BASIC
I don't understand why the compiler thinks that adding 1 to r10, and then immediately restoring r10 to the value it had before the addition. is a sensible thing to do. It's almost as though the save-and-restore of r10 has been added as an afterthought, because r3 was also a poor choice of register to save it in. Pushing something onto the stack is surely quite unnecessary here.cmorley wrote: ↑Mon Nov 08, 2021 9:16 pmCode: Select all
comma: mov r3, r10 push {r3} movs r3, #1 add r10, r10, r3 pop {r3} mov r10, r3 bx lr
Re: New: Console Mode editions of BBC BASIC
What it has done is save/restore r10 around the *esi++ which clobbers r10. This would make sense if r10 is reserved for internal use.
Same thing happens if you just clobber r10. (with or without the global register declared)
https://godbolt.org/z/3Wsx1sano
edit: so it could be a bug in the thumb generator - save/restore r10 always - or it could be by design... gcc is open source... the back ends are pretty hard going but checkable.
Same thing happens if you just clobber r10. (with or without the global register declared)
Code: Select all
void comma(void)
{
asm volatile ( "adds r10,r10,#1" :::"r10");
}
edit: so it could be a bug in the thumb generator - save/restore r10 always - or it could be by design... gcc is open source... the back ends are pretty hard going but checkable.
Re: New: Console Mode editions of BBC BASIC
I have decided to delete all the Pico-specific stuff from my GitHub repository, and return it to the earlier state where the only versions that can be built from it are BBC BASIC for SDL 2.0 and the other Console Mode editions.
Therefore this arm-none-eabi compilation issue will cease to be my problem (thank goodness!) and if you want the code of the interpreter modified to work around it you'll need to take it up with the developers of the Pico version, Eric Olson and Memotech-Bill.
Re: New: Console Mode editions of BBC BASIC
Right I think I have found what it is.
With a core with partial thumb-2 support like the M0+ I think TARGET_THUMB2 is not true. So it defaults to thumb1 in many code paths where HI_REGs are forbidden or does different things. arm.c is 33,500 lines & arm.h is 2,500 lines so I am not going to dig deeper. A flag for register save is probably set wrong in a HI_REG path and with Os on.
This can be confirmed by changing the mcpu to something which does support full thumb2 like cortex-m3 or m7. r10 works fine then in the comma test case. (https://godbolt.org/z/f6rxaMv4b)
Solution for M0+ is to remove the register qualifier and replace with a normal extern global or simply avoid -Os. If the RP2040 has DTCM then create a linker memory section for DTCM and put it in there for lowest waitstate access. (edit: wikipedia says M0+ has no TCM)
edit:
If someone is going to submit a bug report then the following is enough in CE to invoke the behaviour.
With a core with partial thumb-2 support like the M0+ I think TARGET_THUMB2 is not true. So it defaults to thumb1 in many code paths where HI_REGs are forbidden or does different things. arm.c is 33,500 lines & arm.h is 2,500 lines so I am not going to dig deeper. A flag for register save is probably set wrong in a HI_REG path and with Os on.
This can be confirmed by changing the mcpu to something which does support full thumb2 like cortex-m3 or m7. r10 works fine then in the comma test case. (https://godbolt.org/z/f6rxaMv4b)
Solution for M0+ is to remove the register qualifier and replace with a normal extern global or simply avoid -Os. If the RP2040 has DTCM then create a linker memory section for DTCM and put it in there for lowest waitstate access. (edit: wikipedia says M0+ has no TCM)
edit:
If someone is going to submit a bug report then the following is enough in CE to invoke the behaviour.
Code: Select all
register char* esi asm("r10");
void comma(void)
{
esi++;
}
Re: New: Console Mode editions of BBC BASIC
I have updated the Console Mode editions of BBC BASIC to version 0.39. The changes in this version are:
- Fixed an off-by-one bug which resulted in a crash if the response to an INPUT LINE statement was a maximum length string (255 characters).
- Added a new example program patience.bbc which is a simple Patience (Solitaire) program, adapted from one first published in Practical Computing nearly 40 years ago! Note that it uses VT100 line-drawing characters, which may need to be enabled in some terminal emulators (particularly PuTTY). See the code for more information.
Re: New: Console Mode editions of BBC BASIC
Richard has updated the Console Mode editions of BBC BASIC to version 0.40. The changes in this version are:
* Fixed a bug in the handling of @% whereby setting the field width (LS byte) to zero didn't have the correct effect.
* Hexadecimal addresses passed to *commands are now assumed to be unsigned. This fixes a problem (on 32-bit editions only) when the address is in the top half of the memory map.
* Fixed a warning issued by GCC 11 related to the way the pseudo-random number prand is handled. This doesn't affect operation.
* Added a new example program solve.bbc which is a console version of the program of the same name supplied with the GUI editions (it solves simultaneous equations by inverting a matrix).
Version 0.40 may be downloaded from the usual place.
* Fixed a bug in the handling of @% whereby setting the field width (LS byte) to zero didn't have the correct effect.
* Hexadecimal addresses passed to *commands are now assumed to be unsigned. This fixes a problem (on 32-bit editions only) when the address is in the top half of the memory map.
* Fixed a warning issued by GCC 11 related to the way the pseudo-random number prand is handled. This doesn't affect operation.
* Added a new example program solve.bbc which is a console version of the program of the same name supplied with the GUI editions (it solves simultaneous equations by inverting a matrix).
Version 0.40 may be downloaded from the usual place.
Matrix Brandy BASIC VI (work in progress) The Distillery (another work in progress) Note Quiz (New educational software for the BBC and modern kit)
BBC Master 128, PiTubeDirect (Pi 3B), Pi1MHz, 5.25+3.5in dual floppy.
BBC Master 128, PiTubeDirect (Pi 3B), Pi1MHz, 5.25+3.5in dual floppy.