COMAL - some thoughts and a high version and disaasembly

bbc/electron apps, languages, utils, educational progs, demos + more
Coeus
Posts: 3557
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by Coeus »

As far as I can remember, Microsoft 6502 BASIC has an assemble-time option to use 4-byte floats (3-byte mantissa) or five-byte floats (4-byte mantissa) so any machine that ran a version of MS BASIC would be a possibility. The routines by Steve Wozniak at http://www.6502.org/source/floats/wozfp1.txt also look to be 4-byte floats so they would probably have ended up in something for the Apple 8-bit series.
Coeus
Posts: 3557
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by Coeus »

Following up on the speed discrepancy between Acornsoft COMAL and 6502 BBC BASIC, here is a BASIC program to calculate primes by trial division:

Code: Select all

   10 REM Primes by Trial Division
   20 N%=100
   30 F%=0
   40 C%=2
   50 DIM P%(N%)
   60 REPEAT
   70   F%=F%+1
   80   P%(F%)=C%
   90   PRINT C%
  100   C%=C%+1
  110   I%=1
  120   REPEAT
  130     IF C% MOD P%(I%)=0 THEN C%=C%+1:I%=0
  140     I%=I%+1
  150   UNTIL I% > F%
  160 UNTIL F% >= N%
  170 END
(indentation not part of the stored program) and here is a partial profile of the BASIC interpreter running it, i.e. which addresses are executed the most frequently:

Code: Select all

F:99F4 136861
F:99F5 136861
F:99F7 136861
F:99F9 136861
F:99FB 136861
F:99FD 136861
F:99FF 136861
F:98F3 75712
F:98F5 75712
F:98F7 75712
F:98F9 70304
F:98FA 70304
F:98FC 70304
F:ADEC 66922
F:ADEE 66922
F:ADF0 66922
F:ADF2 66922
F:ADF4 66922
F:9E24 55116
F:9E26 55116
F:9E28 55116
F:9E2A 55116
F:9E2C 55116
So the code being executed most frequently is this:

Code: Select all

    F:99F4: 88          DEY         
    F:99F5: F0 41       BEQ 9A38    
    F:99F7: 06 39       ASL 39      
    F:99F9: 26 3A       ROL 3A      
    F:99FB: 26 3B       ROL 3B      
    F:99FD: 26 3C       ROL 3C      
    F:99FF: 10 F3       BPL 99F4    
Given this is doing trial division, this looks like it could be part of a divide routine. Looking at the surrounding code gives this:

Code: Select all

    F:99E0: 84 3D       STY 3D      
    F:99E2: 84 3E       STY 3E      
    F:99E4: 84 3F       STY 3F      
    F:99E6: 84 40       STY 40      
    F:99E8: A5 2D       LDA 2D      
    F:99EA: 05 2A       ORA 2A      
    F:99EC: 05 2B       ORA 2B      
    F:99EE: 05 2C       ORA 2C      
    F:99F0: F0 B5       BEQ 99A7    
    F:99F2: A0 20       LDY #20     
    F:99F4: 88          DEY         
    F:99F5: F0 41       BEQ 9A38    
    F:99F7: 06 39       ASL 39      
    F:99F9: 26 3A       ROL 3A      
    F:99FB: 26 3B       ROL 3B      
    F:99FD: 26 3C       ROL 3C      
    F:99FF: 10 F3       BPL 99F4    
    F:9A01: 26 39       ROL 39      
    F:9A03: 26 3A       ROL 3A      
    F:9A05: 26 3B       ROL 3B      
    F:9A07: 26 3C       ROL 3C      
    F:9A09: 26 3D       ROL 3D      
    F:9A0B: 26 3E       ROL 3E      
    F:9A0D: 26 3F       ROL 3F    
    F:9A0F: 26 40       ROL 40      
    F:9A11: 38          SEC         
    F:9A12: A5 3D       LDA 3D      
    F:9A14: E5 2A       SBC 2A      
    F:9A16: 48          PHA         
    F:9A17: A5 3E       LDA 3E      
    F:9A19: E5 2B       SBC 2B      
    F:9A1B: 48          PHA         
    F:9A1C: A5 3F       LDA 3F      
    F:9A1E: E5 2C       SBC 2C      
    F:9A20: AA          TAX         
    F:9A21: A5 40       LDA 40      
    F:9A23: E5 2D       SBC 2D      
    F:9A25: 90 0C       BCC 9A33    
    F:9A27: 85 40       STA 40      
    F:9A29: 86 3F       STX 3F      
    F:9A2B: 68          PLA         
    F:9A2C: 85 3E       STA 3E      
    F:9A2E: 68          PLA         
    F:9A2F: 85 3D       STA 3D      
    F:9A31: B0 02       BCS 9A35    
    F:9A33: 68          PLA         
    F:9A34: 68          PLA         
    F:9A35: 88          DEY         
    F:9A36: D0 C9       BNE 9A01    
    F:9A38: 60          RTS 
which does indeed seem to be a divide routine - it is entered with Y=0 and there is code before that for handling the sign of the dividend and divisor.

So this is good - the BASIC interpreter is spending more time executing the code that does the work rather than interpreting the souce code. The second most commonly executed code is part of:

Code: Select all

    F:98F1: A4 0A       LDY 0A      
    F:98F3: B1 0B       LDA (0B),Y  
    F:98F5: C9 0D       CMP #0D     
    F:98F7: F0 09       BEQ 9902    
    F:98F9: C8          INY         
    F:98FA: C9 8B       CMP #8B     
    F:98FC: D0 F5       BNE 98F3    
which is searching the current line for CR (end-of-line) or ELSE, whichever comes first.

To be continued....
Last edited by Coeus on Fri Dec 23, 2022 2:11 pm, edited 2 times in total.
Coeus
Posts: 3557
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by Coeus »

Here is a roughly equivalent COMAL program:

Code: Select all

   10 t#:=TIME
   20 numprimes:=100
   30 DIM primes(numprimes)
   40 found:=0
   50 cnt:=2
   60 WHILE found < numprimes DO
   70   found:+1
   80   primes(found):=cnt
   90   PRINT cnt
  100   cnt:+1
  110   index:=1
  120   REPEAT
  130     IF cnt MOD primes(index) = 0 THEN
  140       cnt:+1
  150       index:=0
  160     END IF
  170     index:+1
  180   UNTIL index > found
  190 END WHILE
  200 t#:=TIME -t#
  210 PRINT "Completed in ";t#/100;"s"
Here's a partial profile of this executing:

Code: Select all

8993 448790
8990 362047
8991 362047
B309 288504
B30B 288504
B30D 288504
B30F 288504
896B 271040
896D 271040
896F 263994
8971 263994
B311 253083
B313 253083
B314 253083
B315 253083
B316 253083
B308 240419
897D 177749
8980 177749
AD1D 177747
AD1E 177747
AD20 177747
AD23 177747
ACD4 177715
ACD5 177715
ACD7 177715
ACD9 177715
ACEC 146758
ACEE 146758
9F5A 142088
9F5C 142088
9F5D 142088
9F5E 142088
ACF0 140805
ACF2 140805
96BA 136861
96BB 136861
96BD 136861
96BF 136861
The most executed instruction, though not by quite the same margin, is:

Code: Select all

    8993: 60          RTS         
The code around this is:

Code: Select all

    896A: 8A          TXA         
    896B: C9 7B       CMP #7B     
    896D: B0 23       BCS 8992    
    896F: C9 61       CMP #61     
    8971: B0 1D       BCS 8990    
    8973: C9 5B       CMP #5B     
    8975: B0 1B       BCS 8992    
    8977: C9 41       CMP #41     
    8979: B0 15       BCS 8990    
    897B: 90 15       BCC 8992    
    897D: 20 6B 89    JSR 896B    
    8980: 90 0E       BCC 8990    
    8982: C9 5F       CMP #5F     
    8984: F0 0A       BEQ 8990    
    8986: C9 3A       CMP #3A     
    8988: B0 08       BCS 8992    
    898A: C9 30       CMP #30     
    898C: B0 02       BCS 8990    
    898E: 90 02       BCC 8992    
    8990: 18          CLC         
    8991: 24 38       BIT 38      
    8993: 60          RTS         
    8994: 20 86 89    JSR 8986    
    8997: 90 F7       BCC 8990    
    8999: C9 61       CMP #61     
    899B: 90 08       BCC 89A5    
    899D: C9 67       CMP #67     
    899F: B0 F1       BCS 8992    
    89A1: 29 DF       AND #DF     
    89A3: 18          CLC         
    89A7: 90 E9       BCC 8992    
    89A9: C9 47       CMP #47     
    89AB: B0 E5       BCS 8992    
    89AD: 18          CLC         
    89AE: 60          RTS         
which seems to be all about classifying, and possibly manipulating, the characters of the program text. The second most executed instructions are part of this routine:

Code: Select all

    B305: E6 15       INC 15      
    B307: 24 48       BIT 48      
    B309: A4 15       LDY 15      
    B30B: B1 13       LDA (13),Y  
    B30D: C9 20       CMP #20     
    B30F: F0 F4       BEQ B305    
    B311: 85 6A       STA 6A      
    B313: AA          TAX         
    B314: 68          PLA         
    B315: 18          CLC         
    B316: 60          RTS         
which gets the next non-space character of program text and returns it in X. The third entry is back to the character classification routine, the fourth is back to "get the next non-space character". We have to go as far as entry number 10 in the set of most commonly executed instructions to land in the divide routine:

Code: Select all

    96BA: 88          DEY         
    96BB: F0 41       BEQ 96FE    
    96BD: 06 52       ASL 52      
    96BF: 26 53       ROL 53      
    96C1: 26 54       ROL 54      
    96C3: 26 55       ROL 55      
    96C5: 10 F3       BPL 96BA    
    96C7: 26 52       ROL 52      
    96C9: 26 53       ROL 53      
    96CB: 26 54       ROL 54      
    96CD: 26 55       ROL 55      
    96CF: 26 56       ROL 56      
which looks like it may be exactly the same one as in BASIC.
Coeus
Posts: 3557
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by Coeus »

Ok, so I looked back over those two posts and noticed that the BASIC one is using the resident integer variables. It is no surprise if COMAL is spending more time scanning the program text if there is more of it because the variable names are longer. So here is a version in COMAL that looks much more like the BASIC version:

Code: Select all

   10 t#:=TIME
   20 n#:=100
   30 DIM p#(n#)
   40 f#:=0
   50 c#:=2
   60 WHILE f#<n# DO
   70   f#:+1
   80   p#(f#):=c#
   90   PRINT c#
  100   c#:+1
  110   i#:=1
  120   REPEAT
  130     IF c# MOD p#(i#)=0 THEN
  140       c#:+1
  150       i#:=0
  160     END IF
  170     i#:+1
  180   UNTIL i#>f#
  190 END WHILE
  200 t#:=TIME -t#
  210 PRINT "Completed in ";t#/100;"s"
This one runs in 42.49s compared to the similarly optimised BASIC version which runs in 23.26 seconds. Here's the partial profile of the optimised COMAL version:

Code: Select all

B309 278195
B30B 278195
B30D 278195
B30F 278195
B311 266388
B313 266388
B314 266388
B315 266388
B316 266388
B308 222763
8993 167134
B27C 145072
B27E 145072
B281 145072
B282 145072
96BA 136861
96BB 136861
96BD 136861
96BF 136861
96C1 136861
96C3 136861
96C5 136861
So now getting the next non-space character is in 1st place and the divide routine is in sixth place.
User avatar
scruss
Posts: 653
Joined: Sun Jul 01, 2018 4:12 pm
Location: Toronto
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by scruss »

That's some intriguingly bizarre behaviour you've uncovered. Wish I had the same know-how to compare on Vice and the C64 COMAL cartridge.

The fact that the interpreter is spending so long scanning the program text is surprising. I would have thought that the syntax check on line entry would be a good place to tokenize the program
Coeus
Posts: 3557
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by Coeus »

scruss wrote: Fri Dec 23, 2022 6:41 pm That's some intriguingly bizarre behaviour you've uncovered. Wish I had the same know-how to compare on Vice and the C64 COMAL cartridge.
I am running the Acornsoft COMAL from within B-Em, one of the emulators. One feature of the B-Em debugger is that it can do simple profiling. All it is really doing is counting how many times an instruction opcode is fetched from each of a range of addresses. When the task being profiled is done, it can write the data to a file and will sort based on number of fetches, i.e. the most used addresses are at the start of the list.

After than I am just disassembling the surrounding code to see what that is about. A more useful tool would be one that also counts where execution had come from. For example, if we have a small subroutine to fetch the next non-space character, that could be called from many different places in the interpreter but the simple profiling will simply show that this routine is the hotspot as it is only counting total times executed.

It maybe the case that BASIC avoids having a hotspot routine like that by often inlining it, especially in cases where it doesn't need to do any other testing on the character fetched. An obvious example would be comparing the name of a variable in the program text with the name of a variable in the heap. If the interpreter has been diligent about not letting punctuation, tokens and end of line markers into the variable name as stored in the heap, all it needs to do when looking up a variable is compare the program text with the heap version. If the variable name in the program text finishes before the one being compared to in the heap the comparison will simply fail and it will move on to trying the next variable.
scruss wrote: Fri Dec 23, 2022 6:41 pm The fact that the interpreter is spending so long scanning the program text is surprising. I would have thought that the syntax check on line entry would be a good place to tokenize the program
The text of keywords in Acornsoft COMAL definitely is converted into single-byte token values and the token value stored, just like BASIC does. If you remember this is how the Acornsoft version has END IF and END WHILE with the space as these are being converted into two tokens and the tokeniser insists on a space or other marker after a token, i.e. not an alphabetical character that starts a new token.

Another possibility is scanning forward for ELSE or END IF taking rather too long. BASIC doesn't allow multi-line IF statements so, upon evaluating an IF condition and fining it false, it has a very tight loop that looks for the ELSE token or end-of-line (CR). It has been discussed on here before that BASIC even uses a strange encoding of the line numbers that follow the GOTO and GOSUB tokens to make sure the ELSE and CR tokens cannot occur in the middle of a line number to keep this search for ELSE really tight. BASIC also doesn't have WHILE..END WHILE so avoids another reason to search forward in the program text. For the loops it does offer, BASIC uses a stack to remember the position of the opening keyword (FOR, REPEAT) and so doesn't have to search backwards for that either when it hits the NEXT or UNTIL.

I would be interesting to know what the C64 version does differently. Do we have a binary archived anywhere or even a disassembly? Also, how big is it? BASIC is a simpler language and the version on the BBC Micro has a whole 16K ROM. COMAL has to fit a more complex language into the same space.
User avatar
scruss
Posts: 653
Joined: Sun Jul 01, 2018 4:12 pm
Location: Toronto
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by scruss »

Coeus wrote: Sat Dec 24, 2022 7:21 pm I would be interesting to know what the C64 version does differently. Do we have a binary archived anywhere or even a disassembly? Also, how big is it? BASIC is a simpler language and the version on the BBC Micro has a whole 16K ROM. COMAL has to fit a more complex language into the same space.
The COMAL cartridge for C64 is 4 banked 16K ROMs. The version I use is this one: Comal 80 (1985)(Commodore)[a].crt
julie_m
Posts: 587
Joined: Wed Jul 24, 2019 9:53 pm
Location: Derby, UK
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by julie_m »

This is one of the advantages of compiling to intermediate code! If done right, you get to offload all the slowest bits -- checking for syntax errors, parsing expressions, looking up variable names, and so forth -- to the compiler, and then the runtime interpreter only has to deal with binary representations of numbers and addresses of variables.

(Another advantage is that once you have a runtime interpreter for one architecture, you can rewrite the compiler in the intermediate language, and then you only need to port the runtime interpreter to any architecture you wish to support .....)
Coeus
Posts: 3557
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by Coeus »

scruss wrote: Sun Dec 25, 2022 1:26 am The COMAL cartridge for C64 is 4 banked 16K ROMs...
That certainly seems like it might be a factor. COMAL being a more complicated language, implementing it in the same 16K as BASIC may have caused trade-offs to be made that save space but reduce speed. Having 64K to use means the opposite can be done.

It is probably worth noting that some of the Acornsoft languages also don't fit into a single ROM. Pascal is supplied as two 16K ROMs, one of which contains an editor, command line environment and intermediate code interpreter, the other contains the compiler as intermediate code. LOGO Is supplied as two ROMs. BCPL has only one ROM but that only contains the command line environment, intermediate code interpreter and run-time library. Both the editor and compiler are supplied as files on disc. In the case of the compiler, there is a driver program and a number of compiler passes.

If one wished to make the Acornsoft COMAL faster it would almost certainly y be necessary to split it into two ROMs. An obvious split would be one which contains an editor that works rather like the Advanced BASIC Editor, in that the editor knows the format in which the program is stored in memory so each line is parsed as soon as you have finished editing it. That could print error messages and get the user to correct the line or abandon the edit to that line. With a little more space it could also correct the ENDIF and ENDWHILE incompatibility. The other ROM could be just the interpreter, with the ability to load files from disc so it could be used without the editor to run already written programs.
Last edited by Coeus on Sun Dec 25, 2022 10:06 pm, edited 2 times in total.
Coeus
Posts: 3557
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by Coeus »

julie_m wrote: Sun Dec 25, 2022 6:52 pm This is one of the advantages of compiling to intermediate code! If done right, you get to offload all the slowest bits -- checking for syntax errors, parsing expressions, looking up variable names, and so forth -- to the compiler, and then the runtime interpreter only has to deal with binary representations of numbers and addresses of variables.
Yes, dividing languages into compiled vs. interpreted is probably too simplistic. One of the constraints on languages that work the way BBC BASIC and the Acornsoft COMAL do is that the program text as held in memory is the version that gets listed and edited without a separate source. That limits the ways in which the program text can be transformed. As an example, in BASIC:

T%=TIME+60+60+100

cannot be stored as T%=TIME+(binary 360000), i.e. with 360000 stored in binary, as that means the original version cannot be listed. With this way of working, if the text is to be semi-compiled upon entry, the intermediate code version has to be slotted into the text, using special tokens, so that the original text remains for listing and editing but the intermediate code version is what actually executes.

Languages such as Perl, while still interpreted, don't have the same problem as the program is edited as standard text file and then the interpreter is invoked separately. As an approximation, what Perl does is to build the syntax tree in memory, as a compiler would, but then rather than generating code interprets the syntax tree. Because this syntax tree is never stored in a file it doesn't have to be capable of regenerating the original text and the interpreter can change the internal representation from one version to the next. Recent versions of Perl, interestingly, seem to be doing more semantic analysis before starting execution than it has historically done.

BASIC running on old mainframes would also not have had this limitation as the original text would have been on punched cards or paper tape so again the interpreter can transform the text for storage in RAM any way it likes.
julie_m wrote: Sun Dec 25, 2022 6:52 pm (Another advantage is that once you have a runtime interpreter for one architecture, you can rewrite the compiler in the intermediate language, and then you only need to port the runtime interpreter to any architecture you wish to support .....)
Yes, for most languages that are supposed to be reasonable system programming languages, the compiler is written in the language itself. This is true of C, C++, BCPL, Pascal and Java as far as I know. BCPL, in particular, uses an intermediate code called O-code which is then further transformed into either native code or a second intermediate code that is executed by an interpreter (virtual machine). In the case of micros the second code is called CINTCODE. So, in porting BCPL from CP/M to the BBC Micro what would have been needed was to create a CINTCODE interpreter for the 6502 and then port the standard library. A compiler, compiled into CINTCODE, would run without changes.
julie_m
Posts: 587
Joined: Wed Jul 24, 2019 9:53 pm
Location: Derby, UK
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by julie_m »

Coeus wrote: Sun Dec 25, 2022 9:32 pmAs an example, in BASIC:

T%=TIME+60*60*100

cannot be stored as T%=TIME+(binary 360000), i.e. with 360000 stored in binary, as that means the original version cannot be listed.
I suppose you could store each of the 60s and the 100 in a binary representation; possibly even rearranged somehow like "var&0450 read_TIME decimal_constant60 decimal_constant60 mult decimal_constant100 mult add setvar". If you stored that instead of the original, you would add complexity to the lister, which would now have to be able to recreate the expression in human-readable form (even then, you might not recreate the original spacing faithfully, and extraneous brackets would disappear by themselves, but that's probably tolerable); if you stored it as well as the original, you would be taking up more space. It's yet another trade-off .....
Coeus wrote: Sun Dec 25, 2022 9:32 pmBASIC running on old mainframes would also not have had this limitation as the original text would have been on punched cards or paper tape so again the interpreter can transform the text for storage in RAM any way it likes.
Yes -- if an interpreter did not need to be able to run interactively, it could store programs in the most machine-friendly way possible.

I don't suppose there's any reason why multiple different high-level languages couldn't be compiled to the same intermediate code .....
User avatar
scruss
Posts: 653
Joined: Sun Jul 01, 2018 4:12 pm
Location: Toronto
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by scruss »

julie_m wrote: Sun Dec 25, 2022 11:23 pm Yes -- if an interpreter did not need to be able to run interactively, it could store programs in the most machine-friendly way possible.
The way the original Dartmouth interpreter ran seems bizarre today: the interpreter and interactive front-end ran on a small terminal server machine. This machine converted user commands and numbered lines into compiled batch jobs that it submitted to a connected mainframe. The terminal server polled all of the connected teletypes for input, and sent the various lines of output from the mainframe to the correct terminal. So neither computer held the complete state of any program. Several pre-microcomputer BASICs (from HP, DEC and others) kept this two computer architecture going for quite some time.
I don't suppose there's any reason why multiple different high-level languages couldn't be compiled to the same intermediate code .....
For a while, there were other languages (apart from Pascal) that would run on the UCSD p-System. Apple's FORTRAN compiler for the Apple II produced p-code. It probably wouldn't set any records for execution speed, though.
Coeus
Posts: 3557
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by Coeus »

scruss wrote: Thu Dec 29, 2022 3:04 am The way the original Dartmouth interpreter ran seems bizarre today: the interpreter and interactive front-end ran on a small terminal server machine. This machine converted user commands and numbered lines into compiled batch jobs that it submitted to a connected mainframe. The terminal server polled all of the connected teletypes for input....
That's interesting and reminds me of the kind of hierarchical way IBM mainframes worked with terminals. Part of that seemed to be an effort to keep interrupts down on the main processor as most I/O devices seemed to be quite smart, carrying out reasonably big operations. It would also have worked well when remote offices shared a central mainframe. The interface between the mainframe and a terminal worked at the block level where a block was either a complete screen or the set of changed fields. I am not sure that intelligence was actually in the terminals, though, at least not at the start - there was some kind of terminal controller. But the latency-critical communication would be between the terminal and the controller and the controller would be local to the terminals, so most people wouldn't notice if the mainframe itself was at the end of a phone line.
scruss wrote: Thu Dec 29, 2022 3:04 am
I don't suppose there's any reason why multiple different high-level languages couldn't be compiled to the same intermediate code .....
For a while, there were other languages (apart from Pascal) that would run on the UCSD p-System. Apple's FORTRAN compiler for the Apple II produced p-code. It probably wouldn't set any records for execution speed, though.
FORTRAN Is an interesting choice as that is usually compiled to native code and there has been some competition to keep it fast.

The Java VM is also the target of some non-Java languages. Isn't there also a common intermediate code for the Microsoft .NET system, i.e. languages such as C# and F#. I also think there was an initiative to do the same for some of the open source interpreted languages, such as Perl and Python and maybe others, but I have not heard anything of it recently.
User avatar
scruss
Posts: 653
Joined: Sun Jul 01, 2018 4:12 pm
Location: Toronto
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by scruss »

Yes, Jython (Python embeddable inside Java) was was a thing for a while, but it hasn't seen much love over the last few years and is stuck on Python 2. This might be related to the decline in popularity and support for the gcj (GCC targetting JVM) compiler. WASM in the browser seems to be a more popular target: Richard Russell's BBC BASIC has a web-compiled version, and MicroPython* recently got a WASM port.

---
*: small embedded subset of Python. I use it a lot with small development boards: https://micropython.org/
User avatar
lushprojects
Posts: 148
Joined: Mon Jan 18, 2021 4:02 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by lushprojects »

Sort-of related - I found this while browsing PCW the other day:
"It seems that before the decision on Basic was finalised, some lively debates took place as to the ‘best’ language to use on the BBC machine, with Pascal and Comal devotees being notably anxious to push their languages on the theory that if the public was to be taught programming, it should be taught tidy, academically-satisfying structured programming from the start."

(https://archive.org/details/PersonalCom ... 2/mode/2up)

It would be interesting to see how the beeb would have done if Pascal or Comal were the default language.
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by BigEd »

That's a good find - thanks for the link. I think it's worth quoting a few surrounding paragraphs:
BBC Basic

While a fairly public controversy raged
around the BBC’s choice of Acorn as
the manufacturer of its microcomputer,
a quieter stir was caused by the decision
to make Basic the machine’s ‘natural’
language. For entirely inexplicable
reasons, programming languages arouse
strong emotions in their devotees’
hearts. Each language attracts its band
of followers and it sometimes seems
that the more obscure or difficult or
awkwardly-syntaxed the language, the
more fanatical its proponents.

Basic in particular seems to anger
more people than just about any other
aspect of microcomputing, yet it has
helped thousands of newcomers to get
to grips with their machines, which
would certainly not be the case were
Pascal, say or APL the most commonly-
implemented languages on micros.

It seems that before the decision on
Basic was finalised, some lively debates
took place as to the ‘best’ language to
use on the BBC machine, with Pascal
and Comal devotees being notably
anxious to push their languages on the
theory that if the public was to be
taught programming, it should be taught
tidy, academically-satisfying structured
programming from the start. Although
other languages are planned for the BBC
Micro, it was Basic which won in the
end, probably because it is so easy to
learn, although the fact that a Basic
was under development for the Proton
before it became the BBC Micro must
have been an important factor. Once
the Acorn and Basic decisions had been
made, a further frisson circulated when
people remembered what the Basic was
like on the Acorn Atom; would the BBC
Micro have the same very non-standard
Basic, people wondered?
[...]
For the BBC machine, the same team
which developed Atom Basic evolved
a far more standard implementation of
the language which makes the conver¬
sion to or from another machine (both
of user and of programs) much easier,
while retaining some of Atom Basic’s
more elegant features.
paulb
Posts: 1766
Joined: Mon Jan 20, 2014 9:02 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by paulb »

Coeus wrote: Thu Dec 29, 2022 3:18 pm The Java VM is also the target of some non-Java languages. Isn't there also a common intermediate code for the Microsoft .NET system, i.e. languages such as C# and F#. I also think there was an initiative to do the same for some of the open source interpreted languages, such as Perl and Python and maybe others, but I have not heard anything of it recently.
As already noted, the JVM was targeted by a few non-Java languages, notably Python as JPython (later renamed to Jython) followed by some existing languages such as Ruby and also some more or less original ones like Groovy, Scala and Clojure. Sun did support Jython development for a while, but they actively bet on Ruby and Groovy instead for some absurd reason. Absurd because Groovy in particular was notoriously unstable in terms of the language features and behaviour at one point. These days, Oracle seems to be promoting their Graal VM as a multi-language environment, but nobody in their right mind should rely on Oracle for anything.

Microsoft's rival to the JVM is/was the Common Language Runtime (CLR). Again, there was a Python implementation called IronPython which was done by the same guy that did JPython/Jython after someone else had done a pilot project and concluded that running Python on the CLR wasn't viable. I guess that there are other languages that were made to work with it, but I don't pay attention to Microsoft's product selection for similar reasons to those applying to Oracle.

There is/was a project to do a runtime for dynamic languages such as Perl and Python which took the name Parrot, to acknowledge a prior prank/joke made on the topic that suggested that such a project was something of a pipedream, itself taking the name of Parrot to presumably acknowledge the Monty Python "dead parrot" sketch. (I don't remember the whole story now.) Although there have been some attempts to target this runtime, I don't think it has been a viable platform for implementations. Personally, I don't think the people designing the bytecode had enough knowledge or experience about the issues.
Coeus
Posts: 3557
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by Coeus »

lushprojects wrote: Sun Jan 22, 2023 8:31 pm It would be interesting to see how the beeb would have done if Pascal or Comal were the default language.
I wonder if the decisions were as sequential in nature as the larger fragment quoted by BigEd suggests. There was some discussion in the thread about the Dragon 32 whose manufacturers were, like Sinclair, unhappy that the contract for the BBC Micro had gone to Acorn. Two things in particular were mentioned along the way: standards compliance and the importance of a reliable system of using cassette storage. Part of standards compliance was a preference for a machine that could run CP/M, so as to be able to run the large library of software for CP/M, but this was incompatible with the desire to be able to use cassette tape as the assumption of using floppy discs was integral to CP/M. That's not to say the same machine couldn't do both, just not at the same time and, when Research Machines finally realeased their equivalent to the BBC micro, it was indeed a machine that could boot a variant of CP/M called CP/NOS from a network or work stand-alone as a tape machine with a very simple tape OS/Monitor and BASIC in ROM.

Another part of that standards compliance was a desire for the included BASIC to be Microsoft compatible, i.e. that programs written in Microsoft BASIC would run on the BBC Micro. The version of BBC BASIC delivered may have mostly achieved that, though there are definitely some edge cases we have found since, such as using GOTO in combination with loops, including FOR...NEXT. As such I wonder how much serious thought was given to shipping a default language other than BASIC.

I wasn't there and Richard Russell has left this forum. We know the government was concerned about a lack of awareness of how microprocessors and microcomputers would affect people's life and work and we know they wanted the BBC to do something to educate the public. What I don't know is whether people from government were also involved with the BBC in drawing up the spec for the BBC micro. Were there people from various different places with different priorities? It does seem likely that there were some keen on compatibility with MS BASIC while others were concerned to encourage people to adopt structured programming despite these seeming to pull in different directions and BBC BASIC was the compromise between the two.

Personally, considering the language itself and not the lack of speed of the Acornsoft implementation, I would have been quite happy to have COMAL as the default language. To me, it retains a lot of the ease of getting started that BASIC has while offering better support for structured programming. The documentation of the Acornsoft version is also good.

For me, Pascal would not have been as suitable. Being a compiled language, it lacks the immediacy of an interpreted language like BASIC or COMAL and is particularly fussy about syntax, for example objecting an extra semicolon before an 'end' rather than treating it as a statement that does nothing. But the biggest difficulty of all is the inablility, at least the in the level 0 implementation, to do anything with strings. Pascal doesn't have a string type, instread it has arrays of characters. When these are passed around they are type checked, including the length, which makes writing functions or procedures to process pieces of text or varying length impossible. This issue should be solved, in a level one implementation, by conformant arrays.
OracPrime
Posts: 1
Joined: Tue Mar 12, 2024 1:16 am
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by OracPrime »

Coeus wrote: Sun Dec 25, 2022 9:06 pm
scruss wrote: Sun Dec 25, 2022 1:26 am The COMAL cartridge for C64 is 4 banked 16K ROMs...
That certainly seems like it might be a factor. COMAL being a more complicated language, implementing it in the same 16K as BASIC may have caused trade-offs to be made that save space but reduce speed. Having 64K to use means the opposite can be done.

It is probably worth noting that some of the Acornsoft languages also don't fit into a single ROM. Pascal is supplied as two 16K ROMs, one of which contains an editor, command line environment and intermediate code interpreter, the other contains the compiler as intermediate code. LOGO Is supplied as two ROMs. BCPL has only one ROM but that only contains the command line environment, intermediate code interpreter and run-time library. Both the editor and compiler are supplied as files on disc. In the case of the compiler, there is a driver program and a number of compiler passes.

If one wished to make the Acornsoft COMAL faster it would almost certainly y be necessary to split it into two ROMs. An obvious split would be one which contains an editor that works rather like the Advanced BASIC Editor, in that the editor knows the format in which the program is stored in memory so each line is parsed as soon as you have finished editing it. That could print error messages and get the user to correct the line or abandon the edit to that line. With a little more space it could also correct the ENDIF and ENDWHILE incompatibility. The other ROM could be just the interpreter, with the ability to load files from disc so it could be used without the editor to run already written programs.
As the lead developer of Acornsoft COMAL I can confirm that space was indeed a problem. The initial working version was 23K and we had to do an awful lot of performance-impairing trade-offs to fit it in the ROM. The release version used 16381 of 16384 available bytes!

David Christensen
(apologies for the late reply, only just stumbled across this forum)
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: COMAL - some thoughts and a high version and disaasembly

Post by BigEd »

Welcome David - I bet there's a lot we could hear from you!
Post Reply

Return to “8-bit acorn software: other”