Options for embedding BBC BASIC code in URL

for discussion of bbc basic for windows/sdl, brandy and more
Post Reply
Deleted User 9295

Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

I expect this has been discussed before, but a quick search didn't find anything. I'm thinking of extending the in-browser edition of BBC BASIC for SDL 2.0 so that it will accept a (relatively short) BBC BASIC program embedded in the URL. The question is, what encoding format should it use?

It could accept the same format as jsbeeb, i.e. a URI-encoded plain-text program, but that seems quite inefficient. Or it could accept a URI-encoded tokenised program, or perhaps a base64-encoded tokenised program. Or even a base64-encoded zipped file (not sure how I'd unzip it though - it would have to be done in BASIC!).

Any thoughts?
User avatar
jgharston
Posts: 5319
Joined: Thu Sep 24, 2009 12:22 pm
Location: Whitby/Sheffield
Contact:

Re: Options for embedding BBC BASIC code in URL

Post by jgharston »

Richard Russell wrote: Wed Dec 30, 2020 12:14 am I expect this has been discussed before, but a quick search didn't find anything. I'm thinking of extending the in-browser edition of BBC BASIC for SDL 2.0 so that it will accept a (relatively short) BBC BASIC program embedded in the URL. The question is, what encoding format should it use?
Unzipping uncompressed data is fairly simple, skip the header, solid block of data.

I think having an option to be able to feed in a pre-exiting format - the jsbeeb format - would be useful.

Thinking further along the lines of not re-inventing rotational transport devices, how about the telesoftware format, which is essentially the GSTRANS format. So, PRINT "Hello"<cr> would be |!q "Hello"|M. URL character entities would have to be escaped, so I think that comes to |!q%20%22Hello%22|M.

Checking Wiki, RFC1738 specifies that binary data should be passed with all non-AZaz09 percent encoded*, so it would be %f1%20%22Hello%22%0d.

*plus a few others, but AZaz09 is the simplest universal set.

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.45
(C) Copyright J.G.Harston 1989,2005-2024
>_
Deleted User 9295

Re: Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

jgharston wrote: Wed Dec 30, 2020 12:49 ambinary data should be passed with all non-AZaz09 percent encoded*, so it would be %f1%20%22Hello%22%0d.
That's more attractive than plain text, but it still feels a little wasteful not to use any kind of lossless compression (other than tokenising). Of course it may be that the length of program that could practically be sent this way is so limited that there's not much opportunity for efficient compression anyway.
Deleted User 9295

Re: Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

Here are some comparisons for the trivial program:

Code: Select all

PRINT "Hello world!"
Plain text (CR-terminated), URI-encoded, 33 characters:

Code: Select all

PRINT%20%22Hello%20world%21%22%0D 
Tokenised, URI-encoded, 31 characters:

Code: Select all

%F1%20%22Hello%20world%21%22%0D 
Tokenised, URL-safe base64-encoded, 23 characters:

Code: Select all

8SAiSGVsbG8gd29ybGQhIg0
In the case of this particular example base64 wins. Trying it with a 'real' tokenised program chosen at random, URI-encoding gave 2274 characters and base64-encoding 1347 characters, which seems like a real win. Are there any downsides?
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Options for embedding BBC BASIC code in URL

Post by BigEd »

Hmm, I'm swimming against the current here, but I'd say there's no harm or cost in long URLs, so keeping it simple would be a win, as would following jsbeeb's example.
User avatar
lurkio
Posts: 4351
Joined: Wed Apr 10, 2013 12:30 am
Location: Doomawangara
Contact:

Re: Options for embedding BBC BASIC code in URL

Post by lurkio »

BigEd wrote: Wed Dec 30, 2020 12:17 pm Hmm, I'm swimming against the current here, but I'd say there's no harm or cost in long URLs, so keeping it simple would be a win, as would following jsbeeb's example.
<coward>Phew! I'm glad someone else jumped in first!</coward>

I have to agree. The option to pass the program as URI-encoded but uncompressed would be very handy and easier to use and would therefore probably get more use.

An uncompressed parameter would also be easier to add as an option to tools like the Beeb Link Console, which has already come in handy for generating links to BBC BASIC programs that run in JSBeeb. (I've been in contact with the developer of the Beeb Link Console and I could probably persuade him to add a BBC SDL option.)

:idea:
Deleted User 9295

Re: Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

BigEd wrote: Wed Dec 30, 2020 12:17 pmI'd say there's no harm or cost in long URLs
What about the practical limit of 2000 characters? Some browsers will accept more, but the advice there is that even in 2020 "staying under 2000 chars is the best general policy". If base64 encoding will fit your program in under 2000 characters but URI encoding won't, isn't that a valid argument?

And I don't really see why URI-encoding is any 'easier'. Either way you're going to need some software tool or online resource to do the encoding, and having coded both in BBC BASIC today for the purposes of the comparison there's not a great deal to choose in terms of complexity (URI: 5 lines of BASIC, base64: 8 lines).

As far as I know base64-encoding is still the standard for sending binary data in emails, and RFC4648 specifically documents the URL-clean variant, so why not use it? I'm rapidly convincing myself it's the right way to go.
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Options for embedding BBC BASIC code in URL

Post by BigEd »

I've posted some pretty long URLs... just over 2k that time. But of course you should proceed as you see fit.
Deleted User 9295

Re: Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

BigEd wrote: Wed Dec 30, 2020 1:23 pm I've posted some pretty long URLs... just over 2k that time. But of course you should proceed as you see fit.
Even if the longer URL works, it still feels wasteful to me to send more data than is necessary over the internet, store it on servers etc. OK we're talking about tiny amounts of data compared to the overall traffic, so it's much more an emotional thing than having any practical significance. But every little helps!

I have ascertained that there's a standard Javascript function btoa() for converting binary to base64 (although conceivably it might require a couple of character substitutions to convert the result to the URL-clean variety), which might be handy should somebody want to provide an online conversion tool.
Deleted User 9295

Re: Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

I've gone ahead and implemented the base64-approach experimentally, at least to see how it behaves; apologies to those who don't approve. Here's a link you can try if you have a suitable browser (most desktop browsers except IE or Safari, or Chrome for Android with chrome://flags/#enable-webassembly-threads): long URL (2228 characters).

The syntax I'm using for the URL is ?embed= followed by the base64url-encoded tokenised program (according to RFC4648 section 5), with optional padding. The short program I've chosen for the demonstration is Fractal Pyramid, adapted for BBC BASIC for SDL 2.0.
Last edited by Deleted User 9295 on Wed Dec 30, 2020 6:46 pm, edited 2 times in total.
guesser
Posts: 708
Joined: Mon Jun 26, 2006 10:21 pm
Contact:

Re: Options for embedding BBC BASIC code in URL

Post by guesser »

In terms of browser compatibility huge URLs is pretty much a non issue. It was Internet Explorer that was the problem, and that's not worth trying to support in new projects any more, particularly ones using shiny new web technologies that IE can't do.

Web server and search engine URL length limits could still be an issue. One option is to do what the web based teletext editors do, and don't put your giant lump of data in the request at all. The teletext editors put all the data in the URI fragment, which is entirely decoded by the client and not sent as part of the request when things are configured correctly (so doesn't appear in server logs etc.). Whether that is a good or bad thing depends on your outlook/use case :)
Various teletext things including a web based teletext editor which can export as mode 7 screens.
Join the Teletext Discord for teletext chat.
Deleted User 9295

Re: Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

guesser wrote: Wed Dec 30, 2020 6:24 pmWeb server and search engine URL length limits could still be an issue.
Remember that I'm providing this option in addition to the existing methods that BBC BASIC for SDL 2.0 uses to fetch code or data, i.e. the ?load= and ?chain= URL parameters. They should usually be used in preference, but the ?embed= approach provides an alternative, when the program is short.

Here's the full set of URL parameters that I currently support:

?chain=<remote file URL>
?load=<remote file URL>
?run=<local filename>
?dir=<local directory>
?embed=<base64url-encoded program>
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Options for embedding BBC BASIC code in URL

Post by BigEd »

(Hmm, thinking of portability and transparency, and longevity too, I'd much prefer the Basic in question - however encoded - not to be tokenised, but to be plain text. However, I don't wish to be argumentative, and would rather see the ability to put programs into URLs than not be able to.)
User avatar
sweh
Posts: 3314
Joined: Sat Mar 10, 2012 12:05 pm
Location: 07410 New Jersey
Contact:

Re: Options for embedding BBC BASIC code in URL

Post by sweh »

FWIW, this may help with practical limits: https://stackoverflow.com/a/812962/6569796

Hmm https://stackoverflow.com/a/417184/6569796 also indicates some CDN networks may restrict maximum URL sizes
Rgds
Stephen
Deleted User 9295

Re: Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

BigEd wrote: Wed Dec 30, 2020 6:51 pm (Hmm, thinking of portability and transparency, and longevity too, I'd much prefer the Basic in question - however encoded - not to be tokenised, but to be plain text.
I'm afraid I take the opposite view: for embedding in the URL a program should be as compressed as (reasonably) possible, and tokenising is a worthwhile compression scheme. I really can't see what objection there could be to lossless compression, so long as it uses a well-documented algorithm. You might as well say that ZIP compression should never be used! :wink:

One could always arrange to tokenise at the same time as doing the base64url encoding.
Deleted User 9295

Re: Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

sweh wrote: Wed Dec 30, 2020 6:52 pm FWIW, this may help with practical limits
I have made sure that the in-browser edition of BBC BASIC for SDL 2.0 can successfully fetch files from Dropbox (using a shared link) so there should be little cause to embed long files in the URL. I had hoped to keep my sample URL to less than 2048 characters, but couldn't quite make it with the program I chose (which was too nice not to use). In practice the only browser with a limit that small is Internet Explorer, and that doesn't support WebAssembly Threads anyway, so can't run BBCSDL.
Deleted User 9295

Re: Options for embedding BBC BASIC code in URL

Post by Deleted User 9295 »

Here's how I'm doing the base64url decoding (input in url$):

Code: Select all

            dec$ = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
            F% = OPENOUT(@tmp$ + "untitled")
            FOR I% = 0 TO LENurl$-1 STEP 4
              FOR J% = 1 TO 4
                D% = (D% << 6) + (INSTR(dec$,MID$(url$,I%+J%,1))-1 AND &3F)
              NEXT
              BPUT#F%,D% >> 16:BPUT#F%,D% >> 8:BPUT#F%,D%
            NEXT
            CLOSE #F%
If there's a faster or nicer way let me know.
Post Reply

Return to “modern implementations of classic programming languages”