Options for embedding BBC BASIC code in URL
Options for embedding BBC BASIC code in URL
I expect this has been discussed before, but a quick search didn't find anything. I'm thinking of extending the in-browser edition of BBC BASIC for SDL 2.0 so that it will accept a (relatively short) BBC BASIC program embedded in the URL. The question is, what encoding format should it use?
It could accept the same format as jsbeeb, i.e. a URI-encoded plain-text program, but that seems quite inefficient. Or it could accept a URI-encoded tokenised program, or perhaps a base64-encoded tokenised program. Or even a base64-encoded zipped file (not sure how I'd unzip it though - it would have to be done in BASIC!).
Any thoughts?
It could accept the same format as jsbeeb, i.e. a URI-encoded plain-text program, but that seems quite inefficient. Or it could accept a URI-encoded tokenised program, or perhaps a base64-encoded tokenised program. Or even a base64-encoded zipped file (not sure how I'd unzip it though - it would have to be done in BASIC!).
Any thoughts?
Re: Options for embedding BBC BASIC code in URL
Unzipping uncompressed data is fairly simple, skip the header, solid block of data.Richard Russell wrote: ↑Wed Dec 30, 2020 12:14 am I expect this has been discussed before, but a quick search didn't find anything. I'm thinking of extending the in-browser edition of BBC BASIC for SDL 2.0 so that it will accept a (relatively short) BBC BASIC program embedded in the URL. The question is, what encoding format should it use?
I think having an option to be able to feed in a pre-exiting format - the jsbeeb format - would be useful.
Thinking further along the lines of not re-inventing rotational transport devices, how about the telesoftware format, which is essentially the GSTRANS format. So, PRINT "Hello"<cr> would be |!q "Hello"|M. URL character entities would have to be escaped, so I think that comes to |!q%20%22Hello%22|M.
Checking Wiki, RFC1738 specifies that binary data should be passed with all non-AZaz09 percent encoded*, so it would be %f1%20%22Hello%22%0d.
*plus a few others, but AZaz09 is the simplest universal set.
Code: Select all
$ bbcbasic
PDP11 BBC BASIC IV Version 0.45
(C) Copyright J.G.Harston 1989,2005-2024
>_
Re: Options for embedding BBC BASIC code in URL
That's more attractive than plain text, but it still feels a little wasteful not to use any kind of lossless compression (other than tokenising). Of course it may be that the length of program that could practically be sent this way is so limited that there's not much opportunity for efficient compression anyway.
Re: Options for embedding BBC BASIC code in URL
Here are some comparisons for the trivial program:
Plain text (CR-terminated), URI-encoded, 33 characters:
Tokenised, URI-encoded, 31 characters:
Tokenised, URL-safe base64-encoded, 23 characters:
In the case of this particular example base64 wins. Trying it with a 'real' tokenised program chosen at random, URI-encoding gave 2274 characters and base64-encoding 1347 characters, which seems like a real win. Are there any downsides?
Code: Select all
PRINT "Hello world!"
Code: Select all
PRINT%20%22Hello%20world%21%22%0D
Code: Select all
%F1%20%22Hello%20world%21%22%0D
Code: Select all
8SAiSGVsbG8gd29ybGQhIg0
Re: Options for embedding BBC BASIC code in URL
Hmm, I'm swimming against the current here, but I'd say there's no harm or cost in long URLs, so keeping it simple would be a win, as would following jsbeeb's example.
Re: Options for embedding BBC BASIC code in URL
<coward>Phew! I'm glad someone else jumped in first!</coward>
I have to agree. The option to pass the program as URI-encoded but uncompressed would be very handy and easier to use and would therefore probably get more use.
An uncompressed parameter would also be easier to add as an option to tools like the Beeb Link Console, which has already come in handy for generating links to BBC BASIC programs that run in JSBeeb. (I've been in contact with the developer of the Beeb Link Console and I could probably persuade him to add a BBC SDL option.)
Re: Options for embedding BBC BASIC code in URL
What about the practical limit of 2000 characters? Some browsers will accept more, but the advice there is that even in 2020 "staying under 2000 chars is the best general policy". If base64 encoding will fit your program in under 2000 characters but URI encoding won't, isn't that a valid argument?
And I don't really see why URI-encoding is any 'easier'. Either way you're going to need some software tool or online resource to do the encoding, and having coded both in BBC BASIC today for the purposes of the comparison there's not a great deal to choose in terms of complexity (URI: 5 lines of BASIC, base64: 8 lines).
As far as I know base64-encoding is still the standard for sending binary data in emails, and RFC4648 specifically documents the URL-clean variant, so why not use it? I'm rapidly convincing myself it's the right way to go.
Re: Options for embedding BBC BASIC code in URL
I've posted some pretty long URLs... just over 2k that time. But of course you should proceed as you see fit.
Re: Options for embedding BBC BASIC code in URL
Even if the longer URL works, it still feels wasteful to me to send more data than is necessary over the internet, store it on servers etc. OK we're talking about tiny amounts of data compared to the overall traffic, so it's much more an emotional thing than having any practical significance. But every little helps!BigEd wrote: ↑Wed Dec 30, 2020 1:23 pm I've posted some pretty long URLs... just over 2k that time. But of course you should proceed as you see fit.
I have ascertained that there's a standard Javascript function btoa() for converting binary to base64 (although conceivably it might require a couple of character substitutions to convert the result to the URL-clean variety), which might be handy should somebody want to provide an online conversion tool.
Re: Options for embedding BBC BASIC code in URL
I've gone ahead and implemented the base64-approach experimentally, at least to see how it behaves; apologies to those who don't approve. Here's a link you can try if you have a suitable browser (most desktop browsers except IE or Safari, or Chrome for Android with chrome://flags/#enable-webassembly-threads): long URL (2228 characters).
The syntax I'm using for the URL is ?embed= followed by the base64url-encoded tokenised program (according to RFC4648 section 5), with optional padding. The short program I've chosen for the demonstration is Fractal Pyramid, adapted for BBC BASIC for SDL 2.0.
The syntax I'm using for the URL is ?embed= followed by the base64url-encoded tokenised program (according to RFC4648 section 5), with optional padding. The short program I've chosen for the demonstration is Fractal Pyramid, adapted for BBC BASIC for SDL 2.0.
Last edited by Deleted User 9295 on Wed Dec 30, 2020 6:46 pm, edited 2 times in total.
Re: Options for embedding BBC BASIC code in URL
In terms of browser compatibility huge URLs is pretty much a non issue. It was Internet Explorer that was the problem, and that's not worth trying to support in new projects any more, particularly ones using shiny new web technologies that IE can't do.
Web server and search engine URL length limits could still be an issue. One option is to do what the web based teletext editors do, and don't put your giant lump of data in the request at all. The teletext editors put all the data in the URI fragment, which is entirely decoded by the client and not sent as part of the request when things are configured correctly (so doesn't appear in server logs etc.). Whether that is a good or bad thing depends on your outlook/use case
Web server and search engine URL length limits could still be an issue. One option is to do what the web based teletext editors do, and don't put your giant lump of data in the request at all. The teletext editors put all the data in the URI fragment, which is entirely decoded by the client and not sent as part of the request when things are configured correctly (so doesn't appear in server logs etc.). Whether that is a good or bad thing depends on your outlook/use case
Various teletext things including a web based teletext editor which can export as mode 7 screens.
Join the Teletext Discord for teletext chat.
Join the Teletext Discord for teletext chat.
Re: Options for embedding BBC BASIC code in URL
Remember that I'm providing this option in addition to the existing methods that BBC BASIC for SDL 2.0 uses to fetch code or data, i.e. the ?load= and ?chain= URL parameters. They should usually be used in preference, but the ?embed= approach provides an alternative, when the program is short.
Here's the full set of URL parameters that I currently support:
?chain=<remote file URL>
?load=<remote file URL>
?run=<local filename>
?dir=<local directory>
?embed=<base64url-encoded program>
Re: Options for embedding BBC BASIC code in URL
(Hmm, thinking of portability and transparency, and longevity too, I'd much prefer the Basic in question - however encoded - not to be tokenised, but to be plain text. However, I don't wish to be argumentative, and would rather see the ability to put programs into URLs than not be able to.)
Re: Options for embedding BBC BASIC code in URL
FWIW, this may help with practical limits: https://stackoverflow.com/a/812962/6569796
Hmm https://stackoverflow.com/a/417184/6569796 also indicates some CDN networks may restrict maximum URL sizes
Hmm https://stackoverflow.com/a/417184/6569796 also indicates some CDN networks may restrict maximum URL sizes
Rgds
Stephen
Stephen
Re: Options for embedding BBC BASIC code in URL
I'm afraid I take the opposite view: for embedding in the URL a program should be as compressed as (reasonably) possible, and tokenising is a worthwhile compression scheme. I really can't see what objection there could be to lossless compression, so long as it uses a well-documented algorithm. You might as well say that ZIP compression should never be used!
One could always arrange to tokenise at the same time as doing the base64url encoding.
Re: Options for embedding BBC BASIC code in URL
I have made sure that the in-browser edition of BBC BASIC for SDL 2.0 can successfully fetch files from Dropbox (using a shared link) so there should be little cause to embed long files in the URL. I had hoped to keep my sample URL to less than 2048 characters, but couldn't quite make it with the program I chose (which was too nice not to use). In practice the only browser with a limit that small is Internet Explorer, and that doesn't support WebAssembly Threads anyway, so can't run BBCSDL.
Re: Options for embedding BBC BASIC code in URL
Here's how I'm doing the base64url decoding (input in url$):
If there's a faster or nicer way let me know.
Code: Select all
dec$ = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
F% = OPENOUT(@tmp$ + "untitled")
FOR I% = 0 TO LENurl$-1 STEP 4
FOR J% = 1 TO 4
D% = (D% << 6) + (INSTR(dec$,MID$(url$,I%+J%,1))-1 AND &3F)
NEXT
BPUT#F%,D% >> 16:BPUT#F%,D% >> 8:BPUT#F%,D%
NEXT
CLOSE #F%