Idea: Fastest Conway's Game of Life in BBC Basic

for discussion of bbc basic for windows/sdl, brandy and more
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

Having seen Richard Russell's presentation including demos of 3D shaders running within BBC Basic, and having previously seen the idea of using a shader to run Conway's Life, I thought the combination might be well worth exploring.

If the shader idea works on the Raspberry Pi, so much the better!

Here's a browser demo which uses the shader in WebGL.
User avatar
Diminished
Posts: 1235
Joined: Fri Dec 08, 2017 9:47 pm
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Diminished »

Very neat!

I have nothing to add except a tangent, but in case anyone hasn't seen them before, Chapters 17 and 18 of Michael Abrash's venerable Graphics Programming Black Book tackle ways to optimise Life (although this was all on x86 CPUs).
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

Yes indeed! In a related tangent, there's a juicy thread elsewhere which points back nearer home: Conway's Life on 6502.

But my feeling is that a GPU is going to do pretty well.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

BigEd wrote: Sat Jan 09, 2021 8:49 pm But my feeling is that a GPU is going to do pretty well.
There are several shader implementations of The Game of Life at shadertoy.com. They are of course stupendously fast, and the cell size can be as small as a single pixel, but if anything I find them less visually impressive as a result.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

BigEd wrote: Sat Jan 09, 2021 7:15 pmI thought the combination might be well worth exploring.
Right, I have a shader-based Game of Life running in BBC BASIC! This involved something I've never attempted before: multi-stage shader programming. This is essential for GoL because each iteration's input is the previous iteration's output, which breaks the normal rendering 'pipeline'. This opens up a whole new set of possibilities, but my existing shaderlib.bbc library would need some revisions to support it properly.

It works OK in the desktop edition, but I've not yet tried it with the in-browser edition (there's no guarantee it will work without tweaks, because that's WebGL rather than OpenGL). I might try it tomorrow.
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

Exciting!
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

BigEd wrote: Sun Jan 10, 2021 11:32 amExciting!
I was right, it didn't work in WebGL. I had forgotten that it has a restriction that textures must have power-of-two dimensions (unlike OpenGL which can have any dimensions). So I've modified it to use a 512x512 grid and now it works: click here using a suitable browser.

Obviously running it on a desktop platform is much better from the point of view of experimenting; you can try things like changing the grid size, altering the initial condition (currently it's random), slowing down the generations etc. Here's the source if you want to play. It shouldn't be too incomprehensible:

Code: Select all

      REM Conway's 'Game of Life': Shader version for BBC BASIC for SDL 2.0
      REM Adapted from various examples at Shadertoy.com by Richard Russell

      REM Size of grid (max = window size, must be power-of-two for WebGL):
      VDU 23,22,512;512;8,16,16,0
      GridW% = 512
      GridH% = 512

      REM Initial random pattern:
      FOR C% = 1 TO 80*25 : VDU 32+RND(94) : NEXT
      OSCLI "GSAVE """ + @tmp$ + "initial.bmp"" 0,0," + STR$(GridW%*2) + "," + STR$(GridH%*2)
      CLS

      REM Install libraries:
      INSTALL @lib$ + "shaderlib"

      REM Define constants:
      GL_FRAMEBUFFER = &8D40
      GL_COLOR_ATTACHMENT0 = &8CE0

      REM Create arrays and structures:
      DIM Vertex$(10), Fragment$(999), Final$(10), Float{0%,1%}

      REM Fill vertex and fragment shader arrays from DATA statements:
      PROC_readshader(Vertex$())
      PROC_readshader(Fragment$())
      PROC_readshader(Final$())

      REM Initialise window and get its size:
      ON MOVE IF @msg% <> 5 RETURN ELSE PROCcleanup
      VDU 26
      IF POS REM SDL thread sync
      ScrW% = @size.x%
      ScrH% = @size.y%

      REM Ensure cleanup on exit:
      ON CLOSE PROCcleanup : QUIT
      ON ERROR PROCcleanup : MODE 3 : PRINT REPORT$ : END

      REM Initialise shader library:
      PROC_shaderinit(oVertex%, oFragment%)
      SYS `glCreateShader`, GL_FRAGMENT_SHADER, @memhdc% TO oFinal%
      `glGenFramebuffers` = FN_gpa("glGenFramebuffers")
      `glBindFramebuffer` = FN_gpa("glBindFramebuffer")
      `glFramebufferTexture2D` = FN_gpa("glFramebufferTexture2D")
      `glDeleteFramebuffers` = FN_gpa("glDeleteFramebuffers")

      REM Compile shaders:
      PROC_compileshader(oVertex%, Vertex$(), "Vertex")
      PROC_compileshader(oFragment%, Fragment$(), "Fragment")
      PROC_compileshader(oFinal%, Final$(), "Final")

      REM Link shaders:
      oProgram1% = FN_useshaders(oVertex%, oFragment%)
      oProgram2% = FN_useshaders(oVertex%, oFinal%)

      REM Get pointers to uniforms:
      SYS `glGetUniformLocation`, oProgram1%, "iResolution", @memhdc% TO pResolution1%
      SYS `glGetUniformLocation`, oProgram1%, "iChannel0",   @memhdc% TO pChannel0_1%
      SYS `glGetUniformLocation`, oProgram2%, "iResolution", @memhdc% TO pResolution2%
      SYS `glGetUniformLocation`, oProgram2%, "iChannel0",   @memhdc% TO pChannel0_2%

      REM Load and bind initial random texture:
      SYS `glActiveTexture`, GL_TEXTURE0, @memhdc%
      Texture1% = FN_loadtexture1(@tmp$ + "initial.bmp")
      Texture2% = FN_loadtexture1(@tmp$ + "initial.bmp")
      IF Texture1% = 0 OR Texture2% = 0 ERROR 100, "Couldn't load initial.bmp"
      SYS `glUniform1i`, pChannel0_1%, 0, @memhdc%
      SYS `glUniform1i`, pChannel0_2%, 0, @memhdc%

      REM Create Framebuffer object:
      DIM align{fbo%}
      SYS `glGenFramebuffers`, 1, align{}, @memhdc%
      oBuffer% = align.fbo%
      IF oBuffer% = 0 ERROR 100, "Couldn't create framebuffer"

      REM Render loop:
      REPEAT
        REM Stage one: render to texture:
        Float.0% = FN_f4(GridW%) : Float.1% = FN_f4(GridH%)
        SYS `glBindFramebuffer`, GL_FRAMEBUFFER, oBuffer%, @memhdc%
        SYS `glFramebufferTexture2D`, GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, \
        \                                     GL_TEXTURE_2D, Texture2%, 0, @memhdc%
        SYS `glBindTexture`, GL_TEXTURE_2D, Texture1%, @memhdc%
        SYS `glUseProgram`, oProgram1%, @memhdc%
        SYS `glUniform2fv`, pResolution1%, 1, Float{}, @memhdc%
        SYS `glDrawArrays`, GL_TRIANGLES, 0, 6, @memhdc%

        REM Stage two: render to canvas:
        Float.0% = FN_f4(ScrW%) : Float.1% = FN_f4(ScrH%)
        SYS `glBindFramebuffer`, GL_FRAMEBUFFER, 0, @memhdc%
        SYS `glBindTexture`, GL_TEXTURE_2D, Texture2%, @memhdc%
        SYS `glUseProgram`, oProgram2%, @memhdc%
        SYS `glUniform2fv`, pResolution2%, 1, Float{}, @memhdc%
        PROC_render

        SWAP Texture1%, Texture2%
      UNTIL FALSE
      END

      DEF PROCcleanup
      ON ERROR OFF
      oBuffer%% += 0  : IF oBuffer%   SYS `glDeleteFramebuffers`, 1, ^oBuffer%, @memhdc%
      Texture1% += 0  : IF Texture1%  SYS `glDeleteTextures`, 1, ^Texture1%, @memhdc%
      Texture2% += 0  : IF Texture2%  SYS `glDeleteTextures`, 1, ^Texture2%, @memhdc%
      oProgram2% += 0 : IF oProgram2% SYS `glDeleteProgram`, oProgram2%, @memhdc%
      oFinal% += 0    : IF oFinal%    SYS `glDeleteShader`, oFinal%, @memhdc%
      oProgram1% += 0 : oVertex% += 0 : oFragment% += 0
      PROC_shaderexit(oProgram1%, oVertex%, oFragment%)
      *REFRESH ON
      ENDPROC

      REM Minimal vertex shader:
      DATA "attribute vec4 vPosition;"
      DATA "void main()"
      DATA "{"
      DATA "gl_Position = vPosition ;"
      DATA "}"
      DATA ""

      REM Intermediate fragment shader (outputs to texture):
      DATA "uniform vec2 iResolution;"
      DATA "uniform sampler2D iChannel0;"

      DATA "int cell( in vec2 p )"
      DATA "{"
      DATA "return (texture2D(iChannel0, p/iResolution.xy).g > 0.5 ) ? 1 : 0;"
      DATA "}"

      DATA "void main()"
      DATA "{"
      DATA "vec2 px = gl_FragCoord.xy;"
      DATA "int k = cell(px+vec2(-1,-1)) + cell(px+vec2(0,-1)) + cell(px+vec2(1,-1))"
      DATA "      + cell(px+vec2(-1, 0))                       + cell(px+vec2(1, 0))"
      DATA "      + cell(px+vec2(-1, 1)) + cell(px+vec2(0, 1)) + cell(px+vec2(1, 1));"
      DATA "int e = cell(px);"
      DATA "float f = (((k==2) && (e==1)) || (k==3)) ? 1.0 : 0.0;"
      DATA "gl_FragColor = vec4( f, f, f, 1.0 );"
      DATA "}"
      DATA ""

      REM Final fragment shader (outputs to canvas):
      DATA "uniform vec2 iResolution;"
      DATA "uniform sampler2D iChannel0;"
      DATA "void main()"
      DATA "{"
      DATA "gl_FragColor = texture2D(iChannel0, gl_FragCoord.xy/iResolution.xy);"
      DATA "}"
      DATA ""
Last edited by Deleted User 9295 on Sun Jan 10, 2021 1:02 pm, edited 1 time in total.
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

Excellent, thanks! One very nice thing about this sort of resolution and speed is that gliders really do glide in a very satisfactory way.

Edit: is this the fastest? I rather expect it is the fastest.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

BigEd wrote: Sun Jan 10, 2021 12:09 pm Edit: is this the fastest? I rather expect it is the fastest.
It's locked to frame sync (60 generations per second on most setups). You could make it faster by turning off vSync, but some of the generations would never be displayed and gliders would no longer move smoothly, so is that meaningful? I would argue not.

That's one reason why I'm not sure that "fastest" is a sensible challenge: any half decent GPU is going to manage 60 generations a second, unless you have a bigger grid. You could instead ask: "how big can the grid be whilst still running at 60 Hz?".
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

Ah, I wasn't quite asking if this is as fast as it can go, rather asking if any conventional BBC Basic program, not using a shader, might manage such a high update rate.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

BigEd wrote: Sun Jan 10, 2021 1:44 pmasking if any conventional BBC Basic program, not using a shader, might manage such a high update rate.
By "conventional" do you mean including assembly language, or 'pure' BASIC code only? Do you consider shader code to be distinctly different from assembler code, or in the same general category?

An interesting question would be: is there any way of leveraging BBC BASIC's array arithmetic to compute multiple cells at once? If there is that would be a game-changer for a pure BASIC solution, but I'm somewhat doubtful.
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

I was thinking of portable Basic code, so that rules out assembly language. I was thinking of any conventional approach which has two arrays of cell states, and reads one while writing the other. (It might be that your BBC Basic could do this operation to be programmed explicitly to use multiple threads??)

And indeed, it might be that a sufficiently modern BBC Basic can do those array operations without explicit FOR loops, and avoid a lot of overhead. I hadn't thought of that.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

BigEd wrote: Sun Jan 10, 2021 3:18 pm I was thinking of portable Basic code, so that rules out assembly language.
Assuming that by "portable" you mean it would have to work in ARM BASIC 5 and Matrix Brandy, that rules out a lot of possible 'tricks' too (I was thinking that to leverage array arithmetic it would need some evil poking of array descriptors, but that wouldn't be portable).

I have so often regretted that array descriptors have to be contiguous with the array itself (that's true in my versions, anyway). It means you can't pretend that part of a large array is itself a small array, which would have been possible had there been an extra level of indirection. Do you happen to know whether the same is true in ARM BASIC 5 and/or Brandy?
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

Oh, nested or overlapping arrays! I have no idea about that.

By portable I mainly meant your implementations, but for Pi, desktop, and browser.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

BigEd wrote: Sun Jan 10, 2021 5:11 pm By portable I mainly meant your implementations, but for Pi, desktop, and browser.
I'm allowed array-descriptor hacking then! What's the current pure-BASIC speed record for, say, a 512 x 512 grid? Just so I know what I'm aiming for.
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

Oh, I don't know - but I think it would be terribly slow. I was trying to write a (hopefully decent) Basic implementation last year, but my powers of concentration were just not up to it. It was intended to be a follow on of the program presented here.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

BigEd wrote: Sun Jan 10, 2021 8:38 pm Oh, I don't know - but I think it would be terribly slow.
With array arithmetic and descriptor hacking, I'm getting just over 3 generations per second with a 512x512 grid on this laptop. The trouble is that I have to do such contortions to make array arithmetic do things that it was never intended to, it may well not end up being any faster than the naive but less wasteful approach.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

Here's the pure BASIC (but specific to my current versions) listing. It achieves in the region of 3 generations per second here with a 512x512 grid (compared with the shader version's 60 per second with time to spare). I'd be interested to know if anybody can improve on it - without using a faster interpreter like Brandy of course.

Code: Select all

      REM Conway's 'Game of Life' in pure BASIC (with some cheating!)
      REM By Richard Russell, http://www.bbcbasic.co.uk/, 10-Jan-2021

      HIMEM = PAGE + 8000000

      REM Set grid size and initialise screen:
      W% = 512 : H% = 512
      VDU 23,22,W%;H%;8,8,16,0
      OFF

      REM Define arrays:
      DIM g&(W%-1,H%-1), ft%(9)

      REM Initialise the grid to a random state:
      FOR Y% = 0 TO H%-1
        FOR X% = 0 TO W%-1
          g&(X%,Y%) = RND(2) - 1
        NEXT
      NEXT

      REM Create 8 copy arrays with gaps:
      G% = W% * 2 + 2 : p%% = 0
      DIM k&(W%-1,H%-1), p%% G%, l&(W%-1,H%-1), p%% G%, m&(W%-1,H%-1), p%% G%, n&(W%-1,H%-1), p%% G%
      DIM o&(W%-1,H%-1), p%% G%, p&(W%-1,H%-1), p%% G%, q&(W%-1,H%-1), p%% G%, r&(W%-1,H%-1), p%% G%

      REM Do some evil descriptor hacking to offset the arrays:
      PROCoffset(k&(),W%+1) : PROCoffset(l&(),W%+1) : PROCoffset(m&(),W%+1) : PROCoffset(n&(),W%+1)
      PROCoffset(o&(),W%+1) : PROCoffset(p&(),W%+1) : PROCoffset(q&(),W%+1) : PROCoffset(r&(),W%+1)

      REM Repeat for each generation:
      *REFRESH OFF
      GCOL 15
      ref% = TIME
      @%=&102010A
      ft%() = 30
      REPEAT
        REM Display grid
        CLS
        FOR Y% = 0 TO H%-1
          FOR X% = 0 TO W%-1
            IF g&(X%,Y%) PLOT X%*2, Y%*2
          NEXT
        NEXT

        REM Make 8 copies of grid:
        k&() = g&() : l&() = g&() : m&() = g&() : n&() = g&()
        o&() = g&() : p&() = g&() : q&() = g&() : r&() = g&()

        REM Offset arrays:
        PROCoffset(k&(),-W%-1) : PROCoffset(l&(),-W%) : PROCoffset(m&(),-W%+1)
        PROCoffset(n&(),-1)    : PROCoffset(o&(),+1)
        PROCoffset(p&(),+W%-1) : PROCoffset(q&(),+W%) : PROCoffset(r&(),+W%+1)

        REM Sum the border cells:
        k&() += l&() + m&() + n&() + o&() + p&() + q&() + r&()

        REM Calculate next state:
        l&() = k&() * k&() DIV 4
        o&() = l&() DIV 2
        m&() = l&() DIV -8
        m&() += 1
        l&() AND= m&() AND g&()  : REM survives
        p&() = o&() DIV -2
        p&() += 1
        o&() AND= p&() AND 1 : REM survives or reproduces

        g&() = l&() OR o&()

        REM Undo the offsets:
        PROCoffset(k&(),+W%+1) : PROCoffset(l&(),+W%) : PROCoffset(m&(),+W%-1)
        PROCoffset(n&(),+1)    : PROCoffset(o&(),-1)
        PROCoffset(p&(),-W%+1) : PROCoffset(q&(),-W%) : PROCoffset(r&(),-W%-1)

        *REFRESH

        T% = (T% + 1) MOD (DIM(ft%(),1) + 1)
        new% = TIME
        ft%(T%) = new% - ref%
        ref% = new%
        SYS "SDL_SetWindowTitle", @hwnd%, STR$(1000/SUM(ft%())) + " generations per second", @memhdc%
      UNTIL FALSE
      END

      REM This is non-portable code:
      DEF PROCoffset(RETURN a&(), O%)
      LOCAL p%%, W%, H% : p%% = PTR(a&())
      W% = DIM(a&(),1) + 1 : H% = DIM(a&(),2) + 1
      ?p%% = 0 : p%%!1 = 0  : p%%!5 = 0
      p%% += O% : PTR(a&()) = p%%
      ?p%% = 2 : p%%!1 = W% : p%%!5 = H%
      ENDPROC
Last edited by Deleted User 9295 on Mon Jan 11, 2021 5:36 pm, edited 1 time in total.
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

Thanks Richard - I'll give that a go tomorrow.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

Richard Russell wrote: Sun Jan 10, 2021 9:00 pmit may well not end up being any faster than the naive but less wasteful approach.
I've tried to assess that by writing a naive non-optimised implementation myself; on the same PC it runs at around one generation per second so the array-arithmetic version is quite a bit faster despite its messiness.

I wonder if anybody can improve on the code I devised to compute the next state of each cell. To be usable with entire arrays it can't use any comparisons, so the challenge is to implement a functional equivalent to this but without using the '=', '<' or '>' operators, where k is in the range 0 to 8 and g is 0 or 1:

Code: Select all

      IF (k = 2) AND (g = 1) OR (k = 3) THEN g = 1 ELSE g = 0
I've currently got this:

Code: Select all

        l = k * k DIV 4
        o = l DIV 2
        m = l DIV -8
        m += 1
        l AND= m AND g
        p = o DIV -2
        p += 1
        o AND= p AND 1
        g = l OR o
Writing comparison-less code is quite challenging!
dp11
Posts: 1757
Joined: Sun Aug 12, 2012 9:47 pm
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by dp11 »

sounds a bit like a palette look up to me, k concatenated with g to form an index. :D
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

dp11 wrote: Mon Jan 11, 2021 5:46 pm sounds a bit like a palette look up to me, k concatenated with g to form an index. :D
Yeah, but again not something that can work with entire arrays. The only array operators available (AFAIK) are: + - * / DIV AND OR EOR. Can you suggest some actual code that's simpler than mine?
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

I think it's true, and it might help, that g gets 1 if and only if k+g == 3, which may be slightly simpler - we only need the inclusive count of the 9-cell.

Edit: this is wrong! See below. Posting while tired, that's my excuse.
Last edited by BigEd on Tue Jan 12, 2021 9:03 am, edited 1 time in total.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

BigEd wrote: Mon Jan 11, 2021 6:54 pm I think it's true, and it might help, that g gets 1 if and only if k+g == 3, which may be slightly simpler
g remains 1 also if (k = 3 AND g = 1), which means k+g == 4. :?

I've found a way of speeding it up a little by changing the way it's displayed, but there's a bug (must be in the previous version too) because it never seems to stabilise however long it's left running. I won't publish it until I can find a fix.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

New version below; changes are as follows:
  • Grid corruption bug fixed (hopefully).
  • Grid displayed by mimicking a BMP file, so quite a lot faster.
  • Compatible with both BB4W and BBCSDL (only updating the title bar is affected).

Code: Select all

      REM Conway's 'Game of Life' in pure BASIC (with some cheating!)
      REM By Richard Russell, http://www.bbcbasic.co.uk/, 11-Jan-2021

      HIMEM = PAGE + 8000000

      REM Set grid size and initialise screen:
      W% = 512 : H% = 512
      VDU 23,22,W%;H%;8,8,16,0
      OFF

      REM Define structures and arrays:
      DIM bm{bfType{l&,h&}, bfSize%, bfReserved%, bfOffBits%, \
      \      biSize%, biWidth%, biHeight%, biPlanes{l&,h&}, biBitCount{l&,h&}, \
      \      biCompression%, biSizeImage%, biXPelsPerMeter%, biYPelsPerMeter%, \
      \      biClrUsed%, biClrImportant%, palette%(255)}, g&(W%-1,H%-1), ft%(9)

      REM Initialise bitmap structure:
      bm.bfType.l& = ASC"B" : bm.bfType.h& = ASC"M"
      bm.bfOffBits% = ^g&(0,0) - bm{} : bm.bfSize% = bm.bfOffBits% + W%*H%
      bm.biSize% = 40 : bm.biWidth% = W% : bm.biHeight% = -H%
      bm.biPlanes.l& = 1 : bm.biBitCount.l& = 8
      bm.palette%(0) = &FF000000 : bm.palette%(1) = &FFFFFFFF

      REM Initialise the grid to a random state:
      FOR Y% = 0 TO H%-1
        FOR X% = 0 TO W%-1
          g&(X%,Y%) = RND(2) - 1
        NEXT
      NEXT

      REM Create 8 copy arrays with gaps:
      G% = W% * 2 + 2 : p%% = 0
      DIM k&(W%-1,H%-1), p%% G%, l&(W%-1,H%-1), p%% G%, m&(W%-1,H%-1), p%% G%, n&(W%-1,H%-1), p%% G%
      DIM o&(W%-1,H%-1), p%% G%, p&(W%-1,H%-1), p%% G%, q&(W%-1,H%-1), p%% G%, r&(W%-1,H%-1), p%% G%

      REM Do some evil descriptor hacking to offset the arrays:
      PROCoffset(k&(),W%+1) : PROCoffset(l&(),W%+1) : PROCoffset(m&(),W%+1) : PROCoffset(n&(),W%+1)
      PROCoffset(o&(),W%+1) : PROCoffset(p&(),W%+1) : PROCoffset(q&(),W%+1) : PROCoffset(r&(),W%+1)

      REM Repeat for each generation:
      *REFRESH OFF
      GCOL 15
      R% = TIME
      @%=&102010A
      ft%() = 30
      REPEAT
        REM Display grid:
        OSCLI "MDISPLAY " + STR$~bm{}
  
        REM Make 8 copies of grid:
        k&() = g&() : l&() = g&() : m&() = g&() : n&() = g&()
        o&() = g&() : p&() = g&() : q&() = g&() : r&() = g&()
  
        REM Offset arrays:
        PROCoffset(k&(),-W%-1) : PROCoffset(l&(),-W%) : PROCoffset(m&(),-W%+1)
        PROCoffset(n&(),-1)    : PROCoffset(o&(),+1)
        PROCoffset(p&(),+W%-1) : PROCoffset(q&(),+W%) : PROCoffset(r&(),+W%+1)
  
        REM Sum the border cells:
        k&() += l&() + m&() + n&() + o&() + p&() + q&() + r&()
  
        REM Calculate next state (by Svein Svensson):
        k&() OR= g&()
        g&() = k&() * 80 DIV 240 : REM Relies on 8-bit wraparound
 
        REM Undo the offsets:
        PROCoffset(k&(),+W%+1) : PROCoffset(l&(),+W%) : PROCoffset(m&(),+W%-1)
        PROCoffset(n&(),+1)    : PROCoffset(o&(),-1)
        PROCoffset(p&(),-W%+1) : PROCoffset(q&(),-W%) : PROCoffset(r&(),-W%-1)
  
        *REFRESH
  
        REM Report speed in title bar:
        T% = (T% + 1) MOD (DIM(ft%(),1) + 1)
        N% = TIME
        ft%(T%) = N% - R%
        R% = N%
        title$ = STR$(1000/SUM(ft%())) + " generations per second"
        IF INKEY$(-256) = "W" THEN
          SYS "SetWindowText", @hwnd%, title$
        ELSE
          SYS "SDL_SetWindowTitle", @hwnd%, title$, @memhdc%
        ENDIF
  
      UNTIL FALSE
      END

      REM This is non-portable code:
      DEF PROCoffset(RETURN a&(), O%)
      LOCAL i%%, p%%, W%, H% : p%% = PTR(a&())
      W% = DIM(a&(),1) + 1 : H% = DIM(a&(),2) + 1
      IF O% < 0 FOR i%% = p%% + O% + 9 TO p%% + 8 : ?i%% = 0 : NEXT
      IF O% > 0 FOR i%% = p%% + W%*H% + 9 TO p%% + W%*H% + O% + 8 : ?i%% = 0 : NEXT
      p%% += O% : PTR(a&()) = p%%
      ?p%% = 2 : p%%!1 = W% : p%%!5 = H%
      ENDPROC
Last edited by Deleted User 9295 on Tue Jan 12, 2021 11:31 pm, edited 2 times in total.
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

BigEd wrote: Mon Jan 11, 2021 6:54 pm I think it's true, and it might help, that ...
Oops, quite right, I got that quite wrong. I was dubious, but evidently not careful enough. I'm sure I've seen somewhere before some kind of optimisation of the life-or-death decision.

Thanks for the updated code.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

dp11 wrote: Mon Jan 11, 2021 5:46 pm k concatenated with g to form an index.
Although it can't be used as an index as such, 'concatenating' k with g (in practice I used k*2 + g) was key to the latest improvement. This reduces the total number of operations required to calculate the next state (using only whole-array operators) from 12 to 7:

Code: Select all

      p = k*2 + g + 8
      q = p + 3
      g = q DIV 16 - p DIV 16
I've updated the listing in the earlier post to use this; it's increased the speed to around 6 generations per second on this laptop.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

Richard Russell wrote: Mon Jan 11, 2021 5:35 pmI wonder if anybody can improve on the code
Svein Svensson came up with this suggestion at the Discussion Group, which is either brilliant or ugly depending on your attitude to relying on 'side effects' (it works only because byte variables 'wrap around' on overflow, like 8-bit indirection does, rather than reporting a 'Number too big' error):

Code: Select all

        REM Calculate next state (by Svein Svensson):
        k&() OR= g&()
        g&() = k&() * 80 DIV 240 : REM Relies on 8-bit wraparound
Since it provides another worthwhile speed-up I've edited the program in the earlier post (again) to use it.
Deleted User 9295

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by Deleted User 9295 »

I've listed below what I'm tentatively calling the 'final' version of the program. The last tweak, to squeeze the very last drop of performance, was to remove the synchronisation to vertical sync, because that had become redundant with the new display method.

On my fastest desktop PC it manages a sustained 18 generations-per-second or so, with a 512x512 grid (262,144 cells), which I think is pretty impressive for an entirely-interpreted language. It will run in BBC BASIC for SDL 2.0 or BBC BASIC for Windows, but in the latter case it needs the as-yet unreleased v6.14a.

To achieve this I have made maximum use of BBC BASIC's features:
  • Array arithmetic, allowing me to calculate the next state of every cell all at once.
  • Byte arrays, used for compactness and to take advantage of 8-bit numeric wraparound.
  • Structures, making the main grid array look like a BMP file for fast display.
  • Poking directly into array descriptors, to effectively offset the data within the arrays.
Some of these features are specific to my BASICs, but there may nevertheless be opportunities for similar cheats in other implementations.

Code: Select all

      REM Conway's 'Game of Life' in pure BASIC (with some cheating!)
      REM By Richard Russell, http://www.bbcbasic.co.uk/, 13-Jan-2021

      HIMEM = PAGE + 8000000

      REM Set grid size and initialise screen:
      W% = 512 : H% = 512
      VDU 23,22,W%;H%;8,8,16,0
      OFF

      REM Define structures and arrays:
      DIM bm{bfType{l&,h&}, bfSize%, bfReserved%, bfOffBits%, \
      \      biSize%, biWidth%, biHeight%, biPlanes{l&,h&}, biBitCount{l&,h&}, \
      \      biCompression%, biSizeImage%, biXPelsPerMeter%, biYPelsPerMeter%, \
      \      biClrUsed%, biClrImportant%, palette%(255)}, g&(H%-1,W%-1), ft%(9)

      REM Initialise bitmap structure:
      bm.bfType.l& = ASC"B" : bm.bfType.h& = ASC"M"
      bm.bfOffBits% = ^g&(0,0) - bm{} : bm.bfSize% = bm.bfOffBits% + W%*H%
      bm.biSize% = 40 : bm.biWidth% = W% : bm.biHeight% = -H%
      bm.biPlanes.l& = 1 : bm.biBitCount.l& = 8
      bm.palette%(0) = &FF000000 : bm.palette%(1) = &FFFFFFFF

      REM Initialise the grid to a random state:
      FOR Y% = 0 TO H%-1
        FOR X% = 0 TO W%-1
          g&(Y%,X%) = RND(2) - 1
        NEXT
      NEXT

      REM Create 8 copy arrays with gaps:
      G% = W% * 2 + 2 : p%% = 0
      DIM k&(H%-1,W%-1), p%% G%, l&(H%-1,W%-1), p%% G%, m&(H%-1,W%-1), p%% G%, n&(H%-1,W%-1), p%% G%
      DIM o&(H%-1,W%-1), p%% G%, p&(H%-1,W%-1), p%% G%, q&(H%-1,W%-1), p%% G%, r&(H%-1,W%-1), p%% G%

      REM Do some evil descriptor hacking to offset the arrays:
      PROCoffset(k&(),W%+1) : PROCoffset(l&(),W%+1) : PROCoffset(m&(),W%+1) : PROCoffset(n&(),W%+1)
      PROCoffset(o&(),W%+1) : PROCoffset(p&(),W%+1) : PROCoffset(q&(),W%+1) : PROCoffset(r&(),W%+1)

      REM Repeat for each generation:
      GCOL 15
      R% = TIME
      @%=&102010A
      ft%() = 30
      REPEAT
        REM Display grid:
        OSCLI "MDISPLAY " + STR$~bm{}

        REM Make 8 copies of grid:
        k&() = g&() : l&() = g&() : m&() = g&() : n&() = g&()
        o&() = g&() : p&() = g&() : q&() = g&() : r&() = g&()

        REM Offset arrays:
        PROCoffset(k&(),-W%-1) : PROCoffset(l&(),-W%) : PROCoffset(m&(),-W%+1)
        PROCoffset(n&(),-1)    : PROCoffset(o&(),+1)
        PROCoffset(p&(),+W%-1) : PROCoffset(q&(),+W%) : PROCoffset(r&(),+W%+1)

        REM Sum the border cells:
        k&() += l&() + m&() + n&() + o&() + p&() + q&() + r&()

        REM Calculate next state (by Svein Svensson):
        k&() OR= g&()
        g&() = k&() * 80 DIV 240 : REM Relies on 8-bit wraparound

        REM Undo the offsets:
        PROCoffset(k&(),+W%+1) : PROCoffset(l&(),+W%) : PROCoffset(m&(),+W%-1)
        PROCoffset(n&(),+1)    : PROCoffset(o&(),-1)
        PROCoffset(p&(),-W%+1) : PROCoffset(q&(),-W%) : PROCoffset(r&(),-W%-1)

        REM Report speed in title bar:
        T% = (T% + 1) MOD (DIM(ft%(),1) + 1)
        N% = TIME
        ft%(T%) = N% - R%
        R% = N%
        title$ = STR$(1000/SUM(ft%())) + " generations per second"
        IF INKEY$(-256) = "W" THEN
          SYS "SetWindowText", @hwnd%, title$
        ELSE
          SYS "SDL_SetWindowTitle", @hwnd%, title$, @memhdc%
        ENDIF

      UNTIL FALSE
      END

      REM This is non-portable code:
      DEF PROCoffset(RETURN a&(), O%)
      LOCAL i%%, p%%, W%, H% : p%% = PTR(a&())
      W% = DIM(a&(),1) + 1 : H% = DIM(a&(),2) + 1
      IF O% < 0 FOR i%% = p%% + O% + 9 TO p%% + 8 : ?i%% = 0 : NEXT
      IF O% > 0 FOR i%% = p%% + W%*H% + 9 TO p%% + W%*H% + O% + 8 : ?i%% = 0 : NEXT
      p%% += O% : PTR(a&()) = p%%
      ?p%% = 2 : p%%!1 = W% : p%%!5 = H%
      ENDPROC
Last edited by Deleted User 9295 on Thu Jan 14, 2021 6:25 pm, edited 1 time in total.
User avatar
BigEd
Posts: 6261
Joined: Sun Jan 24, 2010 10:24 am
Location: West Country
Contact:

Re: Idea: Fastest Conway's Game of Life in BBC Basic

Post by BigEd »

Very impressive! And very educational too as your successive versions got faster.

It would be nice to see what can be done in, say, Basic V, which runs on reasonably fast machines (the Pi) but doesn't have the benefit of the overlapping array techniques. Perhaps someone could have a go at that one day.
Post Reply

Return to “modern implementations of classic programming languages”