6502 instruction behaviour tester

tom_seddon · Post by **tom_seddon** » Sun Aug 21, 2022 1:23 am

I've put together a 6502 behaviour test suite that runs out of the box on BBC Micro-type systems and tests some of the undocumented behaviour. More info in the README in its GitHub repo: https://github.com/tom-seddon/6502-tests

Previous brief discussion here: viewtopic.php?f=4&t=24906 - but I thought I'd give it its own thread.

Plans for this in the longer term:

more automation-friendly behaviour, so I can run it as part of the CI process for b2. At the moment, the only real way to tell if things succeeded or failed is to have an actual person look at it running using their eyes
add test for SBX, which is reportedly consistent on C64, but has a time-consuming 24 bits of input. I'll probably make the main test program check 131,072 combinations, then have a separate test that checks all 16,777,216
some tests of some kind for the inconsistent illegal opcodes. At the very least, figure out whether they're inconsistent on the Beeb as well
maybe add some portability stuff? The code is Beeb-specific at the moment, but in principle all it needs is 4 KB RAM and ideally some way of producing output

--Tom

0xC0DE · Post by **0xC0DE** » Sun Aug 21, 2022 10:25 am

Very interesting, Tom! Will have to see if I can get it working on an Electron and, more importantly, to test my Acorn Electron emulator with this.

gtoal · Post by **gtoal** » Sun Aug 21, 2022 11:29 am

OK, this is a very specialised request that will only be useful to a very small number of emulator writers, and I understand and expect it's not something you're likely to accomodate, but I'll get it in print here for the record just in case...

This kind of test usually runs on a real machine to establish 'ground truth' as well as running in emulators to confirm their accuracy. But there's a third environment where they're needed, where it turns out to be very difficult to run this kind of test efficiently: a static binary translator converts binary code into another source language - perhaps a different instruction set for a different micro, or (as in my case) into a high-level language such as C to run anywhere. These translators are a bit like dynamic translating emulators but they do the translation completely first before compiling and executing. So a test suite that generates a sequence and calls the interpreter would have to generate a file to be translated, run the translator, compile the translators output, run that (or dynamically link it in), gather the output and compare to the expected register and memory values.
I did all this with Soren Roug's test suite for the 6809 and it was pretty awkward to integrate. So if at all possible, when designing the harness for a new test suite for the 6502, could you do it in a way that adding a static binary translation environment doesn't require an almost complete rewrite, please?

btw there are some existing tests saved in https://gtoal.com/SBTPROJECT/6502sbt/tests/

Thanks,

Graham
PS If SBTs interest you, I'm trying to build a suite of them for 6809, 6502 and z80. It's a slow low-priority background task so they're not complete - the 6809 one seems to be working, the others are unfinished but fairly advanced. Code in: https://gtoal.com/SBTPROJECT/6809sbt/ https://gtoal.com/SBTPROJECT/6502sbt/ and https://gtoal.com/SBTPROJECT/z80sbt/ (also cinematronics CPU - as in the Tailgunner vector cabinet game: https://gtoal.com/SBTPROJECT/cinesbt/ - and in parallel I'm taking some of the lessons from doing those and starting on a more generic version, https://gtoal.com/SBTPROJECT/generic/ - though that spin-off is just at the stage of working on the disassembly part of the process).

tom_seddon · Post by **tom_seddon** » Sun Aug 21, 2022 12:24 pm

gtoal wrote: ↑Sun Aug 21, 2022 11:29 am So if at all possible, when designing the harness for a new test suite for the 6502, could you do it in a way that adding a static binary translation environment doesn't require an almost complete rewrite, please?

Certainly - what should I bear in mind? There is some self-modifying code as part of the driver (saving cycles in the main loop - it runs millions of times!) and, more necessarily, as part of the tests. Some of the opcode tested have immediate operands.

--Tom

gtoal · Post by **gtoal** » Sun Aug 21, 2022 11:30 pm

> Certainly - what should I bear in mind? There is some self-modifying code as part of the driver (saving cycles in the main loop - it runs millions of times!) and, more necessarily, as part of the tests. Some of the opcode tested have immediate operands.

Well, there's one right there! Self-modifying code won't work in a static translator. Well, it might work but only because the translator would be coded to fall back to an interpreter if the code is changed after the initial translation, but then you wouldn't be testing the translation any more.

The main thing is to avoid too granular a translation unit, i.e. you don't want to have to invoke the translator and the C compiler millions of times - and there has to be a clean separation between the harness that generates the tests, and the actual running of the tests and examination of the registers etc to determine if the test was successful. If possible the test generation should be done by high-level code on the host system, not by 6502 code in the emulated environment. Then call the emulator, return to the controller, and examine the virtual memory and registers of the emulated system from the controller.

Thanks.

Graham

0xC0DE · Post by **0xC0DE** » Mon Aug 22, 2022 8:32 am

Shouldn't this work out of the box on an Electron, Tom? I tested on 3 different Elk emulators and all I get is a very long list of i/o/s lines. It wasn't until I ran the tests in b2 when I realised what the actual output should be

Am I missing something here?

tom_seddon · Post by **tom_seddon** » Mon Aug 22, 2022 5:16 pm

0xC0DE wrote: ↑Mon Aug 22, 2022 8:32 am Shouldn't this work out of the box on an Electron, Tom? I tested on 3 different Elk emulators and all I get is a very long list of i/o/s lines. It wasn't until I ran the tests in b2 when I realised what the actual output should be
Am I missing something here?

Ahh - very interesting, and thanks for the report! Yes, it should work out of the box, though (as you're finding) it's untested on an Electron, and the user experience for failing tests isn't ideal either.

Which test(s) is it failing on? If you do a Ctrl+N before running it by hand, it should at least pause after the first screenful of output. What's the last test it started running? This might not be the easiest thing to debug remotely, but that'll at least be some initial info...

--Tom

0xC0DE · Post by **0xC0DE** » Mon Aug 22, 2022 6:40 pm

It looks like every test fails because the screen keeps on scrolling with i/o/s lines. Here's a screenshot from the very beginning:

tom_seddon · Post by **tom_seddon** » Mon Aug 22, 2022 8:45 pm

0xC0DE wrote: ↑Mon Aug 22, 2022 6:40 pm It looks like every test fails because the screen keeps on scrolling with i/o/s lines. Here's a screenshot from the very beginning:

Sorry I didn't notice this earlier, but coming back to this thread now I've just noticed that you're trying this on emulators. If running on an emulator, the output simply indicates that the emulator is getting that case wrong, and it needs fixing!

The I/o/s lines describe the test case and the results: I was the input to the instruction, o the output from actually executing the instruction, and s the output from simulating the instruction's execution. (o and s match on my BBC B, which has an R6502A.) The values on each line are: O=operand, A=accumulator, X=X, Y=Y, S=stack pointer, and then the 8 status register bits (upper case = set, lower case = clear).

For immediate instructions, O is the immediate value; for read, read-modify-write and write instructions, O is the value in memory at each point. (For read and immediate instructions, all 3 O values will then be the same; for write instructions, the input O value is of course irrelevant.)

Does that make it any clearer? I'll try to update the README with a better explanation if so!

If you get the chance to run it on a real Electron I'd be interested to hear what results you get.

--Tom

tom_seddon · Post by **tom_seddon** » Mon Aug 22, 2022 9:22 pm

If iterating on a particular set of tests, you can set up the tester to run just those test(s). Put the names of the test(s) of interest in the tests_to_run list, and re-assemble. (The test names can be seen in the ".if should_run_test('blah blah')" lines.)

For example, if you want to test just the BCD stuff:

Code: Select all

tests_to_run=['adc_bcd_nmos','sbc_bcd_nmos']

--Tom

0xC0DE · Post by **0xC0DE** » Mon Aug 22, 2022 9:37 pm

tom_seddon wrote: ↑Mon Aug 22, 2022 8:45 pm If you get the chance to run it on a real Electron I'd be interested to hear what results you get.

Good news: all the tests pass ("ok") on a real Acorn Electron.
(It was also trying to write something to disk (MMFS) at the end but the disk was protected)

Bad news: there is some work to do for emulator authors, including myself!

tom_seddon · Post by **tom_seddon** » Mon Aug 22, 2022 10:39 pm

0xC0DE wrote: ↑Mon Aug 22, 2022 9:37 pm
tom_seddon wrote: ↑Mon Aug 22, 2022 8:45 pm If you get the chance to run it on a real Electron I'd be interested to hear what results you get.
Good news: all the tests pass ("ok") on a real Acorn Electron.
(It was also trying to write something to disk (MMFS) at the end but the disk was protected)

Bad news: there is some work to do for emulator authors, including myself!

Lovely, thanks for the confirmation!

I wonder what it was trying to write at the end? That's not intentional. It's supposed to simply re-enter the original language ROM, which it reads with OSBYTE 252 and re-enters using OSBYTE 142. I had a quick skim through the Electron AUG and it looks like that should be supported on the Electron too.

(One unusual thing it does to is write to $fffe on startup and $ffff on exit, so that b2's instruction logging can start/stop at the right times. Those locations aren't write-sensitive on the Electron are they? I can easily change this if they are.)

--Tom

0xC0DE · Post by **0xC0DE** » Tue Aug 23, 2022 9:58 am

tom_seddon wrote: ↑Mon Aug 22, 2022 10:39 pm I wonder what it was trying to write at the end? That's not intentional. It's supposed to simply re-enter the original language ROM, which it reads with OSBYTE 252 and re-enters using OSBYTE 142. I had a quick skim through the Electron AUG and it looks like that should be supported on the Electron too.

Yes those OSBYTEs work on Elk as well. I saw it re-enter BASIC (printed "BASIC") at the end and then "Disk read only" or something like that. Wouldn't worry too much about it; could be my MMFS cartridge acting up.

tom_seddon wrote: ↑Mon Aug 22, 2022 10:39 pm (One unusual thing it does to is write to $fffe on startup and $ffff on exit, so that b2's instruction logging can start/stop at the right times. Those locations aren't write-sensitive on the Electron are they? I can easily change this if they are.)

I saw those in your source and was wondering about its purpose but they aren't write-sensitive on an Elk.

scarybeasts · Post by **scarybeasts** » Sat Aug 27, 2022 6:54 pm

tom_seddon wrote: ↑Sun Aug 21, 2022 1:23 am Plans for this in the longer term:

more automation-friendly behaviour, so I can run it as part of the CI process for b2. At the moment, the only real way to tell if things succeeded or failed is to have an actual person look at it running using their eyes

Hi Tom -- this is a great test case for emulators. 2 great bugs identified in beebjit, so it would be nice to run it as part of the test suite.

For beebjit, I just recently started running a few functional tests, inspired by the way jsbeeb does it (nice one Matt / Rich!) The jsbeeb trick is to wire into the built-in debugger and monitor for a certain 6502 PC reached, which indicates success. Adding to that trick, we could equally as well have some 6502 PCs that indicate failure.

Is your code structured in a way where there's a distinct PC for printing an error line, and a distinct PC for finishing the test? If so, it might already be automation friendly for emulators with powerful debuggers.

Cheers
Chris

tom_seddon · Post by **tom_seddon** » Sat Aug 27, 2022 8:06 pm

scarybeasts wrote: ↑Sat Aug 27, 2022 6:54 pm Is your code structured in a way where there's a distinct PC for printing an error line, and a distinct PC for finishing the test? If so, it might already be automation friendly for emulators with powerful debuggers.

Not currently, but that's the sort of thing I was thinking of doing - I'll probably put some little routines somewhere, with documented addresses, that the test calls when specific events happen. By default these do nothing (probably just rts:nop:nop), but they'd serve as somewhere you could poke your own hooks into and/or somewhere you could put breakpoints for debugging use.

(I'm also planning on adding a hook that gets called before each test is run, that will allow tests to be skipped. This would serve as a way of potentially introducing some more dynamic mechanism for running specific tests while testing, meaning you could control that from the test runner without having to rebuild the code.)

--Tom

dp11 · Post by **dp11** » Sun Aug 28, 2022 11:51 am

tom_seddon wrote: ↑Sat Aug 27, 2022 8:06 pm
scarybeasts wrote: ↑Sat Aug 27, 2022 6:54 pm Is your code structured in a way where there's a distinct PC for printing an error line, and a distinct PC for finishing the test? If so, it might already be automation friendly for emulators with powerful debuggers.
Not currently, but that's the sort of thing I was thinking of doing - I'll probably put some little routines somewhere, with documented addresses, that the test calls when specific events happen. By default these do nothing (probably just rts:nop:nop), but they'd serve as somewhere you could poke your own hooks into and/or somewhere you could put breakpoints for debugging use.

(I'm also planning on adding a hook that gets called before each test is run, that will allow tests to be skipped. This would serve as a way of potentially introducing some more dynamic mechanism for running specific tests while testing, meaning you could control that from the test runner without having to rebuild the code.)

--Tom

Would allocating a spare address in &FExx where writes to it are are trapped be better? So a write of zero is a pass and a write on non zero is the test number that failed?

Post by **1024MAK** » Sun Aug 28, 2022 1:23 pm

Is there already a I/O address which is already used or allocated for testing? E.g. by other test code or by hardware?

Mark

scarybeasts · Post by **scarybeasts** » Sun Aug 28, 2022 6:34 pm

dp11 wrote: ↑Sun Aug 28, 2022 11:51 am Would allocating a spare address in &FExx where writes to it are are trapped be better? So a write of zero is a pass and a write on non zero is the test number that failed?

I like it, it's more explicit that the scheme I proposed in the other thread.
I have something similar for the way I currently run the little integration tests. Instead of the tube, $FEE0 is used as a test assist. If you write to $FEE0, the emulator deliberately crashes

Cheers
Chris

dp11 · Post by **dp11** » Sun Aug 28, 2022 9:16 pm

scarybeasts wrote: ↑Sun Aug 28, 2022 6:34 pm
dp11 wrote: ↑Sun Aug 28, 2022 11:51 am Would allocating a spare address in &FExx where writes to it are are trapped be better? So a write of zero is a pass and a write on non zero is the test number that failed?
I like it, it's more explicit that the scheme I proposed in the other thread.
I have something similar for the way I currently run the little integration tests. Instead of the tube, $FEE0 is used as a test assist. If you write to $FEE0, the emulator deliberately crashes

Cheers
Chris

Would say using &FEFF be sensible to trap ?

scarybeasts · Post by **scarybeasts** » Mon Aug 29, 2022 4:17 am

dp11 wrote: ↑Sun Aug 28, 2022 9:16 pm Would say using &FEFF be sensible to trap ?

My knowledge of the tube is woeful, but the b-em source code is a favorite resource of mine.
It looks like a range of 8 bytes is mirrored 4 times.
Writing to $FEFF would be a write to tube index 7. b-em calls that "Register 4" and it seems to do something. index 6 is not handled for write at all, so maybe a write to $FEFE is more harmless?

Cheers
Chris

TobyLobster · Post by **TobyLobster** » Mon Aug 29, 2022 7:20 am

For reference http://www.mdfsnet.f9.co.uk/Docs/Comp/B ... SHEILAddrs

tom_seddon · Post by **tom_seddon** » Mon Aug 29, 2022 1:07 pm

$fefe is fifo 4 status on the parasite, and the app note says "write (sets IRQ)" - but what that actually means I don't know.

I don't see that these addresses really need to be in page $fe though. The way I see it, you'll have a special test mode in the emulator for running tests such as this, so that the addresses can do destructive stuff to simplify automated testing (e.g., immediately quite the emulator with process exit code 1). You don't want that behaviour active normally.

And if you've got a special mode, the addresses can be anywhere! So you might as well decide that they're somewhere inside the program binary, as the effect of writing/executing those locations will be guaranteed benign when the test is running on normal hardware or in the emulator's normal mode.

--Tom

dp11 · Post by **dp11** » Mon Aug 29, 2022 1:39 pm

I was thinking there could be some sort of standard. Using code space may not always be possible.

Post by **Rich Talbot-Watkins** » Mon Aug 29, 2022 1:47 pm

The Dormann tests work by branching to themselves in an infinite loop on success or failure, something which you can detect in an emulator and cross-reference the PC against the assembly listing to see what went wrong.

I guess a similar trick, which is potentially easier to detect in an emulator, is JSR &FFFF for success, and JSR &FFFE for failure (two addresses which it never makes sense to execute), and then look at the stack to see where they came from.

I don't really see that a standard is necessary though. Better to define success and failure as macros in your assembler source (as the Dormann tests do), so that you can customise them for your own test environment if you wish.

Post by **1024MAK** » Mon Aug 29, 2022 1:49 pm

tom_seddon wrote: ↑Mon Aug 29, 2022 1:07 pm And if you've got a special mode, the addresses can be anywhere! So you might as well decide that they're somewhere inside the program binary, as the effect of writing/executing those locations will be guaranteed benign when the test is running on normal hardware or in the emulator's normal mode.

Or use a unused, or little used I/O address, say in FRED - 0xFCxx (then maybe a simple 1MHz bus device could display some information on a simple LED display if used on real hardware).

The internal I/O space (SHEILA - 0xFExx) has lots of partial address decoding, so it is harder to spot unused addresses.

Mark

dp11 · Post by **dp11** » Mon Aug 29, 2022 9:10 pm

I've chosen &FCD0.

tom_seddon · Post by **tom_seddon** » Thu Dec 15, 2022 11:17 pm

tom_seddon wrote: ↑Sat Aug 27, 2022 8:06 pm
scarybeasts wrote: ↑Sat Aug 27, 2022 6:54 pm Is your code structured in a way where there's a distinct PC for printing an error line, and a distinct PC for finishing the test? If so, it might already be automation friendly for emulators with powerful debuggers.
Not currently, but that's the sort of thing I was thinking of doing - I'll probably put some little routines somewhere, with documented addresses, that the test calls when specific events happen. By default these do nothing (probably just rts:nop:nop), but they'd serve as somewhere you could poke your own hooks into and/or somewhere you could put breakpoints for debugging use.

(I'm also planning on adding a hook that gets called before each test is run, that will allow tests to be skipped. This would serve as a way of potentially introducing some more dynamic mechanism for running specific tests while testing, meaning you could control that from the test runner without having to rebuild the code.)

I got round to sorting this out in the end, as part of doing some actual meaningful work on b2 again over the past couple of months. There's now a generic version, that runs at $2000, uses specific areas of zero page, and has callbacks at documented addresses for events of interest that might occur as it runs. On a real system, poke whatever code you like there; on an emulator, same, and/or trap opcodes at those specific addresses. See the relevant section of the README: https://github.com/tom-seddon/6502-test ... her-system

You can see how b2 uses this here: https://github.com/tom-seddon/b2/blob/0 ... _tests.cpp - it pokes opcode $12 HLT in appropriate places, then has its 6502 emulator treat opcode $12 specially, figuring out which event occurred by checking the program counter.

--Tom

P.S. the 6502 instruction tester does not test any of the HLTs, as it is intended to be something that you can see run to a successful completion on a real system. But maybe some HLT tests could be worth adding. Testing an emulator: either the test gets stuck for 1,000 cycles, or it fails. Testing a real system: you run it and watch for the hang. Has to be a manual process, but there's a finite number of opcodes to test

stardot.org.uk

6502 instruction behaviour tester

6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester

Re: 6502 instruction behaviour tester