py8dis - a programmable static tracing 6502 disassembler in Python

handy tools that can assist in the development of new software
dp11
Posts: 1757
Joined: Sun Aug 12, 2012 9:47 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by dp11 »

More feature requests/ideas to consider.

If a label is always call via jsr then prepend function_

If a label looks like data then prepend data_

Now with the basic static analysis if you can detect a simple loop then prepend loop_ maybe also indent the loop? I suggest there is a line limit

Now getting advanced on 6502 because bra didn't exist sometimes a conditional branch was used. So maybe comment branch always taken. E. G.

Code: Select all

 beq label1
 Bne label2 ; branch alway taken.
Maybe add a blank line after rts, Jmp, bra? Ie we have come to the end of this code block.

My thinking here is when scrolling through the output to try and get a feel of how the code breaks down to blocks quickly.
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

SteveF wrote: Sun Sep 19, 2021 12:09 am My inclination is to just go with "mos_labels()", since I think the OS 1.20 API is a subset of the OS 3.20 API.
Yes, sounds good to me.
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

TobyLobster wrote: Sun Sep 19, 2021 7:01 am
SteveF wrote: Sun Sep 19, 2021 12:09 am My inclination is to just go with "mos_labels()", since I think the OS 1.20 API is a subset of the OS 3.20 API.
Yes, sounds good to me.
Thanks, I've pushed this change to the autoexpr branch.
dp11 wrote: Sun Sep 19, 2021 5:38 am More feature requests/ideas to consider.
These are good ideas, thanks. I will try to make the code analysis a bit less hacky first (have a simple Python function for each opcode which can update an emulated CPU state) and then see what I can do with them...
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

dp11 wrote: Sun Sep 19, 2021 5:38 am If a label is always call via jsr then prepend function_
Done, although I used "sub_" as a prefix instead.

At the moment a code label has to be referenced by nothing but "JSR" instructions to qualify for the "sub_" prefix; I did wonder about changing this so there has to be at least one JSR but otherwise accept unconditional branches (BRA, JMP abs) as well, since those might just be tail calls.
dp11 wrote: Sun Sep 19, 2021 5:38 am If a label looks like data then prepend data_
This was kind of there already, since labels get a "c" prefix if they are code and "l" if they don't look like code.

I could make the prefixes used for the above configurable if that would be helpful.
dp11 wrote: Sun Sep 19, 2021 5:38 am Now with the basic static analysis if you can detect a simple loop then prepend loop_ maybe also indent the loop? I suggest there is a line limit
I've done this, although I'm a bit dubious about the indentation. Currently any label which is referenced by a *single* other instruction which is a backwards branch (conditional or unconditional) to it and no more than config.loop_limit bytes away (default 32) will get a "loop_" prefix. Indentation is on by default for now but can be turned off with "config.indent_loops = False".

Suggestions for alternate heuristics are welcome. Perhaps I'm being overly strict in insisting on a single reference to a label for it to qualify for the "loop_" prefix, for example; maybe as long as all references meet the conditions that should be fine? Then again, it might make the indentation even worse if there's more than one label.
dp11 wrote: Sun Sep 19, 2021 5:38 am Now getting advanced on 6502 because bra didn't exist sometimes a conditional branch was used. So maybe comment branch always taken. E. G.

Code: Select all

 beq label1
 Bne label2 ; branch alway taken.
I haven't done this yet. The "code analysis" framework has been improved somewhat and should now be better at ignoring "irrelevant" instructions and understanding TAX/TAY/TXA/TYA when auto-labelling OS call operands, but I'm not sure it's really good enough to do this. I'll give this a bit more thought anyway. It could probably be done via simple pattern recognition instead of trying to fit it in with the code analysis.

At the moment the inferences of the code analysis are shown in comments to the right of the hex dump by default. These inferences are probably not helpful for anything except debugging py8dis, since they're "at least one execution of this instruction leaves the CPU in this state" inferences, not "every execution of this instruction leaves the CPU in this state" inferences. You can get rid of them with "config.show_cpu_state = False"; sooner or later I'll probably make that the default.
dp11 wrote: Sun Sep 19, 2021 5:38 am Maybe add a blank line after rts, Jmp, bra? Ie we have come to the end of this code block.
I've done this; I must say it seems to really help break the disassembly up nicely for such a simple change.

These changes are on the new analyse branch. As I always seem to say, the implementation is pretty hacky but it does work for me. (I vaguely intend to tidy the code up once there are no further suggestions for improvement; I'm trying to avoid putting a lot of effort into structuring it neatly then finding out that structure doesn't work for new feature X...)
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Here's a few suggestions. I've had a bit of a test using Chuckie Egg as an example to disassemble.

1. The inclusion of '|' seems superfluous here:

Code: Select all

    lda #osbyte_clear_escape ; '|'                          ; 29d8: a9 7c
    jsr osbyte                                              ; 29da: 20 f4 ff
where the constant has already been identified.

2. Replace immediate constants #$00 to #$09 with #0 to #9 for readability. The 9 limit is fairly arbitrary, but there are a lot of small constant values typically and it scans more easily (for me at least).

3. Handle strings with the first byte actually being the length (maybe this is already possible with hooks?)

4. Categorise an address / range as a 'vdu string' (somehow), then annotate the VDU control codes with explanatory text (also note decimal numbers), e.g.:

Code: Select all

.string_enteryourname
    EQUB string_enteryourname_end - string_enteryourname_start
.string_enteryourname_start
    EQUB 24                     ;
    EQUW 0, 0, 1279, 892        ; Set graphics window to 0,0,1279,892
    EQUB 16                     ; CLG
    EQUB 26                     ; Restore default windows
    EQUB 18, 0, 1               ; GCOL 0,1
    EQUB 25, 4                  ;
    EQUW 160, 160               ; MOVE 160,160
    EQUS "ENTER YOUR NAME"      ;
    EQUB 25, 4                  ;
    EQUW 384, 100               ; MOVE 384,100
    EQUB 18, 0, 2               ; GCOL 0,2
    EQUS "Player "              ;
.string_enteryourname_end
5. The ability to show some bytes/words/dwords in a table (i.e. each line having a set number of bytes / words / dwords). Then additionally allow binary data to be displayed (often useful for sprite data), e.g. The large G in 'CHUCKIE EGG' banner:

Code: Select all

.sprite_bigg
	EQUB %00000011, %10000000
	EQUB %00001111, %11100000
	EQUB %00011111, %11110000
	EQUB %00011111, %11111000
	EQUB %00111111, %11111000
	EQUB %00111111, %11111100
	EQUB %01111111, %11111100
	EQUB %01111111, %11111100
	EQUB %01111110, %01111100
	EQUB %01111100, %00111000
	EQUB %11111000, %00011000
	EQUB %11111000, %00000000
	EQUB %11110000, %00000000
	EQUB %11110000, %00000000
	EQUB %11110000, %00000000
	EQUB %11110000, %00000000
	EQUB %11110000, %11111000
	EQUB %11110000, %11111000
	EQUB %11111000, %11111100
	EQUB %11111000, %01111100
	EQUB %01111100, %00111100
	EQUB %01111110, %00111100
	EQUB %01111111, %01111100
	EQUB %01111111, %11111100
	EQUB %00111111, %11111100
	EQUB %00111111, %11111100
	EQUB %00011111, %11111100
	EQUB %00011111, %11111000
	EQUB %00001111, %11111000
	EQUB %00000011, %11110000
I know Acme can also do dots and hashes which looks nice too e.g.: !byte %......##, %####....
(not sure if BeebAsm can?)

6. Add a newline before !pseudopc and also after the terminating '}'

7. It would be fun if this: label(0x193f, "+") worked where + is a local label. But it looks like the + as the label isn't included in the output?

8.Identify internal/inkey key numbers, e.g. instead of:

Code: Select all

    ldx #$ab                                                ; 1a3b: a2 ab
    ldy #$ff                                                ; 1a3d: a0 ff
    lda #osbyte_inkey                                       ; 1a3f: a9 81
    jsr osbyte                                              ; 1a41: 20 f4 ff
This:

Code: Select all

    ldx #inkey_key_H                                        ; 1a3b: a2 ab
    ldy #$ff                                                ; 1a3d: a0 ff
    lda #osbyte_inkey                                       ; 1a3f: a9 81
    jsr osbyte                                              ; 1a41: 20 f4 ff
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

TobyLobster wrote: Mon Sep 20, 2021 9:28 pm Here's a few suggestions. I've had a bit of a test using Chuckie Egg as an example to disassemble.
Thanks, some good ideas here. I've had a look at a few of them - I will probably consider the others too, I just haven't done them yet...
TobyLobster wrote: Mon Sep 20, 2021 9:28 pm 1. The inclusion of '|' seems superfluous here:
Fixed - we no longer emit the character constant where we have an expression.
TobyLobster wrote: Mon Sep 20, 2021 9:28 pm 2. Replace immediate constants #$00 to #$09 with #0 to #9 for readability. The 9 limit is fairly arbitrary, but there are a lot of small constant values typically and it scans more easily (for me at least).
Good idea, done.
TobyLobster wrote: Mon Sep 20, 2021 9:28 pm 3. Handle strings with the first byte actually being the length (maybe this is already possible with hooks?)
I've added stringn(addr), which interprets the byte at addr as a length and the following length bytes as a string, returning the address of the first byte after the string. There's also a stringn_hook function which you can use if you have a subroutine that takes such a string inline.
TobyLobster wrote: Mon Sep 20, 2021 9:28 pm 6. Add a newline before !pseudopc and also after the terminating '}'
Done.
TobyLobster wrote: Mon Sep 20, 2021 9:28 pm 7. It would be fun if this: label(0x193f, "+") worked where + is a local label. But it looks like the + as the label isn't included in the output?
The basic problem here is that the checks for "simple" label names didn't allow "+", so that looked like an expression label internally and thus would only be output if something actually referred to that address. (I think; I haven't looked into this too deeply). I have experimentally allowed label names which consist of nothing but all-"+" or all-"-" to be treated as "simple", so the "+" label should now be output. Is this actually useful as-is? Will you be able to generate any references to the "+" label?
TobyLobster wrote: Mon Sep 20, 2021 9:28 pm 8.Identify internal/inkey key numbers, e.g. instead of:
Done. This was easy, except for typing in the inkey table :-), and I think it's a pretty cool feature. I've only tested it with your code fragment so please let me know if you find any errors in the inkey table.

All this is on the analyse branch.
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Thanks, that was quick work!

For 2. I was thinking that only operands e.g. LDA #$00 would be converted to LDA #0, but data bytes are considered too e.g.

Code: Select all

    !byte $10, $12, 0  , 4  , $19, 4  , $e0, 1              ; 0ab3: ........
This looks a bit untidy to me, what do you think?
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

I think you're right and it's probably better to just use hex for byte data, so I've pushed a change to the analyse branch which does that.
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Sorry to keep giving shopping lists, but here we are:

1. Sadly 'stringn()' isn't working for me:

Code: Select all

Traceback (most recent call last):
  File "chuckie_orig.py", line 147, in <module>
    stringn(0x0AB2)
  File "/Users/tobynelson/Code/6502/py8dis/py8dis/classification.py", line 312, in stringn
    add_expression(addr, utils.LazyString("%s - %s", disassembly.get_label(addr + 1 + length, addr), disassembly.get_label(addr + 1, addr)))
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
2. I can declare a local "+" label once e.g. 'label(0x193f, "+")', but a second such e.g. 'label(0x1dca, "+")' causes:

Code: Select all

Traceback (most recent call last):
  File "chuckie_orig.py", line 835, in <module>
    label(0x1dca, "+")
  File "/Users/tobynelson/Code/6502/py8dis/py8dis/commands.py", line 82, in label
    disassembly.add_label(addr, name)
  File "/Users/tobynelson/Code/6502/py8dis/py8dis/disassembly.py", line 90, in add_label
    ensure_annotation(addr, s)
  File "/Users/tobynelson/Code/6502/py8dis/py8dis/disassembly.py", line 79, in ensure_annotation
    assert all_simple_labels[s] == addr
AssertionError
This "+" trick could also be error prone if a later branch instruction tries to branch backwards to "+". This might need (a) some care to avoid incorrect output OR (b) just disallowing the whole thing.

3. The following causes an assertion error in acme:

Code: Select all

expr(0x2208, "MapId_Egg OR MapId_Seed")
leads to:

Code: Select all

!error: Assertion failed: MapId_Egg OR MapId_Seed == $0c
It needs brackets in the assertion:

Code: Select all

!if (MapId_Egg OR MapId_Seed) != $0c {
    !error "Assertion failed: MapId_Egg OR MapId_Seed == $0c"
}
Workaround is to use:

Code: Select all

expr(0x2208, "(MapId_Egg OR MapId_Seed)")
4. Perhaps the ability to report a list of as yet unnamed labels?
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

TobyLobster wrote: Tue Sep 21, 2021 11:45 am Sorry to keep giving shopping lists, but here we are:
No problem, the interest is gratifying! :-)
TobyLobster wrote: Tue Sep 21, 2021 11:45 am 1. Sadly 'stringn()' isn't working for me:

Code: Select all

Traceback (most recent call last):
  File "chuckie_orig.py", line 147, in <module>
    stringn(0x0AB2)
  File "/Users/tobynelson/Code/6502/py8dis/py8dis/classification.py", line 312, in stringn
    add_expression(addr, utils.LazyString("%s - %s", disassembly.get_label(addr + 1 + length, addr), disassembly.get_label(addr + 1, addr)))
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
I'm sure this is a silly bug on my part but I did give it a quick test and it worked for me. Can you let me have a test case I can use to reproduce this? It doesn't need to be cut down or anything. Attached as a zip here or on a github issue would be fine, or if you'd rather not share the code yet you could e-mail me at the address on the git commit log.

I'll take a look at your other points later but since this one needs further information I thought I'd ask for it ASAP...

Edited to add: with respect to acme local labels, py8dis really assumes all label names are unique and I suspect the best way to support them is to write some Python code which will analyse the labels using the "label X is referenced at points Y and Z" reference table it builds up and swizzle in some local labels at the last minute. This would be a little bit like the loop detection code. I will probably give this a try later and see if I can make it work.
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks Toby.

The problem here is that you're doing stringn() using an address in the move()d region *before* you do the move(), so memory[0xab2] is None. If you move the stringn() after the move() it should work.

I am not sure there's any way to "fix" this as such in py8dis. Obviously the error message could be improved, although this is generally quite poor anyway - to some extent this sort of unfriendliness goes hand-in-hand with the "debugger as a library" approach, although I could certainly add some extra assertions (perhaps as part of the vaguely-planned-for tidying phase one the code stops changing to accomodate new features). It would also perhaps be possible (but probably quite annoying) to insist that move() is done early in the control file to reduce the scope for confusion, but I'm not all that keen on that.

Anyway, please try rearranging this and let know how you get on.

Incidentally I have pushed a fix for the missing brackets on acme assertions to the assembly branch. (Because it was easy. I'll have a look at the other stuff later...)
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Thanks that works. It does make some sense, but it didn't occur to me at the time. For the most part the order of the commands doesn't really matter, so I missed when it did. Perhaps a note in the documentation would help.

I'm all for meaningful error messages.
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

TobyLobster wrote: Tue Sep 21, 2021 5:45 pm Thanks that works. It does make some sense, but it didn't occur to me at the time. For the most part the order of the commands doesn't really matter, so I missed when it did. Perhaps a note in the documentation would help.

I'm all for meaningful error messages.
I've put a couple of TODOs in the code to address these points later, which is almost as good, right? :-)
TobyLobster wrote: Tue Sep 21, 2021 11:45 am 4. Perhaps the ability to report a list of as yet unnamed labels?
I've just pushed support for this to the analyse branch - they appear in a comment just above the assertions. Lightly tested but seems to work, please let me know if it shows stuff it shouldn't or doesn't show stuff it should...
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

SteveF wrote: Tue Sep 21, 2021 6:05 pm please let me know if it shows stuff it shouldn't or doesn't show stuff it should...
That's useful and seems to work fine. In my case it has highlighted labels I have now fixed and pushed. However I'm not sure why l0c06 is being generated, it doesn't seem to be used?
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks Toby. I've had a look at the l0c06 issue and pushed a semi-fix to the analyse branch.

I'm not sure I'm explaining this well, but FWIW: the heart of the problem was that the code was fighting with itself a bit over whether to split or merge byte data classifications. There was a 13-byte classification at 0xc06 and although the code was deciding to split that to put your labels in at 0xc09, it *also* decided to emit l0c06 so it could address 0x0c09 as l0c06+3, which would have been necessary if the 13-byte classification had remained intact. Since it *wasn't* actually necessary, the l0c06 label was emitted redundantly.

This should now be fixed but the fix makes the code a bit less aggressive about forcing in labels like l0c00 in your disassembly and these are now emitted as expressions rather than "inline labels". I probably need to tweak the behaviour here. In the meantime, if you want to change this you can explicitly label these addresses and that will make them appear as "inline labels".
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

SteveF wrote: Tue Sep 21, 2021 9:12 pm This should now be fixed but the fix makes the code a bit less aggressive about forcing in labels like l0c00 in your disassembly and these are now emitted as expressions rather than "inline labels". I probably need to tweak the behaviour here. In the meantime, if you want to change this you can explicitly label these addresses and that will make them appear as "inline labels".
Yes this works now.
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Here's a few more relating to comments:

1.
SteveF wrote: Fri Sep 10, 2021 3:17 am The output includes a hex dump for each line; this can be turned off later
Is there a command line option to turn off the hex dump and other commentary ("; Referenced 21 times by ..." / "Label references by decreasing frequency:" / "Automatically generated labels:")

2. The ability to output a newline before a label definitions and before its comment would help prettify the output e.g. perhaps

Code: Select all

newline(0x0cd0)
3. Adding comments at the end of a line would be nice, e.g.:

Code: Select all

.map0data
    EQUB (map0platform_end - map0platform_start) / 3    ; number of platforms
    EQUB (map0ladder_end - map0ladder_start) / 3        ; number of ladders
    EQUB 0                                              ; has lifts flag
    EQUB (map0seed_end - map0seed_start) / 2            ; number of seeds
    EQUB 2                                              ; initial number of birds
4. The ability to indent a comment by four spaces to align with the instructions / data.
Last edited by TobyLobster on Wed Sep 22, 2021 9:55 am, edited 1 time in total.
dp11
Posts: 1757
Joined: Sun Aug 12, 2012 9:47 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by dp11 »

A longer term feature request, which not everyone will like , but might make it easier to understand pages of assembler.

In the past for micros , one thing I have done is create a bunch of assembler macros. These included ADD16 , SUB16 INC16 etc, but you can actually go further if the macros assembler is powerful enough, for instance you can create FOR(X,255) and NEXT(X--) . These will take effort to find and substitute, but will make the code that little bit quicker to understand. I'm sure that others will have other macro suggestions. It is easy to create a regression library to test the macros .
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

The beebasm version is not assembling the latest on my repo (https://github.com/TobyLobster/ChuckieTest):

Code: Select all

chuckie_orig_beebasm.asm:5512: error: Symbol already defined.

.spritetable
 ^
The acme version works fine.
EDIT: For some reason .spritetable is being added twice. Maybe an edge case as .spritetable is at the load address of &1100? Also pydis_start is added twice. acme doesn't seem to mind because they have the same value as when they were previously defined (due to the !pseudopc).
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

TobyLobster wrote: Wed Sep 22, 2021 9:06 am Is there a command line option to turn off the hex dump and other commentary ("; Referenced 21 times by ..." / "Label references by decreasing frequency:" / "Automatically generated labels:")
There isn't a command line option, but you can add the following to your control file:

Code: Select all

config.set_label_references(False)
config.set_hex_dump(False)
config.set_bytes_as_ascii(False)
config.show_autogenerated_labels = False
There's an inconsistency here between function calls and direct variable assignments because it's all a big pile of hacks at the moment. ;-)

Genuine question: do you think this would be better as a command line option instead/as well? My working hypothesis so far has been that it's better to keep this kind of stuff in the control file as it's in some ways a reflection on how "finished" the disassembly is. I made assembler choice a command-line option on the grounds that two people might be collaborating on a disassembly and have different assembler preferences. (Or someone might check out a "finished" disassembly from someone else and want to use their own preferred assembler.) But I could certainly see "collaborative" arguments for making all of these command-line options.

Having to specify a load of command line options every time could be annoying, but I suspect in practice most use of py8dis will be done via a shell script wrapper to build-and-compare anyway.
TobyLobster wrote: Wed Sep 22, 2021 1:47 pm The beebasm version is not assembling the latest on my repo (https://github.com/TobyLobster/ChuckieTest):

Code: Select all

chuckie_orig_beebasm.asm:5512: error: Symbol already defined.

.spritetable
 ^
The acme version works fine.
EDIT: For some reason .spritetable is being added twice. Maybe an edge case as .spritetable is at the load address of &1100? Also pydis_start is added twice. acme doesn't seem to mind because they have the same value as when they were previously defined (due to the !pseudopc).
Thanks for reporting this and diagnosing it. I think you're right about this being an edge case; I have pushed a hacky fix for this to the analyse branch.
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

The py8dis code is definitely getting to be a bit of a mess and I'm starting to get buried in the piles of hacks. I am mentally sketching out an improved internal design which I hope will make things cleaner and more understandable. I want to make sure this accommodates as many of the desired features as is practical, so I've skimmed this thread and drawn up a list of features we don't yet have but which have been suggested. Did I miss anything off?

Does anyone have any additional thoughts on what is/isn't worth supporting?
  • Allow inline comments to be added by the control file.
  • Allow more control over comments and white space:
    • Control over indentation
    • Perhaps word-wrapping for long comments?
    • Allow blank lines to be added
    • Preserve the order of multiple annotations at a given address
  • Allow more control over the formatting of data blocks:
    • Specify how many columns are used for byte/word data blocks
    • Allow binary, "picture binary" (if supported), decimal, hex or 'character' output to be specified for individual bytes/regions
  • Allow more control over the formatting of literal operands (same format options as for data blocks)
  • Support for analysing the decoded instruction stream and annotating it or amending it, e.g. to put an inline comment on where we "know" a branch is always taken or to rewrite an instruction sequence to use a macro.
    • While I imagine I will initially write some code to do simple pattern matching on this, ultimately there should be a standard way for a user control file to walk/rewrite the decoded instruction stream itself.
    • It's tempting to want the analysis of the decoded instruction stream to recursively inform the decoding process - e.g. if we infer a branch is always taken, we don't automatically assume the following bytes are code. This is not a bad idea, but it makes me uneasy because the disassembly process then becomes a bit more unpredictable (what if we only inferred a branch wasn't taken because of tracing some code following that branch). My inclination is to keep py8dis "single pass" and output suggestions to the user who can then update the control file accordingly if they wish.
    • Ideally we would be able to have "overlapping" decodings and have some code which decides how to resolve them. I don't think this is too big a deal, but I would like to resolve things down to a single view of what's code and what's data fairly early on in the disassembly process so most code can ignore this possibility.
  • Support for disassembling programs which dynamically copy multiple bits of code to the same address in memory (e.g. a sideways ROM copying different utility routines into RAM depending on what it's doing). This is always going to require some assistance from the user, but at the moment I think it's mostly unimplementable because of the single "consistent" 64K memory array used everywhere.
  • Evaluate expressions internally to check they generate the same value as the input without relying on assembler assertions to do this.
  • Support for assigning local labels (beebasm scopes, acme scopes/+/-, xa?)
    • According to the user's explicit instructions
    • Semi-automatically?
      • acme +/- labels could probably be assigned in a similar way to automatic loop detection.
      • If the user identified a scoped region (for beebasm at least), we could emit labels in that region as local by default but emit them as globals if they're referenced outside it.
  • Simpler support (without writing a user label maker) for using different label names depending on context; see this post for an example and a possible syntax.
I'm not saying I will implement any of this immediately, but I think these are all broadly reasonable features to have and I'd like the data structures and division of code to at least not get in the way of implementing them later.
Last edited by SteveF on Fri Oct 22, 2021 4:23 pm, edited 1 time in total.
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Re: configs
I could see a use for having one configuration for a 'debug' mode (where some or probably everything is turned on), and one for a 'release' mode (everything off).
These configs could be tweaked in the control script, but perhaps --release and --debug (the default) could control which config was used?

Re: Further requests:

* Being able to add a comment before a set of related constants are defined (with a newline before and after).

Code: Select all

; Collision map ids
MapId_Platform = $01
MapId_Ladder = $02
MapId_Egg = $04
MapId_Seed = $08
* Naming these groups of related constants e.g. 'sprites', 'colours', etc and a version of expr() where you specify the address and group name, and py8dis fills in the appropriate constant.

* Defining a constant as an expression involving other constants, e.g. the output would be "Colour1 = Colour2 + Colour3"

* I assume a pattern matching/static analysis system could find cases such as:

Code: Select all

LDX #<addr
LDY #>addr
where 'addr' is a known label? This pattern in particular seems quite common. Support for finding these automatically would be useful.

* Do I have access to the label names, e.g. something like 'label_name = get_label(addr)' returning None if there is no label for that address?

* I like the idea of detecting labels pointing to RTS and naming the "rts1" etc. and "zp_" prefix for zero page variables seems fair enough.

* Are the "zp_" / "sub_" / "data_" / "loop_" / "rts" prefixes configurable?

* Context-specific labels: If more than one label is specified for a memory location (e.g. for a zero page location that is reused) then offer a hook to decide which to use.

* Option to output a list of labels that are defined but never used?

* With config:

Code: Select all

config.indent_loops = False
config.set_label_references(False)
config.set_hex_dump(False)
config.set_bytes_as_ascii(False)
config.show_autogenerated_labels = False
config.show_cpu_state = False
I am still getting these comments:

Code: Select all

    adc #$79 ; 'y'
Is there a way to turn these off too?

EDIT:
Re: Naming these groups of related constants e.g. 'sprites', 'colours', etc and a version of expr() where you specify the address and group name, and py8dis fills in the appropriate constant.
This + static analysis allows you to find pattern e.g.:

Code: Select all

LDA #<constant>
STA sprite_number
and fill in the appropriate constant without special intervention for each instance.
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Those are some good suggestions, thanks. I will look at updating the wishlist in the post above at some point in the next day or two. A few notes/observations (I've slightly re-ordered your post in quoting it):
TobyLobster wrote: Wed Sep 22, 2021 7:55 pm * Are the "zp_" / "sub_" / "data_" / "loop_" / "rts" prefixes configurable?

* Context-specific labels: If more than one label is specified for a memory location (e.g. for a zero page location that is reused) then offer a hook to decide which to use.
Although not very well documented (I made a bit of an effort to document the "master" branch, but all the recent work is sufficiently experimental that it's only documented via vague mentions in this thread) you should already be able to do this. You need to define a function to say how you want labelling done and register it, something like this:

Code: Select all

def my_label_maker(addr, context, suggestion):
    if addr < 0x100:
        return "zp_%04x" % addr
    return suggestion
    
set_label_maker_hook(my_label_maker)
That example is ridiculously simple/useless, but it shows the basic idea. "addr" is the address we want a label for, "context" is the address the label is being used at and "suggestion" is a suggestion for a label. The return value does *not* need to be a label you predeclared, it will be created if it doesn't already exist. Going to extremes, you could copy the body of the existing our_label_maker() function in disassembly.py into my_label_maker() and then tweak its logic as you wish.

I don't expect to change this basic idea in the planned rework either, although the precise details might alter. For example, there might be additional hooks to tweak parts of the label generation without needing to replace everything.
TobyLobster wrote: Wed Sep 22, 2021 7:55 pm * Do I have access to the label names, e.g. something like 'label_name = get_label(addr)' returning None if there is no label for that address?
I thought I knew the answer to this but as I try to explain it I realise I don't. I think what I've written below is true and reasonable and avoids the areas I'm currently still trying to think about before engaging in mass refactoring. :-)

If you're talking simply about labels declared by the control file, there's absolutely no reason such a function couldn't be provided. There is currently a get_label(addr, context) function which probably does what you want; context is "where you intend to use the label" and is just passed through to the label maker, so it's more of a hint than anything concrete. Edit: it will return a utils.LazyString object which you can use to construct other utils.LazyString objects but probably isn't really what you want. If you're actually wanting to do this now let me know and I'll see what I can come up with, if you're asking this in the context of the wishlist I agree it's useful and it should be included.

If you're talking about *any* label, bear in mind that when the control file is executing (unless you're writing code in some sort of hook or callback), we haven't done any tracing yet and any semi-automated marking of regions as code/data hasn't happened yet. We therefore don't know that we're going to need a way to refer to &2004 because there's an "LDA &2004,Y" instruction, so the answer to the question "is there a label at &2004?" is "dunno yet", even before we get onto the question of what that label might be called if it does come into existence.

I hope that makes sense. If you think that's bad/wrong/nonsense please argue with me. :-)
TobyLobster wrote: Wed Sep 22, 2021 7:55 pm I am still getting these comments:

Code: Select all

    adc #$79 ; 'y'
Is there a way to turn these off too?
I've just pushed a change which allows you to do:

Code: Select all

config.show_char_literals = False
to disable these.
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Thanks for the detailed reply, all very interesting. I can confirm that turning off the character literals works.
if you're asking this in the context of the wishlist I agree it's useful and it should be included.
I am :-).
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Corrections for the INKEY keys in acorn.py:

Additionally:

Code: Select all

     -27: "inkey_key_keypad_6",
     -28: "inkey_key_keypad_7",
     -33: "inkey_key_f0", 
Subtractionally:

Code: Select all

     -94: "inkey_key_equals",   # ?
     -95: "inkey_key_left",     # ?
     -96: "inkey_key_right",    # ?
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks Toby, I've pushed those corrections to the "analyse" branch. -94 is shown on https://beebwiki.mdfs.net/Negative_INKEY_numbers as "=/+" but having double-checked a photo of a BBC B keyboard I can see that this must be for some other system. The others just look like simple errors as I bashed my way through the list. :-)
User avatar
TobyLobster
Posts: 618
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

SteveF wrote: Wed Sep 22, 2021 5:05 pm The py8dis code is definitely getting to be a bit of a mess and I'm starting to get buried in the piles of hacks. I am mentally sketching out an improved internal design which I hope will make things cleaner and more understandable.
Hi Steve. Have you had any further thoughts about restructuring the code into a clean version? No pressure, just curious :-)
SteveF
Posts: 1663
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Hi Toby, thanks for your continued interest. :-)

I have been tinkering with this intermittently. What's been eating at me is the handling of move() - this is clearly a very useful feature but implementing it cleanly so it:
  • doesn't tie the implementation in knots
  • doesn't force users doing simple things to learn py8dis's internal model for handling move()
  • is flexible enough to handle a variety of uses
has been a persistent headache. As a result I've been continually hacking at the code, so the code is an utter mess and that in turn is a little dispiriting, but there's no point cleaning it up when I'm still trying to get the move() implementation right.

(The current implementation of move() - as being used by the people reading this thread - is a hack which just barely works in limited cases and introduces some quirks which are hard to fix nicely.)

I think - famous last words - I have finally managed to come up with a model for this that should mostly "just work" for the user while having the flexibility to handle more complex cases and which won't be too confusing for the user or in the py8dis internals. If I can get this working, implementing the other neat features which have been suggested should actually be relatively easy and fun and might come together fairly quickly. So there is some prospect of a new version being announced here in the near-ish future, I haven't just quietly given up yet. :-)
Post Reply

Return to “development tools”