py8dis - a programmable static tracing 6502 disassembler in Python

handy tools that can assist in the development of new software
User avatar
hoglet
Posts: 12681
Joined: Sat Oct 13, 2012 7:21 pm
Location: Bristol
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by hoglet »

Hi Steve,

I have a question about whether it's possible to do something.

Yesterday I was working on a disassembly of the Psion Edit/Debug ROM for the Atom.

This ROM contains two entirely seperate components: an editor and a debugger. These share the same zero page locations. I've curretly labelled the zero page locations as per the editor use, but that means reading the debugger code is very confusing.

I would like to be able to constrain the use of a label to a subset of the code being disassembled, for example:

Code: Select all

; Editor
label(0x007C, "MARKER1L", 0xA000, 0xAB91)
label(0x007D, "MARKER1H", 0xA000, 0xAB91)

; Debugger
label(0x007C, "REG_A", 0xAB92, 0xAFFF)
label(0x007D, "REG_X", 0xAB92, 0xAFFF)
Any thoughts about whether something like this is possible, or might be in the future?

Dave
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

hoglet wrote: Fri Oct 22, 2021 10:45 am This ROM contains two entirely seperate components: an editor and a debugger. These share the same zero page locations. I've curretly labelled the zero page locations as per the editor use, but that means reading the debugger code is very confusing.

I would like to be able to constrain the use of a label to a subset of the code being disassembled, for example:

Code: Select all

; Editor
label(0x007C, "MARKER1L", 0xA000, 0xAB91)
label(0x007D, "MARKER1H", 0xA000, 0xAB91)

; Debugger
label(0x007C, "REG_A", 0xAB92, 0xAFFF)
label(0x007D, "REG_X", 0xAB92, 0xAFFF)
Any thoughts about whether something like this is possible, or might be in the future?
Hi Dave,

You should be able to do this by supplying your own "label maker" function, as sketched in this post. The context passed to the label maker is the address at which the label is being referenced, so if you check to see whether it falls in the editor or debugger region of code in your ROM you should be able to return different labels for the same zp addresses.

Please give that a try and let me know how you get on. It's possible there are bugs in this area as it's probably one of the least tested bits of code, but if you do get stuck I'll be happy to take a look.

It wouldn't be a bad idea to extend the label() function to allow you to do this more simply, but it hasn't been done yet; I've added it to the todo list earlier up the thread. (You could implement this yourself by writing a mylabel() function which populates its own data structure and then have your label maker hook reference that data structure, but it's probably not worth the effort.)

Cheers.

Steve
User avatar
hoglet
Posts: 12681
Joined: Sat Oct 13, 2012 7:21 pm
Location: Bristol
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by hoglet »

SteveF wrote: Fri Oct 22, 2021 4:25 pm Please give that a try and let me know how you get on. It's possible there are bugs in this area as it's probably one of the least tested bits of code, but if you do get stuck I'll be happy to take a look.
Thanks Steve, I'll give that a try over the weekend.

Dave
User avatar
hoglet
Posts: 12681
Joined: Sat Oct 13, 2012 7:21 pm
Location: Bristol
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by hoglet »

SteveF wrote: Fri Oct 22, 2021 4:25 pm Please give that a try and let me know how you get on. It's possible there are bugs in this area as it's probably one of the least tested bits of code, but if you do get stuck I'll be happy to take a look.
This is what I ended up with, and it worked very well:

Code: Select all

def my_label_maker(addr, context, suggestion):
    if context < 0xAB92:
        if addr == 0x000a: return "ALTTOPX"
        if addr == 0x000b: return "ALTTOLH"
        if addr == 0x000c: return "ALTPAGEH"
        if addr == 0x0072: return "LINEPTRL"
        if addr == 0x0073: return "LINEPTRH"
        if addr == 0x0079: return "l0079"
        if addr == 0x007C: return "MARKER1L"
        if addr == 0x007D: return "MARKER1H"
        if addr == 0x007E: return "MARKER2L"
        if addr == 0x007F: return "MARKER2H"
    else:
        if addr == 0x000a: return "BRKADRL"
        if addr == 0x000b: return "BRLADRH"
        if addr == 0x000c: return "BRKINSTR"
        if addr == 0x0072: return "REGPCL"
        if addr == 0x0073: return "REGPCH"
        if addr == 0x0079: return "REGSP"
        if addr == 0x007C: return "REGA"
        if addr == 0x007D: return "REGX"
        if addr == 0x007E: return "REGY"
        if addr == 0x007F: return "REGP"
    return suggestion
Thanks very much for the suggestion.

Dave
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks Dave, it's good to know this works!
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

As I said back in October (!), I've been intermittently tinkering with py8dis and fighting myself over the handling of move(). This has sapped my enthusiasm somewhat, but I think I've now got something which is good enough to ask for user feedback on. I can't stress enough how alpha quality this version of py8dis is; please do give it a try and please let me know how you get on, but if things don't seem to be working it's quite possibly a bug, flaw or piece of idiocy in the py8dis code, so don't spend ages bashing your head against a wall - post here with a link to your control file and binary and I'll take a look.

Before I forget, the code is on github here. (The branch name is a bit arbitrary, don't read anything into it!) The README has *not* been updated since this thread started as it's all been experimental changes.

Before I get onto move(), there are some other simpler but hopefully useful changes:
  • The formatting of blocks of byte and word data has been tidied up. byte() and word() now take an optional "cols=n" argument which specifies how many items should be placed on each line.
  • The hex dump has been tidied up a bit.
  • Comments added with comment() are now word-wrapped automatically; newlines within them are still respected. You can avoid this by using formatted_comment(); the name is intended to convey the idea that you've already formatted the comment, so py8dis should leave it alone.
  • annotate(address, string) will include the raw string at the indicated point in the disassembly. This may or may not be useful in taking advantage of assembler features which py8dis can't use automatically.
  • blank(address) will add a blank line at the indicated point in the disassembly.
  • You can provide hints on how literal values (immediate operands or bytes/words within a byte()/word() block) should be represented. All these commands take an address and an optional number of bytes (defaulting to 1). It's harmless to specify these hints on bytes/words which don't turn into literals in the disassembly; they will just be ignored.
    • uint(addr, n=1) specifies that the value should be represented as an unsigned integer; values <10 will be shown as decimal, other values as hex with leading zeros.
    • char(addr, n=1) specifies that the value is an ASCII character and should be expressed as a character literal like 'A'; if it isn't actually representable in this way, the same conventions as uint() will be applied.
    • binary(addr, n=1) specifies that the value should be expressed in binary.
    • picture_binary(addr, n=1) specifies that the value should be expressed in "picture binary" (e.g. %...##...) if supported by the current assembler (binary otherwise)
    • decimal(addr, n=1) specifies that the value should be expressed in decimal.
    • hexadecimal(addr, n=1) specifies that the value should be expressed in hexadecimal.
Coming back to move(), it's tempting to elaborate at length on the underlying model but I won't do that yet, because I hope it's not necessary for most users to care. I will elaborate later in response to problems being reported or (in the unlikely event that doesn't happen) make a separate post later for reference. I hope most of what you need to know is covered by the following:
  • The move() arguments are the same as before - move(dest, source, length). I don't believe in practice you could get away with it before anyway, but the source address must now always identify a section of the file you load() - you can't move a chunk of data, then move from that chunk of data.
  • You can now move() more than one block of code/data to the same address. This is helpful, for example, when disassembling a ROM which copies different code fragments into the same part of main RAM (e.g. the NMI space at &D00) at different times. py8dis will semi-magically try to keep everything straight when you do this, so (for example) automatically generated labels are defined inline in the correct place.
  • You cannot move() the same block of load()ed code/data more than once. If you try to do this, the last move applying to any particular byte is what counts; this is allowed rather than treating this as an error to make it easier to move sub-chunks of data to different places without needing to manually split things up into non-overlapping ranges in the control file.
  • move() returns an object representing the moved data. You can save this in a variable and subsequently write "with move_object_a:" to help py8dis understand the context of the addresses you provide in calls like entry() and label().
  • In order to avoid confusion, it's probably best to do all the move() calls as early as possible in the control file. (You'll probably want to break this rule later, but do it like this to start with.)
  • To get the best results, you should probably move() as little as possible. For example, if you're dealing with a large binary that relocates itself on startup, it's probably best to load() it at the final address and use move() to put the self-relocation code at the correct address, rather than load()ing it at the address it would be loaded on a real machine and using move() to express what the self-relocation code does.
  • The hex dump shows both the "source" and "destination" addresses for move()d regions, with an integer in square brackets indicating the move ID (the value returned by move()).
  • move() should more-or-less work on all assemblers, but acme has the "nicest" output, in my opinion.
examples/move3.py is probably the best simple demonstration of using move() with overlapping regions.

If you're interested, particularly if you're a user of move(), please give this a try and let me know what you think. If we can hammer out a satisfactory and implementable behaviour for move() it would be a lot easier to push forward and make various other improvements knowing I am building on relatively firm foundations.
User avatar
TobyLobster
Posts: 622
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Hi Steve,

I have tried some of the changes and they work great. In particular the formatting with decimal() and hexadecimal(), the columns specified by byte(), comment/formatted_comment all work well. There are a few minor issues:

* I couldn't get the columns specifier of word() to work.
* The hex dump for word()s is incomplete - looks like it might be confusing the number of words with the number of bytes?
* We have a large comment output starting with 'All labels by address and move ID' - but can we switch this output off?

I've only tested against my test case of the Chuckie Egg disassembly, but it has a single regular move() and the move() works fine.

Toby
User avatar
TobyLobster
Posts: 622
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Also the picture sprite formatting works nicely:

Code: Select all

sprite_bigbirdright2
    !byte %........, %.###....                              ; 
    !byte %........, %#####..#                              ; 
    !byte %.......#, %###.#.#.                              ; 
    !byte %.......#, %###.##..                              ; 
    !byte %.......#, %######..                              ; 
    !byte %.......#, %#####.#.                              ; 
    !byte %........, %####...#                              ; 
    !byte %........, %###.....                              ; 
    !byte %........, %.##.....                              ; 
    !byte %....###., %.###....                              ; 
    !byte %...#####, %.###....                              ; 
    !byte %..######, %#####...                              ; 
    !byte %.###...#, %#####...                              ; 
    !byte %###.###., %.####...                              ; 
    !byte %##.#####, %#.###...                              ; 
    !byte %#.######, %######..                              ; 
    !byte %########, %######..                              ; 
    !byte %########, %######..                              ; 
    !byte %.#######, %######..                              ; 
    !byte %.#######, %######..                              ; 
    !byte %..######, %#####...                              ; 
    !byte %...#####, %#####...                              ; 
    !byte %....####, %####....                              ; 
    !byte %......##, %###.....                              ; 
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks for trying that Toby, I'm glad it's mostly working!

I've pushed some changes to the same branch to fix the problems you found with word().

The cols problem was simply that I'd missed that argument off the word() function so at least that was an easy fix.

Your diagnosis of the hex dump problem was right - it was outputting n bytes where n was the number of words, not n*item_size bytes. (The number of bytes output is still capped at 3, but it was just wrong before as it would do things like output only 2 bytes when there were 4 eligible for output.)

I've added a config.show_all_labels boolean to allow the "All labels by address and move ID" to be controlled. This defaults to False (off) and that's probably sensible, but I may end up committing a version at some point during development with it set to True (this is useful for trying to figure out why the automagic handling of labels and moves isn't doing what it should), so you might want to explicitly set it to False in your control file.

Please let me know if these fixes work or not and of course if you find any more problems.

Cheers.

Steve
User avatar
TobyLobster
Posts: 622
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Looking good - that all works: The word col works, the show_all_labels option is off by default, and the hex dump is right for word()s.
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Excellent, thanks!
User avatar
MarkMoxon
Posts: 615
Joined: Thu Jul 18, 2019 4:38 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by MarkMoxon »

Hi Steve.

Yesterday I started a disassembly of Revs, and just thought I'd say how much I'm enjoying working with py8dis. Nothing to report yet, except to say that I'm already up and running with a working reassembly in far less time than it took with my previous disassembly solution. I'm looking forward to getting stuck in to the feature set - especially as Revs does quite a lot of code-moving. I'll shout if I get stuck!

Bravo, what a great tool. =D>

Mark
Last edited by MarkMoxon on Sat Jan 08, 2022 7:41 pm, edited 1 time in total.
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks Mark! I'm glad it's working for you so far, I look forward to hearing how you get on - I'll keep an eye on this thread with a certain amount of trepidation. :-) I'm really glad you're taking a look at Revs, I'm sure the results will be as fascinating as your work on Elite.
User avatar
MarkMoxon
Posts: 615
Joined: Thu Jul 18, 2019 4:38 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by MarkMoxon »

Hi Steve.
SteveF wrote: Fri Jan 07, 2022 2:17 pm Thanks Mark! I'm glad it's working for you so far, I look forward to hearing how you get on - I'll keep an eye on this thread with a certain amount of trepidation. :-) I'm really glad you're taking a look at Revs, I'm sure the results will be as fascinating as your work on Elite.
As promised, here's some feedback on my experience with Revs, and in particular with move(). Sorry about the length of this post - I'm trying to condense a few days of heavy py8dis work into one message, and it's making my eyes cross! :-)

py8dis has been brilliant - much quicker than my previous solution. The output is easy to work with, it seems to get almost everything right without needing hints, and I love it. Great job!

I've been using it to knock together a Revs reassembly with all the data blocks identified, all the code blocks disassembled, and everything being assembled at their correct addresses, and I've got there in record time. I've now moved to the "text editor" stage of the commentary, but py8dis has given me a great start.

That said, it wasn't all plain sailing. In particular, the move() command worked fine for most blocks, but not for all. In the end I had to split up the game binary into smaller files, load them individually, and then add COPYBLOCKs at the end using annotate()... which got things working, but it feels like a bit of a hack. See the last part of this message for more thoughts on the split-file approach.

The issue is with Revs, and in particular with its unpacking process reusing memory locations. The binary loads into &1200-&6FFF, and the track data loads into &70DB-&7813, and then there's an unpacking process with multiple steps:
  • Move &1200-&12FF to &7900-&79FF
  • Move &5300-&5949 to &70DB-&7724
  • Move &1500-&15DA to &7000-&70DA
  • Move &1300-&14FF to &0B00-&0CFF
  • Move &5A80-&645B to &0D00-&16DB
  • Move &64D0-&6BFF to &5FD0-&63FF
The first four moves work fine with a normal move() command, but the last two fail. This is because the destinations for these moves overlap the previous move blocks. If I add a move() for each of the blocks above, the BeebAsm output does the following:
  • Assemble code at &7900-&79FF then move it to &1200-&12FF
  • Assemble code at &0B00-&0CFF then move it to &1300-&14FF
  • Assemble code at &7000-&70DA then move it to &1500-&15DA
  • Assemble code at &70DB-&7724 then move it to &5300-&5949
  • Assemble code at &0D00-&16DB then move it to &5A80-&645B
  • Assemble code at &5FD0-&63FF then move it to &64D0-&6BFF
The penultimate step fails because trying to assemble code at &0D00-&16DB will overlap the code that's already at &1200 (from the first step) and &1300 (from the second step) and &1500 (from the third step). Similarly, the last step fails because &5FD0-&63FF clashes with the code from the previous step.

It might be able to get around this somehow by moving the COPYBLOCK commands to the end of the file... but this might be pretty tricky to do generically. Also, the output assembles code in the order in which it appears in the binary file, so you can't rearrange the steps above. Finally, these move blocks form nicely formed code when put together, but the splits can happen in the middle of instructions, meaning that a block can start and end with truncated instructions, which is a problem whan each block is treated as a self-contained ORG...COPYBLOCK construct.

I spent a day or so trying to hack it to work using annotate(), but in the end I figured that perhaps this kind of James Webb-style unpacking might be better implemented by splitting up the binary, which is what I did (see below).

If you are interested in seeing the problem, check out this folder in the py8dis-move branch of my repo, which contains the Revs.bin game binary, and a super-simple revs-source.py file that demonstrates the issue (I've added comments to the move() commands that cause the problem):

https://github.com/markmoxon/revs-beeba ... les/py8dis

***

Personally, I like the move() command as it is, and I wouldn't necessarily try to build in support for this use case. Here are some thoughts on this.

I eventually went for the option of splitting up the binary file into smaller parts, and loading each one at the correct assembly address (rather than the address in the binary file). I then use annotate() to insert a bunch of COPYBLOCK commands at the end of the generated source to pack everything back into the final file. It works... but it's clearly a kludge. See the master branch for the revs-source.py file:

https://github.com/markmoxon/revs-beeba ... les/py8dis

I like this method as it generates a BeebAsm file where the code is shown in the order that it lives in memory when the game is running, rather than the order in the packed binary file. The blocks in the packed binary file only make sense when they are in this order; indeed, as mentioned above, the splitting process doesn't care whether it's cutting instructions in half, so ordering the code in the packed order rather than the unpacked order makes things hard to follow.

To support slicing of assembled code into blocks that get packed arbitrarily into the binary, you could consider adding an extension to the load() command that could load a portion of a file at a specific address, and then you could use that information to add a COPYBLOCK instruction to the end of the generated file. For example, when I split up Revs into its constituent blocks, I used this kind of thing:

Code: Select all

load(0x7900, "1200-12ff.bin")
load(0x0b00, "1300-14ff.bin")
load(0x7000, "1500-15da.bin")
load(0x16dc, "16dc-5a7f.bin")
load(0x0d00, "5a80-645b.bin")
load(0x5fd0, "64d0-6bff.bin")
load(0x6c00, "6c00-6fff.bin")
A reminder that the Revs binary file gets loaded at &1200-&6FFF, so the filenames above reflect where that files in the packed binary (the gap at &15DB-&16DB is intentional - it just contains background noise in the game binary). The load command loads these individual files at the memory locations where they end up being unpacked to.

The above correctly generates disassembly at the right places, so the first one is at ORG &7900, the second one at ORG &0B00, and so on. There are no overlap problems, as this is the unpacked state of the game.

I then have the following at the end of the file (0x8101 happens to be the last address, so this inserts the commands at the end):

Code: Select all

annotate(0x8101, "COPYBLOCK &5FD0, &6700, &64D0")
annotate(0x8101, "COPYBLOCK &0D00, &16DC, &5A80")
annotate(0x8101, "COPYBLOCK &7000, &70DB, &1500")
annotate(0x8101, "COPYBLOCK &0B00, &0D00, &1300")
annotate(0x8101, "COPYBLOCK &7900, &7A00, &1200")
There are fewer annotate() commands than load() commands because the 16dc-5a7f.bin and 6c00-6fff.bin files don't get moved.

This works, but it's obviously a hack (and I also have to change the SAVE command, as that's using the assembly addresses, rather than the binary file addresses).

If you did want to support arbitrarily cut-up blocks of code without needing this hack, then you could add, say, a move_to= parameter, like this:

Code: Select all

load(0x7900, "1200-12ff.bin", move_to=0x1200)
load(0x0b00, "1300-14ff.bin", move_to=0x1300)
load(0x7000, "1500-15da.bin", move_to=0x1500)
load(0x16dc, "16dc-5a7f.bin")
load(0x0d00, "5a80-645b.bin", move_to=0x5a80)
load(0x5fd0, "64d0-6bff.bin", move_to=0x64d0)
load(0x6c00, "6c00-6fff.bin")
This adds a parameter containing the address of this code within the final binary. Along with the file size, you could generate the COPYBLOCK commands to insert at the end.

Taking this to the next level, you could even extend load() to allow for loading of just a portion of a file, so we wouldn't have to split the binary up into separate files. Say Revs.bin is our game binary, then we could have:

Code: Select all

load(0x7900, "Revs.bin", slice_start=0x0000, slice_end=0x00ff, move_to=0x1200)
load(0x0b00, "Revs.bin", slice_start=0x0100, slice_end=0x02ff, move_to=0x1300)
load(0x7000, "Revs.bin", slice_start=0x0300, slice_end=0x03da, move_to=0x1500)
load(0x16dc, "Revs.bin", slice_start=0x04dc, slice_end=0x487f)
load(0x0d00, "Revs.bin", slice_start=0x4880, slice_end=0x525b, move_to=0x5a80)
load(0x5fd0, "Revs.bin", slice_start=0x52d0, slice_end=0x59ff, move_to=0x64d0)
load(0x6c00, "Revs.bin", slice_start=0x5a00, slice_end=0x5dff)
The slice_* parameters are the start and end offsets of the blocks within the binary file.

This would let you support binaries that unpack in overlapping and arbitrarily complex ways, without having to manually split the binary into files. Rather than trying to extend the move() command to support such complex packing (which might open up a can of worms), adding support for file-splitting might make a nice companion to move(), for disassemblies that need it.

Anyway, these are just vague ramblings - I just thought I'd write them down in case anything sounds good.

Thanks again for py8dis. It's great!

Mark
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

MarkMoxon wrote: Sun Jan 09, 2022 3:51 pm As promised, here's some feedback on my experience with Revs, and in particular with move(). Sorry about the length of this post - I'm trying to condense a few days of heavy py8dis work into one message, and it's making my eyes cross! :-)

py8dis has been brilliant - much quicker than my previous solution. The output is easy to work with, it seems to get almost everything right without needing hints, and I love it. Great job!
Thanks Mark, I'm glad it worked well for you. I appreciate you taking the time to give such detailed feedback, including a test case - that's really helpful. Sorry I didn't reply earlier but I wanted to take the time to try to understand your points properly. This post is me thinking out loud to some extent.

The root of the problem here (apart from laziness on my part :-) ) is that py8dis is trying to support the beebasm, acme and xa assemblers. In general the difference between assemblers is fairly minor syntactic stuff, but beebasm has a completely different model for "relocated" code than the other two.

Any experts/afficionados of these assemblers please correct me if I'm wrong, but to summarise the behaviour (for the benefit of readers in general):

beebasm treats memory like a canvas and you can assemble code (or data, of course) anywhere in the 64K address space then copy it around afterwards. So if you have a block of code that runs at &1200 and a block of code that runs at &3000, you assemble them at those addresses. If you want them at &5300 and &7000 in the generated binary, you then use COPYBLOCK to relocate the results of those other assemblies there, then SAVE exactly the portion of memory you want.

acme and xa turn the stream of assembler input into a stream of output bytes, with the order preserved. If there's some code starting at &7000 which includes a block of code that will be relocated to &D00 at runtime, you'd write (using the acme syntax):

Code: Select all

* = $7000
    lda #42           ; 7000: a9 2a
    jmp $ffee         ; 7002: 4c ef fe
!pseudopc $d00 {
    lda #25           ; 7005: a9 19
foo                   ; label foo has value $d07
    sta $70           ; 7007: 85 70
    rts               ; 7009: 60
}
    lda #42           ; 700a: a9 2a
The acme/xa approach is less flexible but by the same token simpler for py8dis to generate output for as there are no decisions to be made about the ordering; we have no choice but to follow the ordering in the binary input.

beebasm is much more flexible but that comes at a cost. I had naively been trying to emulate the acme/xa approach with the beebasm output, but as your example at https://github.com/markmoxon/revs-beeba ... les/py8dis shows, that doesn't work. Note that if I run that with the -a flag to generate acme output, the output correctly reassembles the input. I do think it's bad that the beebasm output doesn't assemble; py8dis should really generate something that works, even if it's not optimal. (However, it may be that getting this right automatically is too difficult; further wafflings on this below.) I was aware there was almost certainly a lurking problem in this area, but I thought I'd wait until a concrete example of a problem turned up.

I like your suggestions to extend load() and I don't think they'd be at all hard to implement, as long as I made these extensions only supported for beebasm output. This feels a little bit hacky, but perhaps it just reflects the different approach taken to relocation by beebasm, so it's OK. If you're working on a project with a lot of memory reshuffling where the differences between the assemblers starts to become more important, it's probably not unreasonable for the control file to only work with a particular assembler.

(Incidentally, I had been wondering about the usefulness of allowing multiple load() commands in a control file, but since you turned out to be able to use it to work round other limitations I'm glad I didn't arbitrarily restrict things to just one load().)

Even with the extended load(), I suspect there'd still be some lingering scope for tying things in knots by using move() on beebasm, but in practice it would probably be OK, and I could always worry about that when it happens. :-) move() is still needed to handle some cases where (in a sideways ROM, for example) multiple fragments of code can be copied to the same address at different times (e.g. different NMI handlers in a filing system).

It feels like it should be possible to make move() alone work on beebasm, but it would probably need some tricksy logic to assemble and COPYBLOCK things in exactly the right sequence. That is an interesting problem but I'm not sure it's one I really want to get into here; it just feels like I'd never be quite confident I had it right and even when it worked it would probably generate output which wasn't as good as a human could produce using your extended load() approach. (Edit: For example, I suspect it's possible - if maybe not realistic - to have a case where generating "correct" beebasm output would require a COPYBLOCK to a temporary area before a second COPYBLOCK to the final location. I could be wrong about this, but I have a feeling that solving the ordering problem for beebasm output automatically is probably not rocket science but surprisingly fiddly to do in a completely general way.)

As I said above, I'm kind of thinking out loud here. If you or anyone else have any further thoughts on this I'm interested in them, if not I will probably have a look at implementing the extended load() in the near-ish future and then wait to see what breaks the next time someone tries to use py8dis. :-)

(And if I've somehow got completely the wrong end of the stick here please let me know!)
User avatar
MarkMoxon
Posts: 615
Joined: Thu Jul 18, 2019 4:38 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by MarkMoxon »

Hi Steve.

Thanks for the reply.
SteveF wrote: Tue Jan 11, 2022 10:03 pm The root of the problem here ... is that py8dis is trying to support the beebasm, acme and xa assemblers.
Ah, OK, that explains a lot!

Looking at the acme and xa pseudopc syntax, I'm guessing that it wouldn't be easy to support Revs-like block moving without hacks of their own. This is because Revs chops up the assembled code into blocks that it then sticks together to create the final binary, but it does this chopping regardless of instruction boundaries (so instructions get split between different blocks). I suppose the only way to support this in acme and xa would be to split any instructions that lie across pseudopc boundaries into EQUBs, and put them on either side of the pseudopc brackets, which wouldn't be very elegant.

I suspect that trying to support arbitrary file structures like Revs in py8dis is probably a lot of effort for little gain. The extra parameters for load() would probably be useful (certainly the split= parameter would save the effort of slicing the binary file), but they aren't crucial, as the same effect can already be achieved manually. I suspect that most binaries don't do this kind of exploded-puzzle approach, so this might be pretty specialist stuff.

Here are a couple of other suggestions that spring to mind:
  • The annotate() hack is brilliant for solving this kind of issue, but adding an annotation containing, say, COPYBLOCK will presumably break the output for non-BeebAsm assemblers. Would it be worth adding assembler-specific versions, say annotate_beebasm(), for things like this?
  • Not sure how difficult it would be, but would it be possible to add an option to change the order of the disassembly to reflect the assembly address, rather than the address within the binary? Disassembling in binary order is a really sensible default, but when the binary is all over the place, and splitting the file is the only option, the latter would be useful.
As always, these are just thoughts - py8dis already does 99% of what I want from a disassembler, and it's been brilliant to work with!

Mark
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks Mark! Those are good suggestions and something like annotate_beebasm() would not be hard to do. Changing the order of the disassembly might be a bit fiddlier, but is probably doable. If I understand your suggestion and how the different assemblers work correctly, this would only really be feasible for beebasm output, because the !pseudopc-style approach seems to me to require output in binary order.

Just waffling generally (reminded by your annotate_beebasm() suggestion, not arguing against it): While I don't plan to make it unnecessarily hard for the same control file to be used to generate output for different assemblers, as soon as you get into things like move() they are sufficiently different that I think that requiring the author of the disassembly to decide "I want output for assembler X" isn't all that unreasonable. In other words, if Fred likes beebasm, Jim likes acme and Sheila likes xa, they should all be able to use py8dis, but Jim can't expect to take one of Fred's control files but use it to generate acme output and have it Just Work.

I think that since the code seems to working OK, what I will do next is look to tidy up the code a bit and smooth off some of the many rough edges. Once that's done I can probably look at providing a beebasm-specific bit of code which is capable of emitting the disassembly in assembly address order; by doing the tidying up first, I hope to reduce the amount of code duplication involved in this (or at least, reduce the amount of changes needed to two pieces of near-duplicate code :-) ).

I might look at implementing your load() suggestions and annotate_beebasm() before I get bogged down in the tidying up process though, as they are both useful and probably pretty easy to do.
User avatar
TobyLobster
Posts: 622
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

SteveF wrote: Wed Jan 19, 2022 9:50 pm if Fred likes beebasm, Jim likes acme and Sheila likes xa, they should all be able to use py8dis, but Jim can't expect to take one of Fred's control files but use it to generate acme output and have it Just Work.
I hope that for 'simple' moves, that Fred Jim and Sheila could still share the same control file. Revs is a particularly tricky case that may need some manual intervention for non-Beebasm assemblers, perhaps by manipulating the binary after the assembler has finished. But I think most binaries are not in this category. A lot of games will have simpler relocation routines. The current move command works fine for Chuckie Egg over all assemblers for example.

On a separate note I would like to see more 'smart pattern' detection (or the ability for a user to add this) - e.g. automatically labelling constants for the A parameter to OSBYTE/OSWORD etc, perhaps even adding comments to describe the remaining parameters.
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

TobyLobster wrote: Wed Jan 19, 2022 10:44 pm
SteveF wrote: Wed Jan 19, 2022 9:50 pm if Fred likes beebasm, Jim likes acme and Sheila likes xa, they should all be able to use py8dis, but Jim can't expect to take one of Fred's control files but use it to generate acme output and have it Just Work.
I hope that for 'simple' moves, that Fred Jim and Sheila could still share the same control file. Revs is a particularly tricky case that may need some manual intervention for non-Beebasm assemblers, perhaps by manipulating the binary after the assembler has finished. But I think most binaries are not in this category. A lot of games will have simpler relocation routines. The current move command works fine for Chuckie Egg over all assemblers for example.
I agree; I don't plan to break things that currently work (and if I do by accident please let me know and I'll fix them, unless it turns out there was a good reason for breaking them).
TobyLobster wrote: Wed Jan 19, 2022 10:44 pm On a separate note I would like to see more 'smart pattern' detection (or the ability for a user to add this) - e.g. automatically labelling constants for the A parameter to OSBYTE/OSWORD etc, perhaps even adding comments to describe the remaining parameters.
This is a good idea. I hope that once I've tidied the code up a bit it will be easier to add support for this kind of thing. Now that the basic move() design has probably settled down I hope I can gradually refactor and polish up the code bit by bit, but we'll see how it goes. :-)
User avatar
TobyLobster
Posts: 622
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Hi Steve,

I realise you probably haven't looked at this for a while but I have a failure case when using the "typed-addr" branch of py8dis (is this the best branch to use?). Or perhaps I'm doing it wrong (I haven't looked at this for a while either.)

There's a simple example of the failure here:

https://github.com/TobyLobster/starship_test/tree/main

This example fails (the ASM doesn't assemble, it gives a number of asserts):

Code: Select all

warning: move boundary at binary address $61b0 splits a classification
Error - File starcommand_acme.asm, line 10204 (Zone <untitled>): !error: Assertion failed: <(l21c8) == $10
Error - File starcommand_acme.asm, line 10207 (Zone <untitled>): !error: Assertion failed: <(l21d0) == $fff4
Error - File starcommand_acme.asm, line 10210 (Zone <untitled>): !error: Assertion failed: <(l21d8) == $ff
Error - File starcommand_acme.asm, line 10213 (Zone <untitled>): !error: Assertion failed: <(l21f8) == $2a82
Error - File starcommand_acme.asm, line 10216 (Zone <untitled>): !error: Assertion failed: <(l2200) == $0e54
Error - File starcommand_acme.asm, line 10219 (Zone <untitled>): !error: Assertion failed: <(l44e1) == $e1
Error - File starcommand_acme.asm, line 10222 (Zone <untitled>): !error: Assertion failed: >(l2008) == $16
Error - File starcommand_acme.asm, line 10225 (Zone <untitled>): !error: Assertion failed: >(l2184) == $0e54
Error - File starcommand_acme.asm, line 10228 (Zone <untitled>): !error: Assertion failed: >(l21c8) == $fff4
Error - File starcommand_acme.asm, line 10231 (Zone <untitled>): !error: Assertion failed: >(l2208) == $73
If I remove the move() command, it works.
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

TobyLobster wrote: Sun Feb 20, 2022 12:28 pm I realise you probably haven't looked at this for a while but I have a failure case when using the "typed-addr" branch of py8dis (is this the best branch to use?). Or perhaps I'm doing it wrong (I haven't looked at this for a while either.)
Hi Toby,

Thanks for reporting this, and sorry I didn't get back to you earlier. You're not doing anything wrong, acorn.py's code to auto-label calls to OS routines wasn't correctly respecting the distinction between the "binary" and "runtime" addresses which are used (mostly) behind the scenes to help implement move().

I've pushed a fix to the typed-addr branch, it's lightly tested but it does still work with some of my test cases and your test case now builds and re-assembles the input binary correctly even with move().

Let me know how you get on with it! (No rush, of course...)

Cheers.

Steve
User avatar
TobyLobster
Posts: 622
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

A possible small improvement (same Starship Command example as before):

Code: Select all

l3ac6
    !text '"', "furious", '"', "displeased", '"', "disap"             ; 4bc6: 22 66 75... "fu :3ac6[1]
    !text "pointed", '"', "disappointed", '"', "satisfie"             ; 4bdf: 70 6f 69... poi :3adf[1]
    !text "d", '"', "pleased", '"', "impressed", '"', "d"             ; 4bfc: 64 22 70... d"p :3afc[1]
    !text "elighted", '"'                                             ; 4c11: 65 6c 69... eli :3b11[1]
    !byte $0d                                                         ; 4c1a: 0d          .   :3b1a[1]
This is correct, but double quotation marks can be encoded using a backslash, so I think something along these lines would look more readable:

Code: Select all

l3ac6
    !text "\"furious\"displeased\"disappointed\"disappointed\"satisfied\"pleased\"impressed\"delighted\""
    !byte $0d
Encoding a backslash itself this way needs two backslashes.
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks Toby, I've pushed a small change which should implement this for acme output.

In testing that I've realised your test case also shows that py8dis generates broken output for both beebasm and xa! :-(

For beebasm I think the problem is something to do with trying to do the right thing with copyblock and I'll need to give it some thought.

For xa the problem seems to be that "^" in strings isn't treated as a regular character:

Code: Select all

$ xa --version
xa (xa65) v2.3.8
[...]
$ cat test.xa
* = $e00
    .asc "vL^+"
    .asc "foo"
$ xa -o z.out test.xa
$ xxd z.out
00000000: 764c 0b66 6f6f                           vL.foo ; should be vL^+foo
Any advice on this from xa users would be appreciated! (Edit: I could probably just hack round this, e.g. emit ^ characters as a 94 byte on xa, but I'd like to know if there's something fundamental I'm missing here.)
User avatar
TobyLobster
Posts: 622
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

Thanks.

Looking at the xa source:

https://github.com/fachat/xa65/blob/eff ... at.c#L2131

It looks like ^ is used as an escape character for characters " and ' (and ^), with any other character ANDed with 31 before being output (for encoding control characters).
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks Toby, I've pushed a fix for xa using that information. I am still having problems, there seems to be some weird interaction with having "/" in a string and using "//" to open a comment on the same line:

Code: Select all

$ cat test.xa
* = $e00
    .asc "SIEx256SN" // foo
    .asc "SIE/256SN" // bar
    .asc "SIEy256SN" // baz
$ xa -o z.out test.xa && xxd z.out
    .asc "SIE/256SN" // bar
test.xa:line 3: 0e09:Syntax error
Break after 1 errors
If I use ";" to open the comment that works, but the documentation says in ";" comments a colon will terminate the comment unless masm mode (-M) is used so I'm reluctant to do that. I could replace ":" with ";" in inline comments on xa but that feels a bit hacky (and could get confusing if ":" appears in the hex dump).

If I manually remove the comments which cause this problem, xa does correctly reassemble your Starship Command disassembly.
User avatar
TobyLobster
Posts: 622
Joined: Sat Aug 31, 2019 7:58 am
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by TobyLobster »

This looks like a bug in xa. Looking at the source:

https://github.com/fachat/xa65/blob/eff ... xap.c#L926

The preprocessor tries to remove a '//' comment in a line. If it finds the first '/', it checks the next character is also '/'. If so we have found a comment so remove the rest of the line, otherwise stop looking. But this method fails to find the actual comment if a single forward slash is on the line prior to the actual comment (as in your example).

Further, two forward slashes within a string silently fails! e.g.

Code: Select all

$ cat test.xa                     
* = $e00
    .asc "SIEx256SN" // foo
    .asc "SIE//TOBY"
    .asc "SIEy256SN" // baz
$ xa -o z.out test.xa && xxd z.out
00000000: 5349 4578 3235 3653 4e53 4945 5349 4579  SIEx256SNSIESIEy
00000010: 3235 3653 4e                             256SN
...failing to output the 'TOBY' part of the string.

A workaround is to encode any forward slashes as $2f bytes separately from the .asc "..." format.

For the same reason it also fails in cases such as:

Code: Select all

    lda #16/8           // this is a comment
SteveF
Posts: 1697
Joined: Fri Aug 28, 2015 9:34 pm
Contact:

Re: py8dis - a programmable static tracing 6502 disassembler in Python

Post by SteveF »

Thanks very much for investigating that! I've pushed a change to use the workaround you suggested and py8dis now generates xa output which reassembles the input file correctly for Starship Command. (I still haven't tried to fix beebasm, but I will.)

We should probably report this upstream, would you rather do it or shall I?
Post Reply

Return to “development tools”