A question on 6502 branching

geraldholdsworth · Post by **geraldholdsworth** » Mon Nov 13, 2023 7:14 pm

I'm trying to get my head around how the 6502 stores the branching instuctions - well, more the address offset. This is my understanding:

Code: Select all

3000 .backward
3000 60		RTS
3001 60		RTS
3002 60		RTS
3003 80 FB	BRA backward
3005 80 03	BRA forward
3007 60		RTS
3008 60		RTS
3009 60		RTS
300A .forward
300A 60		RTS

Forward branches are counted from the next instruction:
3005 has BRA forward, which is 80 03. The next instruction is at location 3007.
forward is at 300A, which is 3007+3

Backward branches are counted from the point of the branch distance:
3003 has BRA backward, which is 80 FB.
FB xor FF = 4
or
FF - FB = 4
The value of FB is at location 3004, so 3004 - 4 = 3000 which is where backward is.

Or, is it:
100 - FB = 5
Next instuction is at 3005, and 3005 - 5 = 3000?

Would I be correct in my assumptions?

SteveF · Post by **SteveF** » Mon Nov 13, 2023 7:28 pm

I think the natural interpretation is to view the branch offset as a signed two's complement 8-bit value. So &FB is -5, as in your last example, and &03 is (obviously) 3. This offset is then applied to the first byte after the branch, so for "BRA backward" the target is &3005+(-5)=&3000 and "BRA forward" the target is &3007+3=&300A.

Why is &FB -5? In binary it's 11111011, so that's (-128)+64+32+16+8+2+1=-5. But in reality I'd work it out by doing &100-&FB=5 and negating it in my head, as you did.

Edit: In the light of the discussion below, I'll just clarify that I'm making no claims about the underlying implementation in the CPU. For me, thinking about it like this seems natural and it does give the right result, even if what's going on behind the scenes is more involved.

geraldholdsworth · Post by **geraldholdsworth** » Mon Nov 13, 2023 9:58 pm

Cool, cheers Steve.

julie_m · Post by **julie_m** » Tue Nov 14, 2023 11:42 am

The 6502 program counter increases by one as it reads each byte of each instruction it is processing. This means that branch offsets are actually calculated from the byte holding the offset. But the program counter is going to be increased by one again in the last T-state of the branch instruction, so the offset needs to point to the byte before the instruction to which you wish to branch. And because both addresses are one shy of what's expected, this works out the same as the offset from the first byte of the instruction after the branch (where you would fall through to if the test failed and the branch did not happen) to the first byte of the branch destination.

Where the question of "what the program counter is actually pointing to" really matters is with JSR and RTS instructions. The address pushed onto the stack by a JSR is the address of the last byte of the instruction (i.e., the address of the high byte of the destination address); and one will get added to this, giving the address of the instruction after JSR, in the final T-state of the RTS instruction.

If you are making some creative use of JSR -- perhaps to pass some parameters to the function "inline" between the JSR and the next wanted instruction, with intent to correct the return address -- you need to take this into account. To get the address of the JSR itself, you need to deduct two from the address pulled from the stack; or to get the address following the JSR, you need to add one to it (or just LDY #1 if you are going to be using LDA (zp), Y to read the inline parameters).

Note also that it is entirely legitimate (and may even make more sense) to return from your function with another JMP, instead of pushing the return address minus one back onto the stack and using RTS.

BigEd · Post by **BigEd** » Tue Nov 14, 2023 12:08 pm

Umm that’s not quite how I understand it. The way I count is that zero is the do-nothing branch to the following instruction. And then count forwards or backwards (in hex!) until you reach the desired target instruction.

Unless perhaps I’ve misunderstood you?

As for
> The address pushed onto the stack by a JSR is the address of the last byte of the instruction
I completely agree!

julie_m · Post by **julie_m** » Tue Nov 14, 2023 12:57 pm

BigEd wrote: ↑Tue Nov 14, 2023 12:08 pmUmm that’s not quite how I understand it. The way I count is that zero is the do-nothing branch to the following instruction. And then count forwards or backwards (in hex!) until you reach the desired target instruction

Yes, it's correct that an offset of zero makes no difference (except to the timing) whether the test succeeded or failed. But that's because the program counter is already pointing to the byte before the wanted instruction, and after the offset is added will still need to point to the byte before the wanted instruction. At this point within the execution of the branch instruction, the PC is still pointing to the second byte which holds the offset. The extra T-state (entered if the test succeeded) adds the offset to the PC (and there may be another extra T-state if the low byte overflowed and the high byte needs to be changed), and the final T-state of the branch instruction will increase the PC by one whether the test succeeded or failed.

Of course, (B-1) - (A-1) ≡ B-A is one of the fundamental properties of mathematics; so at least for the purpose of conditional branches, there is no real harm in counting bytes as though the PC had already advanced to the next instruction.

BigEd · Post by **BigEd** » Tue Nov 14, 2023 2:21 pm

Ah, so you're looking at PC +1 +offset +1, perhaps, where PC is the location of the branch, and I'm looking at PC+2 + offset, which is of course the same arithmetically. I might have a quick look at visual6502... here we go. So it turns out the ALU calculates PC+2 plus offset, in the case of a branch taken. Does that match up with what you expect?

julie_m · Post by **julie_m** » Tue Nov 14, 2023 5:27 pm

You have to be very, very careful when talking about "the Program Counter", because its value will actually change during the fetching of a multi-byte instruction -- but you don't ordinarily have to think very hard about it not pointing to the place a human thinking at the instruction level a.o.t. the byte level might expect, unless you are doing something really fancy.

All you really need to remember is:

For a forward branch, the offset is equal to the number of bytes to skip over, as you would expect.
For a backward branch, you have to remember to include the two bytes of the branch instruction itself in the count of what's being skipped over.
If you want to go retrieving subroutine return addresses from the stack, Here Be Dragons.

Post by **Rich Talbot-Watkins** » Tue Nov 14, 2023 8:44 pm

Sounds a bit complicated to me! I just remember it as:

- consider the operand to be a signed 8 bit value (0...127, -128...-1)
- the branch destination is the address of the instruction which follows the branch, plus the signed offset.

lovebug · Post by **lovebug** » Tue Nov 14, 2023 11:38 pm

Rich Talbot-Watkins wrote: ↑Tue Nov 14, 2023 8:44 pm - consider the operand to be a signed 8 bit value (0...127, -128...-1)
- the branch destination is the address of the instruction which follows the branch, plus the signed offset.

Is the way I do it too

Diminished · Post by **Diminished** » Tue Nov 14, 2023 11:54 pm

Rich Talbot-Watkins wrote: ↑Tue Nov 14, 2023 8:44 pm Sounds a bit complicated to me! I just remember it as:

- consider the operand to be a signed 8 bit value (0...127, -128...-1)
- the branch destination is the address of the instruction which follows the branch, plus the signed offset.

I guess this makes BRA 0 a no-op, which makes sense.

sweh · Post by **sweh** » Wed Nov 15, 2023 12:00 am

Rich Talbot-Watkins wrote: ↑Tue Nov 14, 2023 8:44 pm Sounds a bit complicated to me! I just remember it as:

- consider the operand to be a signed 8 bit value (0...127, -128...-1)
- the branch destination is the address of the instruction which follows the branch, plus the signed offset.

To confuse matters, in assembler you need to add 2 if referencing P% (eg "BEQ P%+4"' will be "F0 02"). I just let the assembler do the hard work for me

regregex · Post by **regregex** » Wed Nov 15, 2023 8:47 pm

BigEd wrote: ↑Tue Nov 14, 2023 2:21 pm Ah, so you're looking at PC +1 +offset +1, perhaps, where PC is the location of the branch, and I'm looking at PC+2 + offset, which is of course the same arithmetically. I might have a quick look at visual6502... here we go. So it turns out the ALU calculates PC+2 plus offset, in the case of a branch taken. Does that match up with what you expect?

Arithmetically there's no difference, but for timing-critical code it determines where page crossings occur. Take the case of branching forwards to an opcode on a page boundary:

Code: Select all

30FC          .P
30FC 18       CLC
30FD 90 01    BCC R
30FF          .Q
30FF 38       SEC
3100          .R
3100 EA       NOP

Under 1+n+1 the carry out of PCL would be handled in the incrementer for free, but with 2+n the ALU updates PCH, taking an extra cycle.

BigEd · Post by **BigEd** » Thu Nov 16, 2023 6:32 pm

Ah, that's an interesting probe. I was vaguely thinking that while JSR exposes a value of PC which isn't aligned to an opcode, other than that there's no external hint that the PC increments when it does. (visual6502 of course gives us the inside view.)

tricky · Post by **tricky** » Fri Nov 17, 2023 7:59 am

Want there a recent thread about just from code on the stack that gets overwritten by the return address as it executes!
This is a slightly different aspect of how instructions execute.

BigEd · Post by **BigEd** » Fri Nov 17, 2023 2:33 pm

probably "Tripping up emulators with a special JSR"

julie_m · Post by **julie_m** » Sat Nov 18, 2023 8:21 am

The 6502 has to push both bytes of the address of the final byte of the JSR instruction (i.e., the location of the high byte of the jump destination) onto the stack; and to avoid (more) weird effects at page boundaries, this pushing has to take place in T-states while the PC is pointing exactly there, i.e. after it has already read the low byte of the destination. But it can't trash the program counter by latching the second byte into PCL just yet, because it hasn't finished reading the instruction. So it stores DestL temporarily in the adder, which is available because it's not doing any maths, and advances PC; writes PCH and PCL, now pointing to the location of DestH, to the stack; and only once the old value of PC is no longer needed does it read DestH into PCH, transfer the sum (=DestL, because the adder was configured to do nothing) into PCL and continue straight from that address without increasing PC. When the function returns, it doesn't matter about the address on the stack being one shy of what it needs to be, because the RTS instruction can take care of that itself -- it's quite normal to increase PC to move on to the next instruction.

It all makes perfect sense if and only if you read it all in the light of having only 8 fingers to count on ..... Someone writing an emulator on a more powerful host system without fully appreciating just how much functionality the 6502 designers squoze out of so little hardware could easily get this wrong. (It also makes me wonder if any of them might have been entertainers on the side, because there's a definite similarity with juggling tricks that involve doing other things with your hands while the balls are in the air!)

BigEd · Post by **BigEd** » Sat Nov 18, 2023 8:59 am

(It's even more interesting than that - the S register is used as a temporary. Can't see that from the outside but visual 6502 shows it. I think Arlet's 6502 core does the same, for the same reasons - it minimises the hardware.

And for JSR, indeed, the first operand byte has to be stored, and it turns out to be stored temporary in the stack pointer register S, which is fine because the value in S has to take a trip through the ALU to be decremented. See this happening in visual6502 here.

)

julie_m · Post by **julie_m** » Sat Nov 18, 2023 9:57 am

That's even more mental than what I thought it was doing! So the address "latch" is actually a two-way counter that can be advanced or backed up independently to read/write successive locations once it has been preloaded from the PC or SP, but the stack pointer is just a simple bank of flip-flops and relies on the adder to modify the value in it?

BigEd · Post by **BigEd** » Sat Nov 18, 2023 10:16 am

Umm, no, I don't think there's any down counter anywhere: the PC can increment (or not) and the ALU can do the usual, and that's it. If you press "trace more" two or three times in the visual6502 simulation, you get a lot of internal info. It's possible to copy and paste the tabulated results, which I've done a few times, but the alignment is all out of whack so it takes a bit of fettling.

Edit: my pull quote is from this thread on 6502.

Edit: first finding as far as I know is in this post, shortly after visual6502 came to life on the web.

In the case of JSR, I was surprised and delighted to find that the stack pointer is used to hold the first operand for 4 cycles, while the SP is passed to the ALU for decrement and the PC is written to the stack. In the cycle that the new SP value is written back to the register, the LSB can be written to PC.

julie_m · Post by **julie_m** » Sat Nov 18, 2023 11:16 am

BigEd wrote: ↑Sat Nov 18, 2023 10:16 amUmm, no, I don't think there's any down counter anywhere: the PC can increment (or not) and the ALU can do the usual, and that's it.

So what's deducting one from the address on the address bus for the second push, once S has been transferred to the bowels of the adder and destL is in S?

I'm sure I'd be able to work it out for myself, if I knew for sure what each of the columns in the table represented, not just the obvious ones; but anything that looks like a helpful link seems to go nowhere .....

BigEd · Post by **BigEd** » Sat Nov 18, 2023 1:32 pm

This link should be useful, for a tabulation of activity at some level of verbosity. We can add more if we need it.

I'll see if I can transcribe the table (noting that we're a little offtopic from the thread title...)

Code: Select all

cycle  ab    db  rw  Fetch    pc    a   x   y   s   p         Execute  State  ir  tcstate  pd  adl  adh  sb  alu  alucin  alua  alub  alucout  aluvout  dasb
0      0000  20  1   JSR_Abs  0000  aa  00  00  fd  nv‑BdIZc  BRK      T1     00  101111   00  00   00   ff  00   0       ff    00    0        0        ff
0      0000  20  1   JSR_Abs  0000  aa  00  00  fd  nv‑BdIZc  BRK      T1     00  101111   20  01   00   ff  ff   0       ff    00    0        0        ff
1      0001  11  1            0001  aa  00  00  fd  nv‑BdIZc  JSR_Abs  T2     20  110111   20  01   00   ff  ff   0       ff    ff    0        0        ff
1      0001  11  1            0001  aa  00  00  fd  nv‑BdIZc  JSR_Abs  T2     20  110111   11  fd   01   ff  fe   0       ff    ff    1        0        ff
2      01fd  00  1            0002  aa  00  00  11  nv‑BdIZc  JSR_Abs  T3     20  111011   11  fd   01   11  fe   0       00    fd    1        0        11
2      01fd  00  1            0002  aa  00  00  11  nv‑BdIZc  JSR_Abs  T3     20  111011   00  fd   ff   ff  fd   0       00    fd    0        0        ff
3      01fd  00  0            0002  aa  00  00  11  nv‑BdIZc  JSR_Abs  T4     20  111101   00  fd   ff   ff  fd   0       ff    fd    0        0        ff
3      01fd  00  0            0002  aa  00  00  11  nv‑BdIZc  JSR_Abs  T4     20  111101   00  fc   ff   ff  fc   0       ff    fd    1        0        ff
4      01fc  00  0            0002  aa  00  00  11  nv‑BdIZc  JSR_Abs  T5     20  111110   00  fc   ff   ff  fc   0       ff    fc    1        0        ff
4      01fc  02  0            0002  aa  00  00  11  nv‑BdIZc  JSR_Abs  T5     20  111110   02  02   00   fb  fb   0       ff    fc    1        0        fb
5      0002  22  1            0002  aa  00  00  11  nv‑BdIZc  JSR_Abs  T0     20  011111   02  02   00   fb  fb   0       fb    00    1        0        fb
5      0002  22  1            0002  aa  00  00  11  nv‑BdIZc  JSR_Abs  T0     20  011111   22  11   ff   fb  fb   0       fb    00    0        0        fb
6      2211  88  1   DEY      2211  aa  00  00  fb  nv‑BdIZc  JSR_Abs  T1     20  101111   22  11   22   fb  fb   0       00    22    0        0        fb
6      2211  88  1   DEY      2211  aa  00  00  fb  nv‑BdIZc  JSR_Abs  T1     20  101111   88  12   22   22  22   0       00    22    0        0        22
7      2212  88  1            2212  aa  00  00  fb  nv‑BdIZc  DEY      T0+T2  88  010111   88  12   22   22  22   0       22    22    0        0        22
7      2212  88  1            2212  aa  00  00  fb  nv‑BdIZc  DEY      T0+T2  88  010111   88  12   22   ff  44   0       22    22    0        0        ff
8      2212  88  1   DEY      2212  aa  00  00  fb  nv‑BdIZc  DEY      T1     88  101111   88  12   22   00  44   0       00    ff    0        0        00
8      2212  88  1   DEY      2212  aa  00  00  fb  nv‑BdIZc  DEY      T1     88  101111   88  13   22   ff  ff   0       00    ff    0        0        ff
9      2213  00  1            2213  aa  00  ff  fb  Nv‑BdIzc  DEY      T0+T2  88  010111   88  13   22   ff  ff   0       ff    ff    0        0        ff

Edit: so the fc on the address bus comes from the alu - the alu output. It's latched in the adl, while the value is steered back into the alu by way of alub, such that when fb comes out of the alu, it travels by way of sb (the special bus) into s.

gfoot · Post by **gfoot** » Sat Nov 18, 2023 5:11 pm

I'd also suggest this diagram for understanding what the various buses are and what the various control signals do: https://www.witwright.com/DonPub/6502-Block-Diagram.pdf

stardot.org.uk

A question on 6502 branching

A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching

Re: A question on 6502 branching