A 6502 assembly idiom in Excitebike

10 December 2017

Sometimes I like to disassemble ROM dumps for old NES games to figure out how they work. It’s sort of like putting together a puzzle without knowing what the picture will look like beforehand. You start disassembling opcodes at a known entry point (the NES’s 6502-based CPU will jump to the address stored at $FFFC-$FFFD in the ROM cartridge when the machine resets) and follow the control flow through the rest of the ROM. The puzzle isn’t the disassembly itself, which is a purely mechanical process. It’s figuring out what the assembly code is doing, and why it’s doing it. And along the way you can sometimes find interesting bits of code that are unlike what a modern programmer would (or could) write in a high level language. I found one such bit of code in the initialization routines for Excitebike.

This article isn’t about the details of NES programming, but there are a couple of things worth mentioning before getting to the code. First, the NES uses memory-mapped IO, so hardware is controlled by writing to certain memory locations instead of by using special CPU instructions. Second, the NES has a custom chip called the Picture Processing Unit (PPU) that can, among other things, generate a non-maskable interrupt (NMI) at the begining of the vertical blanking interval (or “vblank”). That’s the time between when the last line of the screen is rendered and the first line of the next frame is started, when input can be processed and sprites can be updated.

When the NES starts up, the PPU disables NMIs until bit 7 of address $2000 (the PPUCTRL register) is set. That gives the game’s startup code time to perform whatever initialization it requires - clearing RAM, setting up hardware, etc. Once all that’s done, most games will enable NMIs and enter an infinite loop that gets interrupted once per vblank to let the main game logic run. Sometimes games need to disable NMIs again after that. Excitebike, for example, jumps back into the reset handler after exiting the track editor so it can restart at the main menu.¹ And that brings us to the code fragment that I wanted to talk about:²

disable_nmi:
        lda $10         ; First load the cached value for PPUCTRL,
        and #$7f        ; then clear bit 7 (disable NMI), and continue to...
set_ppuctrl:
        sta $2000       ; First write the value to PPUCTRL register,
        sta $10         ; then cache it in the zero page,
        rts             ; and return to last jsr.

I’ll start with the second routine, set_ppuctrl. This routine simply writes the value in the CPU’s accumulator to the PPUCTRL register, and also caches it in address $10 of the zero page before returning. The value needs to be cached in RAM, because the PPUCTRL register is read-only. If the value wasn’t also stored somewhere else, it would be impossible to update individual bits. And that’s exactly what the disable_nmi routine does. It loads the cached value, clears bit 7 (the one that controls NMIs), and then flows right into the set_ppuctrl routine. That is, the two routines overlap in memory. And why not? Duplicating the code would be a waste, and since you have control over exactly where code exists in memory when programming in assembly, you can arrange things so that you can avoid an extra jmp or jsr instruction.

But you can’t always arrange things so conveniently. There’s another routine in Excitebike’s initialization code that needs to write a value to PPUCTRL, but it can’t just let control fall though to set_ppuctrl, because disable_nmi is in the way:³

some_init_routine:
        ; I've omitted some irrelevant instructions here.
        sta $fc         ; Clear a flag in the 6502's "zero page".
        lda #$10        ; We also want to set PPUCTRL to a fixed value.
        bne set_ppuctrl ; This unconditionally skips over disable_nmi.
disable_nmi:
        ...             ; As before.
set_ppuctrl:
        ...             ; As before.

The 6502 instruction set doesn’t include an unconditional relative branch, so this routine uses a bne instruction that is always followed, because lda with a non-zero argument clears the CPU’s zero flag. So instead of using a regular unconditional jmp (3 bytes, 3 cycles), or a jsr subroutine call (3 bytes, 6 cycles, and a bit of stack space) to get where it needs to go next, this routine gets almost as much benefit from being near set_ppuctrl as disable_nmi does, for just the cost of a single “unconditional” bne (2 bytes, 3 cycles).

That’s also an interesting process. The reset handler checks for a couple of values that it sets at particular addresses in RAM to decide whether or not to clear the memory used by the track editor. That way it can behave differently if it’s jumped to than if there was an actual hardware reset. ↩︎
The labels and comments are my own. I have no idea what the original programmer called these routines, so coming up with good names and comments is part of the disassembly puzzle. ↩︎
I’m not actually sure what else the routine does, or why. This disassembly project is a work in progress. ↩︎