My challenge was to build an SNES controller to Genesis/Megadrive adapter. A standard controller
typically uses a multiplexer (74157 for instance) able to react very quickly (ex: within 27 ns)
to the selection signal. But I wanted to use my
multiuse PCB2 circuit which do not have a multiplexer, but only an Atmega8 clocked at 16 MHz (the maximum).
Of course I wired the selection signal to an external interrupt pin. However for various reasons,
when the interrupt occurs the corresponding code is not executed instantly. This is were things
become interesting.
On this page I write about the path I followed until the response time became acceptable and the
adapter reliable. But first, here's a quick overview of how the Genesis controller works.
The controller uses a DB9-F style connector and runs under 5 volts. There are 6 output signals (Controller to console)
and 1 input signal (Console to controller) used to select which set of buttons the output signal must
report.
DB9 pin
Function
5
+5 volt
8
GND
7
Selection signal (SELECT)
DB9 pin
Function when SELECT==0
Function when SELECT==1
1
D-Pad up
D-Pad up
2
D-Pad down
D-Pad down
3
0
D-Pad left
4
0
D-Pad right
6
Button A
Button B
9
Button START
Button C
For more details, I recommend reading the following:
segasix.txt
First C-only implementation
I wired the adapter such that all outputs were on a single AVR port (PORTC)
and that this port sole use was to control said outputs. This makes
it possible to set all the outputs in a single operation (one write) rather
than in 3 operations (read, modify, write).
The global variables S0_PC and S1_PC are updated by the mainloop following
each poll of the SNES controller. S0_PC holds the value to present on PORTC
when SELECT is low and S1_PC holds the value to present when select is high.
INT0 is configured to trigger on both rising and falling edges. The interrupt
handler is therefore executed each time SELECT changes state.
How inefficient! The write to PORTC (out 0x15) is done way to late! Obviously the compiler cannot guess
the need for updating PORTC as soon as possible. It also does not worry much about using registers
without a reason, which means they must saved and restored with push/pop. I'm not very impressed
by the useless initialisation of r1 (the __zero_reg__) to zero when it is not used at all. But
since the eor instruction is used, SREG (0x3f) is changed and must therefore be saved too...
Note that this was compiled with the -Os option. -O3 was not better.
"Naked" interrupt with inline assembler
Here is a new interrupt handler with the ISR_NAKED flags which prevents the compiler
from generating code at the beginning and end of the handler. This is now our responsability.
Very good since we can write a very simple handler.
With this new version, reaction time fell to 960 ns! The adapter began to work,
but not reliably. I.e., when the jump button was held, the character would repeatedly jump. It
was likely the current timing was overlapping with the acceptability threshold. I had to do better.
The push and lds instructions use two cycles each. If the full firmware
was in assembler, it would be easy to select two registers to hold the S0_PC and S1_PC variables,
making it possible to access them using only one cycle with the mov instruction. Moreover,
preserving r16 could be done away with by reserving a third register.
But I want to keep as much C code as possible. It might be tempting to declare a global
register variable (eg: register unsigned char value asm("r3")) but it would
then be necessary to make sure libraries or other sources in the same project don't touch
the reserved registers. The gcc option -ffixed-3 would be useful for this. But I did not
want this project to depend on a specially compiled avr-libc, nor did I want to manually
make sure the registers are not used by the library by disassembling. (Even if it did
work now, you never know with future versions). So I decided not to take this
approach.
Taking advantage of unused peripheral registers
That said, there is another way to access to access values in a single cycle.
Unused peripheral registers can be used if you make sure there won't be side
effects. But this depends on what peripheral your project uses.
I decided to use UBRRL (baud rade low byte) to store the S0_PC value, and OCR2
(output compare 2) to store S1_PC. Also, r16 is saved in EEDR (eeprom data register).
Writing and reading from those peripheral registers do not have any effect on the program.
Now this is a technique I'd be impressed to see a compiler use…
Well this gives us the following handler:
ISR(INT0_vect, ISR_NAKED)
{
asm volatile(
" out 0x1D, r16 ; EEDR \n"
" in r16, 0x23 ; OCR2 \n"
" sbis 0x10, 2 ; PIND2 \n"
" in r16, 0x09 ; UBRLL \n"
" out 0x15, r16 ; PORTC \n"
" in r16, 0x1D ; EEDR \n"
" reti \n"
::);
}
Since a few cycles were saved, the reaction time is now around 800 ns. And
the adapter seems to be reliable. But I think we are still close to the unreliability
threshold. No problem since it is still possible to improve!
Lowering interrupt latency by placing code directly in the vector
By default, the interrupt vector table is at flash address 0x0000. When an
interrupt handler is implemented, a rjmp instruction is placed at the corresponding
offet to jump to the actual interrupt handler code. Here, __vector_1 is the address
of our interrupt handler.
This rjmp instruction is wasting 2 cycles before our code is executed. Since
I know INT0 is the only interrupt this project uses, I also know I can place
the handler code directly in the vector at address 0x0002.
The atmega8 supports moving the interrupt vector from address 0x0000 to the
start of the bootloader section. The effective address depends on how the
"fuses" are configured. In my case, the address is 0x1800 (Word address 0xC00).
I created a .boot section by adding -Wl,--section-start=.boot=0x1800 when
linking. The interrupt handler "function" that I will place there will therefore
have to be marked with __attribute__((section(".boot"))).
This same "function" will required the "naked" attribute
to make sure the compiler does not place code around the inline
assembler block. The assembler code is the same as before,
except for the two nop instructions used to skip the first
vector (reset). Note that defining .boot one word later would
make the two nop instructions unnecessary. But I prefer it that way.
Updating PORTC within even less time is possible if we already know the
state of the SELECT line. And we do! The interrupt handler is executed
each time the SELECT line changes. If we sample the SELECT line while
the interrupt is executing, a value of 0 means the next transition is
to 1, and vis versa. I did not realize this before this step! I only
did when I began thinking of the 6 button implementation that would
follow..
I also began exploiting ICR1L to store the value to put on PORTC on
the next transition:
asm volatile(
" nop\nnop\n \n" // VECTOR 1 : RESET
" out 0x1D, r16 ; EEDR \n"
" in r16, 0x26 ; ICR1L \n"
" out 0x15, r16 ; PORTC \n"
// Prepare ICR1L for the next transition
" in r16, 0x09 ; UBRLL \n"
" sbis 0x10, 2 ; PIND2 \n"
" in r16, 0x23 ; OCR2 \n"
" out 0x26, r16 ; ICR1L \n"
" in r16, 0x1D ; EEDR \n"
" reti \n"
::);
I was looking at the source code above and wondered what I could
do about the cycle wasted by saving r16... Then I realized I
could use r1, which is also known as __zero_reg__, a register
kept at 0 by gcc.
Because this interrupt handler is executed with other interrupts
disabled, __zero_reg__ can be freely used, but its value of 0
must be restored before returning. No need to save it first since
it should have been zero. However, we must be careful. The
clr instruction has an effect on flags, so SREG would
need to be saved.. Also, loading a 0 with ldi is not
possible because this instruction requires a register from r16
and above. (__zero_reg__ is r1). So I used a lds to load
__zero_reg__ with a zero from memory which has no effect on
the flags.
uint8_t zero = 0;
asm volatile(
" nop\nnop\n \n" // VECTOR 1 : RESET
" in __zero_reg__, 0x26 ; ICR1L \n"
" out 0x15, __zero_reg__ ; PORTC \n"
// Now, let's prepare for the next transition.
" in __zero_reg__, 0x09 ; UBRLL \n"
" sbis 0x10, 2 ; PIND2 \n"
" in __zero_reg__, 0x23 ; OCR2 \n"
" out 0x26, __zero_reg__ ; ICR1L \n"
" lds __zero_reg__, zero \n"
" reti \n"
::);
Reaction time: From 490ns to 630ns.
The graphic on the left represents the final timing. The bottom trace is the SELECT line. The top trace falls to 0 to transmit the state of a depressed START button within an average time of 560ns. Note that one CPU cycle is 62.5ns. The jitter is of approximately 3 cycles and depends one the moment the falling edge occurs in relation with the CPU clock phase, but also on the instruction currently executed by the main loop (Multi-cycle instructions must complete before the interrupt handler runs).
It would still be possible to save one cycle by reserving a register to stock the
next PORTC value. The initial in instructions would not be needed then. But
at this point, the performance seems high enough.
The rest and conclusion
From this point, I made many changes to have the adapter appear as a 6 button controller
to the Genesis console. The code has been changed to put different values on PORTC
according to a sequence of SELECT pulses. Conditional access to OCR2 and UBRRL was
therefore replaced by memory access. But thanks to the optimisations presented here,
reaction time has not increased at all.
If you'd like to see the final code, you may download the project sources through
the
projet page.