Making a SNES Outrun Clone

#hobby #retrodev #assembly #graphics

Hoyt Summers Pittman Aug 4, 2021 ・7 min read

A group of users from the nesdev forums have organized a Super Nintendo themed game jam. Many hobbyist developers such as myself have joined and are working hard building games for this 30 year old machine. I’ve chosen to make an Outrun style arcade racer featuring the SuperFX chip and 30 FPS sprite scaling.

Outrun is a racing game released in 1986 that used custom arcade hardware made by Sega to render screen filling scaling graphics at 60 FPS with a resolution of 512x256. Home video game systems wouldn’t be able to reproduce these effects at arcade speeds and resolutions until the late ‘90s when 3D accelerators became commonplace. For the SNES+SuperFX platform I’m targeting a more manageable resolution of 256x160 at 30 FPS, and my early tests are showing promise.

I have had some success with using the Super Nintendo and the SuperFX chip to draw scaling sprites at near arcade speeds at reasonable resolutions. The image above is generated from a list of sprites with position and scaling attributes provided by the SNES main CPU to the Super FX chip. The SuperFX processes and renders this list and transfers it to the SNES graphics memory to be displayed.

The image above also demonstrates the timings that my code has to hit. The yellow line is the scanline where the SuperFX finishes rendering, and the purple area represents the memory copy to VRAM. If that yellow line crosses into purple, the game crashes.

For me, hitting 30 frames per second is the goal that everything else is being built around. Most SuperFX games ran at 20 - 25 frames, and none of them were sprite scaling arcade racers. However, recently a lot of good SuperFX source code has become available. Molive’s NICCC demo and Randy Linden’s DOOMFX were a primary source of inspiration. The program pulls several tricks to hit my frame target. First, it is only rendering 128 texels per scan line. It writes each pixel twice to fill the screen. Second, the program is only 160 scanlines tall. Finally, the images are only 4bits per pixel and limited to 15 colors and a transparency.

Operation

The Super Nintendo is a tile based graphics system; small pieces of images, tiles, are transferred from the game cartridge into VRAM. Every frame the main CPU in the SNES sends commands to the PPU (picture processing unit) which controls how those tiles are arranged on screen. While the PPU can translate, order, and blend tiles between the sprite and background layers, tiles can not be scaled¹. To scale tiles on the Super Nintendo developers had to use software rendering algorithms and transfer the rendered tile data to VRAM using the main CPU. The main CPU ran at about three mhz which wasn’t fast enough for more than very small simple transformations; however, Nintendo would use extra hardware in their game cartridges to enhance the console’s performance. The SuperFX was one of those hardware enhancements.

The SuperFX chip is a custom risc processor designed to be very fast at the types of maths that make scaling, rotating, and 3D graphics possible. It had its own memory and CPU and, at the end of the console’s life, could run at 20 Mhz. The chip was not a 3D accelerator in the modern sense; it ran more like a CPU with extra math functions. After rendering graphics it would stop and the Super Nintendo would transfer the SuperFX graphics into VRAM to be displayed by the PPU. This was done using DMA during V-Blank.

DMA on the Super Nintendo was the primary way to transfer data into VRAM, and VRAM was only available to write to while the screen was not being drawn. This meant that writes to VRAM had to be synced up with the vertical blanking period². This limited the amount of data that could be moved into VRAM to a few kilobytes per frame. For a full four bits per pixel frame, you would need several frames of vblank to transfer a full image which limited the framerate. DMA time could be extended by using forced blanking which is why games like Star Fox would only draw a fraction of the screen. More blank lines on the screen gave DMA more time to transfer data while also reducing the amount of data to transfer.

In my game, I have tuned the force blanking to let me transfer a 256x160 4bpp frame with two frames of blanking. This gives the user gameplay at 30 frames per second. Of course, bandwidth is only half of the problem. Finishing the frame at this speed requires the processor to run as fast as possible, and that requires learning a lot about how the SuperFX and SNES manage access to memory and writing my code and storing graphics around those limits.

Getting the Game Running

A game on the SNES with a SuperFX chip has several pools of memory and access to each has special management rules. The game is stored as Read Only Memory (ROM) on a game cartridge, 128 kbytes of work RAM in the SNES unit, 64 kbytes of cartridge RAM, and 64 kbytes of VRAM. VRAM, as mentioned before, is only writable during blanking intervals. Work RAM is accessible only from the main CPU, and the SuperFX and main CPU have to share ROM and cartridge RAM. If both CPUs access shared memory at the same time, the game will crash.

The ROM contains the programs for both CPUs as well as all of the graphics data. Because both programs are in ROM and neither CPU may access ROM at the same time as the other, one CPU must run its program out of RAM. For this game, the main CPU will run in work RAM and the SuperFX will have exclusive access to ROM and cartridge RAM while it is running.

The main CPU boots the SNES and the SuperFX chip is idle when the console first turns on. To give the SuperFX control of ROM the main CPU needs to copy its program into work RAM and any interrupts (such as the VBlank interrupt) need to be pointed at instructions in RAM addresses. Once the programs are copied, the CPU jumps to the first instruction in RAM and then the SuperFX chip may be started. While the SuperFX chip is running, the ROM and cartridge RAM are not accessible to the main CPU. This precludes transferring the framebuffer from the cartridge RAM to VRAM during blanking, and is the reason the game will crash if the drawing hasn’t finished³.

The SuperFX Program

With exclusive access to the ROM and cartridge RAM, the SuperFX chip is off to the races. Before the SuperFX is started, the SNES main CPU writes to the cartridge RAM a sprite list that instructs the chip where to draw sprites as well as how to scale them. The SuperFX program has several responsibilities : clear the framebuffer, generate an edge table, and draw sprites.

The SuperFX framebuffer is a segment of cartridge RAM that is copied to VRAM during blanking. If it is not cleared between frames, then each frame will have leftover pixels from the previous frame unless the rendering algorithm covers draws over every pixel every time. I chose to clear the framebuffer by writing 0’s to it as opposed to forcing every pixel to be drawn. As it turned out, most pixels end up transparent (having a value of 0), and this was much faster than drawing transparent pixels during rendering⁴.

With the framebuffer cleared, it is time to setup the edge table. The edge table is a table with each row representing a line on the screen, and each entry in the table is a set of instructions telling the SuperFX how many sprites are on that row, where on the row sprite data begins, the length, and where it is located in ROM. This table is generated by iterating through the sprite list, and writing to each row of the edge table that the sprite appears on the drawing instructions for that row of sprite data. Once the edge table is finished being constructed, the SuperFX will read it from top to bottom and draw the sprite data in order. When the edge table is consumed, the SuperFX halts and makes the RAM and ROM available to the main CPU again.

And That’s It

When the blanking interrupt is called, the SNES CPU copies the framebuffer from the SuperFX to VRAM and updates the display. Then the process repeats: the SNES writes a new sprite list and starts the SuperFX, the SuperFX draws the scene, and the main CPU copies the framebuffer once more and updates the display.

There’s a lot of details I’ve skipped over. The sprite data format, the nuances of building and drawing the edge table, what the SNES CPU does while the SuperFX is drawing⁵, my build system, and my asset pipeline⁶. Each of these probably deserves a few pages of its own, but that will have to wait for another post; stay tuned and feel free to leave comments or ask questions.

Demo and Sources

[YouTube Demo] - https://www.youtube.com/watch?v=ZLw0QR4dAsM
[GitHub Repo ] - https://github.com/secondsun/snes-sfx-demo
[NESdev Forum] - http://forums.nesdev.com
[Jam homepage] - https://itch.io/jam/snesdev-game-jam

Footnotes

1: We’re ignoring Mode 7. Mode 7 can scale and rotate a single background layer with restriction.

2:When the SNES was released, screens used CRT technology; an electron beam excited individual phosphors on glass to produce an image. The beam scanned from left to write and top to bottom. The time it took to move the beam back to the top of the screen is the vertical blanking period.

3:Could I check if the SuperFX chip is running and accept having a dropped frame? Yes I could, but then the game wouldn’t be running at 30 fps any more.

4: Drawing eight pixels takes several cycles and has the unique overhead of updating several registers and the pixel cache. A write word instruction writes four pixels worth of data without this overhead. Most of the frame data is transparent and this means that the small amount of overdraw from clearing is still small when compared to the time saving.

5:It is in an infinite loop doing nothing until I have the game part of the game written.

6: This is written in Java, let me know if you want details.

The Java Cafe

The Java Cafe is a community of 3,681 amazing users

Making a SNES Outrun Clone

Operation

Getting the Game Running

The SuperFX Program

And That’s It

Demo and Sources

Footnotes

Discussion (0)

Read next

Hashtag Jakarta EE #66

BREAKING: Supreme court rules for Google in Java copyright case

Data Science on the JVM with Kotlin and Zeppelin

Have your say about Quarkus