Playing with the Pico Part 6 - SNES like sprites and tilemap with VGA
In Part 5 we produced a simple video test pattern, how about something more interesting, like this?
Full code can be found on github
This is the final result of the code discussed below, so how does it all work?
It would be straight-forward to display a static image or framebuffer. With the 320x240 resolution and 2 bytes per pixel that’s 150 KiB which we can fit in the Pico’s memory. The problem is we can only fit one frame’s worth of image in memory, what if we want a changing image, e.g. some kind of game?
Say we did want to write a game, we could just use a single frame buffer, the problem is if we update the buffer whilst it’s being displayed you end up seeing a mix of the old and new frames. You can avoid this by only updating the buffer when you’re in the vertical blanking interval but this gives you limited time to draw your frame. If you’re careful you can ‘chase the beam’ where you update the frame buffer as it’s being drawn but carefully only change a particular pixel once it’s been output. This gives you more total frame draw time but will be more awkward to get right.
Even with careful programming to allow use of a single buffer without visual artifacts you’re still using a lot of memory. Plus using 150 KiB of 264 KiB memory total just for the framebuffer you’ll have limited memory available for everything you need to draw the frame.
We’re going to choose another tactic, drawing the frame one scanline at a time. Whilst we’re outputting one scanline we’ll draw the next one. This is the ‘chasing the beam’ technique but avoiding a full frame buffer. We will have two line buffers, one for the line we’re currently outputting and one of the line we’re currently drawing.
Let’s make some alterations to the code from the previous blog to support this. We already had two separate line buffers one for a white line and the other for a red, green, blue, black pattern. We’ll switch this to two generic buffers using one for the even lines (0, 2, 4 etc) and the other for the odd lines (1, 3, 5 etc). Whilst one buffer is being streamed out by DMA the other will be filled by software. This can be done with the following code in the interrupt handler:
|
|
There’s a subtlety here as we’re outputting 320x240 graphics at a 640x480
resolution. So every line gets repeated twice. This is what the
current_display_line & 2
accomplishes. Display lines 0 and 1
(current_display_line & 2 == 0
) both output the first even line. Display lines
2 and 3 (current_display_line & 2 == 1
) both output the first odd line.
How do we coordinate this with whatever software is filling the line buffers? We
don’t want to do that in the interrupt handler as generally they should be kept
as short as possible (because they block other interrupts from occurring). I’ll
use a very simple polling loop. It’ll use a boolean flag new_line_needed
. The
polling loop will wait for the interrupt handler to set new_line_needed
and
start filling the appropriate buffer when it is set. A separate next_line
variable will hold which (320x240) line needs to be drawn next. We’ll call a
function draw_line(int line_y, uint16_t* line)
which will draw whatever
should be at the line line_y
position on the screen into the buffer line
.
Here’s the polling loop:
|
|
Here’s the code from the interrupt handler that sets new_line_needed
and
next_line
:
|
|
We’re not quite done yet, we need to think a little more about the beginning of
the frame. The interrupt that’s setting new_line_needed
is doing it when the
line DMA channel is done sending buffer data. This is fine in the middle of the
frame but for the first 2 display lines of the frame there’s no previous display
lines to trigger the interrupt to kick off draw_line
for line 0. An easy way
to handle this is with some dummy lines. We’ll start outputting visible lines
before the display region, just streaming out 0 pixels. This won’t alter the
output on the pins but will give us the DMA interrupt with the exact same timing
we get during the visible lines.
We’ll output 3 dummy lines, by adding some visible lines to the beginning of the
command words we send to the sync PIO. At the end of the first dummy line we’ll
signal new_line_needed
setting next_line
to 0. Giving the software the same
time to draw the first line that it gets for all other lines (the time taken to
output a line twice).
To save a little bit of memory we’ll set the line DMA not to increment its read address for the zero lines, so we’ll point it at a 32-bit memory location containing a zero which it will read over and over. This requires a little extra code to alter the line DMA config between the dummy lines and the real display lines. Here’s the final code for the interrupt handler:
|
|
This adds a new_frame
flag we can use in our polling loop:
|
|
We’ll write a draw_line
and end_of_frame
to produce another test pattern,
one that changes line by line and frame by frame. It has a 1 pixel white border
like the previous one and produces alternating bars of red, green and blue with
a colour gradient in each going from black to full red, green or blue. Every
frame we adjust the start point of this gradient. Here’s the code:
|
|
Finally, an important point. new_line_needed
, new_frame
and next_line
need to
be declared volatile
, like this:
|
|
This tells the compiler the value might change unexpectedly so it has to read or write the actual memory holding the variable every time rather than optimising (either leaving it in a register or just realising the variable must be some fixed value at a particular point in the code so not even checking it). Look at this simplified version of our polling loop to see what happens without volatile:
|
|
If new_line_needed
is false
at the start of this loop the compiler could just
optimise it to an infinite loop that does nothing. However we know an interrupt
will eventually set new_line_needed
to true
. By using a volatile
variable
for new_line_needed
the compiler won’t do this optimisation and instead loop
checking the memory that holds new_line_needed
each time.
Here’s a video of the test pattern in action:
At the beginning I promised something more interesting and we’ve just produced yet another test pattern but we now have everything we need to move on to drawing complex frames. First up let’s look at sprites.
A sprite is just an image you draw on the display. You provide the sprite image data, a width and height, an X and Y position and let the graphics library or graphics hardware do the rest. They were a major feature of earlier 8-bit and 16-bit consoles (like the NES, SNES and Megadrive) as well as some 80s and early 90s home computers (like the Commodore 64 and Amiga). The hardware of these machines had native sprite support, allowing you to draw complex frames without needing enough memory for a full frame buffer.
Here’s an example set of sprites for a character with a few basic animations (Art by Charles Gabriel). The image size has been increased so it doesn’t look too tiny on modern monitor resolutions.
Generally sprites can have transparent pixels, allowing you to draw complex objects over a background and this implementation will be no exception. Any bright pink (R = 31, G = 0, B = 31) pixels in a sprite will be transparent.
Given a list of sprites we can determine which are on the scanline we are currently drawing, these are the active sprites for that scanline. After determining the active sprites we just copy the non-transparent pixels from the relevant line of sprite data from each out to the scanline. Overlapping sprites are dealt with using the order of the list. A sprite that appears earlier in the list will be drawn on top of a sprite that appears later on the list.
What does this look like in code? To keep things simple we’ll keep sprites to a fixed width of 16, but allow varying height. Using a power of 2 for the width is important as it allows a faster calculation for the pixel data address of a specific line.
We’ll begin by creating a structure to describe a sprite, defining a couple of constants and creating an array to hold our sprite info.
|
|
We have a maximum number of sprites we can store in the array NUM_SPRITES
as
well as a maximum number of sprites per scanline MAX_SPRITES_PER_LINE
. We need
to limit this to ensure our scanline is drawn fast enough. The more sprites per
line, the more pixels it needs to copy around and the slower the line draw will
be. Take too long and it won’t be ready in time and the final frame will be
messed up. I chose 20 sprites per line which allows us to fully fill a scan line
with side by side sprites (though of course they can overlap) and 128 sprites in
total as that felt like a decent number. If these are set too high we’ll get
graphical errors as we’ll take too long to draw a scanline. I’ll return to
looking at how we can make scanline drawing faster and what the maximum limits
are here in a later blog.
Next we’ll have a specific active sprite structure and a way to determine which sprites are on a given scanline (so become the actives sprites for that scanline).
|
|
Note that the active sprite slots only need to store two things:
- X coordinate of the sprite
- Address of the sprite data that contains the pixels for the scanline we’re drawing
Finally to draw looks like this:
|
|
We can call draw_sprites_line
directly from draw_line
and we’ve now got
sprites being drawn on top of our test pattern. Using the sprites from Charles
Gabriel. I
put a little test together, where I setup multiple sprites and animated them.
The sprite sheet gives 6 different characters with some walking animations. To
do the animation in end_of_frame
you simply iterate through the enabled
sprites and change which frame they’re pointing to so each sprite on screen goes
through every available sprite from the sprite sheet. Here’s the setup code and
the additions to end_of_frame
. In draw_line
we just add a call to
draw_sprites_line
.
|
|
|
|
Here’s a video of the result:
All looks to be working well, in particular the code can handle 20 sprites per line and 128 lines total without any graphical issues.
How do we something more useful with the background, to replace our test RGB pattern? The answer in tilemaps. These are somewhat like sprites but more constrained. The idea is we have a collection of tiles, each of a fixed size (I will use 16x16 here) that are placed on a fixed grid (contrast to a sprite which can go anywhere and can overlap with other sprites). The map itself specifies for each cell on the grid which tile you should use.
Here’s an example. First we have the collection of tiles we’ll use, the tileset (from LimeZu on itch.io). Overlaid on each tile is an index number (which isn’t present in the actual tileset, it just illustrates how things work). Again I’ve increased the image size so it doesn’t look too tiny.
Then for each cell in fixed grid we specify which tile is placed there:
This will produce a map that looks like this:
I wrote a small python program that takes a tileset image and a tilemap defined using CSV. It outputs a couple of C headers one gives the tileset data the other gives the tilemap data. It can also draw a preview image of the whole map. It can be found on github. The example tilemap above will be my demo map.
Drawing the tilemap is straight forward. First we define a structure to hold our
tilemap data. Note it includes scroll_x
and scroll_y
fields. Our tilemap
will be bigger than a single screen so we want to be able to move it around.
|
|
Then we need a function to draw a tilemap to a line. This is simpler than sprite drawing as everything is arranged to a fixed grid and there’s no overlapping, priority order or transparency to deal with. Scrolling adds some slight complexity as you have to deal with partial tiles at the beginning and end of the line.
|
|
To bring everything together I added scrolling to the sprite drawing code (see
github
to see the code), dropped the test pattern generation from the draw_line
function and added the tile drawing leaving us with a very simple function:
|
|
I also introduced an ‘entity’. Another structure that tracks a sprite along with some information to track how it should be moving and the animation to use. Here’s the entity structure, check out github to see the full code:
|
|
Using this you can place a few animated entities around the map that move with appropriate walking animations. I also added some code to scroll the map and sprites together ‘bouncing’ when it hit the edge of a map by reversing the scroll direction. Here’s a video of the final result:
So what’s next? We can look at further layers, we could have multiple tilemap layers allowing transparency in some (so we can have tiles overlap sprites) or add a text layer (potentially implemented as another tilemap just with different tile sizes). Of course we will hit performance limits at some point (we only have so many CPU cycles to draw a scanline in) and start getting graphical artifacts. So first up we need to look into our current performance, see how many spare cycles we have to work with and see if there’s any way to improve our performance.
Whilst experimenting I found an interesting bug that looks like a performance problem. I recreated my initial sprite test (with the 128 sprites cycling through different animation frames) with a tilemap background. It worked fine still I started scrolling around as well at which point graphical artifacts started appearing.
After spending some time investigating I realised the graphical artifacts were only seen when the tilemap was scrolled in the X direction an odd number of pixels. Here’s a little video that demonstrates the bug. It scrolls one pixel right roughly twice per second. We see the artifacts come and go with every pixel step until we’ve got 8 or fewer sprites per line and suddenly everything works fine. A clear sign of a performance issue. For some reason a tilemap scrolled an odd number of pixels needs noticeably more cycles to draw than a tilemap scrolled an even number of pixels.
I have yet to look into the root cause of this. I suspect it’s something to do with aligned vs unaligned memory access. With 0 scroll pixel 0 and 1 of the first tile are copied to pixel 0 and 1 of the scanline buffer and can be done with a single aligned 32-bit copy. With a 1 scroll pixel 1 and 2 of the first tile (pixel 0 is off screen) are copied to pixel 0 and 1 of the scanline buffer. Either two 16-bit copies or one unaligned 32-bit copy. I’ll investigate in more detail in the next blog.