Now that I have a terrain generator, I can start refining the rendering systems.
Even though the terrain generator was working, it was hard to see since the tops and sides of the blocks were the same bright green color. To make the shape of the terrain easier to see, and to make my eyes hurt less, I decided to add ambient occlusion.
I used this article as the a guide. It uses vertex coloring to darken the corners of blocks, based on whether the neighbors of that corner are solid.
There isn’t a lighting system yet, so everything in the game is rendered at the same brightness level. I started writing a floodfill based lighting system similar to the one described here. The system has bugs when propagating between chunks or when adding a new chunk.
By adding the lighting system, I could achieve the ambient occlusion in a different way than the existing implementation. A similar effect is achieved by averaging the lighting values of the four potential air blocks near a vertex. If one of those blocks was solid, it’s light value would be zero, so it would lower the average lighting for that vertex.
In the image above, vertex A has two air blocks above it. The two blocks are both 100% light, so A is 100% bright. Vertex B has one air block and one dirt block above it. The dirt block is 0% light, so the average is 50%. The face drawn connecting A and B would have a gradient, giving the ambient occlusion effect.
There are artifacts at chunk boundaries, since chunks don’t read from their neighbors consistently.
I’ve realized that there is now noticeable hitching whenever new chunks are loaded. Some of the hitches can be as high as 100 ms, which is about 6 frames.
The hitching is caused by the allocation and deallocation of chunk data. Essentially, when you move, the game deallocates the chunks that are too far away and allocates new chunks that are now closer.
The fix is to recycle chunks instead of allocating and deallocating constantly. When a chunk is too far away from the player, it gets moved to the position of a new chunk that is being loaded. The old block data is all overwritten, so the chunk can be recycled transparently.
I refactored the mesh generation to be on it’s own thread, instead of using the same thread as the chunk updater. So now the terrain generator, chunk updater (the light propagation step), and chunk mesh generator each have their own worker thread.
I added raycasting and block selection. This will be the foundation for the ability to add and remove blocks later.
The raycasting algorithm is based off of this paper, and specifically, this implementation of the algorithm. At a high level, the algorithm steps forward from the camera until it hits a solid block or until it exceeds some distance. If a block is hit, that block is selected. Then, a simple graphic is drawn around the selected block.
The graphic is just a cube with a transparent texture, drawn with no depth testing.
I added the ability to add and remove blocks. The machinery for updating a chunk’s block data and propagating the lighting already existed. Newly generated chunks get placed on an update queue. Modifications simply put the modified chunk back onto the update queue.
Light can be added when removing blocks, for example breaking through the roof of a cave, but for right now light can’t be removed when adding blocks.
After the post for Week 2 was shared on reddit, I started receiving feedback from readers. Reddit user /u/Syracuss on the /r/GraphicsProgramming subreddit suggested a possible optimization I could make.
His suggestion was to use a small number of large vertex buffers to be shared between multiple chunks, instead of the current system of using three separate vertex buffers for each chunk. This optimization had a lot of potential and only a small refactoring cost, so I implemented it.
I’m using vertex buffers that are 256 MiB in size, so each buffer has it’s own dedicated
VkDeviceMemory. When a chunk wants to create a mesh, it suballocates a small region of the vertex buffer. This avoids the cost of creating and binding new
VkBuffers for each chunk. The cost of suballocation still exists, but it has been moved from the
VulkanMemoryAllocator library to my own custom allocation code.
These changes provided only a marginal increase in framerate, however. I started looking into what the main bottlenecks of my game were. Until now, I’ve been using Visual Studio’s built in CPU profiler to measure CPU usage, but I wasn’t using any GPU profiler.
Since I have an AMD GPU, I downloaded the Radeon GPU Profiler. This provides a lot of detailed information about how work is submitted, how work is scheduled, and how hardware resources are used. It also provides a very simple summary of GPU and CPU usage.
The GPU is 93% idle! The bottleneck for this game is on the CPU side. GPU optimizations barely make a difference since the GPU isn’t even stressed. The CPU optimizations I’ve made so far have barely made a difference.
ChunkRenderer::render is the single most expensive function call in the game. It iterates over every loaded chunk and decides if it needs to be rendered. It calls three other functions,
VoxelEngine::Mesh::drawIndexed, in that order (the image shows them in order of execution time).
60% of overall frame time is spent iterating over the chunks and recording command buffers. Note that 22 + 11 + 9 is less than 60. So around ~17% of frame time is spent in
ChunkRenderer::render itself and ~43% in the functions it calls. These functions are the most obvious target for optimization.
isEmpty is 11% of overall frame time. This function only checks if a
uint32_t variable is equal to
0. Such a simple function is taking up so much frame time since about 75% of chunks are empty (world height is 256 blocks and median height of terrain is 64) and are skipped by the renderer. The renderer is iterating over thousands of chunks but only 25% pass the
isEmpty check. So
isEmpty is being called four times as often as the other two functions.
Always use a profiler, kids. Next week will start by optimizing the CPU side of the game.