Over the last few weeks, I’ve been working on trying to solve problems relating to performance, occlusion and lighting. Our main challenge is that the game is randomly-generated, meaning we can’t directly use the fancy-pants pre-baked lighting or occlusion culling that our game engine supports. Instead, we have to use more performance-costly dynamic lighting, or cheaply bake the lighting just after the level has been generated. Of course, it doesn’t help that the type of game we are making has a strong dependency on lighting and shadows for atmosphere!
The design of our game demands that we have a light system that fits with a ship. Ships have lights. In almost every room. And in corridors. Lots of lights. To start with we just put normal lights in every room, no shadows. The performance was acceptable, since shadows are really the expensive part of lighting. This was fine and dandy, but unfortunately lights in rooms were ‘overflowing’ in to each other through walls, ruining any potential lighting design; shadows were necessary to prevent this.
The good news: when it comes down to it, there usually weren’t that many lights on the screen at once, except in long, straight corridors. Frustum culling wasn’t enough though, since the frustum of the camera would often pick up lights that can’t be seen in other rooms. And you can’t just link the light to the occlusion of the room it’s in, since the effect of lights can often go outside of the room, such as lighting up a corridor through a room door.
So, I have been experimenting with a light culling system that can often significantly boost the fps of the game, sometimes jumping from 20 to 40 fps when enabled. Of course, it’s still in development and this might all change, but I am hopeful that this system will be of use.
The algorithm begins with what is known about the lights – lights have an area of effect, and only stuff inside that area will be affected by the light. Each light is given a a 3D model of that shape i.e. a sphere for point lights and a cone for spot lights. The theory is this: a light will be rendered if that shape can be seen when rendered in conjunction with the rest of the scene, except when the camera is inside the shape, which needs to be covered by a different check. It effectively uses the depth buffer to identify if a light can be seen.
The above image shows light shapes that were identified as visible
To do this, each shape is assigned a unique colour at the start. Each frame, a low resolution scene depth texture is rendered first. Next, the light shapes are rendered on to the same texture with depth writing switched off. Next comes the most expensive part of the algorithm: the texture is copied off the GPU and the image is scanned for what colours could be seen. A dictionary look-up is used to associate colours with lights. These lights are then enabled when rendering the main scene.
This picked up most of the lights, but there was still a problem. Larger light shapes would hide smaller ones, meaning the smaller light might be falsely marked as occluded. To resolve this, the algorithm runs again after finishing if it found a light, but doesn’t render the lights it has already seen. It also doesn’t re-render the whole scene, it only renders the light shapes again. Usually, this meant it ended up running 4-8 times per frame. As one might guess, this could lead to a performance problem; to get around this, all the lights that were rendered on the current frame get a ‘free hall pass’ and are rendered the next frame regardless of culling, since they very likely need to be rendered anyway. This means they don’t need to be tested for if they are visible. Well, when I said all the lights, I actually meant all but one – this one light is effectively being tested for if it can be switched off, and it tests a different light on the each frame. This means that if the player turns very quickly, the number of lights marked as visible increases rapidly, but then quickly returns to normal, as it works out over time that lights that were once visible are not any more – an acceptable compromise to bring the average number of iterations from 4-8 to 2-3.
This single pass shows that the light source in the room on the right is still visible
I would like to try and move a lot of this testing to the GPU, since transferring stuff from the GPU is one of the main performance bottlenecks, as well as looping through all the pixels of the rendered image; despite these bottlenecks the frame rate still increases considerably, as it often slashes hundreds of draw calls and hundreds of shadow casters Unity reports.
Despite these drawbacks, I am hopeful that the development of this system will let the artists have their way with the lighting, without killing the precious frame rate. Other ideas/suggestions are welcome!