I am going to start a little less technical here. For modern game development, it is worth everyone on the team’s time to know a little bit about shaders. Shaders are micro programs that run a specific function on each vertex and pixel rendered. Programmable shaders are the big visual difference between modern games and PS2 era games. Most modern platforms, including iOS and Android devices, have a GPU with support for programmable shaders.
An extremely simple description of the graphics pipeline is, the CPU passes vertex information to the GPU, such as vertex position, and texture (UV) coordinates. The vertex shader takes this data, processes it, and pushes it onwards in the GPU. The GPU then uses these vertices to create triangles, and for every pixel on the triangle, runs that through the pixel shader, interpolated per pixel. The pixel data then processes that, and returns a pixel color, for the GPU to render to the screen in that position. So a simple unlit shader that renders with a texture will have a vertex shader that passes the vertex UV coordinates through to the pixel shader, and the pixel shader will look at the texture with the passed in interpolated UV coordinates, and set its color to that position in the texture.
A quick note on terminology: The terms I have been using aren’t entirely accurate. Pixel shaders are more accurately called fragment shaders. Fragment shaders are considered to output information per texel, and not per pixel. Within the world of shaders, you will find that terms are used differently in different places, and often even misused. However, if you know shaders well enough to know the difference between these terms, you are probably way ahead of most developers on this subject.
So, lets take that simple, unlit texture shader I just described, and expand it to something a little more interesting. Lets say you want to have some real time lighting apply to the shader. The simplest lighting model is diffuse lighting. Diffuse lighting is lighting you might see off of concrete: no reflection, no “shiny,” just brighter when light is pointing at it, and darker when there is no light. To achieve this, you expand your vertex shader to take in lighting and normal information, and pass it on to the pixel shader. The normal of a vertex is the facing direction, the “forward” of that vertex. Then the pixel shader, using some basic physics involving the dot product between the direction of light applied to the pixel and the normal, how brightly lit that pixel can be figured out, and that value can be multiplied against the color value of the texture at that pixel to create lit geometry.
You can imagine for every shader feature you want after this, such as fancier lighting, or cool looking special effects, or transparency, you are adding more calculations for each step in this process. This is where performance for shaders come in. Every single pixel rendered to the screen runs through a shader of some type. A simple way to view the time spent rendering a single frame of a game can be viewed as the total time it takes to run the shader program on every vertex, and then every pixel. So if we take the two previous example shaders, the GPU will render the unlit shader much faster than the lit shader, it does less calculations per pixel on the screen.
To take this to a more tangible example, let’s imagine you have a scene rendering with a single character in an environment. Chances are, the total number of pixels in the final screen that are comprised of the character are much less than the environment geometry. If you use an expensive shader to render the environment and the character, then every pixel on screen will render with that expensive shader, leading to framerate slowdown. However, if you render the environment geometry with a simple, cheaper shader, and the character with a more expensive shader, the framerate will take a much smaller hit, because the total number of pixels rendered to the screen with the expensive shader are much less.
What is the practical application of this? GPU performance is going to be much more based on average pixel shader complexity, than pure vertex count. Vertex count can become a large performance issues, especially if your meshes have more vertices than pixels rendered to the screen, but generally artists are going to be reasonably efficient when building their geometry, and vertex count is a metric that you will rarely have to worry about, only very sloppy artists will create geometry that is comprised of more vertices than pixels. Your concern for GPU performance is going to be the average complexity of your shaders on screen.
Hopefully you now have a basic understanding of what shaders are, and what they mean to managing your framerate versus improving your visuals. The next step to understanding shaders is alpha, or pixels with transparency, and overdraw.
Rendering opaque, or pixels with no transparency, is easy. When rendering opaque geometry, the CPU sorts everything from nearest to the camera to farthest to the camera. This minimizes overdraw. Overdraw is when the GPU renders a pixel over another pixel that has already been rendered. Generally, when a pixel is rendered to the screen, it registers itself in the depth buffer: a representation of how far from the camera each pixel has been rendered, to make sure that an object behind another object does not draw in front of it. By rendering opaque geometry in the front first, it means that anything behind it will check the depth buffer for the position on screen it intends to render to see if something has been rendered there, and if something has been drawn in front of it, it skips that pixel.
Transparent geometry involves a different rendering technique. Transparent geometry is rendered in a second pass after all opaque geometry is rendered. This ensures that the geometry blends with opaque geometry behind it, and skips rendering transparent geometry with opaque geometry in front of it. Transparent geometry has to be rendered from the farthest to the camera to the closest to the camera. This is so transparent objects in front of each other can blend with each other properly. This is where transparent geometry can get expensive: For a given pixel on your screen, if there is transparent geometry in front of opaque geometry, it will run that through a pixel shader twice, once for the opaque geometry, and then a second time with the transparent geometry to blend with the opaque geometry. Transparent geometry in front of other transparent geometry can result with the GPU rendering screen pixels multiple times, over and over.
Now to apply all of this information to an actual platform. The iPhone 4 had a GPU that was roughly twice as powerful as its predecessor, the iPhone 3GS. However, the iPhone 4 had a screen resolution of four times the pixels as the 3GS. The iPhone 3GS has a screen resolution of 320×480, for 153,600 total pixels to be rendered. The iPhone 4 has a screen resolution of 640×960, which is a total of 614,440 pixels to be rendered. With four times the pixels to render, and only twice the GPU to render them with, the average per pixel rendering power of the iPhone 4 is half that of the iPhone 3GS. This is a large reason why transparency is so expensive on the iPhone 4, iPad 1, iPad 3, and other devices that were not a full step forward from their predecessors.
I am going to walk you through an optimization technique next to see how to apply this knowledge to real world game development. Lets say you want to apply a full screen visual effect, maybe darken everything on screen when the player pauses the game. Let’s assume that your game is largely opaque geometry. The first thing you might do is put a semi transparent, black rectangle over the screen to force everything behind it to darken slightly. This will achieve the desired visual effect, but you are now rendering every pixel on screen an extra time while this effect is active, and this might have a major impact on frame rate. The example optimization step you might take here is create alternate shaders for everything on screen, that take a blend color as input. When you pause the game, you can visit every material in the scene, and set the blend color to black. This will add a slight per-pixel increase to performance cost, but will achieve a similar visual effect to rendering the rectangle over your screen.
Another iOS performance concern is cutout transparency. Cutout transparency is kind of a mix between blended transparent geometry and opaque geometry. When rendering something with cut out, you render all pixels with a transparency above the minimum level to be rendered as if they were opaque geometry, and skip rendering the rest of the pixels. When building games, it is easy to assume that geometry rendered with a cutout shader, such as HUD elements or two dimensional characters, are going to be inexpensive to render, you will assume the cost is only of the pixels you see. Unfortunately on platforms such as iOS, the per pixel operation of testing if it should render a pixel or not, and discarding pixels that should not be rendered is a very expensive process.
To give another example of optimizing a game for GPU performance, imagine a game where the player character is a two dimensional character, very large on screen, rendered with a cutout shader. When you first get this up and running, you will probably render your character onto a square piece of geometry. Unless your character is a weird square block man, there is going to be many pixels tested and rejected with the cutout shader, and framerate might suffer. To improve the render time of this character without changing the end visual result, you can split your character graphic up into multiple pieces of art. Render each arm, leg, the torso, and the head onto a separate rectangular piece of geometry. There will probably be some slight overlap of discarding a pixel twice that somewhere that overlaps the arm and the torso, but generally the total discarded pixels will be far less. Here is a visual representation of this information: http://i.imgur.com/RNohNSu.png.
Hopefully that is enough information so those of you who previously had a poor understanding of shaders can now communicate a little more meaningfully with the people on your team with a stronger grasp of shaders, or at the very least you understand shaders well enough to keep your project’s framerate from stalling on the GPU.