Knowing which mipmap levels are needed

Tom Forsyth, 27 January 2008(created 19 August 2007)
There's a fundamental signal-processing concept called various different names, mostly with the word Nyquist in them, which (if you wave your hands a lot and ignore the screams of people who actually know what they're talking about) says that an object that is 32 pixels high on screen only needs to use a 32x32 texture when you're rendering it. If you give it a 64x64 texture, not only will the mipmapping hardware ignore that extra data and use the 32x32 anyway, but it's doing the right thing - if you disable the mipmapping and force it to use the larger version, you will get sparkling and ugliness.

Obviously it's more complex than that, and I'll get into the details, but the important point is - the size of texture you need for an object is directly proportional to the object's size on-screen. And that's what texturing hardware does - it picks appropriate mipmap levels to ensure that this is the size its actually using. Every graphics programmer should be nodding their head and saying "well of course" at this point. But there's another thing that is subtly different, but important to realise. And that is that so far I haven't said how big my texture is, i.e. what size the top mipmap level is - because it doesn't matter. Whether you give the graphics card a 2048x2048 or a 64x64, it's always just going to use the 32x32 version - the only difference is how much memory you're wasting on data that never gets used.

If you are streaming your textures off disk, or creating them on-demand (e.g. procedural terrain systems), this can be incredibly useful to know. If you don't need to spend cycles and memory loading big mipmap levels for objects in the distance, everything is going to load much quicker and take less memory, which means you can use the extra space and time to increase your scene complexity. Obviously my approximation above is very coarse - it's not just size in pixels that matter, because we use texture coordinates to map textures onto objects. So how do you actually find the largest mipmap you need?

As a thought experiment (this is going to start out absurd - but bear with me), take each triangle, and calculate how many texels the artist mapped to that triangle for the top mipamp level. Then calculate how many pixels that triangle occupies on the screen. If the number of texels is more than the number of pixels, then by the above rule you don't need the top mipmap level - throw it away, and divide the number of texels by 4. Keep doing that until the number of texels in the new top mipmap level is less than the number of screen pixels. (keen readers will spot that trilinear filtering means you actually need one more mipmap level than this - to keep things simple, I'm going to ignore that for now, and we can add one at the end). Do this for all the triangles that use a certain texture, and throw away all the mipmap levels that none of the triangles need.

That's the concept. Obviously far too expensive to do in practice. The first step is to get rid of the loop and say that the number of mipmap levels you can throw away = 0.5*log2 (top_mipmap_texel_area / screen_area). Quick sanity check - 128x128 texture has area 16384 texels. If you draw it to a 32x32 pixel square on screen, that's 1024 pixels, so you need to throw away 0.5*log2(16384/1024) = 0.5*log2(16) = 2 mipmap levels, which is correct. Why multiplying by half? Because we actually don't want log2(x), we want log4(x), because the number of texels drops by 4x each mipmap level, not 2x. But log4(x) = 0.5*log2(x).

Obviously, you can precalculate the value "top_mipmap_texel_area" - once a mesh is texture-mapped, it's a constant value for each triangle. But calculating the screen-area of each triangle each frame is expensive, so we want to approximate. If we pick an area approximation that is too high, then the actual value of screen_area will be higher than the one for the current frame, and we'll throw away fewer mipmap levels than we could have done. This isn't actually that bad - yes, we waste a bit of memory, but the texturing hardware still does the right thing, and will produce the right image. So approximating too high doesn't change image quality at all. Given that, the approximation we make is to assume that every triangle is always parallel to the screen plane - that its normal is pointing directly at the viewer. This is the largest the triangle can possibly be in screen pixels. In practice it will be at an angle and take fewer screen pixels.

So how large is this maximum size? Well, if we assume mesh deformation and skinning don't do crazy things and ignore them, a triangle always stays the same size in world units (e.g. meters). At distance D from the viewer, with a horizontal screen field of view of F radians, and a horizontal pixel size of P, the screen length of m world meters is (m/D)*(P/tan(F)) pixels in length. That's world length to screen length, but we want world area to screen area, so we square the result. But if we feed in world area of the triangle, M=m^2, then we get the screen area is (M/(D^2))*((P/tan(F))^2) = M*((P/(D*tan(F)))^2). As a quick sanity-check: double the distance away = quarter the screen pixels. Double the screen resolution = four times the pixels. Widen the FOV = smaller screen area.

(P/tan(F))^2 gets calculated once a frame, so that's a trivial cost. And we'll make another approximation that the distance D is not done at every triangle, it's measured from the closest point of the bounding volume of the object. Again, we're being conservative - we're assuming the triangles are closer (i.e. larger) than they really are. So for a single mesh, we can calculate ((P/(D*tan(F)))^2) just once, and then use it for all the triangles. For brevity, I'll call this factor Q, so for each triangle, screen_area <= (world_area * Q).

So looking back at the mipmap calculation, we calculate 0.5*log2 (top_mipmap_texel_area / screen_area ) for each triangle, take the minimum of all those, and that's how many mipmap levels we can actually throw away.

.... min_over_mesh (0.5*log2 (top_mipmap_texel_area / screen_area))
>= min_over_mesh (0.5*log2 (top_mipmap_texel_area / (world_area * Q))
..= min_over_mesh (0.5*log2 ((top_mipmap_texel_area / world_area) / Q)
..= 0.5*log2 (min_over_mesh(top_mipmap_texel_area / world_area) / Q)

And of course the value (top_mipmap_texel_area/world_area) is a constant for each triangle the mesh (again, assuming skinning and deformation don't do extreme things), so the minimum value of that is a constant for the entire mesh that you can precalculate. The result is that we haven't done any per-triangle calculations at all at runtime. If you grind through the maths a bit more, you find that it all boils down to A+B+log2(D), where A is a per-frame constant, B is a per-mesh constant, and D is the distance of the mesh from the camera (remember that log2(x/y) = log2(x)-log2(y)). Again, quick sanity check - if you double the distance of a mesh from the viewer, log2(D) increases by 1, so you drop one mipmap level. Which is correct.

This is pretty nifty. If you're doing a streaming texture system, it means that for each mesh that uses a texture, at runtime you can do a log2 and two adds each frame and you get a result saying how many mipmap levels you didn't need. If you remember that throwing away just one mipmap level saves 75% of the texture's memory, this can be a huge benefit - who doesn't want 75% more memory!

This isn't just theory - I implemented it in the streaming system used for Blade II on the Xbox and PS2 engines at MuckyFoot, and it saved a lot of memory per texture. This meant we could keep a lot more textures in memory. As well as allowing more textures per frame, it also meant that we could keep a lot more textures cached in memory than we actually needed for that frame. This allowed us to stream far more aggressively than we originally intended. The initial design for the PS2 engine was to not stream - we assumed that DVD seek times would cripple a streaming system. But because there was a lot more memory available for textures, we could prefetch further ahead and reorder the streaming requests to reduce seek times. And the system actually worked pretty well.

Flip the Question

The next neat trick is instead of asking "how many mipmap levels do I throw away", you ask "how many do I need". This is obviously just the same number, subtracted from log2(texture_size). If we go back to the original equation of 0.5*log2 (top_mipmap_texel_area / screen_area) - look at how you'd actually calculate "top_mipmap_texel_area" for a triangle. You find the area in the idealised [0-1] UV texture coordinate space, and then you multiply by the number of texels in the texture. So the answer to the new question "how many mipmaps do I need" is:

...log2(texture_size) - 0.5*log2 ((texture_size^2) * area_in_uv_space / screen_area)
= log2(texture_size) - 0.5*log2 (area_in_uv_space / screen_area) - 0.5*log2(texture_size^2)
= log2(texture_size) - 0.5*log2 (area_in_uv_space / screen_area) - log2(texture_size)
= - 0.5*log2 (area_in_uv_space / screen_area)
= 0.5*log2 (screen_area / area_in_uv_space)

Hey! What happened to the texture size - it all canceled out! Yep - that's right. It goes back to something I mentioned right at the top. The texture selection hardware does not care what size the top mipmap level is - unless it wants one that doesn't exist, obviously. But if you give it a large enough texture, it doesn't matter how big that texture is.

Again, every graphics coder is saying "right, I knew that, so?" But this is actually quite a non-obvious thing. It means that if you have a streaming texture system, the only thing that matters is "how many mipmaps do I need to load into memory" - the rest are left on the DVD and not loaded into memory. And the answer to that question is independent of the size of the original texture - it only depends on the mesh and the UV mapping and the distance from the camera. So the actual graphics engine - the streaming and rendering - doesn't care how large the textures are. And that means that the rendering speed, the space taken in memory, and the time needed to read it off disk are all identical even if the artists make every texture in the entire game 16kx16k.

To say it again: the only limit to the size of textures the artists can make is the total space on the DVD, and the time taken to make those textures.

I finally grokked this when I was making the MuckyFoot engine to be used for the games after Blade II and I kicked myself. It had actually been true for the Blade II engine - I just hadn't realised it at the time. But it was a pretty cool thing to go and tell the artists that there were no practical limits to texture sizes. They didn't really believe me at first - it's an almost heretical thing to tell 15 artists who have spent their professional lives with coders yelling at them for making a texture 64x64 instead of 32x32. Nevertheless, if you have a streaming system that only loads the mipmap levels you need, it is absolutely true.

Practical Problems

Of course there's still some practical limits. One thing that breaks this is the standard artist trick of using fewer texels for less important things, such as the soles of shoes or the undersides of cars. What happened was that they would use all this new texture space for interesting things like faces and hands, so the texture size would grow. But the soles of the feet would stay the same size in texels - because who cares about them? Then the preprocessor would calculate the mipmap level it would need for the texture, and the soles of the feet would dominate the result, because it would try to still map them correctly - it doesn't know that they're less important than everything else. There's a few ways to solve this - ideally the artists would be able to tag those faces so that they are ignored in the calculation, but this can be tricky to retrofit to some asset pipelines. A way that I found worked pretty well was to take the minimum linear texel density for each triangle (this copes with stretches and skewed texture mappings), and then for the mesh take the maximum of those minimums. This will find triangles on things like the face and hands, and that will be the density you assume the artist intended. If they didn't give as many texels per meter to other triangles, you assume that was intentional - that they deliberately wanted lower resolution on those areas.

One counter-intuitive thing is that the dead-space borders between areas in a texture atlas need to grow as well. If your artists were making 256x256 textures with 4 texels of space between parts, when they up-rez to 1024x1024, the space needs to grow to 16 texels as well. The border space is there to deal with bleeding when you create mipmaps, so you need to make sure that e.g. the 64x64 mipmap is the same in both cases.