If you are not into game development or don’t know what a texture atlas is, this post is not for you.
The fact is that we have thousands of images that we need to put in texture atlas for the game to perform properly. Most of the images can be put in atlas and PVR compressed for the iOS devices, to save space. Some of them must be in their own files because tiling does not work otherwise. And others, mainly pixel-precision user interface textures, can’t be compressed. This and many other details led us to write our own tool.
So we start with thousands of images like this that we need to pack tightly.
The algorithm for packing the textures uses binary trees and is based on this blackpawn article. I recommend reading it if you want to fully understand this post because I’m going to talk only about the notable differences: transparent pixels trimming, multiple simultaneous atlas processing, and a map/reduce process that takes advantage of multicore CPUs.
Transparent pixels trimming
The characters and objects in our game have multiple states and the majority of them are animated. This means that we have hundreds of frames for each character. For multiple reasons (blender as editor among others) all frames for an object have the same dimensions. This means that they have a lot of transparent pixels because we have to leave space for the animations to fit in (A).
To reduce the memory footprint the atlas are created trimming out the transparent regions. This means that some areas of the atlas that were previously transparent now have pixels from other frames.
If we use the same geometry to draw the trimmed frame, it will be scaled and distorted, which is not what we want (B). If we draw the full frame, parts of other frames will appear.
To solve the problem we need to modify the geometry and remove the trimmed areas. Once the frame that needs to be drawn is selected by the engine, according to the properties of the object (time in the animation, state, etc.), the engine calculates the inset from the original geometry and creates a couple of triangles with these coordinates (C), and adds them to the draw list.
Multiple simultaneous atlas processing
If you read the blackpawn article you see that they always refer to a single atlas file. We have too many images so we need multiple atlas files anyway.
We create a binary tree and start loading it with images. When an image does not fit, we create a second binary tree and put it there. We proceed like this, adding as many binary trees as we need, until we ran out of images.
Multicore CPU optimized
The algorithm has the following steps:
- walk directories and find the images to process, preparing “batches” of them.
- read each image, calculate trimming and cache for later use
- calculate the atlas coordinates for tiling the images
- write the output atlas files, based on the atlas coordinates
- write the atlas coordinates to a json file to be used by the rendering engine
The first and last steps are not run in parallel because they don’t take too much time. In the first step we compile the batches, which is an array of tuples containing the images that will be part of the same atlas sequence of files.
The second step is heavy: reads all the images and walks them finding the transparent pixels to be trimmed off. To execute this in parallel we use the fantastic Pool object, from Python’s multiprocessing package, like this:
pool = Pool(processes=number_of_cores) img_data = pool.map(read_image_data, img_paths) # read_image_data reads and stores the data in a global var
Then we proceed to calculate the tiling using the cached image data.
atlas = pool.map(prepare_atlas, batches) # run prepare_atlas for each batch, in parallel atlas = reduce(lambda x,y: x+y, atlas) # concat all the results
The prepare_atlas function is executed for each batch, calculating the trimming, and creating the binary tree in memory. The reduce() step only concatenates the binary trees for the next step. This only works in parallel if there are many batches. This is because filling the binary tree is not that slow and making the algorithm parallel is not trivial.
If you’ve ever used the multiprocessing package you know that there’s only one parameter to the called function. This is why each batch is a tuple, and besides the images to process also has an environment variable that contains, for example, the image data cached in the previous step.
The next step is actually writing the atlas files. This is the heaviest step (more so when PVR compression is used). Again we use Pool:
frame_locations = pool.map(write_atlas, atlas) frame_locations = reduce(lambda x,y: dict(x.items() + y.items()), frame_locations)
Here write_atlas does all the job and then the results are concatenated for the next step which is writing the texture atlas layout to a json file.
These are the benchmarks for 1500 frames, in a MacBook Pro with 4 HT cores, without PVR compressed output.
- 1 process: 119 sec
- 4 processes: 49 sec
This gist contains the source code in current state of the script. It’s too specific for our process and the way our KGL texture loader works, but some of the implementation ideas or code can be useful for others.