| Zombie simulator: performance woes (and solutions) |
|
| Thursday, 28 January 2010 18:12 |
More than one carThe video I showed recently with the spinning car demonstrated the car model in the "World of Zombie" simply by loading the model using SceneManager::getMesh(). I still had a lot of work to do to get the car into the engine properly, in a form that was usable. So, yesterday, I wrote the VehicleDef class for loading the model, dismantling and identifying the parts and storing them in a buffer for later recall. Following this I created the Vehicle class which included a VehicleDef, reconstituting it as an IMeshSceneNode. Then I upgraded it all to include an XML file so that more than one car model could be defined at the same time. Once I had that out the way, I took the car model for a spin by creating 50 random red and blue hatchbacks, and dotted them around the top corner of the map!
Ouch. The frame rate dropped to below 40 frames per second. Time to do some performance tweaking. The biggest problem is...I ran the engine for a few minutes with performance profiling enabled, then fired up GProf. The surprising result of this was that the worst offender, consuming 44% of the frame time, was Irrlicht itself, specifically a method of SColor called toOpenGLColor(). It turns out that Irrlicht stores colours in DirectX format, a 32 bit value, with eight bits for each of the colour components, arranged with the Alpha transparency first, followed by the RGB components, otherwise known as A8R8G8B8. OpenGL on the other hand stores the colour in the format R8G8B8A8. This isn't a big deal until it comes to rendering. One of the objects passed to OpenGL is a colour buffer, which contains the colour of every vertex in OpenGL format. Because Irrlicht stores the colour in DirectX format, the colour needs to be converted. A typical scene in my Zombie World has about 25,000 vertices, all of which need the colour components rearranged for every frame. That's the ouch. The solution is...I'll have to fix it myself. Irrlicht simply can't support OpenGL colours because Microsoft's proprietary way of doing things clashes with it. What I have to do to fix this is simply remove support for DirectX from the SColor class in Irrlicht and switch to OpenGL's way of doing things. Sure, I won't have access to Microsoft's latest proprietary furry tongue pixel shader in DirectX 11 or whatever else they've been cooking up, but I don't think that will really matter. I've got enough on my plate trying to keep the engine's performance up with 1990's graphics technology. A partial fixI did come up with one partial fix that's pretty specific to what I'm doing, but may help other users that are similarly afflicted. I modified SColor.h, changing the toOpenGLColor() method starting at line 234 so that it checks if the colour requested is all- white, with all colour components equal to 255. If so, then the function simply copies the colour to the output buffer without executing the conversion. The reason I chose this value is that most of my models have all-white vertices, and all-white is the same in OpenGL or DirectX format. void toOpenGLColor(u8* dest) const Re-profiling the code after this change showed that toOpenGLColor() was now consuming only 21% of the frame time, so it's a pretty big saving, but I will be able to save a lot more once I've dumped the Microsoft evil from the SColor class. Unfortunately, that will take time, so for now, I'll live with it. Who knows, it might turn out fast enough as-is? Multiple IMeshSceneNodes with the same material cause big slowdownsChecking through the code, I can see that Irrlicht orders the nodes prior to rendering by the texture they are using, presumably to send the texture to the video card once, rather than once for each node. This is a known problem and a good solution, unfortunately, it doesn't appear to work as efficiently as it could. I've previously created a custom scene node for rendering the buildings since I didn't want to use the standard frustum culling, which doesn't work properly with the particular Orthogonal camera perspective I'm using. To test out rendering efficiency, I commented out the "driver->setMaterial(material);" line in my custom node's render function, added the line to the main engine loop (specifying the building's material) prior to the call to SceneManager::drawAll(), disabled all the other mesh nodes, recompiled and ran the code. I was very surprised with the result. The engine went from about 350fps to over 450fps, with no change to the appearance of the buildings! To check whether it was a problem with Irrlicht switching the texture in and out of video memory, I put the code back the way it was, then removed all the calls to set the textures for the building nodes. Executing the engine showed all the buildings in white without textures and the engine's speed had fallen back to 350fps. The problem must be that simply changing the material for each mesh buffer belonging to each mesh node is consuming about 20% of the frame time. Planning ahead was a big helpI had suspected that something like this might come up, so I planned ahead when I started this and have been putting all the related textures into single files. This means that all my building textures, walls, rooves, etc. are in one big image, all my car textures are in one image, and all my sprites have been divided up into a few images grouped by sprite size. This has permitted me to make a couple of changes that have taken the engine from about 40fps back to over 300fps with no changes to the number of objects on screen. Based on my testing, the fastest way to render is not just have few textures, but try not to change materials as well. Unfortunately, you generally won't have the ability to keep the materials constant as Irrlicht apparently will reload all the material properties, even if nothing has changed. With this in mind, I've switched the buildings to using a custom render loop, and removed the call to set the material in the scene node itself. Instead, I make a single call to set the building material, then draw all the buildings that are visible. The change was actually pretty minor, just a case of writing a 20 line render function and commenting out a couple of lines. The result, as mentioned, was a saving of about 20% of a frame's render time. The second big change was to switch from having a separate mesh buffer for each sprite, to creating a separate mesh buffer for each sprite texture. When a sprite is created, it's vertices are simply appended to the mesh buffer that has been assigned the texture it wants to use, with the correct texture coordinates of course. This increases the efficiency of Irrlicht because at render time, when SceneManager::drawAll() is called, it will draw all the sprites usingĀ the current texture before moving on, drastically cutting down the number of material changes. This been a far better result than the change I made to the buildings, because I can now have thousands of billboard sprites with little affect to the frame rate, though admittedly, it's a little hard to work with moving groups of vertices around, rather than a node. It's well worth the pain though. blog comments powered by Disqus |

