Friday, October 14, 2011

62 – WTF

Look at your tree! Now back at mine! Clearly, your tree looks better. In my defense, I did not model that ugly tree; I generated it procedurally from a sphere and a cylinder. 

Fortunately, Stewart Bridge, aka BrewStew offered to model a few objects for me. You may have seen the barrel in the comments section and now I also have a lovely table at my disposal. Thank you, BrewStew! He can’t fully exercise his talent because I have some strict polygon/face count requirements, but even with low poly models the assets looks great! I would advise you check out his future work when or if he makes it available. 

So what do you do with such assets? You place them in the world. But I do have the problem that I can render at most 300.000 faces. Over that performance becomes way too choppy for me to be comfortable with it. This is not Crysis, so I can’t really release something for futuristic PCs. Also, a friend pointed me towards some statistics, how even low end GPUs can render millions of triangles per second and since performance seems fairly constant on most platforms, one would be forgiven to think that I am not rendering at the GPU performance, but at the speed at which I can transfer data from the CPU to the GPU, something bus related. 

Haha, it’s hardware buffer time! Tanananananananananananananana… Or so I thought. This seemingly simple and short task turned out to be quite long and full of WTF moments and complete randomness. The first sign that things will be random happened actually while I was implementing borders. Everything worked fine until I loaded the last texture. The entire scene disappeared. Remove the last texture: it was back. I significantly reduced GPU RAM consumption by removing all unwanted textures, including the one from the 2D engine that still got loaded for some reason. Same thing. Did not matter what texture I loaded or where. One more => no scene. So I batch converted all PNGs to BMP. The scene was back and I no longer experienced any issues. I reverted to using the original PNGs and no issue. I readded the 2D engine textures: no issues. Very weird. But this was only the tip of the iceberg. 

Using the new barrels and hardware buffers, I managed to stick quite a few in a scene: 

That’s 15000 barrels with high res textures rendered with filtering at a smooth FPS of 60 or more (FRAPS caps it to 30). Normally one would not have so many objects in the scene. Gameplay dictates the maximum number of objects and you can do tricks, like not rendering any small items on levels that are not the current one. But let’s say you want to render so many objects. Let’s say you want more! 

Using the low poly version of the barrel provided by BrewStew, I managed to fill each cell: 90000 barrels! This was the last time everything went smooth. I saw the framerate, I was happy, so I was curious how fast it would go with only a few low poly barrels. The answer was: 40 FPS. Even with a single low res barrel 40 FPS. Replacing it with the high poly barrel: over 800. WTF? WTF? WTF? Do buffers have a minimum size requirement? How about a single buffer with hundreds of low poly barrels? 40 FPS. A few thousand? 40 FPS. Even more? Back to normal performance. 

Maybe it is still related poly count. I export the barrel with some “normal normals”, not “high quality normals” from blender, which for same strange reason causes Irrlicht to report that the barrel has more faces. Problem solved. It works with a lot of a few, but with a lot the performance is lower because now I have a more faces. 

Meanwhile, OpenGL has not a single problem and works admirably with all scene setups. This is the first time in my life when OpenGL not only works (which is rare), but works a lot better than DirectX. But I think it is an Irrlicht issue. Irrlicht somehow does something with DirectX messing it up. 

So to recapitulate, now I can render 15000 high res barrels or 90000 low res barrels, but the low res only render at about 80% speed because I am forced to render more faces in order to prevent DirectX rendering at 40 FPS. Camera movement is not that smooth because Irrlicht containers are really not that well suited for making common operation “faster than they can be”. You know, using a container in a special way to amortize the cost of adding to it based on the nature and frequency of these operations. Anyway, I look over the Irrlicht source code, see what it does with the containers and manage to optimize it in such a way that camera movement is smooth again. 

The problem is that I create all objects in the beginning. I would be nice to be able to repopulate a single map section on demand so I can do dynamic LOD switching. I implement this with such a trivial change that there is no way you can break anything. I get under 10 FPS. WWWWWTTTTTTFFFFF. I only populate one section. Over 800 FPS. Two sections? 40. Three sections? 20! WTTTTFFFFFFF? 

The only difference is the number of scene objects Irrlicht sees. Adding more than one does something with the hardware buffers. Meanwhile, OpenGL is chilling in the corner, barely breaking a sweat and rendering anything I give it at high performance. 

So I rethink the entire hardware buffer system, adding pooling to it. Irrlicht is very poor at giving you any control, so I can’t really even free a hardware buffer, so they are not freed, just overwritten and reused. I start reading Ogre3D tutorials. Ogre3D seems to be a lot more about control. Using the pooling system and still Irrlicht I manage to create a dynamic LOD switching system: 

There are fewer objects, but the focus is not on the number of objects, but on the LOD switching. You can’t tell the difference, but distant barrels are rendered with low poly model and textures. In case you were wondering why the pattern: this is how many high res barrels I can fit inside a single buffer and for this sample I wanted to have one buffer per section. 

So to recapitulate, a lot of high res objects, a lot more low res objects and LOD switching works, but only under OpenGL. Turning off hardware buffers for OpenGL we get 38 FPS, which is very close to the 40. So I guess that DirectX ignores my request for the buffers to be directly accessible to the GPU or whatever. 


Did I do something with the GUI or does it use some primitive that causes DirectX under Irrlicht to die? I don’t know. I’ll have to investigate. So framerate is smooth now with both renderers. Using the low poly barrels, I render 90000 of them 100 FPS (30 in FRAPS). If you look at the video you will see that it is reporting over 50 million triangles. The labels are wrong and computations are not quite accurate. The map has dimensions of 300 cells, we have 96 indices per barrel. That is 2880000 triangles. But the display is not about exactly how many triangles are on screen, but how buffers are allocated and how much potential storage they have. Because the magical number of buffer redundancy that I introduced is 18, we get 2880000 * 18 = 51840000 that is almost as big as the number triangles reported. The rest of the difference is because of buffer rounding, because I am not going to allocate a buffer of 5. Buffer sizes and memory boundary requirements make them larger than needed: 

Clearly there is ton of optimizing to do. The 18 redundancy buffer pooler needs to become more dynamic. 

GUI is turned off because of the issue, but things finally work. 50 million faces are a tad bit more than 300000. The question is what to do with this power. 

Obviously, non dedicated GPUs that do not support hardware buffering won’t be able to take advantage of this. So I’m thinking creating three quality settings. On medium, I’ll have at most 300000 faces. On low at most 150000 faces, even if I have to billboard everything. And a high quality renderer, with 1 to 3 million polygons in the busiest scene possible. 

I need to fix somehow the GUI and I am sure this is only the first set of surprises that 3D has in stored for me. We are no longer in the realm of rationality. 

I’ll try to put together some realistic scenes with level geometry, trees, barrels and tables.

1 comment:

  1. Hehehe I don't know which is more awesome... seeing so many of my barrels on screen or knowing that you could potentially (on highest quality setting) have even more detail shown on screen than that at once!

    at least I know now that you'll only need two LODs for the models. It's having to make the same object compatible in multiple levels of detail that take so much time.