Friday, February 10, 2012

Performance disscussion & graphical updates

If you don't have the patience to read through this long post just skip to the end: there are two screenshots with the way the game is looking right now.

This is one of three posts in which I talk about some specific theme. Today about performance, some other time about the design of the game systems and key differences from DF and finally I'll detail all the playable races with great (anatomic) detail.

So I was working on improving terrain quality and was trying a new technique that would make it prettier than ever. Getting the color blending right was near impossible, but I was getting good results. The caveat was that this method was two times slower, used the CPU too much and not enough GPU and used three times as much memory. This was the first alarm signal that I must greatly mind the performance implications of what I implement because this is not a normal game.

The second alarm signal came when I put together everything that I have ever implemented (except for shadows): fully editable 300x300x100 maps with level changing, rendering up to 60 levels, great item diversity and density. This was the result:

Yikes! Sure, most of the new items and the trees don't have low LOD variants, but still, the performance is way too low. Ten million triangles? Almost 500 MiB of mesh memory? A very heavy round of optimization ensued after seeing this and you will see the results at the end of the post.

So why is performance such an issue?

First of all, this is a DF inspired game. Just creating a game with half as much depth as DF will be heavy on the CPU. This is the first challenge.

Creating a game with the same depth, but with a big scope, scale and frequency of events as in DF is the second challenge. This is one is very hard, almost impossible. No one man in the history of gaming has managed this yet. You may be thinking: but what about Toady? Here's the thing: he is very close, but the game is not out of alpha yet so you can't really tell. Are the anecdotal jokes about needing a super computer to run DF under a given load that circle around in geek circles caused by the game still being in alpha or is this an issue that can't really be resolved. Could multi-threading help? There are a lot of questions and right now DF might be 99% there, but the way to 100% is still very long. Ah, and don't forget about the bugs.

The third challenge is the 3D. Creating an engine that can render everything and provide an interactive environment is not that hard. Creating one that does these things with a big item density, no billboarding and unlimited view distance is very hard. Creating one that responds to these interactions when they happen multiple time a second, while still being responsive and while still running pathfinding, scheduling, time compression, adjusting the needs/mood of dwarves and doing world simulation in background is a huge challenge.

So in order to have good performance there are a few things I must do and a few that you must do.

I need to be very careful with my ambitions. The above mentioned pretty terrain renderer was too much. I replaced it with a new light on resources one. This one uses color blending and some visual illusions to give a less rectangular and Lego like aspect to the terrain:

Sure, the predominant theme is still rectangles, but now they seem to be varied in shape. Here is a close up where the black spot seems smaller than the yellow one, even though they are exactly the same size:

Once in  a while this system produces a rather ugly transition, but this is not a problem I'll worry about. I also removed blending from selections, making them perfectly rectangular and with clear edges. This reinforces the illusion that "this a free form shaped world where dwarves select perfectly rectangular shapes because this is how they think and plan their tasks" rather than "this is a world made out of rectangles":

The next thing I must do is have a very clear idea of a target performance on the target hardware and make sure not to sacrifice visual quality in order to achieve higher and unneeded performance. My engine can handle 90000 items and more. But a normal map won't have such and item density. Rather than scaling my engine to handle absurdly high item densities well and have extremely good and unneeded performance under normal densities, I'll scale it so it has very good performance under normal densities and runs acceptably with absurd densities.

The performance will also be based on the main camera: the top down one (and later the isometric one). Top down is a lot faster than first person because it has a narrower field of view and frustum culling works extremely well with it. Furthermore, in top down you can't really turn on very extreme rendering ranges because you won't be able to see what you need to if you do.

The first person camera is a bonus. A very useful bonus! In first person mode you'll see farther but have lower performance and you may wish to increase the rendering range because it benefits you, unlike in top down mode.

So for hardware compatible with the minimal setup (which I have not precisely determined yet), my target is to have a smooth 60+ FPS in top down mode. In first person mode I'll accept lower performance, that is 40+ FPS. This is the bare minimum. If you have the required hardware and you don't have this performance I did not do a good job with the engine because I did not reach my target.

If you are curious, on my laptop I am getting 200-400 FPS in top down mode and 80-150 FPS in first person mode. These measurements are for fullscreen.

Also, in the future I'll make sure to cater for people with a thousand dollar video cards too. This means better quality models, postprocessing effects and very GPU intensive but pretty LOD switching profile.

So taking these into consideration I perfected the way of mixing smooth and flat shading in the same model to obtain the best look possible. Analyzing the item density requirements I am now comfortable with using meshes with up to 120% polygon count. I don't want to sacrifice a minimum level of visual quality for some unrealistic performance expectations. Normal games won't have 90000 items at once. So I made sure to obtain the best looking barrels and also managed to mostly eliminate the seem:

If you look carefully you can still see it, but I am fine with it. Annoyingly enough, from a distance it is more visible than from close up.

I did not manage to get antialiasing to work yet with Irrlicht. But the engine is capable of it. Using the NVidia Control Panel I forced the game to use antialising. Here is a set of barrels as rendered normally:

You can notice the aliasing effect at the edges of the barrels. And now with anti-aliasing:

This looks a lot better! There is a noticeable performance hit, but my laptop that routinely runs the game at 150+ FPS under normal circumstances does not care. I'll try to add options for direct antialising, but for now using the NVidia Control Panel will have to do.

I remodeled the mesh for the dawnsider:

I am not 100% happy with it, but it is better than the previous one. Here is an alternative dawnsider when bearing fruit.

I adjusted the weapons rack. It has the same concept, but now the polygon count is in line with the rest of the objects:

The armor stand was very hard to see from top down view so I made it thicker. Anyway, these are dwarves, not elves! They need thick and bulky armor stands. Maybe I'll increase the width even more:

I also modeled a very basic bed. It is just placeholder and has no sheets on top, but here it is:

That's about it for what I modeled, but then Bryan sent this to me:

A workshop! A very high quality and detailed workshop! Thank you Bryan! The patch that finally brings workshops to Dwarves & Holes has long since been heralded as the Messianic patch. If you have an interactive world with raw resources and workshops to process them, you are just about done! Am I right? Am I right? Guys? Am I right? Hello?

In all seriousness, workshops will be a major milestone, and while they are not planned until version 0.3.5-0.4, starting to think and plan a little the modelling part for it even now can't hurt. This workshop is too high polygon for starters. While you will never have a huge amount of workshops at once, this workshop has at least ten times the polygon count as it should. I'll reduce the polygon count and send a modified model back to Bryan when I have time. Another concern is the height of it. The little legs are probably not a good idea and I'll adjust it to suit the height of a block and relative size of a dwarf. That is if I still allow for single Z level workshops. Another concern is that I might add levels to workshops. When starting out, you create a low level and less productive workshop. Latter when you have time and resources, you will upgrade it. I'm thinking improvised, normal and finally masterwork workshops. The way this is modeled, with all the detail, it does not look like something a dwarf threw together in a few hours. This looks like the highest level workshop possible with all the detail. And a final concern is that while I want workshops to have a fixed size, all workshops will come with their built in variable sized rectangular stockpile area. This is major change and departure from the DF workshop model and I will detail it some other time.

Short version: the model looks great, hits all the right ideas with the detail, but I need to remove the feet, scale it on the Z axis and reduce polygon count.

So putting things together, after a healthy dose of optimization, here is how a first person fly by render of the game looks with a 300x300x100 map, with the engine trying to render up to 60 levels:

As you can see polygon count is not so high and memory consumption is lower. While item density has been  adjusted a little (there were too many trees before) this scene is largely equivalent to the previous one that ran poorly and had the white trees. The trees are too dark because I did this on a very bright monitor and did not notice the dark colors. An easy fix. Most of the new items have no low LOD meshes yet so the polygon count is a lot higher than it should be. This includes the trees. I need to figure out a way to do low poly distant trees that do not look like crap. 

In the next screenshot we have a more zoomed in view of the same map and thus performance is a lot better:

These two screenshots do not contain the adjusted dawnsider, weapons rack and armor stand, the bed and workshops are not present, but you can still make out most of the meshes.

So what do you think? I say with a little better terrain shading that takes into account the position of the sun this engine has the potential to look quite good! I'll create some screenshots with shadows too, but shadows are something that I did not manage yet to get working.


  1. The screenshot looks great! I don't know how it would be with many more varieties of meshes (since the game would have more than a few kinds of object) but I had very little difficulty making them out. The one exception is the armor stand, the legs seem a bit too thin. But one advantage of using 3D is that you can zoom in when something isn't clear enough, so I wouldn't be bothered at all by such issues with an engine like this.

    And about shadows: This is only my personal experience, but whenever a game runs poorly, the first thing I reduce is shadows. They just contribute very little to a game- I'd be happy turning them off completely, except without simple blob shadows it can be difficult to tell what objects are in the air and which are on the ground. The only time it ever matters is atmospheric games, such as first person horror games set in dark, torchlit corridors, which I don't think is exactly what you're going for.

    Why is it that you consider 500 MB memory usage a problem? It's rare to see a computer with less than 2GB nowadays, and even with Windows 7 consuming a generous 1 GB, that's still twice more what you've seen left. And besides, realistically a typical computer (especially one where a reasonable person would expect to run a DF-style game with good performance) would have at least 4-6 GB, probably more a year or two in the future.

    And last, you spoke about the complexity of DF, but I would rather opine that the difficulty in "creating DF" is the design of such complexity and its implementation, not the performance. First off, there isn't much that can be done about the performance- most of the heavy work is, AFAIK, pathfinding and fluid physics (which is again mostly pathfinding). You can only optimize that so much. When you have 200 dwarfs, there's no escaping the need to do pathfinding for 200 dwarfs. From what I understand, Toady has already made all the low-hanging-fruit optimizations, and even some not so low hanging ones. But anyway, all of this is for the CPU to do. The graphics load is orthogonal to this- the GPU would be handling the graphics anyway. So shouldn't the performance impact of even very demanding graphics be ultimately negligible, unless you don't actually have a graphics card?

  2. Well there is one thing to consider: my engine has one particular peculiarity that other don't have. It uses no billboards and has "infinite" view distance. Well, as large as the map. This on default settings.

    In order to get this to work the engine ties CPU work, GPU work and memory consumption very tightly together. Mesh memory is mirrored so it used both RAM and GPU RAM and every single extra MiB has some impact on performance. All the data is constantly shifted and moved around. I determined experimentally that under 200 MiB seems to be the sweet spot were everything runs good. As you can see with almost 500 MiB I was getting 20 FPS, but in a second screenshot with under 200 MiB I was getting 40 and in the third where I have around 150 even more.