Friday, November 25, 2011

First generation 3D engine done!

Finally! 'Tis done! The first generation of my 3D engine is done! I fished it a couple of days ago but it has been a terribly busy week so I did not have time to post.

So I'll go over the final touches faster, but compensate by talking about next steps a little.

First, I finished the fine tuning and profiling for LOD switching. The problem was with the buffers: I have two arrays for each buffer, one for vertices, one for indices. Pretty standard stuff. In Irrlicht these arrays are instances of the standard Irrlicht array class and you use push_back to add items, which in turn calls a standard insert. Normally I am a huge proponent of using standard containers everywhere and amortized constant cost growing containers, but here it was performance bottleneck. Check out the code for insert. First is check if it needs to grow the container. Because of the volume of data I have to process I made sure ages ago that containers do not need to grow. Array capacity is computed and stored for future use and when you create a buffer the arrays are automatically set to required capacity. So insert will never enter the grow path. But still, this is a completely unnecessary if in my case. Then it checks if the item is inserted at the end or not. I always insert at the end, so this is a second unnecessary if. So basically, for every single vertex or index, I get two unnecessary branches and only after you fall though them you get to the desired code. Then, I have one extra if that is necessary and introduced on a higher level by me for each vertex/index. So why am I bitching about 3 ifs? Is this not a premature optimization? Well, no, since in the context of the volume of data processed and the requirement of real time camera movement, 3 ifs have a noticeable impact on the smoothness of movement. I eliminated these ifs and rewrote the insertion code to use pointers. The solution is less elegant, but it was worth it. The great part is that this code could work for (unmanaged only?) C#, which supports pointers. I couldn't implement something as computationally intensive in a managed environment because there arrays are checked both for insertion and access.

Then I managed to get XEffects - Reloaded to work. Had to chase down a new modified Irrlicht DLL, but in the end it worked as advertised. Only it does not fit my needs. You see, lighting is an art. More so that a lot of other things. And it is hard as hell to do. Lighting is single-handedly the most difficult task I have encountered in this project. I went over several iterations of it. A few night ago I spent two hours just tweaking lighting. Right now I am using a 3 light system: a spot light and two directional lights to simulate outside lighting conditions. Light only illuminates items, terrain is not illuminated because I can only get if very dark and not uniform or extremely bright. Lighting and stencil shadows are somewhat decoupled. but with XEffects and shadow mapping, lighting and shadows are very tightly coupled. So I was forced to illuminate terrain. In the end I almost managed to get illumination to look like and outside scene, but here is where I found the first big obstacle: realistic lighting came at the price of having to position lights very far away from the terrain. Shadow mapping is very dependent on the distance between the light and the object it illuminates because it works with a fixed resolution shadow map. So with the distance I was using shadows for objects were barely a few pixels wide in the shadow map.

The second problem is probably related to the complexity of the scene: it is so high that I had to modify Irrlicht in order not to crash. XEffects also crashes. I could only add shadows to terrain and other discrete meshes  like dwarves. When adding it to the item meshes, it would crash as vanilla Irrlicht does.

So in the end I did not manage to get XEffects to take over shadows for me. I did learn some important lessons and I think I know what can be done: either figure out a new lighting model that does not rely on distant lights, or create a special version of the XEffects shadow mapping tehnicques that decouples lights from shadows. Use the isolated lights to illuminate everything and a directional shadow with a fixed "reach", similar to the case where you would have shadows cast for every object by one light each that is very close, but follows the direction of a sun ray.

Needless to say, my skill level is no way near the level it needs to be in order to implement such lighting and shadows. And it would take me months to implement and test it. So I'll stick with stencil shadows for now. For Tech Demo 1, there will be two options, one controlling item shadows and one controlling terrain shadows. Low LOD objects do not cast shadows. With low item density performance is more than acceptable with stencil shadows, with medium density it is way to slow, so shadows must be turned off and with very high density the engine crashes.

The engine can also handle animations as long as the number of actors is low (hundreds at most). I am yet to find a good format to export meshes with skeleton animations from Blender, but meshes that are already exported work fine.

Reading this you may be wondering why I am not using an existing engine for this game. Well, first of all, Irrlicht is considered a 3D engine, so probably the discussion should be about higher level engines. Finding such and engine is not a trivial task. Even if I were to assume that porting to the new engine would take zero time and effort, I have very specific needs. I need a fairly high level engine that can do a lot without the standard overhead of high level engines, while being focused on supporting huge number of objects and destructible terrain. I am not aware of such and engine and if one exists, it would probably have steep licensing fees.

When working with engines it is the small details that get to you. Let me give Ogre3D as an example. Ogre3D requires each scene node to be identified by an unique identifier. The documentation says that it checks the identifiers and gives an error if it finds duplicates. Identifiers are strings. Not let us consider my video about 300k barrels and let's consider that each barrel would be an unique scene node (in my implementation this was not true; I used batching for performance reasons). So I would need to create a fast algorithm that assigned 300k strings to nodes on the fly and pray that Ogre3D finishes the checking fast enough. You may think that this should be fast, but I had problems with 3 if statements. Even if this discussion is theoretical, who know what Ogre3D does behind the scenes and how fast it is. So before I can choose an engine, I would need to do a very thorough evaluation of each and make sure that such details are never and obstacle.

So what next?

I don't know about you, but I am getting tired of working on the engine and would very much appreciate a palette cleanse. A second generation 3D engine will be created, but I am sincerely hoping that I wont start working on it until March 2012 and spend that time on fishing workshops and crafting. Workshops will be my next focus, with some effort spent on the overworld map.

But December is not going to be a productive month. With 3 official holydays and at least two larger events I won't get that much work done. Still, updates will come. I'll put the code base on feature freeze for Tech Demo 1, iron out a few bugs and search for a good installer generator so I can make it available.

Since most of the changes are implementation details, I can't find a screenshot to summarize up the progress since it would look like previous screenshots. Or maybe I can:


Here's a lovely levitating land trout. A formidable foe indeed. A bane to the existence of any peaceful fortress! Thank you very much InfecteD for modeling this beast!

PS: I really need to add a test skybox, maybe something from here:

Friday, November 18, 2011

So Random... - 02 - And a little bit of color

Good news everybody! The first generation 3D engine is nearing the end of its development process. I am so close! Still, I need a few more day so until then let's get some semi-filler going!

Last time I left you with a grayish height map generated with Perlin noise that could be considered a fair approximation of a section from a natural land mass. I concluded with some musings about merging several height maps, one used for giving shape and another for giving detail, but today I am going to continue in a new direction.

First, I chose a standard water level. Based on the relative distance to this water level I tried to assign colors to the resulting map to make it seem more "natural". Of course, only using height for coloring is less than ideal. You get a few cases where huge jumps in height generate some colors that don't really make that much sense geologically speaking, but we can get some nice results:


Blue is the water mass. The darker the blue, the deeper the water is. On the shore we have sand that is represented by shades of yellow. Then green and brown for different elevations, up to white as we get to snow. The snow also has a nice little "bloomish" effect. All color transitions except for sand and white snow are perfectly smooth, while sand and snow show some discrete color shifts. This was done to enhance the visuals and is purely cosmetic. The height change on these types of terrain is still smooth.

To test the generator I tried to make it render 10000 maps, but I grew bored after 1200+. Still, it works and can generate a pretty varied set of maps, like this mostly water based one:


I will probably have to exclude such maps because they have too much water. The algorithm is not capable of generating and infinite number of realistic maps (but can generate a huge number of maps, including ones that are ugly/unrealistic/undesirable based on integer/floating point max size) and currently it outputs 20000 unique maps. This one I really like:


So generating small maps in great numbers works fine. It is even fast enough that you can create an interactive map navigator. It won't be snappy, but it will work. But what about creating really big maps with high detail? Like I said last time, one of the great advantages of Perlin noise is that you don't need to create and store the entire map with high detail. Or low detail. You create just as much as you need, cache it if you think you'll need it for latter, discard otherwise. But still, it is a good experiment. I created 8192x8192 high detail image. This took over 4:45 minutes and ended up occupying 270 MiBs RAM. Saved on disk as a 10 MiBs PNG. Blogger won't allow me to embed such large images, so I will post a screenshot of the image:


I created and even larger image. This one is 13312x13312 and creating it, saving it and immediately trying to load it almost crashed my system. It took Windows a few minutes to recover:


But colorizing based on height is not the only thing you can do. You can also try to texture everything. To better illustrate, I'll show the before/after results, with before being just Perlin noise and after being the full textured result:


This post is already too long so I won't explain today how you texture this. Maybe next time. 

Especially since there is one more thing I must talk about right now: the above image, while it looks pretty good, has one huge obvious fault. Look at the top of the snow surface. It is perfectly flat. You see, these maps are generated 100% based on Perlin noise. It is just noise, nothing more. Luckily, you have a bunch of parameters and a random seed. After a lot of experimenting with these values (except for the random seed), I managed to turn noise into something that looks a little like a land mass. With looks being the key word here. These parameters were chosen based on aesthetics. I did not try to see a 3D render of the map and decide what is best for that. And what is best for that is not the best for the aesthetics of a color coded topdown map. Which is again not the best for the aesthetics for the textured map. Long story short: I have chosen a set of parameters that creates good looking maps, but sacrifices detail in parts that are less visible in favor of parts where this detail would make more of an impact. In the color coded map renders, you did not notice that the top of the snow are is flat. I used most of the detail information to render a more convincing set of low heights. The parameters will need to be rebalanced for this texture based rendering. The flatness is more apparent on a "colder" map:


As a first step I'll need to make oceans not take up any detail resolution and use the freed up resolution for mountain tops. If this is not enough, I'll try the combination method I wrote about last time. I'll leave you for today with a render of a larger section:

Monday, November 14, 2011

So Random... - 01 - Noise World

I have three more major things do before Tech Demo 1: first I must finish the LOD switching. Coding is done, but I have tons of profiling to do and this will not result in posts so it will be a longer but silent task. Then I need to decide what to do with the shadows. Stencil shadows work fast enough only with very low item density. I had zero luck with shadow mapping. For the Irrlicht solution I tried XEffects, but that package is far to old and can't be compiled with modern Irrlicht. XEffects - Reloaded has several problems, the largest beeing that it won't work with out of the box Irrlicht DLL. You need a special DLL. Of course, the link to the DLL is dead. I still got my hands on it though, but because I have also modified Irrlicht I get some weird failure in the creation of the 3D device. I do not have the sources for the modified Irrlicht  XEffects was created from in order to merge both. And the biggest problem is that XEffects requires GPUs that are more powerful and newer that the ones I am targeting. And speaking about dead links, at least 90% of all links when you google a topic like shadow mapping are either incredibly outdated, or more often just dead. The final thing I must do is the administrative tasks before the release, which again do not involve coding.

So I need some filler content that can occupy my time for the time being. I decided to do some very experimental stuff (even for me) while the above points are resolved, but while still following these rules:
  • What I implement must potentially advance the project by adding features or at least improving existing features while adding new lines of code. I could spend days tweaking values and balancing stuff, but I am not allowed to do that.
  • It must not be completely outlandish and unrelated, like e.g. simulating the movement of bird wings in thick mustard.
  • Every session must add at least 200 lines of code.
For the first topic I though I'd start with one thing that is missing and is very apparent that it is missing: "owerworld" generation. I need a larger and fairly realistic land mass from where you can select you embark location.

Right now I am using midpoint displacement for the embark location, but I will switch over to Perlin noise. What is Perlin noise you ask? This:


It is coherent noise (smooth and without seems, optionally tileable) that has a bunch of interesting properties for world generation. I hear the big boys are moving away from Perlin noise to other types of noise that have even better properties, but I'll stick to what is clearly documented. Also, Perlin has a large number of advantages over midploint displacement:
  • Midpoint displacement is slow. You are basically subdividing your area and randomly shifting the midpoint and then recursively repeating the process until you get a small (typically size of 1) square. So basically recursive computations with floating point numbers, where the size of the map directly influences the depth of the recursion. Slow. Perlin noise is no speedster, but by adjusting the number of octaves and interpolation method you can get some faster results for testing purposes. You can also do a fast calculation followed by a slower but higher fidelity calculation based on some information you have learned from the fast run.
  • It is hard to control the shape of the midpoint displacement. You are dividing and randomly increasing the middle point. If you want better shape you use the square-diamond method, which makes it even harder to control. The dimension of the results is also kind of fixed. It is not a lot easier to control the shape of Perlin noise, but you get to discover some interesting tricks to compensate for this. Also, a good Perlin implementation is very customizeable.See what I can do with the same algorithm:

  • Midpoint displacement must be computed down to the smallest unit and the end result must be kept in memory. Perlin noise does not need to be kept in memory. You can compute it on the fly for every point you desire. You can cache the results or just do it on the fly (as long as it is fast enough). So you can have virtually an infinite sized world. Diamond-square displacement on the other hand would require enough memory to be allocated for the generated area to fit inside and the value of a point is dependent on the point around it. Here is a nice comparison shot of the same "world" generated once with low detail, once with high detail (the twos sets were generated in complete isolation, one does not need to know about the other):

  • Perlin noise offers "infinite" detail as you zoom in. As you increase zoom level you need to increase the detail parameters, but theoretically the only limits are given on how fast you want to generation to be. The memory consumption is constant (and zero if you decide not to cache anything; not a good idea). Using the above high detail map, let me show how it looks when zoomed in 20 times to a point near one of the middle darker spot:


As you can see, the zoomed in portion is smooth and still detailed. Generating the zoomed in portion takes the same amount of time as generating the zoomed out one. If you consider these results the values in a height map, you can begin to see how this could be used for world generation: the darker a pixel is, the lower the elevation.

But the above patch does not look realistic at all. What piece of land looks like this? Maybe a section full of hills somewhere in the middle of a big continental map, but such maps are boring. I'll present a very intuitive way that can be discovered by experimentation to give some fair results.

First, one might argue that you can't really tell what is going on because of the smooth transition in the shades of gray. So one might be tempted to render the same "map" with some form of edge detection/strengthening algorithm. Or just simply increase the amplitude:


Wow! Is that it? That's the result we get by simply increasing the amplitude? The image on the right is the same map, but rendering at a zoom of 20 while centered on a random point. This gives an interesting clue: the zoomed out map is still too busy to be realistic, but the zoomed in looks like a section from a continental mass. Maybe we need to start with such a high zoom level and zoom in even more to get to expedition section. But before I talk about that, let me prove that the two maps are identical and also demonstrate some interesting composition properties. Let me show you what happens if we take the two previous pictures and place then on top of each other, with white used for transparency (please ignore the picture on the right; only the one on the left is important; but still, the right section looks very interesting if you imagine it with selectively inverted heights):


Back to the use of zoom for more realistic land masses! By applying the zoom factor of 20 for the normal map, and again 20 (so 400 total) for the zoomed in section and after adjusting some parameters to make edges more pronounced at such zoom levels, we get this:


The image on the left is our section of the continental map and it looks fairly realistic. One could choose the coordinates to zoom in randomly, do some computation to achieve some ratio of water/land or some desired height average or just do what I did: take coordinates (0, 0). The local map on the other hand does not look that good. It is very hard to find a coordinate in which we don't get an image similar to the one on the right: a clear and relatively straight separation of water and land.

I tried to compensate for this by lowering the zoom factor and making edges even harder. Also, choosing a starting location that is more interesting and has a lake also helps:


The problem with this approach is that you loose detail. So I tried to generate two maps. One with an amplitude of 10 and one with an amplitude of 100. Overlaying the two gives this result:


Hmmm, not great, but we have some detail in the elevation progression. Let's try different parameters:


Well, obviously this is going to take a lot more fine tuning. Here is another blending, but this one is more advanced because it is done in Photoshop. If I were to add some kind of normalization to make sure the border between water and land is more smooth, I could get (hopefully) better results:


Total number of lines of code: 267. One question before I leave: do you thing it would provide some value if I were to post the source code for the example above?

Thursday, November 10, 2011

67 – Frosting Culling

If you don't want to read this post but are interested in Tech Demo 1 then check out the end of the post. Or just read though it!

Check out this screenshot:


We only get 37 FPS. But it is a full map view, with 5,507,481 vertices and 10,040,364 triangles (assuming I did not make a mistake in the code that computes these values). That is 371,493,468 triangles a second. My GPU is working really hard trying to pump out this volume of triangles and it is understandable that performance is low. Now let's get a little bit closer:


What! We now have 46 FPS! A little bit closer:


What? 64! What is happening? What!! Even closer:


Uuuuuu! What? 159? Ooooo! What? What is happening? Must get closer:


Ooooo! Uuuuuu! What is happening? What? What is happening? Uuuuuuuuu! 318? Ooooo! What? What is happening? What is happening? Uuuuuu!

I wish you were here besides me because I am making the most obnoxious and annoying "u" and "o" sounds humanly possible. And a Stifler face.

You may have guessed what is happening: then engine now supports frustum culling. When all the scene components are in view you have no choice but to render them. But as they get out of view they are eliminated and rendering performance increases. Actually, they are only hidden. The buffers still take up resources. Frustum culling is dependent only on camera position and direction and camera movement can be very twitchy. Imagine a first person shooter when you can swipe your mouse around very fast. Doing a few axis aligned bounding box frustum collisions is doable. Building and destroying buffers on the other hand is not because it would make your mouse less responsive. So the bottom line is better performance, same resource use.

Normally I would do a LL3DLGLD on frustum culling, but most of the job I did with the capabilities provided by Irrlicht. Irrlicht automatically culls, but it uses a computationally very fast but not very effective in getting nodes out of the rending pipeline approach by default. I suspected for long that culling was on by default because every time I messed up bounding box creation some nodes started disappearing randomly. Changing the culling criteria to box vs. frustum insides generated the above results. I only wish that Irrlicht would remember the results of its previous frustum check so I can use it latter, but I can repeat the check manually if I need it at another time for some other things, like not switching LOD for nodes that are out of view.

Frustum culling scales well with almost all camera modes and should give you and extra substantial boost in performance. It even makes the game run on poor Intel integrated chips that would have no busyness rendering millions of polygons. As long as you keep the camera very close to the action.

Only a first person view is not ideally served by frustum culling. I will need to add an additional camera distance check.

I can't wait to see the performance data after LOD switching is working as intended. As explained last time, LOD switching is fully implemented, but I do not have low poly models yet for most objects, so the low LOD mesh points to the high LOD mesh, making LOD switching generate a constant level of detail.

But still, I just can help it and feel that the view in the screenshot bellow is just epic with the sheer amount of objects visible:


Frustum culling was a worthy addition to the engine and to the feature set of Tech Demo 1. Useful LOD switching and some better camera movement will also make it into the first version.

And here is my question to you, my loyal readers: what else would you like to see in Tech Demo 1/the engine as a whole. Keep in mind that Tech Demo is designed to test if the engine runs on the targeted spec and provide information on how to make it run better. It is all about the engine. So Tech Demo 1 will not have any of the following (and more): GUI, dwarves, actions, property inspectors, time compression, path finding, etc. It is going to be a pure engine test, so it will have rendering of one or more random levels, different densities of objects, level switching, camera modes, camera movement, etc. And of course all the technology behind it, like buffer lifetime management, frustum culling, lighting, LOD switching, etc.

So keeping the above constrains in mind, what else would you like to see in TD1?

PS: This is not for TD1, but I have an idea that would assure that a future TD would be tried by a lot of people: making the tech demo able to load DF and/or Minecraft maps. It won't be the primary or secondary function of it, but it would work at a satisfactory level. This is a task I think I will need help on :).

Wednesday, November 9, 2011

66 – Not about Nyan Cat

Today I will be documenting my most recent failure. A very pretty failure. 

Using what I learned yesterday about lighting effects I created a new lighting model. Together with stencil shadows this creates considerably prettier scenes:


The lighting model is far far faaaar from perfect, but already it is very useful at helping you figure out what the content of the scene is when zoomed out. Here is a close-up:


There is just a little bit of shinnyness to objects as it should be when they are outside in the sunlight. The column, chest and through are not textured and without the lighting effect they would look like a white outline filled with white, a completely uniform surface, but with the effects I have applied the shape of the object gives it shading so you can tell what it is:


When compared to how it looked in my last video, the though is much more pronounced. The shadows helps but even without it the object looks much better:


Here we have a chair:


Another composite shot:


And finally a column showing shading and a shadow cast on the wall:


It looks very good. 

But you may have noticed a problem: the object density is very low. Normally I create absurdly dense scenes to demonstrate what big cojones my engine has. Not only that, but check out the framerate: it is very low on most shots.

It is not because of lighting. I could not find any performance decrease caused by the lighting effects and it is probably almost free. The slow down is caused by the stencil shadows. This is the maximum object density I could afford with shadows and it still runs very poorly.

Using a custom world population algorithm that is not longer random and flexible, but generates a pseudo-random (more like pseudo-pseudo-random) distribution that has some random elements in it but is is specially designed to make shadow creation as fast as possibly I managed to raise the performance to slightly over 30 FPS with he same very low density.

So I think that stencil shadows are out of the question. First I'll do an experiment to see if I can make Irrlicht throw shadows for some dummy low poly objects instead of the real objects. Maybe this can help. If this fails I'll abandon stencil shadows and try shadow mapping. Irrlicht does not support shadow mapping but there are some third party packages and information on this subject so I'm hopping I'll have no problem implementing it. The question is if it will handle a huge amount of objects with good performance?

Using the above mention population algorithm I recorded a short video showing a very sparse world in action. 

Some of you will be happy to hear that I get it. I really do. I see now why you do not like the "fly" camera movement, how you would like it to behave and why it is confusing to you. I have no problems with it personally, but I'll add the option for a more "natural" camera movement that does not roll. 

But not now, because I have bigger fish to fry, so enjoy the video with the old camera:

Tuesday, November 8, 2011

Screens of the day 19 - C'mon baby light my fire


I was supposed to finish the option dialog but instead I tried to add some effects to the game to eliminate the blandness.


Currently objects are very bland and uniform as you could see in my last video (3D Engine Preview #1). All objects look the same. Always. If I remove the textures you get the same result as if I was using only ambient shading: an outline filled with one single color. Textures help you make out what the actual shape of the objects is, but it is not enough.

So I tried adding light at first. Using the default point light from Irrlicht I did not get too far:

  • lighting is surprisingly binary: you either have some levels of lighting or your entire object is black. I would like a model where the general visuals of my scene are preserved, but some areas are made slightly more dark or light.
  • the shape of the spot light caused a lot of troubles: object close to the light were extremely bright and shinny, while object far were too dark. There was a sweet spot where lighting looked perfect, but this area was relatively small.
  • even when object were not to bright, they had a small area that was extremely shinny.
So the only good thing I got out of this (except for experience) was working stencil shadows. Unfortunately I have special needs for my game with the large number of objects and all and shadows are a huge performance hog. I can only use shadows if I reduce object count a lot (about 20 times). Otherwise  it is either too slow or the scene refuses to render. An Irrlicht only allows you to add shadows to animated meshes, so I had to create animated dummy scene nodes with a single frame instead of normal nodes for shadows to be visible. To my surprise this did not turn out to be a performance penalty.

Then I tried to "fake" lighting by using some directional lights. These light are a strange beast and the effect is in no way believable, but it is a huge improvement over the blandness of flat shading. The effect is not realistic but more of an artistic choice. I still need to fine-tune the parameters a lot and see if I can fix shadows because directional light broke shadows.

Next I'll try spot lights. But I have a feeling I will get the results that I want using shaders. This raises an interesting issues because DirectX uses HLSL while OpenGL uses GLSL. If I don't create two sets of equivalent shaders I am loosing one of my back ends.

In the following video you can see the the effect, together with a larger assortment of items (some of the items are not textured yet; the better to see the shading):

Monday, November 7, 2011

65 – To the tech demo mobile

Finally, I managed to solve the problems that I was having with buffers. I ended up with an extremely complex pooling and life cycle management system for these buffers that still was not enough and did not cover all the cases, so occasionally the game would crash. I needed to create an even more complex system to solve it. I'm sorry, but there are some things that I just won't do! That's what she said!

So I ended up scratching everything and went with a brute force approach. I need the complex system to fool Irrlicht into doing something I wanted for which it had no support. And since this workaround/hack was both to complex and did not work too well, I ended up doing the only other logical thing: modify Irrlicht sources! This raises an interesting issue with licenses: Irrlicht has a permissive license that allows me to modify it and I am using dynamic linking, but this does not matter because thanks to the... "magic" of C++ the Irrlicht DLL was not modified. I do not mean that I recompiled and ended up with the same binary. I did not recompile. I modified the sources but I did not recompile Irrlicht and am running and out of the box Irrlicht. I can imagine the copyright dispute. Did I modify the sources? Yes? Is it derived software? Yes. Do I have binary evidence for this modification? No. As I said for years, current licenses are too vague and incomplete, plus a real hassle for non-lawyer folk to decipher.

I also fixed the memory leaks, except for 4. I still have 4 memory leaks, meaning that exactly 4 blocks get allocated and are never freed. I'll fix this, but this is not a real issue.

Here is a screenshot with a distant view of a field full of different objects with different colors:


Color is not a global property for all objects. The performance is still good but lower than before. The scene is saturated with a large number of high poly meshes that abide by the 64k rule so their number is bigger than before. Edit: and because I am stupid and had v-sync on for those screenshots. Anyway, performance is lower under this higher strain. No real game situation should have such a density of objects and at this view point LOD switching kicks in. Actually, this is implemented and used in the above screenshot, but because I only have low LOD models for some of the objects and did not want to differentiate between objects that have all LOD levels or not (a temporary differentiation because all objects will be required to have two LOD levels), I made it that the high LOD is used for low LOD rendering too. As you move around the map, LOD levels are changed dynamically, but because both sets are identical this does not reduce the strain on the GPU. A good stress test. Because of the high work load of high LOD meshes, camera movement encounters some small snags in debug mode. Normally, most of the LOD switching is reserved for changing low to high, not high to high, but now we get the same mesh recreated with the same detail taking up unnecessary CPU time. In optimal mode snags are gone and once all low models are in switching should be super fast.

Here is a closer view point on the scene:


As you can see objects are rendered with full detail and this becomes more and more apparent the closer you get. Even from this distance you can tell that there are metal plates on the table (not the middle circle, that is the pattern; the plates are more to the south). I won't zoom in even closer because then you'll notice that not all the objects are fully modeled :).

And because of my v-sync blooper, here is comparable screen where FPS goes a little over 60:


And one where object density is not that high:


And a final one with very low object density (but still at high LOD) so you can get an idea how the performance scales:


Now I can sit all day and wallow in ignorance by just testing on the computers I have access to, but in order to get some real results I need a large number of people with whatever setup they have to test if it runs for them. So I'll be doing official tech demos a few posts from now.

Now you may have been spoiled by what big companies call tech demos of public betas: often these are a little more than glorified publicity stunts created to generate hype and get you excited and interested in the game. But not with me. My tech demos will be real tech demos, where things will break and blow up (DISCALIMER: THINGS WILL ACTUALLY NOT BLOW UP. NO FIRE HAZARD HERE). Each tech demo will have a strict agenda of things I need to test to assure some general compatibility with the computers out there. You won't get hyped for the game by trying the tech demos because all gameplay (at least in the first tech demos) will be striped out. If some gameplay remains in the first version this is considered a bug and I expect a bug report. Which brings me to the main point: you will have no reason to download and try the first tech demos unless you are curious to see if the engine runs on your computer. If you are curious and willing to help, a very short report on you setup and what FPS you got, how smooth it was and if it seemed to work like intended would be greatly appreciated.

I'll post more info about the goals for the tech demo as it gets closer. To better facilitate the process I created this launcher for the game:


The empty rectangle is where the log will go. Very old school! Pressing "Launch Editor" will launch the editor you know and love. The editor will probably not make it into tech demo 1 which will be strictly a "will it run/blend" scenario, but by TD2-3 it will be included. The editor will be always free (this is not a legally binding guarantee :P) so it won't be a premium or what not feature. Since it is a custom tool I'm thinking it will do the most amount of good if it has maximal availability.

The options button will open the game options, where you will choose resolution and other setting. An in game options panel will also be available, but for starters the options that need the reinitialization of the entire game context (all the 3D stuff) will be only changeable from the launcher (or the INI file). This is not the most consumer friendly solution, but it is practical. The options dialog should be done by the next post.

Friday, November 4, 2011

Status update

It has been a while since my last post. After over 5 weeks of working round the clock on the 3D engine to compensate for the unplanned 3D conversion, I really needed a break.

Things should get back to regular scheduling now. But even if I took a break, I still got to do some work.

First, the integration between the regular game engine and the 3D stuff has been strengthened and the way you specify the 3D meshes and textures is no longer hard coded. I have now a fairly flexible system, but which unfortunately does not manage to hide the implementation details 100%.

I have a fair amount of new assets courtesy of BrewStew. I also experimented a lot with first person mode. At this stage I do not have the resources, but if I ever do, the "adventure" mode for this game will be Elders Scrolls meets Minecraft, but without so many cubes.

I also have a fair amount of unanswered mails in my inbox. Sorry about that. Breaks are total commitment affairs for me so I don't check my email that often :). I'll get to answering you question ASAP.

With all the new resources game start up is no longer that fast and (for my own convenience) I tried to add multithreaded texture loading. The idea was for the game too boot up instantly and load the textures in background. As soon as a texture would load it would pop in. This was only meant for me, so I can test easier. The consumer really does not want extra pop in. But this attempt failed completely and I did not manage to get it to work in any acceptable fashion.

I complained in the past about the randomness of the GPU + Irrlicht. With new knowledge and techniques, I now have a predictable model for medium to high end GPUs. Actually, these GPUs run the game very well and with predictable behavior and performance. The low end GPUs are the problem. I will be buying a cheapish (around $700) laptop soon and I hope it will run the game perfectly with performance to spare. This is one of the hardware categories I am targeting.

As a lot of you may now, a mesh is just a collection of vertices and indices. The indices tell you in what order the vertices form a face. In Irrlicht with hardware buffers indices are 16 bit. This means that you can have at most 65536 vertices in a buffer. But, if you repeat vertices, you can have more than 65536 indices. It just so happens that meshes tend to have a larger number of indices than vertices and this is very true for my meshes. So it makes sense to have more than 65536 indices in the same buffer. Better GPUs not only do not have any problems with this, you even get a mice performance boost. But at least some low end GPUs complain and do not allow you to have more that 65536 indices.

To make matters worse, low end GPUs also have a maximal total number of vertices and indices, and if you go over either one of these, the entire scene fails to render under DirectX.

So for low end hardware I needed a way to break buffers up into smaller chunks. The problem is that Irrlicht does not give you direct control over that hardware mapping for the buffer and it does not like very dynamic and drastic changes in the structure and content for buffers. Once you have created them, if you change structure Irrlicht will crash, deep inside the DLL where I really don't want to debug why. The problem seems to be that the correlation from software buffers to hardware buffers is done only at the render step, and then if it does not find something it expects to find, it will crash.

I managed to find a needlessly complicated solution for this. When you need to get rid of a buffer, you create a new empty dummy buffer. You clone all the hardware identification info and put this buffer in place of the original, while you take the original and add it to a pool. You also mark the new empty buffer as dirty. You do not remove the old scene node and add a new one with new real buffers. When Irrlicht renders, it will think that the new empty dummy buffer is the old one, but has been emptied. So if will clear up whatever reference it has to it that causes the system to crash if I remove it directly.

The system is not perfect yet. There are still some memory leaks that will eventually cause the system to run out of RAM. Level changing is a little bit slow (but a lot faster that without pooling). And the system gets inundated by empty scene nodes which must be present for at least a render cycle in order to clean up some buffers. These nodes don't seem to eat up resources, but I need to find a way to clean them up.

Like said, all this mess is for low end GPUs.

As for Irrlicht, I don't have the resources right now to abandon it, but our collaboration will not be long lived. It is a good workhorse, but it is not well suited for very advanced hardware related tasks. Or maybe it is poorly documented, and there is somewhere a function cleanUpThisStupidMessOfABuffer or illLetYouManageStuffOnYourOwnBecauseIAmTooStupid. My theory is that (like so often in open source) the more advanced features were not extensively tested and almost nobody except for the one who implemented them has used them for anything that is not trivial. For example I am sending double the amount of MiBs my GPU has to it with buffers and it works fine. Only it is frustratingly hard with Irrlicht. With every single line of code you are fighting it. I'm sure nobody has tried this before. If someone claims that "Irrlicht has top world leading buffer management capabilities" they deserve a slapping. No, Irrlicht has buffer management capabilities so simple and painless to use that a monkey could do it, as long as you are only doing simple stuff. That is somewhat of a compliment. Too bad it does not like destroying these buffers you so easily created.

I'll leave you with a few places this project has been referenced on the Internet. Most topics only mention it and then the subject is changed, but still decent exposure. I need to really settle on the final name for the game before this becomes a PR disaster. But finding this name is a lot harder than making Irrlicht delete a stupid buffer.



GamersWithJobs
DwarvesH (Dwarf Fortress for the rest of us?)

GoblinCamp
DwarvesH - New Dwarf Fortress inspired game in development

Reddit
Searching for a game similar to Dwarf Fortress

Is the market ready for a new dungeon keeper?

Bay12 Forums
Dwarf Fortress Clone?

Wednesday, October 19, 2011

64 – 3D Week 4 in review

Ohhhh yeahhhh! One month of working on the 3D engine. Nice progress if I do say so myself: 


No more barrellands! This time you get to see the table and stool created by BrewStew. To limit performance loss only the current level is populated with items, but in the future I’ll add an option to render all levels with full detail. There is no reason not to allow people with beefy computers to use their full potential. Also useful for anyone for taking high detail screenshots. 

Also stick around until the second part of the video, where you get to see a long lost feature that has made its glorious return: selection with the mouse. The problem is that selection with the mouse is far too slow. You won’t see that in the video, but it is. Optimizing this will be a low priority task. First I must make sure that the geometry generation time for a volume unit is as fast as it gets. Then I must make sure that a minimal number of volume units are updated. 

The high priority task is to figure out which algorithm to use for world construction, geometry building and buffer management. I have at least 6 strong contenders, each with their own advantages and disadvantages, plus probably a bunch of buffer interaction and fragmentation properties that I am yet not aware of. I also need to stick a huge number of different entities in the same scene. Up to this moment I mostly stuck a huge number of the same entity in the scene (trees, barrels). BTW, the trees are not visible in the video because they are not mesh based, they are procedural. Because I transitioned to the new hardware buffer based rendering I need to update the procedural generation for trees and I did not get to it yet. 

The algorithm that I am using right now is different from the one that created low poly 300.000 barrels. This one focuses on fewer higher quality objects. It is also lossy. Each section has a potential for holding items based on maximal buffer sizes, and when you go over this potential, items start to get skipped. Using the density from the video skipping should not occur, but maybe one or two items were dropped from the busier section. The dropping is class based, meaning that if you have a ton of tables and only a few barrels, the excess tables will not cause the barrels to disappear. Only excess items from within a category are lost: if you can have at most 300 tables, 400 barrels or 500 stools alone in a section (example number), you can have all of them visible at once in the same section. But if you have 302 tables, 405 barrels and 600 stools, 2 tables, 5 barrels and 100 stools will be dropped. 

A natural evolution of this algorithm will be to determine how many high poly entities to convert to low poly in order to not drop a single item. So instead of trying to render 405 high poly barrels and ending up only rendering 400 high poly barrels and dropping 5, the engine should render 395 high poly barrels and 9 low poly barrels. Again, example numbers. This can be easily done and I’ll implement it soon, but first I must add an analyze and caching step to world creation, so you know the totals for each section. 

One problem that is especially visible with highly zoomed out top down view is that it is sometimes very hard to tell what is going on. I need to figure out a method to make thing pop or something so you can tell what you are looking at. This is not a problem for first person camera, where the current semi realistic proportions and first person viewpoint make it a lot easier to tell what is going on. One trick I’ll try is to scale up small objects based on the vertical distance from the surface plane of the current level and the camera position. When you are zoomed in close, proportions are similar to the current ones. As you zoom out, small objects become larger and larger. I can also experiment with adding a few black borders to the textures to fake a cell shading style border, but I am not prepared for real cell shading right now.

How does Starcraft 2 do it?

Sunday, October 16, 2011

63 – Barrellands

Using the technique I created and used for adding 15000 high poly barrels or 90000 low poly barrels to a scene, I started experimenting to see if I could make this my main method of world creation. I also had an eureka moment and with one single line of code changed I got an extra 50 FPS on the 90000 barrel scene.

So first step was to improve stability of the system and see how many low poly barrels I can fit in one scene:


That's 307800 barrels! I'm sure that I can fit more than 15000 high poly barrels now, but not by a lot. 307800 is the rough equivalent of 15000 when we change detail level. Don't get me wrong, I can actually fit a lot more high poly barrels, but I end up exhausting the resources of the GPU and this causes it to render the shape of the meshes, but with a fully white texture. The places where the GPU drops textures is actually quite indicative of the make up of the scene: I can practically see my algorithm mirrored in that pattern.

So considering this a successful test, I started using this method for terrain. I reimplemented the terrain generator using the new systems and it was quite a hard task. Again, the GPU did not behave in a rational way and my successes were due more to trial and error than applying my knowledge of the GPU and it behaving accordingly. But I managed to create a world renderer capable of rendering and empty landscape with full elevation with over 1000 FPS. Without borders.

Adding borders drops the FPS by 200-300. I tried adding more materials to a scene node to avoid this drop, but in the end it did not work so now I have multiple scene nodes, one for each border type.

Bringing all the techniques together, I created this video where we have adjustable elevation slicing, digging of walls and a nice topping of barrels on top of the elevation:


As a stress test I render all of the barrels ignoring slicing. In the final engine I'll stick to rendering items only on active levels. The performance is still good enough for FRAPS to almost be able to keep up 60 FPS. There are some drops bellow this value, but if you look carefully at the video, you will see that these drops coincide with level change, which is a very CPU intensive operation.

So with digging implemented (change of landscape) and adding meshes (populating with items) the 3D engine is theoretically finished. I need to add meshes to everything and polish, polish, polish! I hope that in the next post we'll see dwarves actually doing the digging.

But not everything is rosy. I still can't turn on the GUI under DirectX because it will lock my FPS to 40. I started investigating this and I can say that neither the calculations the GUI does (which do drop the framerate by a negligible 10 at most) nor image blitting are the cause. The primary suspect is rectangle drawing, but I can't say for sure yet. And LOD switching needs to be reimplemented for the new system.

Friday, October 14, 2011

62 – WTF

Look at your tree! Now back at mine! Clearly, your tree looks better. In my defense, I did not model that ugly tree; I generated it procedurally from a sphere and a cylinder. 

Fortunately, Stewart Bridge, aka BrewStew offered to model a few objects for me. You may have seen the barrel in the comments section and now I also have a lovely table at my disposal. Thank you, BrewStew! He can’t fully exercise his talent because I have some strict polygon/face count requirements, but even with low poly models the assets looks great! I would advise you check out his future work when or if he makes it available. 

So what do you do with such assets? You place them in the world. But I do have the problem that I can render at most 300.000 faces. Over that performance becomes way too choppy for me to be comfortable with it. This is not Crysis, so I can’t really release something for futuristic PCs. Also, a friend pointed me towards some statistics, how even low end GPUs can render millions of triangles per second and since performance seems fairly constant on most platforms, one would be forgiven to think that I am not rendering at the GPU performance, but at the speed at which I can transfer data from the CPU to the GPU, something bus related. 

Haha, it’s hardware buffer time! Tanananananananananananananana… Or so I thought. This seemingly simple and short task turned out to be quite long and full of WTF moments and complete randomness. The first sign that things will be random happened actually while I was implementing borders. Everything worked fine until I loaded the last texture. The entire scene disappeared. Remove the last texture: it was back. I significantly reduced GPU RAM consumption by removing all unwanted textures, including the one from the 2D engine that still got loaded for some reason. Same thing. Did not matter what texture I loaded or where. One more => no scene. So I batch converted all PNGs to BMP. The scene was back and I no longer experienced any issues. I reverted to using the original PNGs and no issue. I readded the 2D engine textures: no issues. Very weird. But this was only the tip of the iceberg. 

Using the new barrels and hardware buffers, I managed to stick quite a few in a scene: 


That’s 15000 barrels with high res textures rendered with filtering at a smooth FPS of 60 or more (FRAPS caps it to 30). Normally one would not have so many objects in the scene. Gameplay dictates the maximum number of objects and you can do tricks, like not rendering any small items on levels that are not the current one. But let’s say you want to render so many objects. Let’s say you want more! 

Using the low poly version of the barrel provided by BrewStew, I managed to fill each cell: 90000 barrels! This was the last time everything went smooth. I saw the framerate, I was happy, so I was curious how fast it would go with only a few low poly barrels. The answer was: 40 FPS. Even with a single low res barrel 40 FPS. Replacing it with the high poly barrel: over 800. WTF? WTF? WTF? Do buffers have a minimum size requirement? How about a single buffer with hundreds of low poly barrels? 40 FPS. A few thousand? 40 FPS. Even more? Back to normal performance. 

Maybe it is still related poly count. I export the barrel with some “normal normals”, not “high quality normals” from blender, which for same strange reason causes Irrlicht to report that the barrel has more faces. Problem solved. It works with a lot of a few, but with a lot the performance is lower because now I have a more faces. 

Meanwhile, OpenGL has not a single problem and works admirably with all scene setups. This is the first time in my life when OpenGL not only works (which is rare), but works a lot better than DirectX. But I think it is an Irrlicht issue. Irrlicht somehow does something with DirectX messing it up. 

So to recapitulate, now I can render 15000 high res barrels or 90000 low res barrels, but the low res only render at about 80% speed because I am forced to render more faces in order to prevent DirectX rendering at 40 FPS. Camera movement is not that smooth because Irrlicht containers are really not that well suited for making common operation “faster than they can be”. You know, using a container in a special way to amortize the cost of adding to it based on the nature and frequency of these operations. Anyway, I look over the Irrlicht source code, see what it does with the containers and manage to optimize it in such a way that camera movement is smooth again. 

The problem is that I create all objects in the beginning. I would be nice to be able to repopulate a single map section on demand so I can do dynamic LOD switching. I implement this with such a trivial change that there is no way you can break anything. I get under 10 FPS. WWWWWTTTTTTFFFFF. I only populate one section. Over 800 FPS. Two sections? 40. Three sections? 20! WTTTTFFFFFFF? 

The only difference is the number of scene objects Irrlicht sees. Adding more than one does something with the hardware buffers. Meanwhile, OpenGL is chilling in the corner, barely breaking a sweat and rendering anything I give it at high performance. 

So I rethink the entire hardware buffer system, adding pooling to it. Irrlicht is very poor at giving you any control, so I can’t really even free a hardware buffer, so they are not freed, just overwritten and reused. I start reading Ogre3D tutorials. Ogre3D seems to be a lot more about control. Using the pooling system and still Irrlicht I manage to create a dynamic LOD switching system: 


There are fewer objects, but the focus is not on the number of objects, but on the LOD switching. You can’t tell the difference, but distant barrels are rendered with low poly model and textures. In case you were wondering why the pattern: this is how many high res barrels I can fit inside a single buffer and for this sample I wanted to have one buffer per section. 

So to recapitulate, a lot of high res objects, a lot more low res objects and LOD switching works, but only under OpenGL. Turning off hardware buffers for OpenGL we get 38 FPS, which is very close to the 40. So I guess that DirectX ignores my request for the buffers to be directly accessible to the GPU or whatever. 

I continue testing and debugging for hours, trying everything, until I decide to test and turn off the GUI. ISSUES ARE RESOLVED AND DIRECX IS COMPARABLE WITH OPENGL. WTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF? 

Did I do something with the GUI or does it use some primitive that causes DirectX under Irrlicht to die? I don’t know. I’ll have to investigate. So framerate is smooth now with both renderers. Using the low poly barrels, I render 90000 of them 100 FPS (30 in FRAPS). If you look at the video you will see that it is reporting over 50 million triangles. The labels are wrong and computations are not quite accurate. The map has dimensions of 300 cells, we have 96 indices per barrel. That is 2880000 triangles. But the display is not about exactly how many triangles are on screen, but how buffers are allocated and how much potential storage they have. Because the magical number of buffer redundancy that I introduced is 18, we get 2880000 * 18 = 51840000 that is almost as big as the number triangles reported. The rest of the difference is because of buffer rounding, because I am not going to allocate a buffer of 5. Buffer sizes and memory boundary requirements make them larger than needed: 


Clearly there is ton of optimizing to do. The 18 redundancy buffer pooler needs to become more dynamic. 

GUI is turned off because of the issue, but things finally work. 50 million faces are a tad bit more than 300000. The question is what to do with this power. 

Obviously, non dedicated GPUs that do not support hardware buffering won’t be able to take advantage of this. So I’m thinking creating three quality settings. On medium, I’ll have at most 300000 faces. On low at most 150000 faces, even if I have to billboard everything. And a high quality renderer, with 1 to 3 million polygons in the busiest scene possible. 

I need to fix somehow the GUI and I am sure this is only the first set of surprises that 3D has in stored for me. We are no longer in the realm of rationality. 

I’ll try to put together some realistic scenes with level geometry, trees, barrels and tables.