Tuesday, April 3, 2012

March of madness in review

You may have seen my video with that awesome disco scene and wondered what it was all about. Is it subtle (but surprisingly deep if looked from a certain perspective) social commentary on the state of disco as a musical genre and the entire movement after its collapse in the 80s? I also put out another video, showing a close-up of the stone stool: 

The model is the same as it has been since its creation, yet somehow it looks a lot better. You guessed it: it is lighting! But let us rewind. 

Be warned! Massive massive rant incoming. I love to write and not having done so in quite a while makes the ranty go all bendy! OK, I never claimed to be good at writing! And I am always way too lazy to spell-check!

The original plan was for March to be a month of wild experimentation, where I try out a bunch of seemingly divergent techniques to get a feel for them and based on the results of each individual experiment and the condensed conclusion on their ability to be merged into a single project, decide how to continue and what exactly to do with this game and following which style. But I encountered far more obstacles than expected, which lead me to an increasingly disproportional amount of research versus coding and experimenting. After a few days of trying unsuccessfully to implement caves I was on the net trying to find solutions and before I knew it I was neck neck in research papers. Research papers are a strange beast. They are often very obtuse,  math heavy and otherwise barely readable by mere mortals and in the end they only give you hints on how to overcome the problems that are presented. I guess you need to protect your ideas. They also come in exactly two templates, together with that blasted font that a lot of them use. What is this? A lot of them seem to be created with LaTeX . I'll never understand the fascination of academia with LaTeX. Sure, it has a deep tradition and can solve a very specific set of typesetting issues with unparalleled ease. I am a strong believer in the right tool for the right job, so when one of those typesetting issues arises I will point you in the direction of LaTeX. But if you are not in that situation and are basically writing a Word document but would like to hide that fact with fancy words and redacting it with LaTeX, I have news for you: 2012, Unicode 6.1 and printing industry wide standards called! They said to tell you to stop being a hipster. Heck, you were a hipster before hipsters existed!

Wow, way to come back from an absence of writing: making up for lack of content with ranting and alienating all LaTeX users. As said before, I like LaTeX when it is used properly. But we have a better standard that keeps getting better and better, yet in 2012 the general implementations for it are still lacking. I don't want to type \c{c} to get a ç. I'll just write a ç, it will appear as ç on the screen and it will be encoded as ç in the saved document. And yes, that character is completely different from c̦, and if they show up the same to you, then you can see why I said that support is still lacking.

But back on track. I encountered several blocking issues. One of them was caves. I worked and researched the least amount regarding caves, because this research quickly degenerated into a general terrain algorithm research. I went over CLOD (Continuous LOD), which is quick and simple, offering medium sized CPU terrain with an average amount of pop-in. I studied ROAM and ROAM2. ROAM2 seems hell-bent on solving all the issues with terrain rendering and thus raises quite a few issues that it tries to cover in its implementation, making it a tough pill to swallow at first. Geo-clipmapping seems easier to understand and has more implementation attempt available for study. Also, a distinct move away from CPU can be seen in the realm of terrain rendering. Even I was able to get a tiny patch generated using GPU, just to see if it is possible.

Now one problem that you will find when researching terrain and come to hate is seams! Seams in the terrain. Places where it is not watertight. Changing terrain LOD levels always results in seams. This is not a problem in research papers. They will acknowledge the problem, almost all will give a high level solution and quite a few will go into detail. The problem arises with third parties of similar status to myself (dudes interested in terrain that implement something based on what they have read). Almost all implementations lack any form of stitching. Another interesting thing I have discovered is the infiltration of such geeky topics in social media. With today's level of technological incursion in popular culture and a prevalence of gaming, you see young people on social media who have never coded in their life having genuine interest in game engines, rendering and often terrain. Just check out YouTube and the amount of terrain rendering going on there, and the amount of comments from random people written in typical YouTube comment style ("LOL, TEH TERNAIN ROKS WUH IS THAT, roam2?" or "my cat is called poo-poo-cacku! how hard is to do that land in unity?"). Anyway, what I'm trying to say is: most of the random  implementations lack support for any kind of stitching, it is almost always acknowledged and then hand-waived ("lol, stitching. i'll do that latter"). I find that funny for some reason and both motivating to never-ever release any terrain without stitching, even if I know for sure that the very next update which I can do in 30 minutes with one hand tied behind my back will add great stitching.

One thing that caught my eye is CDLOD by Filip Strugar. A very unique and quite elegant idea and implementation. I think that at the time of writing there are no commercial games or other serious big budget software using CDLOD for their terrain, but I tried it extensively and I think it is quite innovative and works great. The research paper is very readable but kind of short. OH MY GOD! What have these papers done to me. I am complaining that a research paper is too short! It also comes with full source code that for once works as claimed and without issues, in a normal brute force variant and a more advanced streaming variant. And a few large data sets are provided. The terrain needs to be pre-processed and written to disk in a custom format, but after that is done, the streaming client is capable of loading even gibibytes of terrain data near instantly (relatively speaking of course, based on the actual size of the data) and then life streaming it. The performance is great and you can adjust the detail and processing power needed to render the often hundreds of kilometer wide terrain both before pre-processing and on the fly while running the application. You can reach several compromises this way. I created a setup where terrain was rendered at very high detail, but far from maximum, and had over 60 FPS. Then I created a setup where terrain renders at over 180 FPS. And I created a setup where the same terrain is always rendered with between 300k-500k triangles. And I did not mention one of the biggest advantages (IMHO) of CDLOF: morphing! Traditional terrain systems suffer from pop-in. You take a step and somewhere in you view frustrum the geometry of the terrain changes. Often this is very jarring and if you would have a sound effects specialist make a sound for that effect just based on the visual ques, it will probably be some sort of "POP" or a metallic slightly reverberating "SCHLONG". All terrain implementations that feature LOD have pop-in. In the good ones you won't notice it. CDLOD is no exception, as in the geometry of the terrain changes. But it is not a pop effect. It is a morph effect. CDLOD does not use fixed LOD levels, instead it always morphs smoothly between two LOD levels. So "pop-in" is always present, at every step, not just at fixed thresholds, but it is not a pop, but a smooth morph. If you miss configured you detail parameters, sampling rate and view distance you will get very ugly LOD changing artifacts. Only they are not a pop, but a morph. If on the other hand you created a good compromise of quality/rendering speed there are almost zero visible morphs. And if you go all out and create a very high setting, I think you get a incredibly smooth engine, capable of rendering a closeup of the land from the view-point of a normal person, and then fly-up like Superman with hundreds of kilometers/hour, gaining a great overview of the land, while doing all this seamlessly.

If you can't tell, I am quite impressed and excited about CDLOD. I encourage you to download the demo and try it out for yourselves. It is fairly large download, but if you are interested in terrain rendering, I think it is well worth it and the algorithm will warm its way into peoples hearts. Just check out the elegance of this morphing sequence from a higher LOD level to a smaller LOD level, especially when compared to most approaches that simple exchange the chunk with one that uses 1/4 so many faces:

That image is from Filip Strugar's paper and all intellectual rights regarding CDLOD belong to him.

Did I loose you already? I am only getting started! The first think I researched was caves, but that became a terrain rendering research. Very interesting, but not helping me directly right now. A second major issue exists that is an obstacle in my plans to create this engine. Let me show a screenshot, again from and external source, that should give you a hint on what this problem is:

That, ladies an gentleman, is Dungeon Keeper 2. Released in 1999. I have never played a Dungeon Keeper game before. I was somewhat aware of it when it was released and I can recall reading back then an article about it in a printed magazine that specialized in games. Back then it was hard to get a hold of games and I greatly enjoyed reading those magazines, even if I would rarely if ever get my hands on specific games. Probably the only reason I remember DK is because of the very iconic red devil dude, who I think is the Dungeon Keeper. He is no Mario, but still a recognizable icon today. Well, maybe. If you are old! And so is that woman in the black leather that often appears near him. I have no idea who these people are and can't really find out, because being from 1999, DK2 has problems running on Windows 7 64 bit, even the GoG version. I can play about half a level before a crash, 1 an 1/2 if I'm lucky, in software mode and I can't save. But from what I managed to play, I can certainly see the similarities. So if I ever get accused of cloning DK2, while technically not true, I can see why that person would think so. Squad of imps digging out caves, squad of dwarves digging out caves. Sure, the similarities are just superficial and this becomes apparent from even playing just 2 levels from DK2, but you know the drill by now with appearances and the way they influence our conclusion reaching mechanism and impulsive behavior. But I was aware even back then of the great Bullfrog studio.

I have higher polygon count, better textures and even fancy bump mapping techniques, yet there is something about that screenshot that screams "I am prettier than you!". I am talking about the lighting. It is even worse if you run the game, because in software mode, those light even manage to flicker. Now 1999 was a long time ago and as said back then I had limited resources to get my hands on games so I did not get to see every game like I do today. But still, if my memory serves me well, this looked pretty good for 1999.  Not quite cutting edge technologically speaking, but still very solid lighting. Imps even cast shadows from multiple light sources (limited to 4; in software mode at least; determined empirically) and the charm of character design, color palette, overall personality of the world and the narrator/dungeon master, makes this a pleasing experience, even today.

So how did they do it? Maybe it is all trickery. Clearly, that is not vertex lighting. Unless they actually used a super high vertex count and made it intentionally look like the vertex count was low (you can count the polygons in that screenshot with ease), that looks like (not the best by today's standards) pixel lighting. The top of the map, the tiles that are not dug out are clearly vertex colored and involve no actual lighting. Or at least I can create with ease an almost identical effect by vertex colors. But the walls? The trickery argument would imply texturing tricks combined with lighting and some vertex colors. The other option is that they had an actual no trickery lighting scheme, as in if you see an area lit, then there is a light placed there and there are lighting calculations done to render the scene. The lighting model would not work that well for first person view point (judging from how it looks from top down; I read that you can poses an imp and control him first person, but I have no idea how to do this), but It is blast from top down view.

And how would such a game look in our modern ages? Maybe something like this:

Oh god, the lighting on that is crazy. Not especially pleasing aesthetically, but still good. That (if I did not upload the wrong picture) is a screenshot from Dungeons, developed by Realmforge Studios and published in 2011. Another game that I did not try or have any real contact with before this research phase, but this can be easily remedied and I hope to get my hands on it soon and do some speculative reverse engineering based on sight purely: looking at lighting at trying to think of ways to achieve a similar lighting engine. The game did not have too high reviews, but some reviewers liked it, so I may even be worth a shot just on its own merits. I think there were some crushed hopes of it being a new DK or some other form of complicated history. Anyway, I would love to have such a lighting engine.

But why is lighting so hard? First of all, it is just plain hard even if you have an engine with infinite and unimaginable lighting capabilities, because lighting comes down to aesthetics. You need to create an aesthetically pleasing lighting scheme first. This is very hard and completely dependent on your artistic aptitude. Even AAA titles sometimes look muddy. Why is that? Often a combination of factors, but more often than not it is either poor resolution and wrongly chosen textures (the composite albedo of you scene which without lighting determined the look of you game can have an equally disastrous effect if poorly chosen as very bad lighting) or just plain bad lighting. Or both. In the indie scene, on those projects that never make it out of some alpha/beta stage and never become a real and finished product (fingers-crossed that this is not fate's way of doing some fore-shadowing) you can often see competent lighting, but which is still somehow wrong.

And then there is the technical implementation of lighting, which, depending on who you ask, may or may not be a lot harder. There are a series of obstacles. Let's take standard hardware lighting. This is limited to 8 lights per object and is only per-vertex. Having a limited amount of lights per object defines the way you build you world. You need to figure out/compromise on the number of maximal moving lights (people carrying torches, projectile light, etc.), take that value into consideration and then divide the world into small enough pieces that while rendering the fixed lighting for those pieces you lave enough buffer space for the dynamic light. Let's say you divide the world into axis aligned square areas and pick a reasonably large chunk. You can't pick a very small chunk, because of the maximal number of batches constrain for the GPU. The chunk size is also dependent on how mobile and flexible you camera needs to be. So you pick a fairly large chunk size and most of the time you get only 2-4 light on a chunk, but sometimes you get 7, let's say in an area where a corridor merges with a room near a corner. Then if you have a single worker there you are fine. If you have more than one, you need to turn off either a worker light or one of the other lights. And you can't just turn off a light at random, because as workers walk the light totals will shift wildly and you will get very ugly flickering. You need a very complex light management component. And this without taking into account that dynamic entity movement can be truly freeform. In such an environment worker movement patterns may not mix well with that cell structure created by the aa boxes.

And after you coded this beast, you get only vertex lighting.

If you want pixel lighting, you need to use shaders. Here is where things become even more complicated. You need to take into account the capabilities of the GPU. You'll have a hard time coding such a scheme with pixel shader version prior to 3. With pixel shader 3 I created a lighting scheme that support any number of lights. Theoretically. But there are issues. First of all, the shader has a maximum number of lights. If you create a shader that can render up to M lights, but you parametrize it to render N, where N <= M, your performance will depend on both N an M. A shader that only currently renders 1 light but can render up to 4 will be faster than one that currently render 1 light but can render up to 8. Or so have all my practical experiments shown. So you need a bunch of shaders, probably one for 1, 2, 4, 6, 8 lights, and then probably a few more to account for a few extreme circumstances where you get a lot more lights. The lighting manager becomes simpler because it is less under strain to chunk the world in such a way that you have at most 8 lights active at once. You can target it for common scenarios and the suffer a performance penalty for uncommon ones. On the other hand, the lighting manager becomes more complicated because you need a bunch of shaders. A whole lot actually. Not only do you need a set with different maximal light counts, but a shader that can render a point light and a directional light is not identical to one that can render two point light, a point and a spot or all other combinations. If you need more light types than spot lights, the number of shaders increases dramatically.

And this without taking into account effects. Two very important effects are normal and parallax bump mapping. These depend on lighting. So if you want bump mapping, you take the total number of lighting shaders, and create a variant for each with bump mapping. And repeat for parallax. I have created a shader system that handles bump mapping for a theoretically unlimited number of lights. I still need to write one for parallax. How does on do that? First you study really in depth the lighting shader. Make sure you fully understand it. Then you study the normal mapping shader. Do this until it becomes trivial and you smirk arrogantly and say "that's it?". Normal mapping is a really easy trick and involves the same normal based calculations you do tho normally light an object. Make sure you can switch between the two with ease. Then study the multiple lighting shader, marking the differences in calculations from the single lighting one. By now, you should instinctually be able to update the multiple lights calculations to take into account normal mapping. Then study the differences between normal and parallax mapping, and repeat the process, starting from the multiple light normal mapped shader.

Easy! The hard part comes next if you want o support pixel shader 2. Small indie titles do well IMO to support a wide range of hardware. Pixel shader 2 is quite limited when compared to pixel shader 3. I managed to create a PS2 shader that support up to two lights in single pass mode and up to 5 lights in multi pass mode, as in one light per pass. I am working right now on a shader that supports up to 2 lights in one pass, up to 4 lights in two passes and up to 5 lights in 3 passes. For now I can't seem to get over 2 light per pass because I reach the limit of shader instructions and can't get over 5 lights total because I run out of constant registers to pass the lighting parameters. This while using "object local" multipass, as in a object is rendered using a single multi-pass shader. I am researching to see if it possible to do true multipass, as in you render either each object in turn or the entire scene for each pass, and each pass has its own set of parameters. Maybe this way I can get more than 5 lights. And of course, multipass gets almost linearly slower as the number of passes increases. Whatever batch count you obtained by chunking in you light manager, multiply that by the number of passes.

Then there is normal mapping, which for now I only managed to do with two lights, one pass with pixel shader 2. Parallax is in the works, and so is normal mapping multi pass, but I'm not sure if it will turn out okay.

And all this to get the first half of the universal lighting model. A set of shaders for each maximal light count (or a single one if you want to be slow, but general), doubled for normal normal, tripled for parallax mapping and multiplied by who know what if you want pixel shader 2 fall backs. Oh, and you can mix vertex with pixel lighting as a fall back method. If you pull this of and you also create a good light manager, you can get as good results as Dungeons. I hope! Is this how the big boys code lights? Seem prohibitively hard for indies.

And one final very important note: a lot of per-pixel light, even if single pass, are slow. No matter how good your chunking is, you will hit a brick wall sooner or latter depending on the fillrate of you GPU. It is easy to find this limit even with small scenes and a modest number of lights. A forward rendering scheme has the worst case scenario of number of objects x number of lights.

And then there is the second half of the universal lighting model: shadows. There is also post processing effect, but let's just stop at shadows. Shadows are an especially hard beast to tame and up to date, now in 2012, we do not have a perfect shadowing model. Shadow mapping is kind and the same principle in some form or another offers pretty much the only scalable solution. The only problem is that it produces very aliased edges. You can smooth them out, but this is not perfect. The quality also degrades a lot with the size of the scene. Anyway, shadow mapping is widely used and instantly recognizable. I think I can tell both the size of the shadow map and the scope of scene from a Skyrim screenshot. Lately, anything that I play gets inspected for all the lighting and shadowing visual cues I can gather. Cascading shadow maps and parallel split seem to greatly improve shadow quality. And  even smoothing is starting to give great results using percentage closer filtering. I am not a native English speaker (DUH!!!!!!), but does "percentage closer filtering" sound awkward to anybody else? Like they did not finish their sentence. So while shadow mapping is full of faults, there are solution out there to correct them and they keep getting better and better. But I am having a hard time wrapping my head around normal plain-and-simple shadow mapping. With normal lights. You can just forget about omnidirectional lights and those dammed cube maps and volumetric rendering.

Lightmaps are another tools that can be used in conjunction with the other techniques. Like shadow mapping, lightmapping is again hard to wrap you mind around and even with a fully GPU based implementation, it is not fast enough for real time updates to a dynamic scene. If you have a game with fixed levels created by a level designer baked lightmaps are probably your best friend. If you engine support real time blending of lightmaps with dynamic shadow, the better. Just bake you light maps and you have great scene ambiance. But my "levels" are all procedurally generated. And there are volumetric light maps, that affect dynamic geometry that passes through the lightmapped volume and renders correctly, similar to dynamic shadows. Yup, they exist. I can certainly confirm that. Those are a real thing. Yup.

And there are also volumetric lights. I just love what you can achieve with those: turn a bland scene into an almost eerily charming fairy tale scene full of warmth and hot pockets. I would love to have a partially volumetric lights based implementation. Would suit the look I am going for very well. Another thing I am studying.

I think the best results would be given by soft shadow casting lights that can be blended with a procedurally incremental pseudo-dynamic lightmap and a pixel shader based volume effect added to standardish spot lights.

Shadow mapping also has implications on you batch count. You already have a high batch count and shadow mapping renders the scene from the perspective of the light first. No my engine has been created from start to support a very high number of objects. More that other engines. I have my own method with which I am fairly satisfied that offers the possibility to render a huge amount of objects. You saw it, I kept showing it off. It does so by using a lot of GPU, a lot of CPU and a lot of normal RAM and GPU RAM. So it is fairly intense and the only things that makes it as practical as it is that my algorithm is designed to have a very good batching behavior. So if you combine my algorithm with the need of the light manager to have small objects and a high batch count, the whole thing goes to hell. I also expanded upon my algorithm and created a new super fast but super blurry version. Too blurry to use for normal objects, but for distant objects it is about 20 times faster in the worst case scenario. 3 to 6 times faster for normal scenarios. This implementation is slightly lighter on the CPU, but is is also undone by forced high batch counts.

So while I plan to keep this implementation around, being the crown jewel of my engine and all, I have also researched other ways to get a huge number of items rendered: instancing. I know now about 5 variants of instancing that give mixed results, all better than no instancing, and all except one far worse than my implementation. That one good instancing implementation is not as strong as my own method that routinely handle 90k objects. It is more suited for 65K items. But is does so with very low CPU use and considerably lower RAM requirements, so it might be a solution for the batching issues with lighting.

So as you can see, creating such and engine is extremely hard and will probably take me over a year. But take a look at this picture:

And the Disco video. A large number of lights and bump mapping. Actually, the disco video has a whole bunch of wrong normals used for mapping. The picture above has these issues corrected thanks to some further splendid effort from BrewStew.

Another closeup of how I can render stuff now, in case that the video is too blurry to see the details: 

So how was this done? With deferred rendering. Not deferred lighting. Deferred rendering offers some huge advantages. It is fairly geometry agnostic, as in the performance of lighting does not depend a lot on the complexity of the scene, so you can really go over board with huge poly count if you have great batching. Countering this is shadowing. Current shadowing schemes care a lot about geometric complexity, so even if you can light your super complex scene with ease, you won't be able to shadow it. Still, this means that you need to balance your load around shadows, not around lighting. Another advantage is that is is very fast, even with hundreds of lights. With very small lights you can even get thousands of light at once on the screens. My disco video uses 550 lights. Another advantage is that it is fairly simple. When you first read about it, the idea sounds very outlandish and has absolutely nothing to do with the fixed lighting pipeline. Zero in common. But it is still pretty easy to do.

So these are the advantage: huge number of lights, very fast, geometry agnostic. What are the disadvantages? Quite the truckload! No out of the box hardware antialiasing. Do you like your games with MSAA. Can you tell the difference and swear by the quality improvement and performance of CSAA? Well, they don't work with differed rendering. In the screenshots and videos you can see antialiasing and there are quite a few solutions for it. They don't give superb results, but are good enough. The only problem is that they are quite the performance hogs, especially since deferred rendering is really scraped for bandwidth. DirectX 10.1 gives you access in shader to each sample when using multisampling and this makes hardware MSAA viable again, but it does limit your choices of GPU and operating system. I don't think that I have a setup capable of this and anyway and I don't understand yet the code for it. Using deferred lighting you can again get MSAA working, but this implies an extra full geometry pass. Which does not play well with another disadvantage: performance and fill rate. Deferred rendering eats huge amount of resources, on paper at least. I was genuinely surprised of how fast it is, being faster with 550 light than forward rendering with 20. Still, deferred rendering is very fillrate dependant and completely changes the GPU compatibility graph. With forward rendering you have a curve, with low end working so poorly that you probably wont get interactive framerates and steadily increasing from there. With deferred rendering you get a plateau where no older GPU are capable of even creating the context for deferred rendering and the curve starts up from the plateau of zero compatibility and the first jump is a big one. My Disco demo runs with the integrated Intel GPU on my laptop with Optimus. 

Another disadvantage is that I can't handle transparency. As in alpha blending. Alpha masking works fine, as seen in the video. There are several solutions, but the prominent one, depth peeling is not really feasible yet in a general way. You need to multiply the amount of memory and fill rate requirements with the number of depth you want to "peel", making an already expensive pipeline a lot more expensive.

Another disadvantage is that is forces your lighting model on the entire scene. It is harder to create a setup where you do tricks by lighting different parts of the scene in different ways. And of course, having multiple materials is again very hard. I am currently tackling this problem of having universal specularity, making a polished metal object as shinny as simple cardboard box. Even if I manage to get specular mapping working, making truly different materials available is even harder. Like rendering an object with Phong, another with Blinn Phong and another with Lambert shading. Raise your hand! Who here does not love micro-facets?

So I am not really sure if the advantages outweigh the disadvantages. I'll keep experimenting with both forward and deferred rendering for a while.

But how did I do this? I must whole-heartily thank Catalin Zima . His blog, explanation and implementation of deferred shading for XNA 2.0 were pivotal in getting me to understand the technique. Then I must thank Roy Triesscheijn, who ported the implementation over to XNA 4.0. And last, by no mans least, Emil Persson (a.k.a. Humus) who had the second implementation of deferred rendering I managed to understand and who works under a more familiar C++ environment. Anyway, Humus's site is absolutely incredible. There are samples there, some even really old, that do very impressive things. Like that gold glow effect from 2003 written with GLSL assembly shaders. Or that interior scene with Phong lighting model and moving omnidirectional shadow-casting lights. Wow! Also let's not forget all the reading I've done beforehand, from which I will only mention a the chapter from GPU Gems dedicated to deferred rendering and a very interesting paper written by the developers of S.T.A.L.K.E.R. about their use of deferred rendering in the game with the same name. There was also a forum that escapes my mind right now.

Now, in all honesty, I really don't think it is possible for one man to create the engine and then code the game I am trying to create in any reasonable amount of time. So besides researching things that will allow me to create the engine my game deserves, but not the engine it needs, I also started researching engines in hopes that maybe I can find an engine that offers all the features I need and then I can concentrate 100% on the game. This is not an easy task because most engines seem to be centered around the pre-designed level structure while my game world is procedurally populated. Even if I do not transition over to an engine, I am so done with Irrlicht. I had my gripes with it for ages now, and researching any advanced topic and how to do it in Irrlicht is painfully depressing. You will find some ancient forum posting, with some promising results, but eventually all links stop working and none of the concerns and incomplete implementations ever get finished and production ready. Also, I have been suspecting for a while now that Irrlicht is to be blamed partially for the difficulties I had in creating such a large number of objects in engine. I whipped together a little DirectX sample that did shader based per vertex lighting and rendered 90k cubes without any frustum culling or instancing. And it runs with 2 FPS on one machine and 13 FPS with high AA and AF on another. 13 is really low, but with Irrlicht I can't get even 1/4 so many items to render at that framerate. Maybe Irrlicht is doing a lot of useful stuff I am not doing right now in my DirectX mock-up test, things that I will eventually have to do and then well get the same performance, but still, I don't consider this a good start for Irrlicht and taking into considerations all my other issues, including that while abstracting away all the low level stuff, if makes it extremely hard to implement things it was not designed for, it is time that we end out collaboration. Still, Irrlicht is not all that bad and if you don't have an overly ambitious project it may be a good place to learn the basics. And I will miss being able to switch at will between DirectX and OpenGL.

Next I investigated Ogre and I did not like it at all. It is "just" a rendering engine so you need a lot of plugins to get things that you need working. Even without any plugins, with the absolute bare minimum hello world application, it takes ages to compile. Maybe it gets better with precompiled headers (which I do not have routine access to and I don't need them), but precompiled headers are at most a band aid solution to fix the symptoms of the absolutely abyssal module support in C++. More precisely, zero module support. They are not enough and you need other tools, like making sure you do not have a horrendous include hierarchy. I precompiled that hello world Ogre example and it had over 800k lines of code. Geeee... I wonder while it takes so long to compile? Look, we are programmers. As programmers we wait a lot for stuff to compile, to load, to process. The sad truth is that there is no way around it. Even modern languages like Java or .NET based ones eventually start to compile slow, once your code base becomes large. Sure, the margin is much higher than for C++. But eventually we'll have to wait a lot for stuff to finish and there is no way around that. At the work place! Not on my personal projects. Those compile blazingly fast indifferent on what language I use. The load fast and are generally fast. A 3 second compile and link is very slow for what I am used to. So there is no way I wouldn't get annoyed as hell while working with Ogre. To make matters worse, Ogre uses a resource abstraction system where you set up folders with textures, materials, etc. and then you load the resource based on identifiers, not full disk based path. So far so good, I really want and need such a system, The problem is that in Ogre this is super slow. Granted, my hello world was only loading a single mesh (the ogre head), but the resource locator was configured to use the entire Ogre Demo resource bundle, almost 50 MiB of data. This causes a pretty large delay when the application loads, even if you only use one mesh. I tested and created some resource folders where only the needed mesh files were present and start-up was instant. You mean to tell me that startup takes so long only with 50 MiBs of data? That is not actually used? My game will use more than that. In the time it takes Ogre to index or what not those folders, I'm pretty sure I can load 3 times as many resources in anything else, even in Irrlicht. Heck, when compared to Ogre, the instant gratification provided by Irrlicht makes me remember our time together with slightly more rose colored glasses. And while the Ogre samples are more advanced that the Irrlicht samples, they still seem pretty last-gen. DirectX samples from the SDK are more advanced and interesting than that. So a massive no no for Ogre.

Speaking of DirectX, I became quite comfortable with it, more precisely with DXUT. It is one of my primary candidates for my tutorial series (not the game). What tutorials do you say? As said in a previous post, once I finish developing shaders I will do a small series on them. Since I changed my mind, and I won't be doing just some series on it. I will do full fledged tutorials with full public source code, including of course the shader code. I don't know when exactly, but soon. My other candidate for the same tutorials is XNA 4.0. While not the biggest fan of C#, I like and respect in enough for me to work comfortably with it and as a plus I have nothing but good things to say about XNA. Sure, the way you sometimes have to write custom pipeline classes can seem awkward, but generally speaking I believe that XNA is the go to platform for young coders who wish to learn the ropes of graphics programming. And I am not saying that because I am some Xbox fanboy. I have PCs and a PS3. But being able to port easily to the Xbox is a huge plus. Too bad for the licensing agreement and all the limitation on your port though, including the cost. So my tutorial will be either in DXUT or under XNA. The shaders are going to be identical anyway. Just the code that invokes them is different. XNA is easier to learn and develop for, but I find the levels of tutorial materials more than enough and XNA seems well represented on the learning front. DXUT is more powerful but a lot harder to learn and there is not enough learning material out there, so I fell like DXUT tutorials for my shaders would do more good. I was dead set on XNA, but while writing the lesson plan on complexity flow, I realized that I underestimated the amounts of glue code needed under DXUT. I have no problems with the glue code, especially since you can of repeat it unchanged from project to project, but I'm not sure I can present it in an approachable manner.

What do you think? XNA or DXUT?

As for my game, I am probably not going to port it to either XNA or DXUT, but if I ever start a new 3D project, it will 98% be a XNA one, even if this means that I need to write tighter code to compensate for the relative speed difference from C++ to .NET.

As for real finished engines that I could use for my game, after going over Ogre, DXUT and XNA, or during, I first investigated C4. C4 is pretty powerful and scalable. It has marching cubes terrain with built in LOD switching and stitching, so seemingly perfect for caves. It also features quite the bit of coding in addition to the drag & drop stuff, so building levels procedurally seems to be in. On the other hand, while the engine looks good and has extremely good bump mapping capabilities (I don't remember of the top of my had what the technique was called, but it even has some self shadowing and works superbly with brick like surfaces), I can't say that the engine looks that great. Everything created in it looks slightly muddy, and the lighting is somewhat foggy, like if there was some grain filter over the scene which there is none. If this were the only option, I would gladly accept it, because it still looks good, tons better than what I had and has the marching cubes terrain. As they say, you do not look a gift horse in the mouth. There is a licensing cost. The basic version is 100$ and you don't get sources or the ability to sell a game created with it. The standard version is 250$ and has the sources plus ability to sell. While I won't be using it on this project, I will probably buy the basic version.

Another engine I considered in Torque3D. This one I did not get the opportunity to play around in depth and don't understand its capabilities as well, but it certainly seems more advanced than C4. If I understand correctly, this one focuses more on drag & drop functionality, while having scripting tie together the capabilities of the engine. I'm not sure if this is ideal for my needs. Also, the scripting language it has uses a C++ inspired syntax. Really? Really? Out of all the well designed scripting languages out there... The information on licensing is slightly more contradictory here, so I can only say that it is between 75$ and 179$ (with some super expensive extra premium version that offers something important I think), but don't know exactly which price range offers which features.

Another engine I superficially reviewed is the NeoAxis engine. This one is based on Ogre3D, but it is in C# and you build the world in an editor, but does have quite a lot (too much for C# and Ogre?) coding behind the curtains. It scales well (except for shadows, which either work as expected on adequately powerful machines, but pretty much disappear on low hardware). It has a start-up time, but this time it makes sense with the amount of resources the demos actually load. The indie version is 95$ but has no source code. The commercial version (including source code) is 395. Out of all these engines I am the least inclined to go with this one, because it is just a (as far as I can tell) very competent engine at what it wants to do, without having anything to recommend for or against taking into account my needs. It is also more expensive, but from 250$ to 395$ the jump is not that big, and you can always buy the indie version first and then upgrade if you are fully committed.

And then there is Unity! Without Unity the things on this list seem more attractive. But Unity is huge, powerful, easy to use and fast. Or so they tell me. It is definitively together with XNA to a somewhat lesser degree the focal point of this generation's development efforts, big AAA producing companies not included. There are 3 potential obstacles. One, I don't know how to use it yet, because I barely installed it today. Two, it is heavily based on the editor and I'm not sure how complex procedural levels will work out. And let's not forget the price, which is a whooping 1500$. Sure, there is always the free version, but this one does not support dynamic shadows and deferred rendering: the very two features I want the engine to do so I don't have to do them myself. Also, I hope that C# from Unity is good (not a standard C#; based on Mono so it should do fine though), because there is no way I am using JavaScript. I want to see JS destroyed and begging for mercy, not coding in it :).

And then there are the two overkill engines: the Cryengine and the Unreal engine. Both have a free SDK (UMK for Unreal). Unreal is old and venerable. Cryengine not as old. Both these engines would be serious overkill. Sure, Unity can also be considered so, but Unity is quite used, so not all games will look good and be high profile. But using Cryengine for this? What's next? The FrostByte engine, with it's three absolutely gorgeous cutting edge next super dooper envelope pushing games, Battlefield 3, Need for Speed: The Run and Dwarves & Holes!

So I'll keep you updated and in April I'll focus more on the practical, leaving a few unread research papers for others too.

Oh, just because I have studied a lot and am starting to amass quite the knowledge base, it does not mean that I am not talking out of my ass. Take everything I say with a grain of salt because I may be wrong. I'm sure there is a graphics guru out there who when reading this would do several quadruple out of phase facepalms while coding with their feet an engine that is 10 billion times as advances as I will ever have.