DwarvesH Development: 2013

Wednesday, September 25, 2013

Tworenas Snapshot 11

Yeah, you know what this snapshot is about. The thing I've been talking about the last week. This is the first snapshot with terrain LOD!

The first generation terrain LOD is finished, tested, quite stable and I even managed to squeeze in a first round of hefty optimization. On the other hand, barely a week old and the system has reached its limit! I just got started on the second generation LOD system!

But let's not get ahead of ourselves! Last time I talked about fixing seams and left off with adding a new level to the LOD system that I didn't manage back then to add the seam fixing functionality in for (what a phrase; I R engrish):

I just can't leave you with that screenshot, so here is a fixed one:

I added further levels since then and now the system uses 5 LOD levels, each reducing the resolution to 1/4 of the previous level.

With this system the standard view distance is 2 km, with fog starting from 1.9 km. The old default was 500 meters, with fog starting at around 400 meters. Even though the view distance is 4 times larger, the polygon count is about half and the performance is a lot better. The engine really flies, even on the Intel on-board.

I also optimized loading a bit and it should be better. This is still C# and the garbage collector really hates to run with a lot of RAM use and no free system RAM, so for best results please make sure that you have about 1 GiB of free real RAM on the system before loading a map. If this is true, loading becomes really fast. If you have low RAM, loading can take 4-10 times as much as normal. Why did I ever leave C++?

This is also the best behaved version RAM wise ever. Pooling is far better and no longer are resources held onto more than needed.

The system can handle more than a 2 km view distance. The batch count is not great, but it works. Let's start with the bad: heights! If you are somewhere very high, there isn't any level geometry to block your view and everything you see is bellow and the view distance seems very low. The 4 km view distance barely seems higher than the 2 km one:

Yup, that is 4 km! Seems so low. Here is the same shot with another style of fog:

On the other hand, if you are low, the view distance is really great. Check out this shot where the peak far away in the distance above the cursor is a 4 km from you:

So how can you fix the problem with heights? One solution would be to make sure that from all high points, there is high level geometry somewhere in the distance hiding the unwanted effect. A lot of games use this, but this is very difficult if you don't have an artist designing the terrain. My terrain is procedural. Another solution would be to further increase the view distance.

My LOD scheme scales very well the number of polygons with view distance. What it does not scale is the number of draw calls. The amount of draw calls is the same as if there wasn't any LOD scheme. What does this mean? It means that if you don't care about interactive framerates and the amount of draw calls, with the current system you can have impressive results:

Te above screenshot is a 64 square km map without fog viewed from a height of 4 km. It has 546k polygons, 5335 draw calls, 11 MiB mesh data and runs at 17 FPS. The only value that is not great here is the draw call count and in consequence the frame-rate. It is actually abysmal. So the second generation LOD method will try and do something about the number of draw calls.

So Snapshot 11 is mostly view distance eye-candy and a better performance overall. And very buggy. In Snapshot 10 the bugs were accidental. This time, I know about them but did not have time to fix them. Basically, all the bugs are related to the fact that the terrain editing features were not designed with LOD in mind. Try to edit a chunk with a lower LOD: crash!

I'll fix these bugs by snapshot 12, so fell free to skip this one.

BY AGREEING TO DOWNLOAD THIS SOFTWARE, YOU TAKE TOTAL RESPONSIBILITY FOR ANY CONSEQUENCE RESULTING FROM THE USE OR MISUSE OF THE SOFTWARE. I AM NOT TO BE HELD RESPONSIBLE FOR ANY POSSIBLE HARM THAT CAN BE CAUSED BY THE SOFTWARE NOR DO I GUARANTEE THAT IT IS FIT FOR ANY PARTICULAR PURPOSE OR THAT THE INTEGRITY OF THE SOFTWARE WILL BE PRESERVED WHILE BEING TRANSFERRED ON ANY AND ALL MEANS OF COMMUNICATION.

ALL RESOURCES INCLUDED FALL UNDER MY COPYRIGHT AND ARE ORIGINAL CREATIONS, EXCEPT FOR THE TERRAIN, GRASS AND SKYBOX TEXTURES, WHICH WILL BE REPLACED SOON WITH ORIGINAL OR OWNED CREATIONS. FULL LIST OF BORROWED ASSETS:

data\textures\bush: grass01.c.dds, grass02.c.dds, grass03.c.dds, grass04.c.dds
data\textures\terrain: region1.c.dds, region1.n.dds, region2.c.dds, region2.n.dds, region3.c.dds, region3.n.dds, region4.c.dds, region4.n.dds
data\textures\skybox: clouds-back.c.dds, clouds-front.c.dds, clouds-left.c.dds, clouds-right.c.dds, clouds-top.c.dds

Link: Snapshot 11

Thursday, September 19, 2013

Screens of the day 37 – G(h)e(tt)o-mipmapping

Now, after adequate testing, I can tell that Snapshot 10 is quite buggy. Mostly the grass. Let me explain why.

This is all due to my pseudo geo-mipmapping. While it is not active in the code-base by default, I am doing a dual mode implementation so you can switch between normal inefficient rendering and the LOD solution on the fly. This means that some changes from the LOD solution will affect the normal one. Like the chunk size. I mentioned before that my chunk size of 127 units was inappropriate for LOD because of its odd size. Meanwhile I have added a third LOD level, and the new size of 126 was no longer good, because it is not divisible by 4. So now my chunk size is 124 units. In Snapshot 10 in beyond. But I forgot to adjust the grass, so grass chunks no longer align with terrain chunks, causing visual bugs.

I won't show the results of the 3 LOD level based terrain yet, because we must talk about seams. For seam testing I'll use a very tight LOD margin so that seams are very apparent and not hidden by distance. Here is a problematic area:

To fix this, first I write code to determine LOD boundaries and compute which edge has a seam. Then I determine which points from that edge are causing the problem. To test that my detection is correct, I apply a very visible and unique pattern to the problematic vertices:

Those points are the ones that need to be fixed. Here is what we get with a proper fixing algorithm:

Once more, this time with wireframe:

This is of course only one edge. Slightly different code is needed to fix the other kinds of edges.

So now I have three LOD levels and full edge seam fixing, but only from level 2 to level 1. I still need to fix the seams form level 1 to level 0. After that the first version of the LOD system is done. I tested it and it seems mostly bug free. On the other hand, it is far from optimal. It can be optimized a ton and it will slowly get better and better.

So let's see the results. I will test the same scene in a problematic area without LOD and then with LOD. To make sure that the data is relevant, the switching from LOD to non-LOD and in reverse clears all pools and caches, calls GC.Collect to clear up the C# garbage collector and re-streams in all the data from disk.

The before shot:

We get 2.798 million triangles rendered at 27 FPS. The data uses 148.6 MiB and a further 64.91 MiB for physics. The physics memory use is constant because I'm not doing LOD on physics. So let's see the after shot:

We get 0.676 million triangles, so we only use 24% of the original triangle count. We now get a FPS of 45, up from 27. Not quite double. While the polygon count is way down, the draw calls and the fill area remain the same. Still, 45 FPS is pretty good. In the past on a Intel on-board you would really struggle to get 30 FPS at 720p and low setting. But now, from my limited testing, I can say that you get between 40 and 52 FPS consistently, based on the complexity of terrain. On a better system I have measured an improvement of up to 80 FPS.

And finally, let's not forget the RAM use. Unfortunately, .NET RAM reporting is completely useless and inconsistent. It reports values I can't make sense of and have little relation to what the Windows Task Manager reports. And it gets completely lost on resources that are on the GPU. So I will consider only the RAM use as reported by my engine.

The original scene used 148.6 MiB and the new one with LOD uses 27.95, which is 18% of the original. This is strange, I was expecting it to be closer to 24%, like the polygon count. Anyway, it eats up at least 100 MiB less RAM and the Task Manager also reports this.

Lol, I might be able to run the engine on a XBox eventually without problems :)).

Anyway, great improvement. I manged to implement this in a couple of days, so not bad. But it was difficult. And fixing seams is no fun. So what is there left to do?

Fix the seam from level 1 to level 0, like I did form 2 to 1. See if you can spot the seams in the above screenshot.
Create and awesome video showing off the new performance and view distance possibilities.
Enhance the system to make it easy to add more levels. I think if I go from 3 levels to 6, I could have an acceptable view distance. The number of levels should be dynamic and changeable on the fly.
Rebalance the margins. After I finish my comparison video, I will have to rebalance all the margins and frustum constrains and create a new render profile that is optimized for the LOD system. This will means above double default view distance at the same performance. The new normal will have the same view distance as high, and the new high will be a lot bigger.

So Snapshot 11 is on track to have the first version of the LOD system turned on by default. I will probably have to add a LOD aggression option. The above screenshot uses a sensible default that compromises quality for speed. This is the high option:

This uses more RAM and has a higher poly count, but even without the finished seam fixing you can't notice seams because of the higher high quality render distance.

I'll probably add an "ultra" option too. Those people with dual Titans need to use the available power :).

Wednesday, September 18, 2013

Tworenas Snapshot 10

Today's snapshot is not really rich in obvious features and should be considered more of a bug-fixing release. The primary feature that took most of the time was designing and implementing the new world chunk traversal system. The rest of the time was spent on the LOD system, which of course I am not using because seam elimination is not implemented yet.

With the new traversal system and the balancing act of adding a LOD system to the game engine without breaking anything, this snapshot might have a few extra bugs even.

But I did manage to prototype a more compact UI:

The main idea is still the same as it has been: two modes, a building mode and a combat mode. Building mode requires you to be fully calm and in tune with nature so you can't use it in combat. This is simply because I can't balance the combat if you can drop a house on your foe. The two modes will have each a panel in the corner with appropriate content, like seen in the screenshot, and a hotkey bar. For combat mode, each hotkey will trigger a spell, for build mode it will be the sub-menu of the current tool selected form the panel. Hot-key bars are coming in Snapshot 11.

I also fine-tuned and adjusted the texture for the stone table, so it is included in the build. Here are a bunch of screenshot with it:

The full implementation of the first generation terrain LOD scheme is scheduled somewhere between snapshots 12 and 15 and this will take up most of my time, so most of the next few posts are going to be about LOD.

BY AGREEING TO DOWNLOAD THIS SOFTWARE, YOU TAKE TOTAL RESPONSIBILITY FOR ANY CONSEQUENCE RESULTING FROM THE USE OR MISUSE OF THE SOFTWARE. I AM NOT TO BE HELD RESPONSIBLE FOR ANY POSSIBLE HARM THAT CAN BE CAUSED BY THE SOFTWARE NOR DO I GUARANTEE THAT IT IS FIT FOR ANY PARTICULAR PURPOSE OR THAT THE INTEGRITY OF THE SOFTWARE WILL BE PRESERVED WHILE BEING TRANSFERRED ON ANY AND ALL MEANS OF COMMUNICATION.

ALL RESOURCES INCLUDED FALL UNDER MY COPYRIGHT AND ARE ORIGINAL CREATIONS, EXCEPT FOR THE TERRAIN, GRASS AND SKYBOX TEXTURES, WHICH WILL BE REPLACED SOON WITH ORIGINAL OR OWNED CREATIONS. FULL LIST OF BORROWED ASSETS:

data\textures\bush: grass01.c.dds, grass02.c.dds, grass03.c.dds, grass04.c.dds
data\textures\terrain: region1.c.dds, region1.n.dds, region2.c.dds, region2.n.dds, region3.c.dds, region3.n.dds, region4.c.dds, region4.n.dds
data\textures\skybox: clouds-back.c.dds, clouds-front.c.dds, clouds-left.c.dds, clouds-right.c.dds, clouds-top.c.dds

Link: Snapshot 10

Tuesday, September 17, 2013

Screens of the day 36 – LOADING LOD!

This is probably the most important screens post ever, followed by the one introducing spherical harmonics rendering. Because it introduces my attempts at terrain LOD.

I have a very low view distance in my engine. This is because I have no LOD system and rendering a lot of terrain at full resolution quickly get's out of hand. At the default view size of 7 you can get up to 1.1 million triangles with terrain only. At the maximum view size of 19, you can get up to 2.8 million triangles. This is quite a bit and has lead to the my default and maximum values still having a relatively low view distance.

So it is time to add a LOD system. I won't be going for now with any established LOD scheme, because they are either too difficult, use too recent technology developments like geometry shaders, would take too much time to implement or, in some cases, come with a demo and I really didn't like the way the system looked in action. So I'll be rolling my own based on what I have read. That's right!

It's time for ghetto ass programming! Today we'll be going for a ghetto version of geo mip-mapping.

Since this method is meant to be implemented in a few hours, it won't have all the bells and whistles of the true method, nor will it have the same level of quality. I predict that it will handle natural generated terrain very well and the terrain edited by hand by the player very poorly.

We'll start first by dropping 3/4 of the polygons in terrain, going from this:

... to this:

Now we only have 1/4 of the original number of polygons. Next we need to change which vertices are used for the creation of the polygons, in order to fill up the holes.

This was easy but I ran into some large problems. My engine is meant to run on both XNA Reach and XNA HiDef. This means that there is a limit on how many vertices you can render in one draw call, thus the size of a chunk has a maximum. I wanted to make good use of this limit, so I decided to have a chunk be 128x128 vertices, thus covering an area of 127x127 logical units. This was done before any thoughts of LOD so I didn't think that having an area of odd size would be a problem. But it is for going to a lower LOD level, which works by division of two. In order to keep the odd size, I would have had to move around vertices at a LOD change and I did not want this. So I changed the chunk to 127x127 vertices covering an area of 126x126. The same area can be covered by a lower LOD chunk with 65x65 vertices. Most of the engine is parametrized with these values, but some parts were not and I had to track down and fix those spots.

So, after all these fixes, I added a secret hotkey to instantly switch from high LOD to low LOD. Before:

After:

Another comparison:

And now with grass:

As you can see, the lower LOD is pretty close to the high one. You can tell the difference with ease close by, but the farther the terrain is, the smaller the difference is. LOD is meant to be used with distance chunks only. So I enhance the debug mini-map to show LOD levels and add a fair distance of high LOD chunks:

We went from 1.174 million polygons at 30 FPS to 0.722 million polygons with 39 FPS and no noticeable decrease in quality. I increased the aggressiveness of the LOD distribution and did some tests with edited terrain. As I predicted, edited terrain with a distinct shape will have troubles with this implementation:

The bump can be seen clearly popping in. The effect is not as bad with strong fog:

But the real benefit come when trying to render higher view distances. Here is a shot without LOD at the current maximum view distance:

It runs at 23 FPS and tries to render 2.698 million triangles. But with LOD, it runs at 33 FPS and 0.920 million triangles:

This is of course on a Intel on-board chip. My dedicated high end GPU has no problems running the above scene without LOD, high quality and 4x MSAA at around 150 FPS. I'm curios on how it will behave with LOD.

This is only the first attempt at a LOD system. It won't be featured in Snapshot 10, which is due out tomorrow. There are lots of things to improve. First, while the engine tries to render less, it still uses the same amount of memory. I need to update it to use less memory for low LOD chunks, so I expect memory to also be reduced 3 times.

As a hobbyist engine developer, I do a lot of research and stumble upon other people who try to solve the same problems that I do. Most of them try to implement terrain. And a lot of them, once they begin to implement LOD, create a solution that has seams, wow to fix the seams in a future update, but never do. I have wowed to never do that. My LOD system is theoretically capable of showing seams. It has no seam compensation implemented. In over an hour of testing, I never saw one, but they could appear. So I will need to implement seam compensation and only after that will I include the LOD system into a public build! But there will be a secret key you can press to try it out, but I won't tell you which one :).

SAY NO TO SEAMS!

Monday, September 16, 2013

109 – Who best shape? Circle best shape!

The reason I'm stretching out the features of Snapshot 10 to 3 snapshots is that I can give the final set of polish to some existing features that I want to consider complete and also enhance some features with some sorely missing functionality, while otherwise being in a semi feature freeze environment. So snapshot 10-12 will not really have any new features, but iterations on the existing ones.

One of the problematic features is the algorithm that keeps track of a grid of objects and manages their streaming. I want my engine to support very large terrains. While storage space concerns will keep me from having truly huge maps (more than 256 square kilometer map will eat up probably a blue-ray's worth of data), the algorithm should be able to handle any sized map. And currently it does, but the cost of the algorithm is dependent on the map size. This was a problem when I first implemented grass, because the grid size was so big that I lost about 30-50 FPS on a top-end PC just by the CPU work of managing grass visibility, without even streaming in anything. Then I did a quick hack, but now it is time for a proper solution.

Because we are going to rewrite a system from scratch, a task that must be done but doesn't really push the project ahead in way that screams "new feature" or "gameplay", we are going to be try and be as clever as possible when implementing the new feature. What I mean by that is that I'm going to implement and algorithm that tries to solve our problem (decouple the map size form the time it takes to manage the world in the vicinity of the character) and only that, but by the way we solve it, we will implicitly get a bunch of strong benefits that are almost implicit features on their own.

For this we'll consider an algorithm that works on a grid like structure, where each element in the grid has a property that tells it if it should be loaded and one that tells it if it is loaded or not (an other properties). And we'll consider 3 lists: one that keeps track of the currently loaded chunks (the loaded chunk list L), one that will store the chunks that are in range and are inside the view frustum (the priority chunk list P) and one that stores the chunks that are in range and are not in the view frustum (the secondary chunk list S). In order to determine if a chunk is in range, we'll use a bunch of circles that share a center, have increasing radii and are constructed in such a way that by overlapping all circles, all the chunks inside the radius of the largest circle are exactly in one and only one one of the circles' border. This way if we go from the radius of 1 chunk to V, we can iterate over all the chunks inside that radius, but in the order of the increasing circles and we only process each chunk once. So here is the algorithm.

Determine the chunk you are standing on and let that one be the center off all the circles. Dismiss the priority list P and the secondary list S. These are computed each frame. The loaded chunk list L is persistent and it's content is updated when needed from frame to frame.
Go over the list of currently loaded chunks and set "ShouldBeLoaded" to false.
From v = 1 to V, where V is the current view radius, go over all the chunks that are inside the circle with the current given radius v. For each chunk set "ShouldBeLoaded" to true. If the chunk is within the view frustum, add it to priority chunk list P and if not add it to the secondary list S.
Go ever the L list again, and if you find a chunk that is loaded but it should not be loaded, free the chunk, ideally adding the freed resources to the pool.
If the current frame allows for a chunk load (I base that on time, but other mechanisms can be used), try to determine how many chunks you could load and keep your framerate. It is simplest to consider this 1 and thus load 1 chunk every DELTA milliseconds. Try to load the required number of chunks from the priority list. If you have gone over all the items in the priority list and have not reached your desired number, go over the secondary list.
If the above steps have not updated the active list L, you can update it here in a final loop.

The above algorithm has the property that it does not depend on the world size, only on your view radius. If you have a world with 1 chunk or 1 million chunks, it will finish in approximately the same amount of time. The old algorithm that this one replaces did not have this property and the CPU time of the run increased very fast based on the world size. As said before, I lost 30-50 FPS on a large map. The new algorithm has no such problems, especially since it is not only constant time, but pretty fast. The cost of the algorithm is negligible to the cost of streaming in a single chunk.

So let's see what clever implicit behaviors we managed to squeeze out of this implementation:

It favors chunks in the view frustum. I don't start you off with a completely un-streamed map at a map load, so this is hard to notice. But if I were to drop you in an empty map and you'd have to wait for it to stream in from zero, you'd notice that the terrain in front of you get's streamed in first. If you were to turn around very fast, you might notice that there is nothing behind you. This is actually a good behavior and the desired one if you combine it with a generous pre-streaming so that only distant chunks in front of you get streamed in.
It gives implicit approximate sorted view-depth for chunks. Since the priority list is created based on circles with increasing radius, the closest chunks will be streamed in first. But not only that, but they will be one rendered first. This has generally a better Z (depth buffer) behavior. I can't test this under a setup with very high framerate. If I run the engine on a computer strong enough to go with 150+ FPS, it will fluctuate unpredictably between 150-170. So I tested in 720p under a weak computer that can't handle more than 30-FPS. Just by switching the algorithm and taking into consideration only the benefit from a better sorted depth buffer write, I go a minimum of 1-2 FPS bonus. Sometimes as much a 5, but this was when staring directly at a steep cliff side form close up. Hey, free FPS. What is there not to like?
It gives implicit approximate sorted view-depth for objects in the chunk pool. Since we are rendering based on chunk proximity, the chunks that are closer are rendered first and so are the items in the chunks, thus increasing the change of a larger closer object to hide smaller objects behind it because of the place where the draw call was made on the frame render time axis.
The relationship between items in a chunk and having two lists of chunks based on the frustum intersections allows us to pre-cull a big portion of the item list without touching the item list.
The order we construct the lists are very cache friendly.
We use a bitmap to give us the circles so we can easily change the shape of the circles, or if needed replace it with rectangles or some other shape. Anything goes, including the shape of two overlapping owl if that is needed, but I'll go with circles:

This is the current list of benefits what are actively used when rendering. There is one more, a pretty big on I have not implemented yet:

It allows for single-cache approach when rendering the scene multiple times. If I were to cache all the CPU computations for the items that will be rendered, this allows for a render procedure that only reads from a cache, without doing any computations. This is very helpful if you are rendering the scene more than once for different effects, because you are doing the CPU matrix calculations and CPU lighting bounds calculations only once and you can render as many times as you wish with different shaders. I'll add this in the future.

Implementing this was not as easy as it seams based on the description. To test that it works correctly, I implemented a primordial debug map:

I should have added the map before developing the algorithm, not after. That would have made things a lot easier and faster :)).

The debug view is updated in real time, and color coded. Red are the problematic chunks, the ones that are technically leaks if they persist. It is normal for them to appear, but in a few frames they should disappear. If they hang around that is an alarm signal of a memory leak. Currently there are no know leaks and red dots will almost never show up, and if they do they last just one frame. Gray marks chunks that should be leaded, but the streamer did not have time yet to schedule that loading. You can't see it in that screenshot and in real life scenarios gray dots only appear at the distant margins of your view radius. Cyan are the chunks that are loaded. Dark cyan/gray signals chunks outside the view frustum, light cyan/gray signals chunks inside. This color coding greatly helps with navigation because you know which direction you are heading.

This debug map will eventually evolve into the mini-map for the game. The scale must be changed first. That mini-map seems deceptively small, like if the current map was very small. This is because the chunk size is pretty big and the map that you see there is 16 square kilometers big. For the mini-map to be useful, it shouldn't render the whole map, but more like an area between 4 and 9 chunks big, centered on the character.

Friday, September 13, 2013

The future of the engine

Let's take a minute to talk about the future of the engine. The fate of the engine is now forever tied to to the fate of the game, which is very slowly but surely going forward. The project is split into two parts: the game and a semi-general purpose 3D engine, but the split is not good enough yet to take the general purpose library and build a new game from it. A few key features like my new physics behavior is in the game. This might be remedied as time goes by. The general purpose library will be eventually opensourced, but I don't have time for that right now. I am psychologically unable to open-source code that is not near perfect.

So the engine of the game is the library plus the some parts of the game code and that is becoming relatively stable. Two main features that will need to be added or finished is the general unified lighting scheme and terrain LOD. Once these are done, this version of the engine will be considered done. The schedule for these features is closely tied to the schedule for the game. Except for shadows, I really can't think of any other important features to add to the engine. Maybe occlusion culling. So if this won't be the final version of the engine then there will probably be only one more.

I'm currently in a full throttle 22 week development cycle that focuses mostly on the game but also advances the engine.

As a conclusion, the game development is going well and the engine is nearing its current and potential design objectives.

But this is just the near future of the engine. What about the rest?

The biggest problems with the engine is that it is under XNA. XNA is going away and I'm getting tired of its limitations, so this will have to be addressed. The second biggest problem is that it is in C#. Now, I love C# and I would like to work more and more in C#, but for such a buffer heavy application C# is ridiculously slow. I spend more time trying to find hacks to compensate for the slowness of C# than working on the features. All those people who say that C# is not significantly slower than C++ need to try to implement something that by definition would be too slow under C++ in C# and we'll talk after. Unfortunately, this can't be addressed. The code base is far too big to port it to C++ in under 6 months, so I'm not going to do it. Even if I could port the code in a shorter period of time, BEPUphysics is in C# and if I were to replace it with a C++ physics engine, the feel of the game would change noticeably. Physics engines are not absolute simulations and are by far non-interchangeable. I have balanced my game around how physics behaves under BEPU and it would take me further months to re-balance it to a new physics engine.

So it's C# and BEPUphysics! All the way, baby!

But XNA has got to go. I knew this for quite some while and I took a measure of steps. My game and engine don't use any real XNA specific features. I'm not using the content pipeline to load or do anything with the assets of the game. I am not using the component framework of XNA to render my game or manage my components. I am not using the content importers from XNA. These and more have been eliminated. I still need to slowly phase out the reliance on the SpriteBatch and friends classes. After this is done, the only things I'd be left using form XNA are the vector, matrix and color types, the API to create the device and render targets and the API to set the state of the renderer (buffers, textures, shader parameter, etc.). These parts won't be easy to port, but they are a relatively small part of the code base. I basically need to port my Game, Renderer and Cache classes to whatever I'll be using and 90% of the porting is done.

I will be porting to SharpDX, which hopefully offers the full power of DirectX and I will go about the porting process as smart as possible. If I were to abandon today development and focus 100% on porting, I could have a working version in about 3 weeks of working round the clock. But I don't want to disturb the development process, so I'll be going much slower. First, I'll split the game and engine code into two parts: the XNA dependent one and the non-XNA dependent one. This will be done as a background task in the next month and is very easy, just make sure to minimize dependencies and move files into new folders. Then, I'll start using less and less XNA data structures and the math library and I'll control which version will be in use with compilation flags. From this point on, you will get two executables. One native XNA that runs at full speed and one that uses less XNA but will be slightly slower. This process will continue until summer 2014! So not 3 weeks but 9 months, making sure that I don't sacrifice any development speed for porting. I really don't want to spend more than a couple of hours a week porting. Maybe even one hour. Maybe on the clock. During this period, there will eventually be two versions of the executable, the main feature rich one and a secondary SharpDX version. Both will look identically, pixel perfect, but one will have less features and be less stable.

And I also have a plan for the direct porting to and learning of SharpDX. Before I started this development sprint of 22 weeks, you may have noticed a long period of inactivity on the blog and development wise. But I did not sit on my hands the entire period of time. I wanted to port the engine to C++, so I though I would revive my old DXUT aborted attempt at a tutorial series. I wanted to write 15-20 tutorials and then take the last one and turn it into the port of the game. I got to tutorial 7 before I realized that the physics engine was going to be a big problem, so I got my shit together and went forth on my development sprint.

So this is the plan. Develop the game as usual. Reduce dependencies on XNA. Make sure that new code does not use XNA. Publish my DXUT tutorial series, but don't go up to 15-20 as the original plan was and only publish the first 7. Create a new 8th tutorial, that ports the parts of DXUT that I'm suing to raw and very simple DirectX, while maintaining the same API. You will be able to run a diff on tutorial 7 and 8 and they will look the same. Then, take tutorial 8 that uses just DirectX but has a DXUT style API and port this to SlimDX for tutorial 9. Then take the SharpDX tutorial 9 and add the same camera, terrain and renderer class that I'm using in the game now with XNA.

So how does this sound?

Wednesday, September 11, 2013

Tworenas Snapshot 09

It's snapshot time! Actually, this snapshot was pretty much done on Sunday and I wanted to package it it on Monday, but on Monday I had a particularly hard day of physical training. It wasn't as bad after just finishing the session, but after a few hours I became all sore and stiff and I actually went to sleep as soon as I got home and slept for almost 16 hours.

I also wanted to do a post about the new physics features, but since I didn't have time for neither, I will mix them together here under a new shorter version. The object descriptors now have a bunch of new properties, like singular and plural name:

Of course, you don't want to have all object display their name. Imagine a cave with hundreds or even thousands of meshes, all having mouse over labels and most of them being "rock" or "cave wall". So I added a new boolean property that controls if an object should have a mouse over label or not. Most objects shouldn't have one, only import ones, like doors, containers, objects of interest, loot, activatable objects, etc.

Another new property of objects is one that controls if they can be picked up. Here is a pot that has a text descriptor and it is set to display a label:

Now the same object, with the "pick-up" flag set to true:

The label will inform you that the object can be taken. Once ownership is added, the label will be able to say "steal" when appropriate. The it will inform you about a few basic properties, like weight and value. Finally, I make the text and interaction more readable:

The final and most important property is the "stickiness" which controls where the object can be placed and on what surfaces. This is in its primordial stages, but works quite good. The problem is the if you can have a sticky surface for item placement, that can be anywhere and is no longer guaranteed to be so flat or well behaved like terrain.

So I needed a brand new physics subsystem that manages placement and clipping detection and avoidance. Brand new equations that can handle a lot more situations gracefully:

In this video you can see only static items placed, but If I were to repeat the test with the column and the placement of the pot on the top corner but with a mobile mesh, it would initially have the same position, but immediately proceed to fall down under the effects of gravity.

Because of the new physics resolver I bumped up the version number. There are also a ton of bug-fixes, most of the being the system mode crashing due to it lagging behind the vast changes in the module format.

So I said that for snapshot 10 I'm going to do something cool. I am no longer going to do that since I realized that the engine is already pretty cool . This is very subjective, but I'm liking it more and more and I think it has great potential. So snapshot 10 won't be lacking implicitly on the coolness factor whatever I decide to do. What I'm lacking in general is polish. So snapshot 10 will be a polish snapshot, focusing on the user interaction. Snasphot 11 and 12 will be maintenance version for snapshot 10. So I hope that snapshot 12 will have a pleasant enough interface that anybody can pick it up and use it, including a controls list.

data\textures\bush: grass01.c.dds, grass02.c.dds, grass03.c.dds, grass04.c.dds
data\textures\terrain: region1.c.dds, region1.n.dds, region2.c.dds, region2.n.dds, region3.c.dds, region3.n.dds, region4.c.dds, region4.n.dds
data\textures\skybox: clouds-back.c.dds, clouds-front.c.dds, clouds-left.c.dds, clouds-right.c.dds, clouds-top.c.dds

Link: Snapshot 009

PS: The snapshot are getting pretty big. By snapshot 20 I'll probably be at least at about half a CD's worth of data per snapshot, so I need to figure out a better way to distribute these.

Ideally situation: registering a domain, creating a site with forum and repository and adding an incremental packer to the game distribution that updates you to the latest version. I'll look into this, but don't expect it too soon. It will probably be both hard and expensive to do. I'll keep you up to date.

And speaking of expensive, I forgot to show the textured shelves:

The texturing is not done yet, it is temporary and currently in the seam reduction phase. But it is still included in the snapshot. This is the outsourced asset of the week. The asset made by me is the pot. I'll continue with the two assets per week approach, even if they don't get to be included in their final form.

But with one payed asset per week, the cost will quickly add up. I don't mind working for free while developing the engine because I really like what I'm doing and the way the engine is shaping up. I am also staring to see the game slowly emerge from the engine which is quite motivational. And I don't mind spending my personal funds on a few dozen of assets. A good investment in my mind. But I do mind spending my personal funds on the hundreds of assets that will eventually be needed. Two assets per week is way to slow. At least 5 would be needed. So I'll need to find a way to monetize the game after I'm done with the current snapshots plan.

Thursday, September 5, 2013

108 – Summer cleanup!

It's still summer, am I right?

So the plan for this week was to get the container list going and implement inventories, together with persistence with them, but I realized that containers are not the simplest form of statics and I should get the foundation finished first before attempting something more complicated.

I updated the module system to support list, for now a single list: statics! As I mentioned, most gameplay rules can be expressed as lists, the implications items derive from the nature of the list and their properties. The static list is the simplest list and it has the simplest implications: if a mesh is added tot he static list, that mesh will be available to be added to any map/interior on the terrain/floor and it will persist. Items can be added to separate lists at the same time, so if you define an item in a static list, you can have it in another list that let's say implies that the objects is dynamic and you can interact with it, but will expire if left alone for 20 minutes.

But first let me describe the complete structure of game objects:

Raw textures. These are stored on disk and they usually have the same name as their path on disk, but without extension, so that the engine can differentiate between multiple versions of the texture used for different things, like when running low on RAM the engine might decide to reduce the texture resolution. It could do this by loading only a lower mip-map, but alternatively it could load a specially prepared texture from disc meant to be used in this scenario. Right now textures can be loaded multiple times form the same path and you will get two independent copies, but if I determine that there is no need for this, the engine could only load one path as one texture in memory and use that for all the "copies". You do not have explicit control over textures as resources in the editor because they are governed by materials.
Materials. These are basically a structure describing how material should look and they usually have a diffuse texture and a normal map texture, but also properties relating the the strength of the material channels, specular highlights, etc. This is what you change when you need to change the way an object looks, not the raw texture objects and there is full support in the editor for this.
Meshes. These basically describe a 3D mesh as exported by a 3D modeling program, under a neutral stance (position, scale, rotation). Example: an upright barrel. A mesh has multiple parts, and each can have a material. In the past a mesh object used to have properties related to its scale when introduced in the world and its physics properties, but starting with this version a mesh will be size and physics agnostic, just a raw data, similar to the raw textures. The old properties are moved to a new objects, that shares the same relation with meshes as the relationship between textures and materiel.
Object descriptors. These take a mesh and add size and physic properties to it. Let's say your exported mesh is a cylinder with dimensions (200, 450, 200). This size is resulting from the scale that the 3D artist used. It would be great if all meshes would be created with the same scale, but with an army of modelers that I'll surely have soon, this is not very likely. This size of the mesh informs its use in the descriptor, so you could choose that the descriptor would use the exact size given by the mesh. But more likely you'll choose another object. This way you could have a static barrel in an object descriptor which specifies that the barrel mesh will be used with static physics and a size of (0,8, 1.2, 08). This item descriptor will be added to a lists and then used. Or you could specify a static physics model that uniformly scales you input mesh, but to height of maximum 1.5 units. Or some ranges. Or make it dynamic and force it to a convex wrap. Or mobile cylinder with some dimensions.
A model instance. This is an actual instance of an object descriptor, inheriting all properties form it, but this will have it location in world and other properties, like content. The entire world is made up of model instances. Their specific properties are saved to disk to enable map loading.

Seem complicated, but here is the short version: a game object is an instance of an object descriptor, which is describes a mesh with ti physics properties and size. A mesh is a 3D model that can use multiple materials, which in turn will load multiple textures. Here is a screenshot with the mesh editor for the skybox, showing multiple materials:

So now we know what resources we can have and our final goal: use this information to populate our world with model instances. But what do you do with those instances? You add them to a slot. Currently there are two slots types:

The master model pool. This can handle anything. It can handle static meshes with one or multiple sub-parts and any shape or dynamic objects, but only with one submesh, and currently forced to become a box or a cylinder. As the need arises, I'll add spheres, convex wraps and compound shapes as a valid shape choice. The master pool can handle any number of objects in any mix and added in any order with perfect batching. Or at least how I currently understand how perfect batching should work. It frustum culls and can further cull your objects based on distance, but it does have the downside of having to go though all the objects that you added to it every time it tries to determine what to render. This fact combined with the ease it handles mobile objects makes the master pool an ideal candidate for mobile objects, which you should have less of than static entities.
The terrain chunk pool. Actually, there is one such pool per terrain chunk and it can theoretically hold the same things that the master pool can, but I never tested with mobile objects because this is meant for static objects. If you add a static object to the game world and choose the terrain pool as destination, only the chunk that contains the object will be updated. It has perfect batching, but only for the objects inside a chunk, chunk boundaries breaking perfect batching. This pool first frustum culls the entire pool as a single test as an optimization, so if you have a terrain chunk with 10000 objects and the entire chunk is outside your view frustum, with one test all 10000 objects are eliminated. After the chunk test, all objects inside are further tested if needed and there is distance based culling. So not all terrain pools must be fully traversed like in the case of the master pool, but it doesn't handle mobile objects well.

Again, to clarify, let me give you the short story on how objects should be added to the game world. If it is static and should be cull-able, add it to the terrain chunk pool. If it is mobile, add it to master model pool. The engine has default behaviors and since the object descriptor tells it how the object is meant to be used, you generally don't need to worry or know this stuff. Just select a descriptor and do level.Add(descriptor, position) as an example. The engine will select the pools as appropriate. And will do this efficiently, with constant cost and low overhead.

This is why I called this post summer cleanup. Most of these systems existed before, but now I formalized and cleanup everything related to them and added the new object descriptor system. This is what I wanted to say, so you can stop reading now.

but bellow I'll detail parts related to the development of this system and some distance based filtering. So after a first round of cleanup and creating support for object descriptors and adding list support to the module system and adding a first item to the static list, I needed to finish the support for adding such a descriptor. I started with the foundation dropping code, but made it use the new object descriptor system:

Above you can see this work, maybe a little bit too well. The pots are the size of foundations and terrain gets deformed and its texturing changed. I strip out the unneeded parts and I compare a pot added by the old system and the one that uses the object descriptors:

Great! thinks look the same. But they are not! Before the object descriptor system you needed to give some physics properties to a mesh directly, and if you did not, it defaulted to a box. The new descriptors coexist with the old properties, but since we selected an object from the static list, their physics behavior is quite different:

The one on the left is a static pot, that wills stay there until somebody or something removes it, while the one the right is a mobile one mapped to a box which you can just push around by walking. From this screenshot I can draw two conclusions: maybe, just maybe, the pot mesh is too high poly. But for sure, the physics mesh, which can be different from the render mesh is too high poly. I'll model a new physics mesh, that will have at most half as many vertices and use it as physics impostor.

The static mesh is in the terrain chunk pool, the dynamic one is in the master pool. They both have support for different ways of culling as described earlier, but also support for distance based filtering and actions. I created this very approximate grid of pots:

I intentionally set the distance on the pool to something very low, like 20 meters. See what happens as I move back:

And move back more:

Object further away that the 20 meters will get culled as I move about. This partially solves the problem of very ugly pop-in as you move around. Do you remember my old video where I did a 64 square kilometer map with hundreds of thousands of physics enabled objects and I ran diagonally from one corner to another with high speeds? Well, you could hardly tell because of the YouTube blurring that distant chunks would pop in all at once when running around. As in all the hundred of objects inside the boundaries of one chunk would appear as soon as you got close enough to the chunk. This was very immersion breaking and due to the chunk single frustum pre-test. Now, if I were to repeat the same test, internally the same pop-up would happen, but the engine would only render things in a set radius around you, so you would see object pop in one by one instead of the whole chunk. And in practice, I won't set the view distance to 20 meters.

I will set a generous radius and add other methods to enlarge the radius, like omitting very small objects and using meshes with LOD levels. In the above screenshots, instead of hiding the objects, I could switch to a lower LOD mesh. If I had one. I need to model that when I sit don to do the physics impostor.

Switching to a lower LOD will still have some pop in, but it will be minor. Full chunk pop in: VERY VERY BAD. Single object pop-in: PRETTY BAD. LOD transition and/or smart billboard: pretty good. That is what the AAA are using, so it will be good enough for me.

I couldn't test LOD transition, but I could test pixel shader complexity transition. I set the view distance to something very small, like 1 meter. If I am inside the radius, everything looks as normal:

If I take just one step back, the normal mapping effect is omitted:

Seeing this screenshot I realize that I did do a fair job at modeling and texturing this simple object, but I did a horrible job of normal mapping it. The pot looks a lot better without the normal map. I'll have to redo it.

Anyway, the change is very jarring. but as you increase the radius, the difference becomes less apparent, like at 5 meters:

At 10 meters, the change is barely noticeable:

So if I set a larger radius, the change won't be noticeable. But this won't give you any great performance benefit. The object will be only a few pixels wide and you will save a few pixel shaders instructions which in the long run will barely affect your framerate. The way you use this to actually get some benefit is use it to count the uses of the normal map based on the radius. If it is zero for a given raw texture, you can unload the texture and save on VRAM. As an example, if your object is so small that from 200 meters you can't tell the difference between high LOD and normal mapping and low LOD and no normal mapping, you make it so that at 220 meters it unloads the normal map and maybe replaces the full size diffuse texture with only 64x64 mip. The 20 meters extra units gives you a little bit of leeway if the texture streamer is very busy right now. It gives you a better change that by the time you reach the 200 meters mark , it is done streaming in the previously unloaded textures. And if it won't quite make it until 180 meters, you still won't notice the change, only when you are really looking for it.