Wednesday, March 27, 2013

Engine features 01 (A.K.A. I am really really done with the terrain; fo' reals...)

OK, a little bit latter as promised, here is the first video showing off engine features that are in a near final form:



First time doing any "real" video editing. Took me quite some time and hopefully my productivity and results will improve on future videos.

Since this video is late and I did not stop working meanwhile, I have enough new features for another two similar videos (but probably shorter) that I could record/post even right now. I still need to space out the content on the blog and YouTube, so today I am going the ignore the rest of the features and focus only on terrain.

So let me walk you quickly though the contents of the video. First I show the dynamic quality switching. It might be hard to tell, but every 1-2 seconds I change the quality of the terrain as I am moving around. I am fairly happy with the preset quality levels and any concerns are more stylistic than implementation related. The presets have predictable performance, with higher quality being slower, but unfortunately my terrain is bus capped so I can't get accurate GPU times on the rendering. A future version 2 of the terrain might use again vertex buffers to reduce the CPU time of passing the terrain chunks around, but this is fairly complicated since a complex cache system must be implemented. Terrain is meant to be editable in real time and creating/destroying vertex buffers is too slow for this. So this theoretical caching system should give timers to a chunk. If a chunk has been edited in the last N seconds it is very likely that it will be edited again, so no vertex buffer should be used. A chunk that hasn't been poked in ages should use vertex buffers. And the system should handle fast buffer migration for high speed terrain traversal. Complicated.

Next I show dynamic view distance. Again, every 1-2 seconds I increase view distance, from 7 to 19 (radius). Farther view distance is not really feasible, not because of the far plane, but because the terrain does not support LOD and is also bus capped, so after a point you spend a fixed and far too large time span just passing on blocks. More that it takes the GPU to render them. Version 2 of the terrain should have at least a simple LOD solution to approximately half the resolution of distant blocks, maybe even insert them into the same buffer.

Next is adaptive detail mapping. The engine supports this feature separately for medium and high quality terrain rendering, each with its own settings. By default for medium we have a distance of 700 and for high of 2000, but for the video I lowered it to 100 to show what the effect does. Alternatively there is a super fast variant of the algorithm that does not do anything for pixels outside the radius, instead of the smooth blending. This can boost performance somewhat, but does give a visible circular edge around the character. At a distance of 2000 this edge is only noticeable when you look for it, but by default I use the higher quality blending. Performance be dammed, look good!

Next is a new arrival: day and night cycles! I have a very complicated lighting scheme and making it behave consistently and also transition smoothly from day to night is a real nightmare. I am not 100% happy with the results, but it is a good start. The night is intentionally bright and clear.

And finally we have the personal light source/torch. During the day it over illuminates everything as it should and by night it gives good visibility for close-by objects. It was very hard to find a configuration that did not produce very desaturated/washed out colors on objects. Oh god, dynamic lighting is so hard. I don't like the way the point light illuminated the terrain though. It is unrealistic, with some points creating a wrong illusion of depth and facing. But I'll keep it like this for now.

With version 1 the terrain is finished. I did not show it in this video, but the terrain is still editable in real time, changing the height and texturing. I will try to release the source code of the core of it soon. I won't start working on version 2 of the terrain, which should one day allow for LOD switching and smart vertex buffer usage.

The next engine feature videos will show off the in game material editor and container/character inventory.

With everything going seemingly so well, it is important to mention things that don't work properly.

Number one is point lighting. It is just wrong. I have the feeling that there is somewhere a very basic error that permeates everything and once I find, it will fix everything. I upgraded the normal G-buffer to show normal-map enhanced normals, not just vertex based normals, and this causes the normals to rotate around wildly as objects are rotating. This clearly illustrates that there is a fundamental flaw in normals and the way point lights interpret them. 

Second is physics timing. The calculations I had in the past where wrong. I have fixed it but I'm not sure yet that the values are correct. With real values, physics actually is a huge performance bottleneck (it was before, only the measurements were lying to you). The good news is that I added multi-threaded support for physics and it scales very well with the number of cores, so if you have 2/4 cores instead of one you should notice a good improvement on physics times. The bad news is that I will probably have to scale back the map size. My i7 can handle the huge 64 square kilometer map decently, but slower machines might have a lot of troubles. Maybe even do a lot of small areas that are separated by caves and passages to hide the loading bars and limit the physics to only one region at a time.

Thursday, March 14, 2013

Understanding texture compression - 01 - History & overview

I have achieved significant progress on the engine part and even some gameplay, progress that I'll be slowly showing using brand new higher production value videos, but I'm far too lazy to create such a video during the week. I'll try to do a first one during the weekend. Also, I am kind of overworking myself and I should take it easy and make sure not to burn out on development.

In the meantime, I'll write a short series of articles on texture compression. This domain of computer graphics can be incredibly complex and confusing and I'll be writing these articles as I am learning the ropes myself, so please excuse any mistakes I might make or poorly researched information.

So what is texture compression? Before we answer that, let's go more general: since textures are images in a format that is meant for very specific hardware to access (in our case PC GPUs), what is image compression? Traditionally you use image compression to reduce the size of an image when saved on disk. That's it. That's the primary motivation. Current storage oriented hardware is bigger and faster than ever, but you still can't ignore image compression. Back in the day, a 32 bit 640x480 (VGA) image occupied 1.17 MiB, which was quite a lot of memory for the hardware that was available then. A 1920x1080 (1080p) occupies 7.91 MiB. The width is 3 times as high and height is 2.25 times as high, so the area is 6.75 times larger, so if you do the math this makes sense. Today, in early 2013, a Samsung Galaxy 2 is a quite common and still great phone, but definitively last gen. It takes pictures with a resolution of 3264x2448, so a uncompressed picture takes up 30.48 MiB. This would eat up the relatively small on board storage quite quickly. But this doesn't happen because no one uses uncompressed images. The above mentioned images occupy hundreds of KiB or a few MiB based on compression and quality settings. Image compression is even more important in the case of video. Especially with upcoming technology  A 4K 3D video running at 48 FPS that would allow you to see The Hobbit part 3 on your future tech TV would have serious problems today, because it can't fit comfortably on a single optical media and has a bitrate so high that you can't stream it over the Internet (like on Netflix or something).

That was a fully unnecessary and yet too basic overlong introduction. Back to subject. In the case of texture compression  you don't care about the space the file occupies on disk (but you get reduced disk space as a bonus), but instead you care about occupied video memory. But there is an even more important benefit to compression: reduced bandwidth usage and better cache behavior. If you compress your 4 MiB image to 1 MiB, your GPU will access it faster, even if some form of decompression is needed for each individual access.

Taking this into consideration, several things that can be considered compression in general terms are not texture compression. Here are a few important conditions that must be satisfied and behavior the GPU will have:
  • Texture compression is GPU oriented, so the GPU must receive the raw compressed data. If you decompress your image on the CPU before sending it to the GPU, you may be using less disk space, but you get no benefit from texture compression.
  • The GPU will not decompress and cache the full data once it has received it. Doing that would void the reduced memory consumption and bandwidth advantages. So the GPU stores the texture in the compressed format and uses it like that when sampling.
  • Texture decompression must be fast. Since the GPU accesses the raw compressed data when it samples a texture, this process must be fast.
  • The GPU needs fast constant time random access to any point of the texture. So streaming compression  where the image must be decompressed in a temporary memory location on accessed until the desired pixel is reached is out of the question.

So it seems that creating such a compression format is no easy task. Something like a simpler JPEG compression can't be used. The scheme must be a lot simpler but still give good compression results. A 10% reduction is size is not good enough to outweigh the cost of decompression.

A long time ago, in a galaxy far away, a company called S3 Graphics that used to produce graphics chips laid the foundation of a compression scheme that is both in use today and was the foundation for other techniques developed in the meantime. They developed the S3 Texture Compression algorithm (S3TC for short) and presumably only their graphics chips could decompress from this format. It was a block based algorithm. The image was split into chunks of 4x4 pixels. There were multiple variants of the method, each suited a different purpose, but the main idea was the same: you would store two key pixels with a high bit depth and the rest would be approximated by storing the difference between the color that was stored and the  key colors, using a low bit depth. This was based on the observation that over small surfaces there is generally a smaller change in color in most images. But why does this give high compression ratio? A 4x4 block has 16 unique pixels, so it would consume 64 bytes. How do you encode 64 bytes in far less bytes, using a algorithm that is fast to decompress? Well you can't. Not unless you use lossy compression. And S3 chose a very lossy scheme. I'll detail all the schemes soon, but for now it is enough to mention that this compression would always result in a fixed compression ratio of 1:8 or 1:4. That's right, the 64 bytes block would be compressed as an 8 byte block. Needless to say, this worked on some images better than others and there are tons of cases where you shouldn't use this compression.

The block structure satisfies nicely the condition of GPU decompression. It has fast constant cost random access because for a coordinate you can easily compute the block location. Decompression is very fast because for a block the decompression algorithm is a fixed set of arithmetic operations without any branching. It also takes advantage of a very common  scenario: when rendering a textured polygon, a texture sample operation will almost always be followed by another texture sample operation for a near by texel (a texel is a texture "pixel"). This meant that decompressing the block and storing the result in cache would greatly improve performance and would have a very low rate of cache misses.

So texture compression seems very advantageous, even with the reduced visual quality. Especially since when it was introduces, video memory was very low. Today you can easily buy a GPU with 2 GiB of on board DDR5 RAM, so memory consumption is less of an issue but memory bandwidth is still as important as ever. Probably even more important as it was, because RAM is falling behind and when compared to the instruction execution speed on modern CPUs/GPUs, memory access is a performance bottleneck.

But what use was this method if it only worked on S3 chips? Especially since S3 is no longer producing such chips? Well, other chips/APIs stated adding reliable support for texture compression and paying royalties to S3. And this is the last thing I'll mention about S3 because I probably got the entire history part messed up and S3 will try an sue me.

OpenGL added support to S3TC starting with version 1.3. They kept the name and supported it with the "EXT_texture_compression_s3tc" API or extension or whatever OpenGL uses in these cases. I am not targeting OpenGL so I won't talk about it anymore. DirectX also adopted the technique starting with DirectX 6.0. Ahhh, I remember DirectX 5.0. It sucked :P! In a move completely atypical for Microsoft, they renamed it to DXT.

DXT came in 5 variants: DXT1, DXT2, DXT3, DXT4 and DXT5! I'll summarize the differences between them in the following table:

Method Components Encodes As Premultiplied Bytes
DXT1
3/4
RGB, optional A
RGB(5:6:5), A(0)/A(1)
N/A
8
DXT2
4
RGBA
RGB(5:6:5), explicit A(4)
Yes
16
DXT3
4
RGBA
RGB(5:6:5), explicit A(4)
No
16
DXT4
4
RGBA
RGB(5:6:5), interpolated A(8)
Yes
16
DXT5
4
RGBA
RGB(5:6:5), interpolated A(8)
No
16

I love making HTML tables!

OK, now let's try to understand the table. In part two I will go into a lot of detail regarding the structure and implementation of each method, but really the information in the table is all you need.

DXT1 is the base of all methods and is the simplest, while DXT2-DXT4 are very similar in structure and build upon DXT1. The last column of the table gives the dimension of the block in bytes. Since an uncompressed block takes up 64 bytes, this means that DXT1 provide a 1:8 compression and uses 4bpp (bits per pixel). The rest of the methods provide a compression of 1:4 and use 8bpp. This is why compression gained traction: you are compressing a normally 32 bits per pixel image to 4/8 bits bits per pixel image. In the case of 24 bpp images that don't have an alpha, when compressed with DXT1 the ratio is 1:6.

Now that we understand the basic size difference let's see what we actually encode. Images can have several channels and we traditionally work with images encoded in the RGB format that has 3 channels one for red, one for green and one for blue. These channels use 8 bits normally, but in very specialized graphics processing they can use more. You can also have a fourth channel specifying the transparency of the pixels, the alpha channel.  This fourth 8 bit channel creates the very common RGBA 32bpp pixel format. All DXT format are for channel formats that encode RGBA, with the exception of DXT1, which is either opaque, having an alpha of 100% and encoding only 3 channels, or it can optionally encode RGBA, but the alpha channel of a given pixel can be either 0% (fully transparent) or 100% (fully opaque).

Now that we know what is encoded, the question how is it encoded remains: the fourth row in the table. All DXT methods encode the RGB components in the 5:6:5 format, meaning that green uses 6 bits, while red and blue only 5. DXT1 uses 1 bit for alpha to signal 0%/100%. DXT2 and DXT3 use 4 bits per alpha, while DXT4 and DXT5 use 8 bits. This is where the interesting part starts: since DXT5 uses two times as many bits for alpha then DXT3, it should consume more memory. But if you look at the final column, they both use the same memory. This is because they store alpha differently. DXT2/3 use explicit alpha, each pixel having one 4 bit component to store the value. DXT4/5 use interpolated alpha, using a scheme similar to DXT1 RGB compression: two key alpha values are stored at high bit depths and the rest is interpolated and the difference is stored with low bit depths. So even though DXT5 has more bits per alpha, these values are not explicit. Each pixel does not have its own alpha, but on interpolated value.


Let's skip what premultiplied means for now and give a few key guidelines and observations about these methods and which you should choose.

One key observation is that all 5 methods encode RGB data the same way and provide the same quality. So if you don't care about alpha values, you should always use DXT1 because it as the best compression ratio. This also has a downside: if your DXT1 compressed RGB only image looks like crap with DXT1, you can't switch over to DXT2-5 to get a better quality. The RGB encoding is deterministic across all methods. With one exception. Say you care about alpha, but one bit is enough and you use DXT1: the RGB encoding will look worse that DXT1 without alpha or DXT2-5. The extra alpha encoding reduces the RGB color space. So if your DXT1 looked bad, your DXT1 with alpha will look even worse.

Now let's start caring about alpha. If 1 bit is enough consider DXT1. You will probably need to apply and alpha threshold in the pixel shader to compensate for some unwanted black borders, but it will work. But if the RGB quality drops in a disturbing way by adding the alpha bit, you can consider DXT2-5. Or you must consider DXT2-5 if you need more than 1 bit.

And the rule here is very simple: DXT3 is good at images with sharp alpha changes while DXT5 is good at images with smooth alpha changes, like alpha gradients.

And finally let's address the elephant in the room: premultiplied alpha. DXT2 and DXT4 use premultiplied alpha. This means that the alpha channel is encoded as is (like in DXT3 and DXT5 respectively), but the RGB data is considered to have been premultiplied with the alpha before encoding. So choosing DXT2 over DXT3 changes only the values of the RGB components. In practice it turned out that there was not a lot of use for premultiplied alpha. So little in fact that when the DXT reform occurred  these two methods were left out. So don't sue DXT2/4 unless you have really good reasons for it.

The DXT reform renamed some methods and added a few more to solve some common problems.

DXT, while pretty good, is not 100% general. It gives poor visual results when used with a lot of photographic materials, very detailed textures, smooth gradients, large color variation, diagonal detail, a few very specific images where the blocked encoding aligns very badly with another blocky pattern resulting from the content of the image and... normal maps! Normal maps look absolutely horrible when compressed with DXT and give rise to a typical blocky bump mapping effect. Newer compression method address some of these issues. 

But people are clever! Long before the new methods were created and incorporated into newer consumer level hardware, people came up with ways to fix, at least partially, the shortcomings of DXT.

Let's take normal maps as an example. DXT is generally a 4 channel compression, but not enough bit depth is available for the 3 channels of a normal map that needs very smooth transition between normals that are meant to follow a surface. One clever trick is the so called DXT5n(m) (I'm not 100% sure if DXT5n and DXT5nm are the same format). What is DXT5n? It is DXT5! There are absolutely no differences between the two formats. Except for what you store in them. Instead of writing the 3 components of the normal into the RGB channels of the image, you move the red channel to the alpha  you keep the green in place and fill the now unused red and blue channels with the same color. The alpha and green channels have a higher bit depth thus saving becomes less lossy. Since DXT is based on saving differences from two key colors, filling red and blue with the same value minimizes unnecessary differences and creates better detail precision. The final component of the normal is computed in the pixel shader since normals have a unit length of one. The benefit of texture compression generally outweighs the extra cost of the third component calculation. This is a clever trick that can make more normal compressible with good results than DXT1, which generally fails to give good results. But we are saving only 2 channels in format created for saving 4 channels. This method would greatly benefit from a compression format optimized to store only two channels with greater bit depth than DXT. Foreshadowing!

But normals are not the only thing that can be improved. How about plain RGB images? What do you do when DXT1 (and thus DXT2-5) give poor results, full of artifacts and what not? You use another clever trick! Normal DXT1 is a 4bpp format and we want to get comparable results with greater visual quality. For this we first convert the image to YCbCr format: a luma component followed by blue difference and red difference chroma components. We save the luma in the green channel of DXT1 texture. We encode the Cb and Cr into the alpha and green channels of another texture saved as DXT5. The first texture will already use the same storage space as our entire DXT1 image, that is it will have 4bpp. And we still have a second texture that will be stored at 8bpp, for a total of 12bpp! Not to mention another sampling cost! Not a good idea. The trick here is to down-sample the CbCr texture so that under the new resolution it is effectively 2bpp, giving a total of 6bpp. We can even do another trick, sampling the second image at a lover mip-map level. While the memory consumption is still 6bpp, this will behave more like a 4.5bpp. This improves quality a lot over DXT1 but is still not as great when DXT1 really doesn't like you input image. How great it would be if we could use a format optimized for saving 1 channel images and one for 2 channel images! More foreshadowing!

As you can see, DXT is not that hard to understand and master. The real challenge is to compensate for its weaknesses with all sorts of tricks!

As a final point, let's go over that DXT reform I mentioned earlier. More precisely a DirectX change. Direct X is actually dead. For quite some time now! Out of inertia/misinformation it is still commonly refereed to as DirectX, but what it actually is, the part that is evolving is Direct3D. Initially a sub-API of DirectX, Direct3D is the only rendering part that gets attention. The DirectX SDK hasn't been updated in quite some while, causing some unnecessary panic. How do you get the new versions of Direct3D SDK? Well the Direct3D SDK has been more or less silently incorporated into the Windows SDK. So anyway, Direct3D is evolving, and Direct3D changed a few things in the domain of compression.

It renamed DXT1 as BC1, DXT3 as BC2 and DXT5 as BC3. DXT2 and DXT4 were left out because of their low use.

From DirectX 6 to DirectX/Direct3D 10 new formats were introduces by different manufacturers. 3Dc+/ATI1 was created and is a block format very similar to DXT but it only encodes 1 channel. 3Dc/ATI2 encodes using a similar method 2 channels. ATI1 became BC4 and ATI2 became BC5. Using BC5 for the normals encoding trick described a few paragraphs above gives the best quality compressed normal maps available and BC4 and BC5 can be used for the two image YCbCr trick again with great results. 

Direct3D 11 added BC6 and BC7, two formats what are very complicated  but when used correctly the give extremely good results. Better than BC4/5. I will ignore them, especially since XNA is Direct3D 9.

So let's summarize in a new table:

MethodComponentsEncodesAsOld nameBytes
BC13/4RGB, optional ARGB(5:6:5), A(0)/A(1)DXT18
BC24RGBARGB(5:6:5), explicit A(4)DXT316
BC34RGBARGB(5:6:5), interpolated A(8)DXT516
BC411 channel(8)ATI1/3Dc+8
BC522 channels(8:8)ATI2/3Dc16

This article really didn't turn out the way I planned, but I'll go with it anyway. Part two will go into more detail regarding BC1-5.

Monday, March 11, 2013

98 – Terrain? More like ter-done! Am I right??!

I think I finally finished with the base of the terrain texturing!

I refactored the terrain shaders, using a very modular approach  with tons of function calls and a clean design. I sure hope that the shader compiler is really good. If not, a final version of the shader might have to be written someday that flattens out the implementation and uses all manner of optimizations.

I also started using branching heavily. I am using both good and bad kind of branching. The good one relies on constants passed to the pixel shader body and I am pretty sure the compiler does compile time evaluation of the constants and removes unnecessary branches from the code. The bad kind of branching is the use of run-time "if"s in shaders. GPU really don't love branching. Pixels are evaluated in a clustered fashion and branches can cause the entire cluster to wait for a sync. This can be mitigated if there is a high probability that all the parallel shaders executed for the cluster will take the same path. I done some testing and the results are inconclusive, maybe tending to go a little bit toward having lower performance if I use branching, even thought the body of the branch that is skipped is more expensive.

Using these methods I created two shader implementations, one very basic for the low quality and one that handles higher quality rendering. These are further parameterized with compile time flags to create all variants that I need. I also managed to greatly optimize the implementation, giving a 10-15 FPS increase on weak hardware. On strong hardware I can't tell, because currently I am bus capped.

I also implemented adaptive detail mapping, allowing you to specify a radius for detail mapping. Medium quality setting use this, not for the performance, but because it reduces repeating patterns in terrain somewhat. On high I am not using it because the small performance gain is not worth it when compared to the quality loss. It is a high quality setting for a reason.

The final step was to do something about the view distance. I determined that the landscape looks the best when I use a very distant and aggressive fog. The farther the fog start is, the larger the terrain seems. The fog is exponential and does a good job (but not a perfect one) of hiding polygons entering though the far plane. This small pop-in is so minor that you won't notice it unless you are really looking for it.

One thing that I need to dos till is make the view distance adjustable at run-time.

Using all the above I finished my hardest task: out of the dozens of permutations, choose only 5 quality settings. This was ridiculously hard because all were tough compromises. Just now I changed the spherical harmonics computation just like that and I'm not sure which one I like better. Anyway, there are 3 quality settings: low, medium and high. You can also choose to have enhance the harmonics for better quality, but this does not work for low quality, thus giving 5 quality levels instead of 6. I am fairly happy with these setting. I also made sure that they have comparable color warmness and intensity, but some minor differences are present.

Here is a video showing a 64 square kilometer map with small view distance at maximum terrain quality, large item density using 8xMSAA and SMAA while the character is running at very high speed traversing the map not quite diagonally (I wanted to go from corner to corner but I messed up :) ):



Now that the terrain shaders are finished (I hope) I need to add day and night cycles to it and see about those lights.

For the rest of the post let me entertain you with some very interesting shader variants I managed to produce:



These are not photoshoped or using any other textures than the one from the video. Just a shader variant that produces strange colors a more wet look:





If I ever need an alien looking landscape  I know where to start. I did not manage to produce workable shaders out of this method because the output is too noisy and weird in lot of places. It also has pretty bad temporal aliasing.



Thursday, March 7, 2013

Screens of the day 33 - Science, bitch!

Ignoring the variants that support point lighting, only with directional lighting, I have the following terrain shaders:
  • A simple 4 texture blending shader.
  • A variant that adds normal mapping
  • One that adds normal mapping for polygons facing away from the light
  • One that does the same thing as the above one, but is a "hard" variant that adds a lot more pronounced normal mapping effect.
  • One that is variant of the above one, but this time it reduces the tiling texture look a little. Heavy normal mapping, even with normal repeating texture compensation is still prone to repeating patterns and this shader tries to remedy this, but it produces somewhat brighter outputs. This was my preferred solution up to this point.

The third variant was a key progress, adding normal mapping to every inch of the map.

And the last two variants were again key, because the first 3 produced far too soft normal mapping effects. But these two methods were not perfect. I basically made them up based on practical observations after getting the hang of shaders and things that seemed like a good idea. There were hours of trial and error before I got acceptable results and even more hours before I arrived at variant 5.

But I am still not 100% happy. Time to fix this. But how? No more trial and error and monkey patching. This time I'll use science, maths and a lot of research. Why try stuff out when you can do maths? Let's see the preliminary results of this.

Here is a screenshot using method 5:


This was my best method. Now let's see one of the new methods (I have several variants and I can't tell which one I like better):


Fantastic! Another screen with method 5:


Without the bright texture this looks a lot better and the repeating normal mapping pattern is not that visible as it would be with method 4. Now let's try the new method:


Huge improvement, but I can't say I am in love the black parts of the mapping effect. another sample with method 5:


And the new method:


Again, some blackness, but still interesting results!

What do you think? Do you like the new shaders? Are the black parts disturbing? The number of variants is getting out of control so I would like to arrive at only a handful of variants.

But how about the point light. I am no longer computing it correctly because of the new method, but the results are passable:





I'll continue and experiment and try to finalize the new shaders. The terrain shader file is in bad need of a monster cleanup!

Oh, but before I go: not all the shaders were a success. Take a look at this one, that creates nice little swirly artifacts that have a surprisingly high amount of fake 3D popping effect:


Changelog 07/Mar/2013

Another change-log post, and the drill is the same: bullet points detailing the changes and in bold what is still to be done on that feature.

Let me start by apologizing for the mistake I did in the last change-log post: I showed an untextured screenshot with ambient and diffuse claiming that it was only ambient. Here is how the engine looks with ambient only:


And ambient plus diffuse:


Changelog:
  • Cleanup terrain editing functions. This feature was present for quite a while, but now it is more bug-free and robust, with better UI support  A new API is also in place to facilitate editing. Let me show you what it takes to make keys '1' though '4' change terrain texturing under cursor and keys '5' and '6' change terrain height:

Texturing changes are fully real time and take next to no CPU to finish. Height editing is not so robust because both normals and tangents must be recomputed. Editing height lowers your FPS and mouse becomes sluggish.  It will be a real programming challenge to optimize this in such a way that only the affected vertices are recomputed and thus it can run in real time without impacting game performance.
  • Moved the camera and physics based character controller to the library. There is still a lot of to work on these, improving robustness and feel, but they will have to do for now. Library source size is 361 KiB and going up.

  • Implemented skybox. There are still some problems with it, like being able to see the borders:

I won't bother fixing this for now because I'm sure it is an easy fix. Famous last words yet again. What is not that easy to fix is the view distance. Without a skybox, just an uniform color and strong fog, you can get away with a very irregular distant landscape having a chunked look and strong pop in. The non-uniform nature of a skybox makes this pop-in and chunked look very disturbing. To illustrate this take a look at this screenshot with a black sky for contrast:


A chunked distant look. Take one step and massive pop-in:


I improved this by lowering the view distance and adjusting fog parameters, making it that distant chunks that pop in near the border of the visible area do so before they reach the far plane. View distance is worse, but pop-in is less noticeable. A far better solution for view distance is needed.

  • The bloom component is no longer a XNA Component and it has been moved to the library. It is now a plain class that implements the functionality without complying to any interface dictated by a framework  When some day in a distant future the engine will move away from XNA, its replacement won't have to emulate the XNA Component class in order for bloom to work. Or any other effect.

  • Terrain uses a world matrix. In the past the terrain vertices used absolute coordinates and the world matrix was set to identity. Now they all are local coordinates and the world matrix is used both for rendering and physics. It is a central point of terrain now and if I want to do something with it there is just one entity to change. I did some tests and made the world centered around zero, but I am not sold yet on this solution. It might double the range of shadows because smaller numbers have less rounding errors, but it makes the conversion form logical to physical coordinates and vice-versa more complicated and less intuitive. So for now I'm leaving the world corner at zero. This can also allow me to speed up chunk creation since local coordinates are the same for each chunk. The first creation takes exactly the same time, but further creations will be faster. Not implemented yet.

  • Implemented point lighting as seen in previous post. Currently only one point light is used for the character light source. I will detail the exact powers and abilities your character will have in a future post, but for now let's say that abilities will be physical and magical, and for the magical some will be selectable, like choosing that the hotkey '1' will trigger some spell you selected for that slot, and some will have a predefined key and behave more like an ability you have acquired. You will still have to invest skill points into these, but once you do the special hotkey will become active. 'F' is personal light, Shift is you skiing ability, 'B' you blink spell and so on. The number of intrinsic hotkeys won't be that large and are used to portray the special nature of your character that allows him to naturally do out of the ordinary stuff and have superhuman mobility. The light spell actually causes you body to emit light. It will be upgrade-able with different abilities, like threat highlighting and "stealth light", allowing you to sneak with the light on at the cost of mana. If you don't want this power, you will have to carry a torch with you. Torches will expire and their use takes up one free hand.

  • Created new entity hierarchy to represent mesh instances. I am still not happy with the design. It is very hard to come up with a general solution that has near perfect batching at the same time. A few more redesigns will probably come, but the Draw method of the engine is a lot cleaner now.

  • Cleaned up render target support in the engine. Things are still not perfect because with 3 different AA options, bloom and post processing effects, the number of permutations is very high and some combinations still don't work. This being and engine you can turn on any of the effects and their associated render targets on the fly and if you not wish to have 50 render targets and isntead reuse them cleverly, this becomes quite complicated.

  • You can now change the level of MSAA on the fly. There is a debug key that goes though 0, 2, 4, 8 or 16 MSAA. The debug text displays the intended level of MSAA and the actual one if the GPU can't handle it. It also displays the type of post-processing AA used. You can have both one at the same time for ultimate visual quality, but I don't recommend MSAA + FXAA. Or FXAA. Period. Keyboard handling has been split in two parts, one handling the debug functionality of the engine and one for gameplay. In the final game you probably won't want a hotkey for tuning on bloom or wireframe, so you just turn off/fine tune debug keys. Debug keys have been restructured to use Ctrl-Fn keys so they don't interfere with normal game keys. Some exceptions are present, like Alt-Enter to toggle fullscreen. The text is also less verbose, showing information only if it relevant. There still needs to be added a mode to choose FXAA/SMAA quality. I'm still pretty sure that you can't have CSAA with XNA, but I need to confirm it. And see about TXAA.
Here are two random screens with MSAA + SMAA:



Tuesday, March 5, 2013

Screens of the day 32 - Oh god, my wrists

Let me show you some results from Friday. They are kind of late. Here's the thing: I've training at ever increasing intensity since late October, and last week I had one of the most intense workout ever by counting the number of different exercises. I was supposed to have this week off to recuperate, but instead I started my hand stand training and besides that this is shaping up to be the second most intense one, so I was in no mood to post or work.

For my game I want the following lighting scenarios:
  • A directional light simulating the sun/bright moon used  for days/bright night scenes, plus several point lights. This will produce very good visibility and the difference between shaded and unshaded areas will be small.
  • A directional light plus point lights simulating darker environments, where the values are tuned so you have good visibility only when there is light because there is a big difference between shaded and unshaded.
  • Ambient plus point lights, where ambient gives good visibility, but the main parts are lit by point lights. This will be default for interiors, especially houses, but also caves and other structures.
  • Point lighting. This will be for very dark interiors, that have absolutely no light sources. If a light is not shining on something, it will be pitch black. In order to not make it annoying, most places will not use this lighting, and when they do, there will be a good reason for it.
Last time I showed scenario A, but with terrain not being illuminated by point lights. Let's fix this, but before, let us take care of scenario D, which is the most simple one:


In the above image one white point light is illuminating your surroundings. Ignore the thee canopies. Canopies are rendered by a permutation of the shaders that alpha blends and I always update them only at the end, when the normal shaders are perfect.

Some other textured terrain seems to have more color:


Terrain does not have specularity because I don't want to pay yet the cost for specular mapping for the terain because it is already more than expensive to render. Without specular mapping, only with standard specularity, I find terrain to be far too unrealistic and mirror like. Going to a grassy area gives even better color reproduction:


And finally, some soil:


The perception of light is not yet 100% identical for terrain and objects. If you look closely you may get the impression that there are two light sources with different intensity at the same position, one illuminating only objects and one illuminating only terrain. This needs to be tighten up, but I am generally happy with the results.

Now let's get back to scenario A:


This is a bright sunny day where you are holding a bright-ass light source. This plus bloom makes for the ridiculous brightness. Again, when in game, you will rarely turn on light source outside, but when you do, it will work.

And a couple of more shots showing lighting in action:



Now I need to add a hotkey to turn on or off your light source. I'm still not 100% sure on the equations. I also have a very annoying part of it, where the theory says 1 - N, but this gives bad results, so I am using 1 / N. Taking into consideration my educational background, this is driving me crazy. I need to write up a mathematical proof that one is correct and the other one is not.

After I need to figure out how many point lights can you squeeze in into a single pass. At least 8 with unrolled  loops. And speaking of loops, why can't XNA set a single index for array of structure shader variables?

PS: I've also ran into an interesting issue. Running a version of the engine compiled on some other machine on my new giga beefy home computer runs like a charm. Running the engine compiled locally (Windows 8) gives horrible results, with terrible aliasing as if the render targets are scaled to some arbitrary value without any filtering and horrible performance. I need to figure out if I broke the code or the build environment is to be blamed.

Friday, March 1, 2013

I have no idea what I'm doing

After further investigation I realized that I did not fix anything with the new Blender export options, I just rotated the issue 90 degrees.

And I think I found something! Here is a barrel illuminated by a red point light with blue specular highlights that is positioned at the camera (like a head mounted light):


Looks OK to me. Highlights seem to be in the correct places and diffuse illumination seems to taper off as it should. The cap of the barrel gets very little light. As we get closer, the cap starts receiving as much light as the sides:


And the second problem case:


The barrel near the boulder is has roughly an uniform light intensity. Neither the sides or the cap do flare up with bright light while the rest remains black. And as we move closer, but on a diagonal path:


Again, seems correct-ish. And finally:


Here the cap of the barrel is almost perpendicular to the light rays so it is darker.

The point light system is of course not meant to only serve as mobile light source and you can also pin it down to a specific position. Here is a shot with the light pinned down somewhere to the left of the camera at a fair distance:


And now let's see the whole illumination scheme in action:


Terrain is not illuminated in that shot by the point light, but otherwise we have the standard directional light that represents the sun, spherical harmonics and a point light. The point lights are meant as an added highlight so they don't do advanced stuff (like another spherical harmonics application). And the shot is very bright, but this is normal. The point lights are meant to illuminate dark scenes. Interiors and caves. Turning on a powerful light in full sunny conditions is not the way you are supposed to explore the world.

And finally a shot with tons of barrels:


I'll consider this issue fixed for now, but my overall confidence in the shaders is very low right now. The original shaders that I've been using were done in a straightforward way, implementing the equations from the Blinn-Phong model and still they did not work correctly. Now I am using similar but different equations that seemed correct to me. They may not be. I will be keeping a very close eye on everything the engine renders for the next two weeks to make sure that at least all major inconsistencies and eye-sores are dealt with.

Meanwhile, I need to add point light support to the terrain and figure out exactly how many point lights I can fit into the shader in a single pass.

A long time ago, when I was far less experienced I tried deferred rendering. I worked fine but I was unable to fine tune it for my needs. Before this point light fiasco, I though about giving it another go. But right now I think this would be a mistake because...