"Capitalism is neither evil or good, but it's an excellent tool to save the world."
- Birgitta Ohlsson (fp)
More pages: 1 2 3 4 5 6 7 ... 11 ... 12
Framework 3 (Last updated: June 26, 2012)
Framework 2 (Last updated: October 8, 2006)
Framework (Last updated: October 8, 2006)
Libraries (Last updated: September 16, 2004)
Really old framework (Last updated: September 16, 2004)
Deferred shading 2
Tuesday, October 21, 2008 | Permalink

Executable
Source code
DeferredShading2.zip (1.2 MB)

Required:
Direct3D 10.1
While my previous deferred shading demo showed deferred shading at its best using an extreme amount of lights this demo shows a more real-world scenario. Hence it's also implemented differently.

The main bullet point of this demo versus the previous one is that multisampling is supported. In D3D10 we have full access to the samples in multisampled surfaces, and in D3D10.1 this is extended to depth buffer as well; however, it's still tricky to make it work and perform well. Since the geometry for the lighting (which in this demo is a sphere) is separated from the underlying buffers we don't automatically get any information about which pixels belong to edges and need to be evaluated for all the samples and which pixels only need a single sample evaluated. This demo separates these two cases using a stencil mask.

First the scene texture, normal and depth is rendered into multisampled render targets. Pixels that are partly covered are detected using centroid sampling on the SV_Position register. If the sampling position shifted from the center, it's an edge pixel. This is written to the alpha channel.
An ambient lighting pass is then rendered as a fullscreen pass to the backbuffer (which is not multisampled) and the resolved edge flag comes out in the alpha channel.
The backbuffer is then sampled as a texture in a pass to create a stencil mask from the alpha channel by writing 1 to stencil and discarding fragments where alpha is zero.
The lighting is then rendered as two passes using the stencil mask to select pixels. One pass with a simple shader that only uses the first sample to compute the lighting, which is used for the vast majority of the pixels, and one that computes the lighting for all samples, which is used for edge pixels.

Using keys F5-F7 you can compare this technique (F5) to the naive approach of evaluating the samples on all pixels (F6) and that of using single sample shader on all pixels (F7). On F8 you can see what's in the stencil mask.

This demo should run on the Radeon HD 3000 series and up. Since this demo uses Direct3D 10.1 you need to have Windows Vista SP1 installed.

Interior Mapping
Sunday, August 3, 2008 | Permalink

Executable
Source code
InteriorMapping.zip (1.4 MB)
InteriorMapping.7z (926 KB)

Required:
Direct3D 10.1
Interior mapping is a technique that renders the interior of buildings without requiring additional geometry. Everything is done in the pixel shader. The advantage is that when you're rendering large cities you can draw all those buildings as simple boxes.

The idea was to my knowledge first proposed here.

My implementation is somewhat different. To begin with, I represent the room with a cubemap. In DX10 this has the advantage that it'll filter across cubemap faces, making the edge between walls/floor/ceiling antialiased, unlike the original implementation. It also means you only need one texture lookup, rather than separate for floor planes, walls and ceiling. Finally, with cubemap arrays in D3D10.1 it's also possible to have several different types of rooms which can be selected by the shader. I have implemented this, although due to lack of artwork (it was time consuming enough to produce a single room) I'm simply shuffling the walls around in 8 different combinations.

This demo should run on the Radeon HD 3000 series and up. Since this demo uses Direct3D 10.1 you need to have Windows Vista SP1 installed.

GPU Texture Compression 2
Sunday, July 6, 2008 | Permalink

Executable
Source code
GPUTextureCompression2.zip (380 KB)

Required:
Direct3D 10.1
This demo is very similar to the first GPU Texture Compression demo. The difference is that this demo compresses to DXT1 or BC1 as it's called in DX10. BC1 textures have a wider range of uses than BC4 because they are color rather than grayscale, but on the other hand they are more complex to compress to. This demo implements a fairly quick and quite decent compression which is good enough for realtime use. The tricky part in DXT1 is to choose the two base colors for a tile. The algorithm basically looks for which channel of RGB has the greatest contrast and uses that as the base. Then it does a least squares fitting of the other two channels against the base channel to find the most suitable values. Once the base colors are chosen it loops over the 16 colors and matches each color to whichever of the two base colors or the two interpolated colors is the closest.

In addition this demo has a more trivial implementation which simply uses the maximum and minimum of all channels as the base colors. It produces surprisingly useful results. I would have expected nasty discolorations, but this appears to not be nearly as much of a problem as one might have thought. Quality is somewhat lower than the above method though, but on the other hand it's somewhat faster as well. You can toggle between the two with F6.

This demo should run on the Radeon HD 3000 series and up. Since this demo uses Direct3D 10.1 you need to have Windows Vista SP1 installed.

GPU Texture Compression
Saturday, April 12, 2008 | Permalink

Executable
Source code
GPUTextureCompression.zip (335 KB)

Required:
Direct3D 10.1
Compressed textures is one of the most important features and have saved us a lot of memory and bandwidth over the years. However, its use has long been practically limited to static textures only because the compression is slow and there's no compression hardware, just decompression. Even with a simple compression algorithm the bottleneck of transferring GPU data to the CPU for compression and then transferring compressed data back makes it impractical for most uses to compress render targets.

Direct3D 10.1 introduced the ability to do a copy to block compressed formats from integer formats of the same size as a compressed block. For instance, a 256x256 RG32_UINT texture can be copied into a 1024x1024 BC4 texture. This allows an application to write a compression shader rendering to an integer render target and then copying the results into a compressed texture and thus keeping the entire texture compression process on the GPU.

This demo compresses a texture (although static) on the GPU to a BC4 format. Because this is a single channel texture I also use the Gather texture fetch method (AKA Fetch4) that was also introduced in Direct3D 10.1 to reduce the number of texture fetches from 16 to 4. The shader is surprisingly short at 49 ALU instructions on a HD 3870, which for a 4x4 block is barely over 3 instructions per pixel.

For render targets that stay the same across many frames it's likely that compressing it will improve performance. Even compressing every frame is definitively practical, although whether you get a performance increase or not depends on how many times you read it for each compression cycle. Another interesting use is to avoid the harddrive bottleneck when reading textures from disk, which you can do by storing textures as jpegs on disk and recompressing to a GPU format on the fly. Having it on the GPU ensures the compression process will be fast, although an implementation would be likely to do the jpeg decompression on the CPU.

This demo should run on the Radeon HD 3000 series. Since this demo uses Direct3D 10.1 you need to have Windows Vista SP1 installed.

Custom resolve
Thursday, March 27, 2008 | Permalink

Executable
Source code
CustomResolve.zip (3.3 MB)
CustomResolve.7z (2.4 MB)

Required:
Direct3D 10
One of the more interesting features of D3D10 is access to individual samples in a multisampled texture. This has a lot of uses and one of the most obvious is to implement a custom resolve of the multisampled buffer. This allows an application to combine the samples in the buffer in any way it likes in a shader. One reason an application may want to do that is because of HDR rendering. Any decent tonemap operator will have a non-linear response, which makes regular resolves that average samples produce low quality antialiasing. In fact, even with just relatively modest contrast ratios the effect of antialiasing will diminish or even disappear. The solution to this problem is to make a custom resolve shader that tonemaps all samples individually and average the results after tonemapping, rather than before, which will be the case with a regular resolve.

For a more in-depth explanation of this technique check out my article "Post-tonemapping resolve for high quality HDR antialiasing in D3D10" in ShaderX6.

This demo should run on Radeon HD 2000 series and above and GeForce 8000 series and above.

Order Independent Translucency
Tuesday, December 11, 2007 | Permalink

Executable
Source code
OrderIndependentTranslucency.zip (2.1 MB)

Required:
Direct3D 10
This demo implements order independent translucency using a Stencil Routed A-Buffer, which is explained in this exemplarily short paper. The basic idea is that you hijack MSAA for the purpose of storing multiple incoming fragments in each pixel. Each stencil sample is initially cleared to a separate value and then the geometry is rasterized with multisampling disabled. The stencil test will then route the incoming fragments one by one to each of the available samples, as long as there are unwritten samples available. This allows up to 8 layers to be stored with current hardware. Once the buffer has been filled it is sorted in back-to-front order and blended. The paper suggests using bitonic sort; however, this demo uses odd-even mergesort instead because that requires fewer compare-and-swap operations to be performed (19 instead of 24).

This demo uses D3D10, but it could have benefited from D3D10.1 in at least two ways. First of all, multisampled depth buffers can't be used for texturing in D3D10, so this demo uses a separate render target for this purpose. Also, now the depth bits of the depth-stencil buffer is entirely unused, which is wasteful. Secondly, and probably more important, is that multisampled buffers can't be CopyResource'd in D3D10. Currently a significant chunk of the frame time is consumed just initializing the stencil buffer. A better way to handle this would be to initially set up a stencil clear-buffer just once, and then clear the active stencil buffer by copying that stencil clear-buffer into it. A copy is likely a good deal faster than 8 fullscreen passes with a sample write mask, which is required now.

The advantage of using this technique compared to depth peeling is that you don't need multiple passes. If you want up to 8 layers, you need 8 passes with depth peeling, whereas this technique only needs a single pass, plus a sort pass. The disadvantage is that once the buffer is full fragments will be discarded. If you limit yourself to 8 layers with depth peeling, you'll get the 8 front-most layers, whereas with this technique the layers are destroyed based on the order they arrived, meaning that the front-most layer could be the one that got discarded if you're unlucky, which of course is a lot more disturbing than dropping layers in the back, which may not be noticable at all.

This demo should run on Radeon HD 2000 series and up and GeForce 8000 series and up.

Deep deferred shading
Monday, November 26, 2007 | Permalink

Executable
Source code
DeepDeferredShading.zip (3.9 MB)
DeepDeferredShading.7z (3.3 MB)

Required:
Direct3D 10
One of the main drawbacks of deferred shading is that it doesn't handle blended objects very well, and typical rendering scenes include at least some translucent objects. The usual solution to this problem is to render translucent objects using traditional forward rendering on top of the deferred rendered scene. This is obviously far from ideal since it forces you to write and maintain a forward rendering path in addition to the deferred shading path, and it also has negative performance implications.

This demo shows a way to solve this problem such that translucent objects can be used with deferred rendering. This is done by using a deep buffer approach. Instead of having a single buffer containing the attributes such as diffuse and normal this technique stores several layers of these attributes in a texture array. Similarly, the depth buffer is also a texture array. In this demo three layers are used, which allows for up to two translucent layers in front of an opaque surface. More layers may be needed in more complex scenes; however, it would still typically suffice to maintain only a few layers. The different layers are extracted using depth peeling.

In the deferred lighting passes the lighting contribution from each layer is computed. However, the cost will not multiply with the number of layers, but instead be only somewhat more expensive than traditional deferred shading since we can stop once we hit an opaque surface, and typical scenes will have a lot more opaque surfaces than translucent, so the majority of the pixels will only have to evaluate the first layer.

The drawback of this technique is naturally in the memory consumption. Deferred shading already consumes a large amount of memory for the render targets, and this technique multiplies that with that number of layers. This may not be much of an issue on PC, but on consoles it could be more problematic. The cost of the light rendering passes also increases, although not too bad. However, the cost of filling the buffers in the beginning increases by a larger number.

This demo should run on Radeon HD 2000 series and up and the GeForce 8000 series. Since it's Direct3D10, Vista is also required.

Deferred shading
Tuesday, October 23, 2007 | Permalink

Executable
Source code
DeferredShading.zip (973 KB)

Required:
Direct3D 10
A technique gaining increasing attention these days is deferred shading. The main idea behind deferred shading is that you initially fill a set of buffers with common data, such as diffuse texture, normals and various material properties. Then for the lighting you just render the light extents and fetch data from these buffers for the lighting computation. The main advantage of this technique is that it decouples lighting from the geometry. In regular forward rendering you have to resubmit all the geometry that's within the light radius to add another light. This might include changing a lot of states and shaders and issuing numerous draw calls. With deferred shading only a single call is neccesary, in fact, you can apply several lights with a single draw call. This makes it scale much better than forward rendering when the number of lights increases. On the downside, it typically consumes more memory, bandwidth and shader instructions than forward rendering.

This demo takes deferred shading to the extreme. A particle system in generated entirely on the GPU and spews loads of particles in all directions, and every particle is a light source. Hence in this demo there are 1024 light sources active at once. Yet, the performance is in the order of hundreds of frames per second.

The geometry shader is used in this demo to compute the light bounding boxes and each visible light is drawn as a rectangle in clip-space. Note that the extents in z direction is computed as well. This allows the pixel shader to skip computations where the stored depth value is not in range. This eliminates a lot of unlit pixels.

This demo should run on Radeon HD 2000 series and GeForce 8000 series on Vista.

More pages: 1 2 3 4 5 6 7 ... 11 ... 12