Framework 4 (Last updated: October 25, 2019)Framework 3 (Last updated: February 6, 2017)Framework 2 (Last updated: October 8, 2006)Framework (Last updated: October 8, 2006)Libraries (Last updated: September 16, 2004)Really old framework (Last updated: September 16, 2004)
Saturday, February 28, 2009 | Permalink
One of the least advertised features of shader model 4.0 is that temporaries are indexable. This means you can declare an array in your shader and arbitrarily read and write to it. This opens up for lots of interesting algorithm. What this demo does is to use this feature to do recursion. An array of temporaries is used to emulate a stack, together with an index that's used as the stack pointer. The actual algorithm implemented recursively here is perhaps not the most exciting, it's just a simple tessellation of a sphere. More interesting is that fact that it's possible.
A couple of notes about the implementation. HLSL does not support recursive function calls. If it did, the code would have looked very straightforward. In fact, HLSL hardly ever creates function calls at all. The only way I've found to convince it to do so is to use a switch case with the [call] directive. And even in that case it only works from main(), not from any subroutine. So what I had to do was to emulate the whole function calling mechanism manually in a while-loop.
The end result is a shader that does tessellation up to three levels. It could've gone much higher if geometry shaders didn't have an output limitation of 1024 floats. The demo renders the base geometry with tessellation levels 0, 1, 2 and 3.
It should be noted that while indexing into temporaries is cool and sometimes also convenient it's a feature that should be used sparingly because the performance implications could be very nasty. One reason is that it's harder for the compiler to optimize since indexed reads and writes are hard to analyze. The other reason is that the number of temporaries your shader needs could go up dramatically, which reduces your GPU's ability to hide latencies. Even a small array could easily double or triple the temporary count your shader needs.
One cool thing about recursion though is that you don't need a very deep stack to get a lot of stuff done. In this demo I only need a stack of 12 registers. The shader as a whole needs 25 registers according to GPU Shader Analyzer. One could have implemented the same algorithm in a flat matter where it would tessellate the results from the previous pass in a loop. However, this would require that the array is large enough to fit the entire results of a pass, which in this demo would've meant 192 registers.
This demo should run on Radeon HD 2000 series and above and GeForce 8000 series and above.