"September 11, 2001, already a day of immeasurable tragedy, cannot be the day liberty perished in this country."
- Judge Gerald Tjoflat
More pages: 1 2
New particle trimming tool
Thursday, May 21, 2009 | Permalink

GPUs continue to get more powerful at an amazing rate. The performance boost is not evenly distributed though. While ALU power is shooting through the roof some other parts of the chip are lagging behind. Most notably lagging are the ROPs. A Radeon HD 4890 has 16 ROPs, which is the same as even an old X800XT. This means that the only performance boost is from higher clocks. So while the HD 4890 has roughly 15x raw ALU power (if my math is right) it only has 70% higher fillrate than the X800XT. This of course makes sense since we're no longer in a "more pixels" race, but it's all about "better pixels" these days. The monitor resolutions aren't going up very fast, in fact, we had a drop with the transistion from CRT to LCD and I suppose we're now roughly back to where we were before.

The relatively low ROP performance can be a problem though when dealing with things like particle systems which can fill large portions of the screen with a lot of overdraw and often only uses a trivial shader. To deal with this people have come up with ideas such as rendering to a smaller buffer and then merging the result with the main buffer, but there are also simpler solutions like simply using fewer and denser particles.

Particles are often rendered as simple quads. This is particularly wasteful of our limited ROPs. It may not seem all that bad, but a relatively normal particle texture can contain a surprisingly large amount of empty space. Take for instance this smoke puff I found in the darkest corners of my texture collection. It may look like less, but half of it is actually empty space, which means that we're rendering twice as many pixel as we have to.

So the first approach at dealing with this situation is to shrink the quad to the bounding box of the particle. It may looks like a small gain, but that's actually an over 30% save. However, most particles aren't shaped like aligned rectangles, so we can optimize this further. So this afternoon I wrote a tool that takes a particle image as input and generates an optimized polygon enclosing the particle. It can be downloaded from the Cool stuff section. How much difference does it make use an optimize polygon over the naive approach? Quite a lot actually. In fact, the triangle case above is nearly as efficient as the naive rectangle, whereas the optimized quad shaves off another 10 percentage points. There's of course a diminishing return from using more vertices, but depending on the shape of the particle, it could be worth going higher. This is the result from this particular smoke puff:

Original: 100%
Aligned rect: 69.23%
Optimized 3 verts: 70.66%
Optimized 4 verts: 60.16%
Optimized 5 verts: 55.60%
Optimized 6 verts: 53.94%
Optimized 7 verts: 52.31%
Optimized 8 verts: 51.90%

So how is the tool doing its magic? First the source image is thresholded so that you have a bitmap of zeros and ones. Then it loops through the pixels again and adds all points that are ones into a convex hull that's continuously updated as new points are added. In order to not throw tens of thousands of points at it I'm doing a basic check with neighboring pixels so I'm only adding potential corner pixels. If the number of neighbors that are set is 5 or more it's either in the middle of a straight edge or at a corner that's pointing inwards and thus is not added. This reduces it to hundreds of points to consider. After this pass we have a convex hull enclosing the particle. For this smoke puff it had 23 edges in the end, so it appears to not get overly complex in normal cases. Finally it goes over all permutations of the edges given a provided target vertex count, checks for each permutation that a valid fully enclosing polygon can be constructed with those edges, and for all valid ones finds the one that has the smallest area. As always, the source code is included.

I haven't tested this tool extensively yet, so it won't surprise me if there are bugs or if it's not 100% robust, but it seems to work very nicely and provides the resulting polygon as good as instantaneously with the textures I have tested. I also haven't proven that the result is actually the optimal solution, but if it's not, it is probably close to optimum in real world scenarios.

If anyone finds this tool useful and plans to use it, or have any feedback, just let me know.



Enter the code below

Friday, May 22, 2009

We had been doing crop like this to make general quad with 4 corners (you call it - optimized 4 verts case), saving these 4 values in uniform buffer with texture data for particle system. There's a tool like you did, to calc convex hull and optimal quadliteral.
Then, GS used these values as UVs and as position offsets.
Such approach gave us about 15-20% speedup on high fillrate scenarios.

Friday, May 22, 2009

So how does this stack up with using Discard in the fragment shader?

Eric Haines
Friday, May 22, 2009

Very cool, thanks! As particles get smaller, the vertex shader eventually becomes more important. Trimming quads is always going to be a win (no extra vertices). Any sense of what the break-even size point is for, say, 8 vertices? It is likely to be slower than a quad if the particle covers very few pixels.

My guess is that particles have to be tiny to make 8 vertices perform worse, but you never know without actual testing... Even then, of course, GPUs vary; but it would still be nice to know, to have some stick in the sand. I hope you'll give it a try and let us know.

John Burnett
Friday, May 22, 2009

Awesome - thanks for sharing! As a sidenote, while looking at the code, it seems like there's a copy/paste bug on line 91? ("const int h = img.getWidth();" should be "getHeight()"?)

Saturday, May 23, 2009

Overlord, using discard in the fragment shader is unlikely to speed things up at all. In fact, it's more likely to slow you down. Unless of course you have a very heavy pixel shader.

Eric, as you say, the particle would have to be tiny. Exactly how tiny is hard to say without any actual test. I'd guesstimate it to be a few hundred pixels or so.

John, thanks for catching this. I didn't catch it because I've only tested on square textures. I'll make an updated tool tomorrow with a few other minor improvements as well.

Saturday, May 23, 2009

I should add that while this tool find something that intuitively would be the optimal solution (without any particular proof for this being the case), or at least close to it, this is just for convex polygons. There can of course be concave solutions that are better. For instance a tree impostor with its trunk and top.

I also found a case where performance really suffered. An entirely circular particle. It'll generate a very high number of edges in the convex hull. Up to six vertices ran in order of seconds, but my brute force approach really slowed down at seven vertices which took almost a minute. The eight vertices cases would probably be like an hour then. I gave up after a few minutes and generated that manually instead.

Saturday, May 23, 2009

Have you checked actual speedup? Something that might counteract the speedup is the extra triangles generate extra internal edges. GPUs shade in at least 2x2 blocks, and some have larger internal groupings - 4x2, 4x4. At an internal edge, these blocks can be hit twice (one per tri). In some cases the GPU can coalesce them again, but not always. It would be interesting to see the falloff.

Sunday, May 24, 2009

TomF, yeah, the speedup it pretty much exactly the same as the reduction in area, but then most of the particles are large. If you're using smaller particles the gain may not be nearly as linear or no gain at all. I guess it could be a good idea to use fewer vertices for smaller particles, perhaps using the GS.

More pages: 1 2