"It is more blessed to give than to receive."
- Acts 20:35
More pages: 1 2
Wednesday, July 22, 2009 | Permalink

The key to getting good performance is properly measuring it. This is something that every game developer should know. Far from every academic figure understands this though. When a paper is published on a particular rendering techniques it's important to me as a game developer to get a good picture of what the cost of using this technique is. It bothers me that the performance part of the majority of techical papers are poorly done. Take for instance this paper on SSDO. This paper isn't necessarily worse than the rest, but just an example of common bad practice.

First of all, FPS is useless. Gamers like them, they often show up in benchmarks and so on, but game developers generally talk in milliseconds. Why? Because frames per second by definition includes everything in the frame, even things that are not core components of the rendering technique. For SSDO, which is a post-process technique, any time the GPU needs for buffer clears and main scene rendering etc is irrelevant. But it's part of the fps number, and you don't know how much of the fps number it is. Therefore a claimed overhead of 2.4% of SSDO vs. SSAO, for instance, is meaningless. The SSDO part may actually be twice as costly as SSAO, but large overhead in other parts of the scene rendering could hide this fact. For instance if the GPU needs 12 milliseconds to complete main scene rendering, and 0.5ms to complete SSAO, that'll give you 12.5ms in total for a framerate of 80 fps. If SSDO increases the cost to 1ms, which would then be twice that of SSAO, the total frame runs in 13ms for a framerate of 77 fps. 4% higher cost doesn't seem bad, but it's 0.5ms vs. 1ms that's relevant to me, which would be twice as expensive and thus paint a totally different picture of the cost/quality of it versus SSAO. Not that these numbers are likely to be true for SSDO though. This paper was kind enough to mention a "pure rendering" number (124fps at 1600x1200), which is rare to see, so at least in this can we can reverse engineer some actual useful numbers from the useless fps table, although only for the 1600x1200 resolution.

So main scene rendering is 1000/124 = 8.1ms, which is definitely not an insignificant number.
With SSAO it's 1000/24.8ms = 40.3ms
With SSDO it's 1000/24.2ms = 41.3ms

So the cost of SSAO is 40.3ms - 8.1ms = 33.2ms, and for SSDO it's 41.3ms - 8.1ms = 34.2ms
Therefore, SSDO is actually 3% more costly than SSAO, rather than 2.4%. Still pretty decent, if you don't consider that their SSAO implementation obviously is much slower than any SSAO implementation ever used in any game. 33.2ms means it doesn't matter how fast the rest of the scene is, you'll still never go above 30fps. Clearly not useful in games.

Another reason that FPS is a poor measure is that it skews the perception of what's expensive and not. If you have a game running at 20fps and you add an additional 1ms worth of work, you're now running at 19.6fps, which may not look like you really affected performance all that much. If you have a game running at 100fps and add the exact same 1ms workload you're now running at 90.9fps, a substantially larger decrease in performance both in absolute fps numbers and as a percentage.

Percentage numbers btw are also useless generally speaking. When viewed in isolation to compare two techniques and you compare the actual time cost of that technique only, it can be a valid comparison. Like how many percent costlier SSDO is vs. SSAO (when eliminating the rest of the rendering cost from the equation). But when comparing optimizations or the cost of adding new features it's misleading. Assume you have a game running at 50fps, which is a frame time of 20ms. You find an optimization that eliminates 4ms. You're now running at 16ms (62.5fps). The framerate increased 25%, and the frame time went down 20%. Already there it's a bit unclear what number to quote. Gamers would probably quote 25% and game developers -20% (unless the game developers wants to brag, which is not entirely an unlikely scenario ). Now lets say the next day another developer finds another optimization that shaves off another 4ms. The game now runs at 12ms (83.3fps). The framerate went up 33% and the frame time was reduced by 25%. By looking at the numbers it would appear as the second developer did a better optimization. However, they are both equally good since they shaved off the same amount of work, and had the second developer done his optimization first he would instead have gotten the lower score.

Another thing to note is that it's never valid to average framerate numbers. If you render three frames at 4ms, 4ms and 100ms you get an average framerate of (1000/4 + 1000/4 + 1000/100) / 3 = 170fps. If you render at 10ms, 10ms and 10ms you get an average framerate of 1000/10 = 100fps. Not only does the latter complete in just 30ms versus 108ms, but it also did so hitch free, yet gets a lower average framerate.

The only really useful number is time taken. Framerate can be mentioned as a convenience, but any performance chapter in technical papers should write their results in milliseconds (or microseconds if you so prefer). There are really no excuses for using fps. These days there are excellent tools for measuring performance, so if the researcher would just run his little app through PIX he could separate the relevant cost from everything else and provide the reader with accurate numbers.



Enter the code below

Thursday, July 23, 2009

We can also plot FPS vs Frame Time, and we'll see that it's a non-linear function.

here: http://www.mvps.org/directx/articles/fps_versus_frame_time.htm

Thursday, July 23, 2009

From my experience of using PIX I can say it is only relevant on XBOX. On windows you have better chances that it will crash, experience deadlock or simly works incorrect. Simply speaking it works only on simple, mostly error-free applications.
For OpenGL the situation is only worse. I do not know of any such free tool.

Thursday, July 23, 2009

Sopyer, I can only say that it is your experience.
PIX can take up more resources that a game normally do, but I've profiled a number of games and it worked perfectly fine.
You're not saying that Crysis, Mirrors Edge, Portal and CoD4 are simple applications, do you?

Thursday, July 23, 2009

I'm not saying they are simple. But when you support legacy code with potential D3D errors, you have better chances of troubles. That's not what you should expect from debug instrument. I experienced this numeric times with different DXSDK's. PIX does not support MRT. It has problems with mesh presentation. Support of DX10 is awful. I do not say that it is useless but it's heavily depends on your application. And I want to stress more - PIX on XBOX and on PC is two very different versions. I do not want to start a flame war. But measurement of time in graphic applications is not easy topic. I just disagreed with thought that PIX can help.

Thursday, July 23, 2009

I'm running our engine through PIX at a regular basis, and while I agree that it can be unstable at times it mostly works fine. Mostly when PIX is unstable it's because I'm on a CrossFire/SLI system, including any "X2" type of card, or I have enabled some kind of driver override like for instance for Nvidia's stereo glasses. The use of queries across frames also seems to be an issue. I also agree PIX for Xbox is better, but PIX for Windows is certainly a useful tool. In addition to optimization I've also found many bugs with it.

For OpenGL there's gDebugger. I don't know at what quality the performance tools in it are today since I haven't used it in a long while, but I know both ATI and Nvidia worked together with them to get performance counters into it.

And if you don't like external tools you can always write your own performance measuring framework. In DirectX there's the timestamp queries and in OpenGL there's the GL_EXT_timer_query extension.

Friday, July 24, 2009


Could you let us know how long it took you to write that post? But please do not use "Words-per-Second"... use the more accurate "Letters-per-Second".


Saturday, July 25, 2009

@Michael: shouldn't it be "Miliseconds-per-letter".

@Humus: wouldn't it be a good thing to follow up with a demo that implements GL_EXT_timer_query and measures different parts within a demo to show how long different parts take.

Saturday, July 25, 2009

"Still pretty decent, if you don't consider that their SSAO implementation obviously is much slower than any SSAO implementation ever used in any game."


I am also sick of hearing performance comparisons in fps. The worst is when you get something like 'changing that option increased 5 fps'. 5 fps?!? That could mean anything depending on what the total framerate was.

Your percentage point reminds me of when I first tried AMD's GPU PerfStudio and it would only tell you what percent of the frame a given event took. Not very useful.

Sopyer, I have also found PIX for windows to be god-awful wrt stability and usability. Anytime I get close to finding some useful information after sorting through pages of irrelevant render state setting, the program crashes. There's a difference of night and day between PIX for windows and PIX for xbox 360. When I have a bug that needs PIX the first thing I do is try to repro it on the xbox.

More pages: 1 2