"Nothing is more difficult, and therefore more precious, than to be able to decide."
- Napoleon I.

Rules of optimization
Monday, July 2, 2018 | Permalink

So I tweeted a thing that got retweeted, liked and commented on a bit more than my usual stuff, so I thought I should expand a bit on my thoughts on it.

Rules of optimization:
1) Design for performance from day 1
2) Profile often
3) Be vigilant on performance regressions
4) Understand the data
5) Understand the HW
6) Help the compiler
7) Verify your assumptions
Performance is everyone's responsibilityhttps://t.co/CwwJffnHGz

— Emil Persson (@_Humus_) June 27, 2018

Basically Programming Wisdom (which btw often is a great sources for actual programming wisdom) posted a quote that basically suggested more or less that thereís never a good time to think about performance. Even experts should defer it until later! This is way worse advice than your usual ďpremature optimization is the root of all evilĒ tirade. The premature optimization at least addresses something that can be a bit of a problem, i.e. that programmers go ahead and obfuscate code in order to make it faster, without even looking at whether the code had any performance problem to begin with or verifying that the new code actually is any faster, and in the process of doing this introducing bugs and reducing readability. Yes, thatís a real problem and poor engineering practice. And thatís what Knuth was going on about in that quote. But the often omitted continuation is equally important: ďYet we should not pass up our opportunities in that critical 3%." Basically what he was saying is that you should profile your code first, see where your bottlenecks are, and then work to optimize those parts. Thatís good advice and I agree.

But I would go a step further than Knuthís famous quote. If you work in an environment where you have a decent performance culture, you may in fact be able to just fire up the profiler, find your top 3% functions (or top 3% shaders), optimize those and have a shippable product. But if performance work has been neglected for most of the product development cycle, chances are that when you finally fire up the profiler itíll be a uniformly slow mess with deep systemic issues. Fixing your top 3% function may only lead to a minor observable performance gain, or none at all. You may find that your product wonít be able to reach a shippable state without large scale redesign, delays, budget overrun or cutting features. You donít want to put your entire team on a 6 months crunch to salvage a fundamentally broken product when you had hoped a small performance push would suffice.

What you need is a performance culture. Understand that performance is a core feature of your product. Poor performance is poor user experience, or in the case of games, possibly unplayable and unshippable. When you design new systems, you need to think about performance from the start. Yes, you can hack away at prototypes and proof of concept implementations without being overly concerned about micro-optimizations. You can run with it for a while and get a feel for how things hold up. Itís fine, you donít have all the answers at this point. But put some thought into what sort of workload it should be able to handle once it goes into production. Does it run fine with 100 items? How about 1,000? A million? Is it conceivable that there will be a million items? Donít just integrate a prototype into mainline without first having thought about the scale it will run at eventually. The idea isnít that you should optimize everything down to the last SIMD instruction and last byte of memory in the first pass. But you should prepare your solution to be able to operate at the intended scale. If the code will only ever need a handful of objects, donít obsess over performance. Will there be hundreds? See if you can simplify your math. Maybe group things sensibly. Will there be thousands? You should probably make sure it operates in batches, think about memory consumption, access pattern and bandwidth, perhaps separate active and inactive sets, or hierarchical subdivision. Tens of thousands? Time to think about threading perhaps? Millions? Start counting cycles and you can essentially not have cache misses anymore.

So thatís the item 1 and I guess also item 8 on my list, and a pinch of 4 and 5 I guess. Yeah, the list isnít sorted. My tweet wasnít carefully crafted, but a spur of the moment frustration over terrible advice. Now, itís also true that most of the time you work with existing code rather than writing new, and most days youíre probably spending more time debugging, maintaining and fixing code rather than optimizing. The idea isnít that performance should override all these other concerns, but itís a concern that is of equal value to the rest. You wouldnít advice anyone to defer bug fixing to the end of the product cycle, and neither should you defer performance work until the end. And just like you couldnít have a single engineer on the team be ďthe bug guyĒ who gets the fix all issues when the rest of the team implements features, neither is it a reasonable thing to have a single engineer be ďthe performance guyĒ who gets to fix all performance issues.

Performance is everyoneís responsibility and it needs to be part of the process along the way. And that leads to item 3, you need to take performance regressions seriously. If performance unexpectedly drops, it should be a top priority to investigate and fix. And to catch performance regressions you should have systems in place detecting it, such as auto tests with daily graphs over performance. But engineers on the team should also profile often enough to have a good idea in their head of the overall performance characteristics of their game or application as a whole, and of their particular systems in particular, so that whenever it looks different from the usual they know that either something broke or thereís a team member that needs an extra pat on their back for the awesome performance work.

Naturally, what priority you assign to performance may vary depending on what you are developing and how critical performance is and what your current state is. Not every piece of software needs a lot of performance work. Most of my tweet should be interpreted in the context of a team within an organization, and my perspective comes from a rendering engineer working in game development. In my line of work performance is absolutely crucial. I need to crunch through up to tens of thousands of items 60 times per second, and I need to shade millions of pixels in milliseconds. Maybe if you build applications for yourself to run on your own servers you may consider just throwing hardware at the problem. But that doesnít work in game development, and probably not for most lines of business. The software is not for us and the hardware isnít ours. We can throw hardware at internal tools, like build servers, to optimize our own processes, but customer facing products must run on the customersí machines. If our PS4 SKU doesnít run satisfactorily on a PS4, itís not going to ship until it does.

Thatís not to say that performance work is always needed. When I write my personal code I donít necessarily spend a lot of time tuning performance. At least not more than necessary, or should I say, not any more than I enjoy doing. Thereís time for just brute-forcing something. If I just want to generate a lightmap for a scene I can just write some dumb code that throws loads of rays into the scene until all shadows are smooth and nice. Itís fine if it takes 30 minutes if it only needs to run once. So unless I can optimize the code to be much faster in less than 30 minutes, it is probably more productive to just let it churn away on poorly optimized code. In practice though, to be honest, chances are that Iíll run it tens of times before Iím satisfied with the results.

Also, I do agree that if you are working with a system that is currently buggy, itís not time to optimize it. But donít defer performance work just because there exists bugs elsewhere in the code. Itís fine to optimize the occlusion culling system while thereís glitches in the character animation, but you should of course first make sure the culling logic is correct and you arenít falsely culling objects before you spend any time tuning it.

So item 1, 2, 3 and 8 were mostly concerning performance culture. I would say those are the most important points. The rest mostly concern practical optimization tips when you actually sit down to do performance work. I donít have a lot to add about 4 and 5, but regarding 6, helping the compiler. Yes, compilers can be super smart and employ some really clever tricks to make your code faster. But itís important to know that the compiler often has its hands tied in a number of ways that makes it not able to optimize in the ways you expect. Many so call zero-cost abstractions can turn into very costly abstractions when you stack several layers of them. Or when you run in Debug. A compiler is limited not just by the semantics of the language, which can constrain it in surprising ways sometimes, but also in time. A compiler may try an optimization, which wouldíve been successful if it went all the way, but after a number iterations bail out because it doesnít seem to be converging, and then just generate the code instead of the inline constant you hoped would result. Verifying your assumptions is key. Look at the actual assembly code. Does it look like the compiler generate the code you thought? Did the zero-cost abstraction actually generate zero instructions? Did it fail to inline code that you wanted inlined? Did it inline code thatís pointless to inline?

There's a lot more to be said about practical optimization, but what I really wanted to say is that performance matters and neglecting it until the end of development cycle is a recipe for disaster.

[ 6 comments | Last comment by Mikkel Gjoel (2018-07-13 09:41:39) ]