1 2 3 4 5 6
Another addition to the Persson family
Monday, July 8, 2019 | Permalink
Our family welcomes another baby boy.
Final baby now, I promise.
[ 1 comments
| Last comment by Kšy Vriend (2019-07-31 11:11:29)
Rules of optimization
Monday, July 2, 2018 | Permalink
So I tweeted a thing that got retweeted, liked and commented on a bit more than my usual stuff, so I thought I should expand a bit on my thoughts on it.
Basically Programming Wisdom (which btw often is a great sources for actual programming wisdom) posted a quote that basically suggested more or less that thereís never a good time to think about performance. Even experts should defer it until later! This is way worse advice than your usual ďpremature optimization is the root of all evilĒ tirade. The premature optimization at least addresses something that can be a bit of a problem, i.e. that programmers go ahead and obfuscate code in order to make it faster, without even looking at whether the code had any performance problem to begin with or verifying that the new code actually is any faster, and in the process of doing this introducing bugs and reducing readability. Yes, thatís a real problem and poor engineering practice. And thatís what Knuth was going on about in that quote. But the often omitted continuation is equally important: ďYet we should not pass up our opportunities in that critical 3%." Basically what he was saying is that you should profile your code first, see where your bottlenecks are, and then work to optimize those parts. Thatís good advice and I agree.
But I would go a step further than Knuthís famous quote. If you work in an environment where you have a decent performance culture, you may in fact be able to just fire up the profiler, find your top 3% functions (or top 3% shaders), optimize those and have a shippable product. But if performance work has been neglected for most of the product development cycle, chances are that when you finally fire up the profiler itíll be a uniformly slow mess with deep systemic issues. Fixing your top 3% function may only lead to a minor observable performance gain, or none at all. You may find that your product wonít be able to reach a shippable state without large scale redesign, delays, budget overrun or cutting features. You donít want to put your entire team on a 6 months crunch to salvage a fundamentally broken product when you had hoped a small performance push would suffice.
What you need is a performance culture. Understand that performance is a core feature of your product. Poor performance is poor user experience, or in the case of games, possibly unplayable and unshippable. When you design new systems, you need to think about performance from the start. Yes, you can hack away at prototypes and proof of concept implementations without being overly concerned about micro-optimizations. You can run with it for a while and get a feel for how things hold up. Itís fine, you donít have all the answers at this point. But put some thought into what sort of workload it should be able to handle once it goes into production. Does it run fine with 100 items? How about 1,000? A million? Is it conceivable that there will be a million items? Donít just integrate a prototype into mainline without first having thought about the scale it will run at eventually. The idea isnít that you should optimize everything down to the last SIMD instruction and last byte of memory in the first pass. But you should prepare your solution to be able to operate at the intended scale. If the code will only ever need a handful of objects, donít obsess over performance. Will there be hundreds? See if you can simplify your math. Maybe group things sensibly. Will there be thousands? You should probably make sure it operates in batches, think about memory consumption, access pattern and bandwidth, perhaps separate active and inactive sets, or hierarchical subdivision. Tens of thousands? Time to think about threading perhaps? Millions? Start counting cycles and you can essentially not have cache misses anymore.
So thatís the item 1 and I guess also item 8 on my list, and a pinch of 4 and 5 I guess. Yeah, the list isnít sorted. My tweet wasnít carefully crafted, but a spur of the moment frustration over terrible advice. Now, itís also true that most of the time you work with existing code rather than writing new, and most days youíre probably spending more time debugging, maintaining and fixing code rather than optimizing. The idea isnít that performance should override all these other concerns, but itís a concern that is of equal value to the rest. You wouldnít advice anyone to defer bug fixing to the end of the product cycle, and neither should you defer performance work until the end. And just like you couldnít have a single engineer on the team be ďthe bug guyĒ who gets the fix all issues when the rest of the team implements features, neither is it a reasonable thing to have a single engineer be ďthe performance guyĒ who gets to fix all performance issues.
Performance is everyoneís responsibility and it needs to be part of the process along the way. And that leads to item 3, you need to take performance regressions seriously. If performance unexpectedly drops, it should be a top priority to investigate and fix. And to catch performance regressions you should have systems in place detecting it, such as auto tests with daily graphs over performance. But engineers on the team should also profile often enough to have a good idea in their head of the overall performance characteristics of their game or application as a whole, and of their particular systems in particular, so that whenever it looks different from the usual they know that either something broke or thereís a team member that needs an extra pat on their back for the awesome performance work.
Naturally, what priority you assign to performance may vary depending on what you are developing and how critical performance is and what your current state is. Not every piece of software needs a lot of performance work. Most of my tweet should be interpreted in the context of a team within an organization, and my perspective comes from a rendering engineer working in game development. In my line of work performance is absolutely crucial. I need to crunch through up to tens of thousands of items 60 times per second, and I need to shade millions of pixels in milliseconds. Maybe if you build applications for yourself to run on your own servers you may consider just throwing hardware at the problem. But that doesnít work in game development, and probably not for most lines of business. The software is not for us and the hardware isnít ours. We can throw hardware at internal tools, like build servers, to optimize our own processes, but customer facing products must run on the customersí machines. If our PS4 SKU doesnít run satisfactorily on a PS4, itís not going to ship until it does.
Thatís not to say that performance work is always needed. When I write my personal code I donít necessarily spend a lot of time tuning performance. At least not more than necessary, or should I say, not any more than I enjoy doing. Thereís time for just brute-forcing something. If I just want to generate a lightmap for a scene I can just write some dumb code that throws loads of rays into the scene until all shadows are smooth and nice. Itís fine if it takes 30 minutes if it only needs to run once. So unless I can optimize the code to be much faster in less than 30 minutes, it is probably more productive to just let it churn away on poorly optimized code. In practice though, to be honest, chances are that Iíll run it tens of times before Iím satisfied with the results.
Also, I do agree that if you are working with a system that is currently buggy, itís not time to optimize it. But donít defer performance work just because there exists bugs elsewhere in the code. Itís fine to optimize the occlusion culling system while thereís glitches in the character animation, but you should of course first make sure the culling logic is correct and you arenít falsely culling objects before you spend any time tuning it.
So item 1, 2, 3 and 8 were mostly concerning performance culture. I would say those are the most important points. The rest mostly concern practical optimization tips when you actually sit down to do performance work. I donít have a lot to add about 4 and 5, but regarding 6, helping the compiler. Yes, compilers can be super smart and employ some really clever tricks to make your code faster. But itís important to know that the compiler often has its hands tied in a number of ways that makes it not able to optimize in the ways you expect. Many so call zero-cost abstractions can turn into very costly abstractions when you stack several layers of them. Or when you run in Debug. A compiler is limited not just by the semantics of the language, which can constrain it in surprising ways sometimes, but also in time. A compiler may try an optimization, which wouldíve been successful if it went all the way, but after a number iterations bail out because it doesnít seem to be converging, and then just generate the code instead of the inline constant you hoped would result. Verifying your assumptions is key. Look at the actual assembly code. Does it look like the compiler generate the code you thought? Did the zero-cost abstraction actually generate zero instructions? Did it fail to inline code that you wanted inlined? Did it inline code thatís pointless to inline?
There's a lot more to be said about practical optimization, but what I really wanted to say is that performance matters and neglecting it until the end of development cycle is a recipe for disaster.
[ 6 comments
| Last comment by Mikkel Gjoel (2018-07-13 09:41:39)
Epic Games Stockholm
Wednesday, June 6, 2018 | Permalink
So I'm a bit late at blogging about this, and for that matter updating the "About" page here, but back in March I joined Epic Games in a new studio here in Stockholm. I will work on various Unreal Engine rendering stuff, but for now I'm still very much a n00b at everything, so it'll take a little while more until I get fully productive. At the time of writing we are only 3 people here, but there are more people in the pipeline joining shortly and we are expecting the team to grow. We are currently in a temporary space close to the central station, but expect to move to the gamedev cluster on SŲdermalm later this year.
If you're in Stockholm and would like to work with me, now is a great time to join. Just let me know and I'll get you in touch with our recruiters, or simply apply here
[ 4 comments
| Last comment by Duke (2018-11-26 13:59:26)
Monday, February 6, 2017 | Permalink
Like my last demo, this is sort of a modern take on an old idea. Sorting rendering back-to-front with a BSP isn't exactly a new idea, but a parallel traversal in a compute shader might be. I've been pondering on this idea for a lot longer than I care to admit, it's probably been in my TODO list for over 5 years, but I finally got around to play with it and see if it was meaningful. My original idea was actually to traverse the BSP on the fly in the vertex shader, which of course is entirely possible, but ultimately decided that it makes more sense to do it in a separate compute pass.
Either way, enjoy the demo, and hope it's useful for something.
[ 8 comments
| Last comment by soietre (2017-12-13 02:38:42)
Tuesday, October 11, 2016 | Permalink
Well, it's been a while, but it's time for a demo again!
This time it's a sort of hybrid between Clustered Shading and LightMapping, which essentially forms a lightmapping solution that's compatible with normal mapping, and while the lights need to be stationary, allows most other attributes other than position to be dynamically changed. In this demo you can interact with all the lights, turn them on and off, change color, or switch to an animated mode.
Head over to the 3D section and give it a try!
[ 3 comments
| Last comment by Aaaaaa (2016-11-19 13:43:53)
Update to MakeID.h
Friday, April 3, 2015 | Permalink
If you've been using MakeID.h
I suggest you download the latest version. The previous version had a minor bug that made it copy an element too much when it reduced the number of ranges. This didn't impact functionality, but in edge-cases it could result in a read outside of allocated memory, which if you're really unlucky could cause a crash. Thanks to Markus Billeter for reporting the issue.
[ 3 comments
| Last comment by matthew (2018-07-08 08:06:35)
Sunday, March 29, 2015 | Permalink
I've done some internal plumbing in the website scripts. Hopefully you should see no difference at all, and I've clicked around quite a bit to verify that everything still works. If you see any that's not working on this site, please let me know.
[ 1 comments
| Last comment by mr (2015-04-05 11:06:00)
Tuesday, March 24, 2015 | Permalink
So, uhm, it's been a while since I last posted a demo here. I have valid excuses though. This whole kids situation sure makes a dent into your available time, and for the time that remains it sure has an impact on how much energy you have left at the end of the day. But it's a great joy too, so I'm not unhappy about it in any way. And the other thing is of course that I moved into a research position at work, which reduces my itch of doing so much research at home during my spare hours. But with that said, I recently ended up browsing into my site with another guy at work for reasons I don't remember, and the realization that it had been nearly three years since the last demo was a little too depressing for me. So I made it a priority to make another one, and now it's here.
So for this demo I'm poking around a little with Clustered Shading, but exploring an alternative implementation of the underlying principles. The main differences being world-space clustering, storing active lights as a bitfield, and going full forward shading instead of deferred. Well, except for the reference I'm comparing to.
Some useful shortcuts for the demo:
F5 - Clustered Shading
F6 - Visualize clusters
F7 - Classic Deferred Shading
F8 - Visualize stencil mask for Deferred Shading with MSAA
[ 15 comments
| Last comment by robotjellyfish (2018-12-18 19:32:00)
1 2 3 4 5 6