"Doom3 is up to 112% slower than the 6600GT"
- Nvidia PR doing the math. The X700 apparently renders at a negative framerate in Doom3.
More pages: 1 2 3 4 5 6 ... 11 ... 21 ... 31 ... 41 ... 48
Another Metaballs2 update
Friday, October 25, 2019 | Permalink

I have update the Metaballs2 demo again. The main addition is a prepass optimization that reduces the total GPU time by about 33% with default settings, or up to 50% with the densest grid. The field() function is evaluated for the entire grid as a pre-pass, but only the sign is stored as a single bit, with 8 values in a 2x2x2 cube per pixel. This allows the storage to stay at a relatively modest 16MB for the densest grid setting of 512x512x512, and allows you to fetch multiple values with few fetches.

A number of improvements have also been done to the Framework4, most notably the PipelineCompiler has been updated to work with the latest Vulkan SDK.

[ 13 comments | Last comment by DimucaTheDev (2024-09-30 19:34:31) ]

Updated Metaballs2 demo
Sunday, September 15, 2019 | Permalink

The Metaball2 demo has been updated with a compute shader based fallback path, which allows a much wider range of GPUs to be able to run it. If your GPU supports mesh shaders, you'll be able to toggle between the paths and compare performance. In the end, the performance delta between mesh shaders and compute wasn't all that great, in fact, they're performing roughly the same. There are a number of options available on the F1 dialog you can toggle to compare, like number of balls, size and grid density. It's also interesting to toggle MSAA settings and see that this affects compute path a bit heavier than mesh shader, which is probably because the greater bandwidth needed by the compute path as it needs to write its intermediate data to memory. A similar effect can be seen if you run at higher resolutions.

[ 7 comments | Last comment by Phentermineinfo (2020-10-02 12:44:42) ]

Metaballs 2
Sunday, August 25, 2019 | Permalink

It's been quite a while, but now I have a new demo available.

This demo explores the new Mesh Shader pipeline available in NVIDIA Turing GPUs. It is a bit similar to my old Metaballs demo from 2005, except much greater mesh density and a lot more balls, and implements the Marching Cubes algorithm on the GPU instead of on the CPU.

Also noteworthy is that this demo is based on a new framework I've been working on for a while, which is based on DX12 and Vulkan. This particular demo uses Vulkan, but I plan to release more demos in the future based on DX12 as well, depending on what makes more sense for a given demo.

Some of the code in this framework has been more or less just adapted from code in my previous framework, like model loading and other utils, but a lot of stuff is new also. I spent probably a little too much time on putting in SIMD support in various places, like the vector classes and image utilities, but it's fun exploring. But the biggest difference is of course implementing a renderer for DX12 and Vulkan with a completely new abstraction, a relatively thin layer over the commonalities of these APIs. This is based on modern concepts like pre-built pipeline objects, descriptor sets, root constants etc. These APIs are not stable as I'm still exploring a lot, so I expect things to change as I add more stuff that I need for future demos. There are plenty of gaps in functionality.

Another new thing in this framework is a PipelineCompiler. This allows you to specify your root signature with a clear syntax together with all shaders and constant buffer structs in the same .pipeline file so that shaders and C++ code links up together minimizing risk for mismatch. The PipelineCompiler compiles all shaders, automatically generating the resource binding according to the specified root signature, and outputs a header file for the C++ code to use. In this header you will find the binary code for the shaders and resource binding enums matching what you put in the .pipeline as well as the constant buffer structs, so that the shader and C++ code can agree on inputs. Earlier frameworks relied on runtime shader compilation, but with this framework the shaders are precompiled and linked into the exe file of the demo. To see this in action, check the MarchingCubes.pipeline file in the new Metaballs2 demo.

Enjoy this release. Everything free and open source as usual.

[ 4 comments | Last comment by Barisa Abdulaziz (2024-04-08 05:29:34) ]

Another addition to the Persson family
Monday, July 8, 2019 | Permalink

Our family welcomes another baby boy.



Final baby now, I promise.

[ 6 comments | Last comment by JEREMIAH KELLY (2024-11-05 21:04:34) ]

Rules of optimization
Monday, July 2, 2018 | Permalink

So I tweeted a thing that got retweeted, liked and commented on a bit more than my usual stuff, so I thought I should expand a bit on my thoughts on it.

Rules of optimization:
1) Design for performance from day 1
2) Profile often
3) Be vigilant on performance regressions
4) Understand the data
5) Understand the HW
6) Help the compiler
7) Verify your assumptions
Performance is everyone's responsibilityhttps://t.co/CwwJffnHGz

— Emil Persson (@_Humus_) June 27, 2018


Basically Programming Wisdom (which btw often is a great sources for actual programming wisdom) posted a quote that basically suggested more or less that there’s never a good time to think about performance. Even experts should defer it until later! This is way worse advice than your usual “premature optimization is the root of all evil” tirade. The premature optimization at least addresses something that can be a bit of a problem, i.e. that programmers go ahead and obfuscate code in order to make it faster, without even looking at whether the code had any performance problem to begin with or verifying that the new code actually is any faster, and in the process of doing this introducing bugs and reducing readability. Yes, that’s a real problem and poor engineering practice. And that’s what Knuth was going on about in that quote. But the often omitted continuation is equally important: “Yet we should not pass up our opportunities in that critical 3%." Basically what he was saying is that you should profile your code first, see where your bottlenecks are, and then work to optimize those parts. That’s good advice and I agree.

But I would go a step further than Knuth’s famous quote. If you work in an environment where you have a decent performance culture, you may in fact be able to just fire up the profiler, find your top 3% functions (or top 3% shaders), optimize those and have a shippable product. But if performance work has been neglected for most of the product development cycle, chances are that when you finally fire up the profiler it’ll be a uniformly slow mess with deep systemic issues. Fixing your top 3% function may only lead to a minor observable performance gain, or none at all. You may find that your product won’t be able to reach a shippable state without large scale redesign, delays, budget overrun or cutting features. You don’t want to put your entire team on a 6 months crunch to salvage a fundamentally broken product when you had hoped a small performance push would suffice.

What you need is a performance culture. Understand that performance is a core feature of your product. Poor performance is poor user experience, or in the case of games, possibly unplayable and unshippable. When you design new systems, you need to think about performance from the start. Yes, you can hack away at prototypes and proof of concept implementations without being overly concerned about micro-optimizations. You can run with it for a while and get a feel for how things hold up. It’s fine, you don’t have all the answers at this point. But put some thought into what sort of workload it should be able to handle once it goes into production. Does it run fine with 100 items? How about 1,000? A million? Is it conceivable that there will be a million items? Don’t just integrate a prototype into mainline without first having thought about the scale it will run at eventually. The idea isn’t that you should optimize everything down to the last SIMD instruction and last byte of memory in the first pass. But you should prepare your solution to be able to operate at the intended scale. If the code will only ever need a handful of objects, don’t obsess over performance. Will there be hundreds? See if you can simplify your math. Maybe group things sensibly. Will there be thousands? You should probably make sure it operates in batches, think about memory consumption, access pattern and bandwidth, perhaps separate active and inactive sets, or hierarchical subdivision. Tens of thousands? Time to think about threading perhaps? Millions? Start counting cycles and you can essentially not have cache misses anymore.

So that’s the item 1 and I guess also item 8 on my list, and a pinch of 4 and 5 I guess. Yeah, the list isn’t sorted. My tweet wasn’t carefully crafted, but a spur of the moment frustration over terrible advice. Now, it’s also true that most of the time you work with existing code rather than writing new, and most days you’re probably spending more time debugging, maintaining and fixing code rather than optimizing. The idea isn’t that performance should override all these other concerns, but it’s a concern that is of equal value to the rest. You wouldn’t advice anyone to defer bug fixing to the end of the product cycle, and neither should you defer performance work until the end. And just like you couldn’t have a single engineer on the team be “the bug guy” who gets the fix all issues when the rest of the team implements features, neither is it a reasonable thing to have a single engineer be “the performance guy” who gets to fix all performance issues.

Performance is everyone’s responsibility and it needs to be part of the process along the way. And that leads to item 3, you need to take performance regressions seriously. If performance unexpectedly drops, it should be a top priority to investigate and fix. And to catch performance regressions you should have systems in place detecting it, such as auto tests with daily graphs over performance. But engineers on the team should also profile often enough to have a good idea in their head of the overall performance characteristics of their game or application as a whole, and of their particular systems in particular, so that whenever it looks different from the usual they know that either something broke or there’s a team member that needs an extra pat on their back for the awesome performance work.

Naturally, what priority you assign to performance may vary depending on what you are developing and how critical performance is and what your current state is. Not every piece of software needs a lot of performance work. Most of my tweet should be interpreted in the context of a team within an organization, and my perspective comes from a rendering engineer working in game development. In my line of work performance is absolutely crucial. I need to crunch through up to tens of thousands of items 60 times per second, and I need to shade millions of pixels in milliseconds. Maybe if you build applications for yourself to run on your own servers you may consider just throwing hardware at the problem. But that doesn’t work in game development, and probably not for most lines of business. The software is not for us and the hardware isn’t ours. We can throw hardware at internal tools, like build servers, to optimize our own processes, but customer facing products must run on the customers’ machines. If our PS4 SKU doesn’t run satisfactorily on a PS4, it’s not going to ship until it does.

That’s not to say that performance work is always needed. When I write my personal code I don’t necessarily spend a lot of time tuning performance. At least not more than necessary, or should I say, not any more than I enjoy doing. There’s time for just brute-forcing something. If I just want to generate a lightmap for a scene I can just write some dumb code that throws loads of rays into the scene until all shadows are smooth and nice. It’s fine if it takes 30 minutes if it only needs to run once. So unless I can optimize the code to be much faster in less than 30 minutes, it is probably more productive to just let it churn away on poorly optimized code. In practice though, to be honest, chances are that I’ll run it tens of times before I’m satisfied with the results.

Also, I do agree that if you are working with a system that is currently buggy, it’s not time to optimize it. But don’t defer performance work just because there exists bugs elsewhere in the code. It’s fine to optimize the occlusion culling system while there’s glitches in the character animation, but you should of course first make sure the culling logic is correct and you aren’t falsely culling objects before you spend any time tuning it.

So item 1, 2, 3 and 8 were mostly concerning performance culture. I would say those are the most important points. The rest mostly concern practical optimization tips when you actually sit down to do performance work. I don’t have a lot to add about 4 and 5, but regarding 6, helping the compiler. Yes, compilers can be super smart and employ some really clever tricks to make your code faster. But it’s important to know that the compiler often has its hands tied in a number of ways that makes it not able to optimize in the ways you expect. Many so call zero-cost abstractions can turn into very costly abstractions when you stack several layers of them. Or when you run in Debug. A compiler is limited not just by the semantics of the language, which can constrain it in surprising ways sometimes, but also in time. A compiler may try an optimization, which would’ve been successful if it went all the way, but after a number iterations bail out because it doesn’t seem to be converging, and then just generate the code instead of the inline constant you hoped would result. Verifying your assumptions is key. Look at the actual assembly code. Does it look like the compiler generate the code you thought? Did the zero-cost abstraction actually generate zero instructions? Did it fail to inline code that you wanted inlined? Did it inline code that’s pointless to inline?

There's a lot more to be said about practical optimization, but what I really wanted to say is that performance matters and neglecting it until the end of development cycle is a recipe for disaster.

[ 6 comments | Last comment by Mikkel Gjoel (2018-07-13 09:41:39) ]

Epic Games Stockholm
Wednesday, June 6, 2018 | Permalink

So I'm a bit late at blogging about this, and for that matter updating the "About" page here, but back in March I joined Epic Games in a new studio here in Stockholm. I will work on various Unreal Engine rendering stuff, but for now I'm still very much a n00b at everything, so it'll take a little while more until I get fully productive. At the time of writing we are only 3 people here, but there are more people in the pipeline joining shortly and we are expecting the team to grow. We are currently in a temporary space close to the central station, but expect to move to the gamedev cluster on Södermalm later this year.

If you're in Stockholm and would like to work with me, now is a great time to join. Just let me know and I'll get you in touch with our recruiters, or simply apply here.

[ 4 comments | Last comment by Duke (2018-11-26 13:59:26) ]

BSP Blending
Monday, February 6, 2017 | Permalink

Like my last demo, this is sort of a modern take on an old idea. Sorting rendering back-to-front with a BSP isn't exactly a new idea, but a parallel traversal in a compute shader might be. I've been pondering on this idea for a lot longer than I care to admit, it's probably been in my TODO list for over 5 years, but I finally got around to play with it and see if it was meaningful. My original idea was actually to traverse the BSP on the fly in the vertex shader, which of course is entirely possible, but ultimately decided that it makes more sense to do it in a separate compute pass.

Either way, enjoy the demo, and hope it's useful for something.

[ 10 comments | Last comment by xPepe (2021-01-24 16:01:37) ]

Modern LightMapping
Tuesday, October 11, 2016 | Permalink

Well, it's been a while, but it's time for a demo again!

This time it's a sort of hybrid between Clustered Shading and LightMapping, which essentially forms a lightmapping solution that's compatible with normal mapping, and while the lights need to be stationary, allows most other attributes other than position to be dynamically changed. In this demo you can interact with all the lights, turn them on and off, change color, or switch to an animated mode.

Head over to the 3D section and give it a try!

[ 3 comments | Last comment by Aaaaaa (2016-11-19 13:43:53) ]

More pages: 1 2 3 4 5 6 ... 11 ... 21 ... 31 ... 41 ... 48