Stop blaming teeth for Cities: Skylines 2 performance problems, say devs

alyaza [they/she]@beehaw.org · 2 years ago

Stop blaming teeth for Cities: Skylines 2 performance problems, say devs

BorgDrone · 2 years ago

You conveniently deleted important parts of my comment, such as “at least with low-graphics settings” and “adjust for a few years of hardware inflation”,

No, that just supports my theory. Graphics settings usually scale really well, that’s the reason they are adjustable by the end-user in the first place. Those should not cause any of the issues you are talking about. The problems lie in parts that take advantage of certain architectural differences.

A hypothetical example that highlights a real architectural difference between consoles and PCs:

Say you have a large chunk of data and you need to perform some kind of operation on all this data. Say, adjust the contents of buffer A based on the contents of buffer B. It’s all pretty much the same: read some data from A and B, perform some operation on it, write back the results to A. Just for millions of data points. There are many things you could be doing that follow such a pattern. You know who’s really good at doing a similar operation millions of times? The GPU! It was made specifically to perform such operations. So as a smart console game developer you decide to leverage the GPU for this task instead of doing it on the CPU. You write a small compute kernel, some lines in your CPU code to invoke it. Boom, super fast operation.

Now imagine you’re tasked with porting this code to the PC. Now, suddenly this super fast operation is dog slow. Why? Because it’s data generated by the CPU, and the result is needed by the CPU. The console developer was just using the GPU for this one operation that’s part of a larger piece of code to take advantage of the parallel performance of the GPU. On PC, however, this won’t fly. The GPU cannot access this data because it’s on a separate card with it’s own RAM. The only way to get to the CPU is through the (relatively slow) PCIe bus. So now you have to copy the data to the GPU, perform the operation, and then copy the data back to system RAM. All over the limited bandwidth of the PCIe bus, that’s already being used for graphics-related tasks as well. On a console this is completely free, the GPU and CPU share the same memory so handing data back and forth is a zero-cost operation. On PC this may take so much time that it’s actually faster to do on the CPU, even though the CPU takes much more time to perform the operation, simply to avoid the overhead of copying the data back and forth.

If an engine uses such an optimisation this will never run well on the PC, regardless of how fast your GPU is. You’d need a lot of years of ‘hardware inflation’ before either doing it on the CPU or doing it on the GPU + 2 times the copy overhead is faster than just doing it on the GPU of the console with zero overhead.

In fact, things like this is why Apple moved away from dedicated GPUs in favour of a unified memory model. If you design your engine around such an architecture you can reach impressive performance gains. A good example of this is how Affinity Photo designed their app around the ‘ideal GPU’ that didn’t exist yet at the time, but which they were expecting to in the future. One with unified memory. When Apple finally released it’s M-series SoCs they finally had a GPU architecture that matched their predictions and when benchmarked with their code the M1 Max beat the crap out of a $6000 AMD Radeon Pro W6900X. Note that the AMD part is still much faster if you measure raw performance, it’s just that the system architecture doesn’t allow you to leverage that power in this particular use-case.

It’s not just how fast the individual components are, it’s how well the are integrated and with a modular system like a PC this is always going to cause a performance bottleneck.