- In D3D, shaders could be compiled on the main thread, blocking
startup.
- Reduced the latency between a pipeline being requested and used in all
backends in hybrid ubershader mode, when no shader stages were present.
- Fixed a case where async compilation could cause the same UID to be
appended multiple times to the UID cache.
- Fix incorrect number of threads being used when immediately compile
shaders was enabled.
Lowest hanging fruit I could find with a profiler.
Not sure this stuff actually needs to be done, but assuming it is, why
not do it quickly? 10x faster, goes from 1% CPU to 0.09%.
This enables shaders to be compiled while the game is starting, instead
of blocking startup. If a shader is needed before it is compiled,
emulation will block.
As these are stored in a map, operator< will become a hot function when
doing lookups, which happen every frame. std::tie generated a rather
large function here with quite a few branches.
tl;dr: This PR speedups dolphin on mobiles with the Mali GPU and ES 3.2
drivers by a factor of 10 by using the method with the biggest overhead.
Please keep care not to buy this shit!
The ARM driver team seems to care very well about their customers. But
bad luck, users and open source developers are *not* their customers. So
even device-independent feature requests are just ignored for *years*:
https://community.arm.com/graphics/f/discussions/4645/gl_ext_buffer_storage-support
The bad point, they neither implement any of the other common ways to
stream dynamic content in unextented GL:
- They just ignore the GL_MAP_UNSYNCHRONIZED_BIT flag
- They don't support on-device buffer updates and just stall with
glBufferSubData
It seems like no benchmark is using any dynamic content - and like no
customer cares about anything but benchmarks, or users...
We have a flag to disable the glBufferSubData way, this PR adds the flag
to also disable the unsychronized mapping way. The second one is
available since their ES 3.2 update, but slow as hell.
So how to continue? The last remaining technical way to stream dynamic
content at all is to alloc a new buffer per draw call with glBufferData.
This is very gross, but still a factor 10 speedup compared to stalling
the GPU. Small tests shows that you can expect another 3-5 times speedup
with EXT_buffer_data, so Mali would be on pair with Adreno here. So if
you have bought such a device unfortunately, please try to make noise on
your vendor forums/support and ask for this extension. If you are going
to buy a new mobile, I'd recormend to avoid *any* mobile with a Mali GPU
in it.
We now differentiate between a resize event and surface change/destroyed
event, reducing the overhead for resizes in the Vulkan backend. It is
also now now safe to change the surface multiple times if the video thread
is lagging behind.
The option is named DisableCopyToVRAM under the Hacks section in
GFX.ini. It is intentionally not exposed to the GUI, as users should not
need to use it under normal circumstances. The main use is debugging
issues in the EFB-to-RAM shaders.
The console appears to behave against standard IEEE754 specification
here, in particular around how NaNs are handled. NaNs appear to have no
effect on the result, and are treated the same as positive or negative
infinity, based on the sign bit.
However, when the result would be NaN (inf - inf, or (-inf) - (-inf)),
this results in a completely fogged color, or unfogged color
respectively. We handle this by returning a constant zero for the A
varaible, and positive or negative infinity for C depending on the sign
bits of the A and C registers. This ensures that no NaN value is passed
to the GPU in the first place, and that the result of the fog
calculation cannot be NaN.
Otherwise we might get UB if the value we cast is larger than the
max value of the underlying type that the compiled picked for the enum.
I haven't done any extensive check through Dolphin to find cases
of this, I'm just fixing the cases I already know of.