Note: It's not 100% perfect, as some of the GPU capablities leak into the
pixel shader UID.
Currently our UIDs don't get exported, so there is no issue. But someone
might want to fix this in the future.
As much as possible, the asserts have been moved out of the GetUID
function. But there are some places where asserts depend on variables
that aren't stored in the shader UID.
Bug Fix: It was theoretically possible for a shader with depth writes
disabled to map to the same UID as a shader with late depth
writes.
No known test cases trigger this.
Bug Fix: The normal stage UIDs were randomly overwriting indirect
stage texture map UID fields. It was possible for multiple
shaders with diffrent indirect texture targets to map to
the same UID.
Once again, it dpesn't look like this bug was ever triggered.
This frees up 21 bits and allows us to shorten the UID struct by an entire
32 bits.
It's not strictly needed (as it's encoded into the length) but I added a
bit for per-pixel lighiting to make my life easier in the following
commits.
The only code which touches xfmem is code which writes directly into
uid_data.
All the rest now read their parameters out of uid_data.
I also simplified the lighting code so it always generated seperate
codepaths for alpha and color channels instead of trying to combine
them on the off-chance that the same equation works for all 4 channels.
As modern (post 2008) GPUs generally don't calcualte all 4 channels
in a single vector, this optimisation is pointless. The shader compiler
will undo it during the GLSL/HLSL to IR step.
Bug Fix: The about optimisation was also broken, applying the color light
equation to the alpha light channel instead of the alpha light
euqation. But doesn't look like anything trigged this bug.
Fixes a major preformance regression in Skies of Arcadia during
battle transisions.
I had plans for a more advanced version of this code after 5.0,
but here is a minimal implemenation for now.
Using glMapBufferRange to read back the contents of the SSBO is extremely
slow on NVIDIA drivers. This is more noticeable at higher internal
resolutions. Using glGetBufferSubData instead does not seem to exhibit
this slowdown.
This is an oversight from pr https://github.com/dolphin-emu/dolphin/pull/3266 . Thanks to degasus for pointing this out.
It's possible that MAX_TEXTURE_BINARY_SIZE can be optimised, but i wanted to play it safe considering the 5.0 stable release.
Drivers that don't support GL_ARB_shading_language_420pack require that
the storage qualifier be specified even when inside an interface block.
AMD's driver throws a compile error when "centroid in/out" is used within
an interface block.
Our previous behavior was to include the storage qualifier regardless, but
this wasn't working on AMD, therefore we should check for the presence of
the extension and include based on this, instead.
I'm not entirely sure what is happening, but this optimisation is causing an issue in Sonic Riders: Zero Gravity. Apparently the issue would also be fixed by PR#3747, but this PR should also fix similar issues.
Games that use partial updates might get slower with this, so some performance regression testing would be nice. Games like New Super Mario Bros, RS2, Zelda TP and Silent Hill. Testing with high graphics settings makes sense, since this would mostly end up in more work for the GPU.
The D3D backend was always forcing Anisotropic filtering when that is enabled regardless of how the game chose to configure the texture filtering registers; this causes the same issues as "Force Filtering" without Anisotropy, such as causing game UI elements to no longer line up adjacent correctly. Historically, OpenGL's Anisotropy support has always worked "better" than D3D's due to seeming to not have this problem; unfortunately, OpenGL's Anisotropy specification only gives GL_LINEAR based filtering modes defined behavior, with only the mipmap setting being required to be considered. Some OpenGL implementations were implicitly disabling Anisotropy when the min/mag filters were set to GL_NEAREST, but this behavior is not required by the spec so cannot be relied on.