The output of instructions like fabsx and ps_sel is store-safe
if and only if the relevant inputs are. The old code was always
marking the output as store-safe if the output was a single,
and never otherwise.
Also, the old code was treating the output of psq_l/psq_lu as
store-safe, which seems incorrect (if dequantization is disabled).
instruction tables
Previously, all of the internals that handled how the instruction tables
are initialized were exposed externally. However, this can all be made
private to each CPU backend.
If each backend has an Init() function, then this is where the instruction
tables should be initialized, it shouldn't be the responsibility of
external code to ensure internal validity.
This allows for getting rid of all the table initialization shenanigans
within JitInterface and PPCTables.
The class itself already acts as a namespace trailer, so '_interpreter'
isn't necessary. This also gets rid of a duplicate typedef in the
Interpreter_Tables.
This is unused, and since it had the same value as FL_OUT_D, it was unnecessarily setting the rS register as an output, even on instructions that only have FL_OUT_D set.
Updated ARAM DMA and FIFO write exception checking to uses these types.
Conflicts:
Source/Core/Core/PowerPC/Interpreter/Interpreter_Tables.cpp
Source/Core/Core/PowerPC/PPCTables.h
This should dramatically reduce code size in the case of blocks with
lots of branches, and certainly doesn't hurt elsewhere either.
This can probably be improved a good bit through smarter tracking of register
usage, e.g. discarding registers that are going to be overwritten, but this
is a good start and should help reduce code size and register pressure.
Unlike that sort of change, this is a "safe" patch; it only flushes registers,
which can't affect correctness, unlike actually discarding data.
As part of this, refactor PPCAnalyst to support distinguishing between
float and integer registers (to properly handle instructions that access
both, like floating-point loads and stores).
Also update every instruction in the interpreter flags table I could find
that didn't have all the correct flags.
Tries as hard as possible to push carry-using operations (like addc and adde)
next to each other. Refactor the instruction reordering to be more flexible
and allow multiple passes.
353 -> 192 x86 instructions on a carry-heavy code block in Pokemon Puzzle.
12% faster overall in Pokemon Puzzle; probably less in typical games (Virtual
Console games seem to be carry-heavy for some reason; maybe a different
compiler?)
Doesn't support all the FPSCR flags, just the FPRF ones.
Add PPCAnalyzer support to remove unnecessary FPRF calculations.
POV-ray benchmark with enableFPRF forced on for an extreme comparison:
Before: 1500s
After, fmul/fmadd only: 728s
After, all float: 753s
In real games that use FPRF, like F-Zero GX, FPRF previously cost a few percent
of total runtime.
Since FPRF is so much faster now, if enableFPRF is set, just do it for every
float instruction, not just fmul/fmadd like before. I don't know if this will
fix any games, but there's little good reason not to.