The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
The new `dispatcher_no_timing_check` is the same as `dispatcher_no_check`
except it includes the "stepping check" in debug mode. This lets us avoid
the `m_enable_debugging ? dispatcher : dispatcher_no_check` dance.
Maybe "tail call" isn't quite the right term for what this code
is doing, since it's jumping to the dispatcher rather than
returning, but it's the same optimization as for a tail call.
Jumping to `dispatcher` requires first subtracting the downcount,
otherwise `dispatcher` may unpredictably jump to CoreTiming::Advance,
which could break determinism compatibility with JitArm64. We should
jump to `dispatcher_no_check` instead.
The breakpoint check in Jit.cpp makes it redundant.
Normally this redundant check doesn't cause any issues, but if you
create a breakpoint and enable logging without breaking, you get two
log messages if the breakpoint is at the beginning of a block. See
https://bugs.dolphin-emu.org/issues/13044.
This is also a tiny performance improvement for when debugging is
active, since we no longer check for breakpoints for blocks that never
had any breakpoints to begin with.
base is an unsigned variable, so we can make things little more
consistent by making the loop index unsigned so we aren't doing bit
arithmetic with signed types.
MemoryInterface already does this, so we can leave it alone.
No behavioral changes, just a consistency thing.
Micro-optimization. Some CPUs can fuse CMP+B, TST+B, arith+CBZ, etc.
I also moved things around for CMP+CSET and TST+CSET - which I'm not sure
if any CPUs support - but it doesn't hurt anything, so I might as well.
Improves accuracy but isn't known to affect any games.
This turned out to be fairly convenient to implement; ORing with the
PPC default NaN will quieten SNaNs and do nothing to QNaNs.