It became irrelevant in 952dfcd610, when the define was removed; now, the code the comment is referring to is in JitRegister.cpp, and oprofile is controlled by cmake.
This is done entirely through interpreter fallbacks. It would
probably be possible to implement this using host exception
handlers instead, but I think it would be a lot of complexity
for a rarely used feature, so let's not do it for now.
For performance reasons, there are two settings for this feature:
One setting which does enables just what True Crime: New York City
needs and one setting which enables it all. The latter makes
almost all float instructions fall back to the interpreter.
This reverts commit 66b992cfe4.
A new (additional) correctness issue was revealed in the old
AArch64 code when applying it on top of modern JitArm64:
LSR was being used when LSRV was intended. This commit uses LSRV.
SPDX standardizes how source code conveys its copyright and licensing
information. See https://spdx.github.io/spdx-spec/1-rationale/ . SPDX
tags are adopted in many large projects, including things like the Linux
kernel.
Repeated erase() + iteration on a std::multimap is extremely slow.
Slow enough that it causes a 7 second long stutter during some
transitions in F-Zero X (a N64 VC game that triggers many, many icache
invalidations).
And slow enough that JitBaseBlockCache::DestroyBlock shows up on a
flame graph as taking >50% of total CPU time on the CPU-GPU thread:
https://i.imgur.com/vvqiFL6.png
This commit optimises those block link queries by replacing the
std::multimap (which is typically implemented with red-black trees)
with hash tables.
Master: https://i.imgur.com/vvqiFL6.png / 7s stutters
(starting from 5.0-2021 and with branch following disabled)
This commit: https://i.imgur.com/hAO74fy.png / ~0.7s stutters, which
is pretty close to 5.0 stable. (5.0-2021 introduced the performance
regression and it is especially noticeable when branch following
is disabled, which is the case for all N64 VC games since 5.0-8377.)
If we know at compile time that the PPC carry flag definitely
has a certain value, we can bake that value into the emitted code
and skip having to read from PPCState.
Add a function to calculate the magic constants required to optimize
signed 32-bit division.
Since this optimization is not exclusive to any particular architecture,
JitCommon seemed like a good place to put this.
PR #2663 added a Jit64 implementation of dcbX and a fast path to skip JIT cache invalidation. Unfortunately, there is a mismatch between address spaces in this optimization. It tests the effective address (with the top 3 bits cleared) against the valid block bitset which is intended to be indexed by physical address. While this works in the common case, it fails (for example) when the effective address is in the 7E... region (a.k.a. "fake VMEM"). This may also fall apart under more complex memory mapping scenarios requiring full MMU emulation.
The good news is that even without this fast path, the underlying call to JitInterface::InvalidateICache() still does very little work in the common case. It correctly translates the effective address to a physical address which it tests against the valid block bitset, skipping invalidation if it is not necessary. As such, the cost of removing the fast path should not be too high.
The Jit64 implementation is retained, though all it does now is emit a call. This is marginally more efficient than simple interpreter fallback, which involves an extra call. The JitArm64 implementation has also been fixed.
The game Happy Feet is fixed by this change, as it loads code in the 7E... address region and depends upon JIT cache invalidation in reponse to dcbf.
https://bugs.dolphin-emu.org/issues/12133
Remove the warning:
warning: offsetof within non-standard-layout type ‘JitBlock’ is conditionally-supported
JitBlock contains non-trival types now. Split the fields with trival
types that needs to be access from JIT code into JitBlockData structure.
This global belongs in the JitOptions structure, as it's a conditional
setting (A.K.A. option) that changes the behavior of what the JIT does.
Plus it keeps the scope of the variable constrained to the general area
it's intended to be used and nothing further.
In both cases of the x64 and AArch64 JITs, these would have const casted
away from them, followed by them being placed within an emitter and
having breakpoint instructions written in them.
In this case, we shouldn't be using const period if we're writing to the
emitted data.
Given JitBase shouldn't include platform specifics, we can generalize this
preprocessor define and allow any JIT to use it to indicate that generated code should be logged.
While we're at it, also move these defines beneath the includes with the
rest of the defines.
Given the code buffer is something truly common to all JIT
implementations, we can centralize it in the base class and avoid
duplicating it all over the place, while still allowing for differently
sized buffers.
This is just used as a means of carting around routines. It's not meant
to directly have functionality embedded within it--this is the job of
the inheriting data structure--so we can just make this a basic struct.
Particularly given all the data members were public to begin with.
PowerPC.h at this point is pretty much a general glob of stuff, and it's
unfortunate, since it means pulling in a lot of unrelated header
dependencies and a bunch of other things that don't need to be seen by
things that just want to read memory.
Breaking this out into its own header keeps all the MMU-related stuff
together and also limits the amount of header dependencies being
included (the primary motivation for this being the former reason).
Paired single (ps) instructions can call asm_routines that try to update
PowerPC::ppcState.pc. At the time the asm_routine is built, emulation has
not started and the PC is invalid (0). If the ps instruction causes an
exception (e.g. DSI), SRR0 gets clobbered with the invalid PC.
This change makes the relevant ps instructions store PC before calling out
to asm_routines, and prevents the asm_routine from trying to store PC
itself.
Gets rid of the need to construct UReg_MSR values around the the actual
member in order to query information from it (without using shifts and
masks). This makes it more concise in some areas, while helping with
readability in some other places (such as copying the ILE bit to the LE
bit in the exception checking functions).
Ensures that upon construction of a JitBase instance, that all
underlying members within the option and state structs are guaranteed
to be initialized.
This prevents potentially using a member uninitialized in some form.