Before dbf5dca, the dirty flag had no meaning for an immediate value,
so we made sure to always set the dirty flag when switching a register
from Immediate to Register. But after dbf5dca, that is no longer the
case. If an immediate is marked as not dirty, we can keep the register
marked as not dirty after materializing the value. This way we skip
having to write it back to ppcState later.
Without this change, non-dirty immediates don't actually get flushed.
This can be a problem if we for instance are flushing all registers in
order to execute an interpreter fallback. If that interpreter fallback
writes to a register that contained a non-dirty immediate, the JIT will
keep using the old value instead of loading the updated value.
Dolphin's JITs have a minor terminology problem: The term "fastmem" can
refer to either the system of switching between a fast path and a slow
path using backpatching, or to the fast path itself. To hopefully make
things clearer, I'm adding some new terms, defining the old and new
terms as follows:
Fastmem: The system of switching from a fast path to a slow path by
backpatching when an invalid memory access occurs.
Fast access: A code path that accesses guest memory without calling C++
code.
Slow access: A code path that accesses guest memory by calling C++ code.
With this, situations where multiple arguments need to be moved
from multiple registers become easy to handle, and we also get
compile-time checking that the number of arguments is correct.
If dcache is enabled when the game starts, initializing the fastmem
arena is still useful in case the user changes the dcache setting.
And initializing it doesn't really cost anything.
Preparation for the next commit.
JitArm64 has been conflating these two flags. Most of the stuff that's
been guarded by fastmem_arena checks in fact requires fastmem.
When we have fastmem_arena without fastmem, it would be possible to do
things a bit more efficiently than what this commit does, but it's
non-trivial and therefore I will leave it out of this PR. With this
commit, we effectively have the same behavior as before this PR - plus
the added ability to toggle fastmem with a cache clear.
This is needed so that the checks added in the previous commit will be
reevaluated if the value of m_enable_dcache changes.
JitArm64 was already recompiling its asm routines on cache clear by
necessity. It doesn't have the same setup as Jit64 where the asm
routines are in a separate region, so clearing the JitArm64 cache
results in the asm routines being cleared too.
Some code paths in EmuCodeBlock.cpp that were checking fastmem_arena
should really also be checking m_enable_dcache.
Because JitArm64 centralizes more or less all memory access to the
EmitBackpatchRoutine function and because that function already
contained a check, JitArm64 works fine without the additional checks
added by this commit. Regardless, I added the checks to MMU.cpp instead
of EmuCodeBlock.cpp where applicable so they would be available to
JitArm64. Maybe one day JitArm64 will need them if its code gets
restructured.
Instructions referencing registers r8-r15 take an additional byte to
encode. `reg_downcount` may be assigned to one of these registers, so it
is a small size win to store the downcount value in `RSCRATCH` first.
Before:
33 D2 xor edx,edx
44 8B 6D 64 mov r13d,dword ptr [rbp+64h]
45 85 ED test r13d,r13d
7E 30 jle 0000023546B43F6D
44 8B B5 D4 02 00 00 mov r14d,dword ptr [rbp+2D4h]
41 8B C5 mov eax,r13d
BF 07 00 00 00 mov edi,7
F7 F7 div eax,edi
After:
33 D2 xor edx,edx
8B 45 64 mov eax,dword ptr [rbp+64h]
85 C0 test eax,eax
7E 30 jle 000001AFBBAE359D
44 8B B5 D4 02 00 00 mov r14d,dword ptr [rbp+2D4h]
44 8B E8 mov r13d,eax
BF 07 00 00 00 mov edi,7
F7 F7 div eax,edi
This value is used in a multiplication. The result of this
multiplication is then subtracted from m_base. By negating m_dec, we are
free to use an addition instead.
On x64, this saves an instruction.
When evaluating whether jo.fastmem should be set to true, we check the
value of jo.fastmem_arena. However, due to a change made in 28e8117b90,
jo.fastmem_arena wasn't set until after the first time we set
jo.fastmem, so jo.fastmem would end up always being false until the next
time RefreshConfig was called.
Fixes https://bugs.dolphin-emu.org/issues/13364.
And in order to avoid a double dereference in the dispatcher, directly store
the normalEntry in the map.
The index to the block map becomes ((((DR<<1) | IR) << 30) | (address >> 2)).
This has been chosen since the msr bits change less often than the address,
thus we keep nearby entries together.
Also do not call the C dispatcher in case the assembly dispatcher didn't
find a block, since it wouldn't find a block either due to the 1:1 mapping,
except when falling back to the non shm segment lookup table.
This bug has been lurking in the code ever since I added the discard
functionality. It doesn't seem to be triggered all that often,
and on top of that the emitted code only runs conditionally, so I'm not
sure if people have been affected by this bug in practice or not.
We need to check for the address of the *previous* instruction, because
checking fifoWriteAddresses happens not at the end of the instruction
that triggered it but at the start of the next instruction.
Whenever JitBaseBlockCache::Clear() got called, it threw away the memory mapping for the fast block map and created a new one. This new mapping typically got mapped at the same address at the old one, but this is not guaranteed. The pointer to the mapping gets embedded in the generated dispatcher code in Jit64AsmRoutineManager::Generate(), which is only called once on game boot, so if the new mapping ended up at a different address than the old one, the pointer in the ASM pointed at garbage, leading to a crash.
This fixes the issue by guaranteeing that the new mapping is mapped at the same address.
While both fastmem and the BLR optimization depend on fault handling,
the BLR optimization doesn't depend on fastmem, and there are cases
where you might want the BLR optimization but not fastmem. For me
personally, it's useful when I try to use a debugger on Android and have
to disable fastmem so I don't get SIGSEGVs all the time, but it would be
especially useful for iOS users.
This fixes a problem where changing the JIT debug settings on
Android while a game was running wouldn't cause the changed settings
to apply to code blocks that already had been compiled.
Modify PPCSTATE_OFF and PPCSTATE_OFF_ARRAY macros when using GCC to
avoid useless log spam. Specifically, use a consteval lambda with gcc
_Pragma statements to disable the -Winvalid-offsetof warning inside the
macros.
Each successful build (and many failing ones) on the Android buildbot
generates almost 300 cases of -Winvalid-offsetof, resulting in thousands
of lines of log spam per build. In addition to bloating the log filesize
these spurious warnings make it harder to find actual warnings.
These warnings are generated by calls to the macros PPCSTATE_OFF and
PPCSTATE_OFF_ARRAY, which in turn are used by many other macros used by
the JIT. The ultimate cause is that offsetof is only conditionally
supported on non-standard-layout types, which includes the PowerPCState
struct.
To address potential questions of whether there's a better way to handle
this:
The obvious solution would be to modify PowerPCState so that it does
have a standard layout. This is unfortunately impractical.
To have a standard layout a type can only contain other types with
standard layouts. None of the stl containers are guaranteed to have
standard layouts, and PowerPCState contains a std::tuple and std::array.
PowerPCState also contains a PowerPC::Cache and InstructionCache which
themselves contain std:arrays and std::vectors.
Furthermore InstructionCache derives from Cache, and a derived class can
only have standard layout if at most one class in its hierarchy has a
non-static data member, but both classes have such members. Making
InstructionCache have a standard layout would require duplicating all
the functionality of Cache so it no longer derived from it, as well as
replacing the stl containers. This might require having a raw pointer to
said containers, with the manual memory management that implies.
All of that would be much more disruptive than would be justified to get
rid of some warnings (however annoying they might be). This is
compounded by the fact that PowerPCState hasn't had a standard layout
for a long time, if ever, and if the PPCSTATE_OFF macros weren't working
reliably it would have become obvious a long time ago.
As to why I picked the lambda solution over other potential changes:
- Keeping the define as-is and wrapping some gcc #pragmas around it
doesn't work because the pragmas don't get included when the define is
substituted to the call site.
- Keeping the define as a non-lambda expression and using inline
_Pragma() statements would ideally be better and works fine for msvc,
but fails for GCC with "'#pragma' is not allowed here".
- Turning off -Winvalid-offsetof globally for gcc would work, but there
might be other contexts where offsetof is problematic and GCC seems to
be the only compiler warning about it.