Commit Graph

3113 Commits

Author SHA1 Message Date
76d605639b Merge pull request #11881 from JosJuice/aarch64-function-call
JitArm64: Add utility for calling a function with arguments
2023-11-25 17:30:42 +01:00
c3c0c7dc1c Jit: Get rid of short-lived std::vectors
Let's aim for making as few heap allocations as possible while jitting.
2023-11-18 23:15:37 +01:00
aa1311cd78 Merge pull request #12268 from JosJuice/fastmem-terminology
Jit: Define new terms related to fastmem
2023-11-12 19:44:45 +00:00
2333fc2701 MMU: Use VSID in segment register as additional TLB lookup key 2023-11-11 15:59:47 +00:00
620fbcdfb7 Merge pull request #12274 from JosJuice/jitarm64-non-dirty-immediates
JitArm64: Fix some oddities with non-dirty immediates
2023-11-08 20:44:32 +01:00
18d777095b MMU: on DSI exception, don't set store bit on read 2023-11-08 16:06:11 +00:00
40bf452ac8 Merge pull request #12182 from JosJuice/jit64-ps-sum1
Jit64: Use MOVSD in ps_sum1 and ps_merge01
2023-11-06 14:50:33 +01:00
f9dd13a309 JitArm64: Preserve dirty flag when materializing immediate
Before dbf5dca, the dirty flag had no meaning for an immediate value,
so we made sure to always set the dirty flag when switching a register
from Immediate to Register. But after dbf5dca, that is no longer the
case. If an immediate is marked as not dirty, we can keep the register
marked as not dirty after materializing the value. This way we skip
having to write it back to ppcState later.
2023-11-05 09:21:58 +01:00
1b7bd32ac1 JitArm64: Correctly flush non-dirty immediates
Without this change, non-dirty immediates don't actually get flushed.
This can be a problem if we for instance are flushing all registers in
order to execute an interpreter fallback. If that interpreter fallback
writes to a register that contained a non-dirty immediate, the JIT will
keep using the old value instead of loading the updated value.
2023-11-05 09:15:08 +01:00
482da7975b Jit: Define new terms related to fastmem
Dolphin's JITs have a minor terminology problem: The term "fastmem" can
refer to either the system of switching between a fast path and a slow
path using backpatching, or to the fast path itself. To hopefully make
things clearer, I'm adding some new terms, defining the old and new
terms as follows:

Fastmem: The system of switching from a fast path to a slow path by
backpatching when an invalid memory access occurs.

Fast access: A code path that accesses guest memory without calling C++
code.

Slow access: A code path that accesses guest memory by calling C++ code.
2023-11-02 21:30:12 +01:00
13c70eeb31 Merge pull request #12257 from Wack0/patch-1
JIT64 (Jit_Integer): for twx instructions, raise exception with correct SRR0
2023-11-01 21:10:39 +01:00
c248a69268 JitArm64: Add utility for calling a function with arguments
With this, situations where multiple arguments need to be moved
from multiple registers become easy to handle, and we also get
compile-time checking that the number of arguments is correct.
2023-11-01 19:01:58 +01:00
d04e67be3d Add fastmem arena setting
Just for debugging.
2023-10-31 19:43:49 +01:00
8686536d7d Jit: Always initialize fastmem arena
If dcache is enabled when the game starts, initializing the fastmem
arena is still useful in case the user changes the dcache setting.
And initializing it doesn't really cost anything.
2023-10-31 19:43:49 +01:00
0606433404 JitArm64: Check fastmem instead of fastmem_arena
Preparation for the next commit.

JitArm64 has been conflating these two flags. Most of the stuff that's
been guarded by fastmem_arena checks in fact requires fastmem.

When we have fastmem_arena without fastmem, it would be possible to do
things a bit more efficiently than what this commit does, but it's
non-trivial and therefore I will leave it out of this PR. With this
commit, we effectively have the same behavior as before this PR - plus
the added ability to toggle fastmem with a cache clear.
2023-10-31 19:43:49 +01:00
b3bfcc5d7f PowerPC: Allow toggling write-back cache during emulation
Now that PR 10575 is merged, the JIT automatically clears its cache
when this setting is changed, making this reasonable to implement.
2023-10-31 19:43:49 +01:00
899d61bc7d Jit64: Recompile asm routines on cache clear
This is needed so that the checks added in the previous commit will be
reevaluated if the value of m_enable_dcache changes.

JitArm64 was already recompiling its asm routines on cache clear by
necessity. It doesn't have the same setup as Jit64 where the asm
routines are in a separate region, so clearing the JitArm64 cache
results in the asm routines being cleared too.
2023-10-31 19:43:49 +01:00
5e74a8b850 Jit64: Don't make use of fastmem arena when dcache is enabled
Some code paths in EmuCodeBlock.cpp that were checking fastmem_arena
should really also be checking m_enable_dcache.

Because JitArm64 centralizes more or less all memory access to the
EmitBackpatchRoutine function and because that function already
contained a check, JitArm64 works fine without the additional checks
added by this commit. Regardless, I added the checks to MMU.cpp instead
of EmuCodeBlock.cpp where applicable so they would be available to
JitArm64. Maybe one day JitArm64 will need them if its code gets
restructured.
2023-10-31 19:43:40 +01:00
2ccc2bfb2e Merge pull request #12250 from Sintendo/dcbx-nit
Jit_LoadStore: Minor dcbx register optimizations
2023-10-28 02:33:51 +01:00
c9cd0b626b JIT64: for twx instruction, raise exception with correct SRR0 2023-10-27 13:27:36 +01:00
171f76ae07 Jit_LoadStore: Another minor dcbx optimization
The multiplication needs the value from RSCRATCH2, but shouldn't
overwrite it as it is still needed later. The original code solved this
by copying RSCRATCH2 to another register first.

As it turns out, the other register involved in the multiplication can
safely be overwritten, so we can swap the operands around and use
RSCRATCH2 directly without making a copy.

Before:
33 D2                xor         edx,edx
8B 45 64             mov         eax,dword ptr [rbp+64h]
85 C0                test        eax,eax
7E 30                jle         000002D4DF373F6B
44 8B B5 D4 02 00 00 mov         r14d,dword ptr [rbp+2D4h]
44 8B E8             mov         r13d,eax
BF 07 00 00 00       mov         edi,7
F7 F7                div         eax,edi
41 8D 56 FF          lea         edx,[r14-1]
3B C2                cmp         eax,edx
0F 42 D0             cmovb       edx,eax
44 2B F2             sub         r14d,edx
44 89 B5 D4 02 00 00 mov         dword ptr [rbp+2D4h],r14d
8B C2                mov         eax,edx
0F AF C7             imul        eax,edi
44 2B E8             sub         r13d,eax
44 89 6D 64          mov         dword ptr [rbp+64h],r13d
44 8D 72 01          lea         r14d,[rdx+1]

After:
33 D2                xor         edx,edx
8B 45 64             mov         eax,dword ptr [rbp+64h]
85 C0                test        eax,eax
7E 2E                jle         0000021C01013F69
44 8B B5 D4 02 00 00 mov         r14d,dword ptr [rbp+2D4h]
44 8B E8             mov         r13d,eax
BF 07 00 00 00       mov         edi,7
F7 F7                div         eax,edi
41 8D 56 FF          lea         edx,[r14-1]
3B C2                cmp         eax,edx
0F 42 D0             cmovb       edx,eax
44 2B F2             sub         r14d,edx
44 89 B5 D4 02 00 00 mov         dword ptr [rbp+2D4h],r14d
0F AF FA             imul        edi,edx
44 2B EF             sub         r13d,edi
44 89 6D 64          mov         dword ptr [rbp+64h],r13d
44 8D 72 01          lea         r14d,[rdx+1]
2023-10-24 00:42:35 +02:00
dd58a8d65e Jit_LoadStore: Minor dcbx register optimization
Instructions referencing registers r8-r15 take an additional byte to
encode. `reg_downcount` may be assigned to one of these registers, so it
is a small size win to store the downcount value in `RSCRATCH` first.

Before:
33 D2                xor         edx,edx
44 8B 6D 64          mov         r13d,dword ptr [rbp+64h]
45 85 ED             test        r13d,r13d
7E 30                jle         0000023546B43F6D
44 8B B5 D4 02 00 00 mov         r14d,dword ptr [rbp+2D4h]
41 8B C5             mov         eax,r13d
BF 07 00 00 00       mov         edi,7
F7 F7                div         eax,edi

After:
33 D2                xor         edx,edx
8B 45 64             mov         eax,dword ptr [rbp+64h]
85 C0                test        eax,eax
7E 30                jle         000001AFBBAE359D
44 8B B5 D4 02 00 00 mov         r14d,dword ptr [rbp+2D4h]
44 8B E8             mov         r13d,eax
BF 07 00 00 00       mov         edi,7
F7 F7                div         eax,edi
2023-10-22 15:13:52 +02:00
3c3168706c PowerPC: Negate m_dec values in frsqrte table
This value is used in a multiplication. The result of this
multiplication is then subtracted from m_base. By negating m_dec, we are
free to use an addition instead.

On x64, this saves an instruction.
2023-10-21 21:08:21 +02:00
1b7a590b4b Merge pull request #12209 from JosJuice/frsqrte-exp-lsb
PowerPC: Flip the order of frsqrte_expected
2023-10-10 10:38:07 +02:00
219610d8a0 Jit64: Increase nearcode/farcode size 2023-10-04 13:05:09 -07:00
02d76ba2a0 Jit: Fix fastmem initialization order
When evaluating whether jo.fastmem should be set to true, we check the
value of jo.fastmem_arena. However, due to a change made in 28e8117b90,
jo.fastmem_arena wasn't set until after the first time we set
jo.fastmem, so jo.fastmem would end up always being false until the next
time RefreshConfig was called.

Fixes https://bugs.dolphin-emu.org/issues/13364.
2023-09-30 16:16:54 +02:00
9e0fea8fc7 PowerPC: Flip the order of frsqrte_expected
This makes the the frsqrte routines one instruction shorter.
2023-09-30 11:41:27 +02:00
60e331e2e1 PowerPC: reduce location assert cost 2023-09-28 19:57:03 -04:00
8dbbae76de Jit64: Optimize ps_merge01 with Rd == Rb 2023-09-21 18:03:39 +02:00
a79fe768e3 Jit64: Simplify ps_sum1 2023-09-17 11:04:37 +02:00
fd9c970621 JitArm64/Jit64: Extend the fast lookup mmap-ed segment further to avoid needing to check the msr bits.
And in order to avoid a double dereference in the dispatcher, directly store
the normalEntry in the map.

The index to the block map becomes ((((DR<<1) | IR) << 30) | (address >> 2)).
This has been chosen since the msr bits change less often than the address,
thus we keep nearby entries together.

Also do not call the C dispatcher in case the assembly dispatcher didn't
find a block, since it wouldn't find a block either due to the 1:1 mapping,
except when falling back to the non shm segment lookup table.
2023-09-15 19:46:15 +03:00
d16bedd5c4 Merge pull request #12178 from JosJuice/jit-gp-pc
Jit: Use correct address when checking fifoWriteAddresses
2023-09-10 15:58:23 +02:00
92d67df4e9 Merge pull request #12138 from JosJuice/jit-gp-check-discard
Jit: Don't discard before gather pipe interrupt check
2023-09-10 15:10:37 +02:00
34b0a6ea90 Jit: Check for discarded registers when flushing
This adds a check for the bug addressed by the previous commit.
2023-09-10 12:54:52 +02:00
5902b5b113 PPCAnalyst: Don't discard before gather pipe interrupt check
This bug has been lurking in the code ever since I added the discard
functionality. It doesn't seem to be triggered all that often,
and on top of that the emitted code only runs conditionally, so I'm not
sure if people have been affected by this bug in practice or not.
2023-09-10 12:54:52 +02:00
f7f4da2be8 Jit: Use correct address when checking fifoWriteAddresses
We need to check for the address of the *previous* instruction, because
checking fifoWriteAddresses happens not at the end of the instruction
that triggered it but at the start of the next instruction.
2023-09-10 12:54:18 +02:00
1a0f0e7e96 Merge pull request #12081 from JosJuice/jitarm64-debug-exit-pc
JitArm64: Store PC on debug exit
2023-09-10 02:10:29 +02:00
cf2a1f29b7 Core/JitCache: Don't try to allocate the fast block map on 32-bit builds. 2023-09-07 14:48:57 +02:00
bd57d17dee Merge pull request #12079 from JosJuice/blr-no-fastmem
Jit: Allow BLR optimization without fastmem
2023-09-02 12:45:39 -04:00
f1c1c6ded6 JitCache: Fix potentially dangling pointer to fast block map.
Whenever JitBaseBlockCache::Clear() got called, it threw away the memory mapping for the fast block map and created a new one. This new mapping typically got mapped at the same address at the old one, but this is not guaranteed. The pointer to the mapping gets embedded in the generated dispatcher code in Jit64AsmRoutineManager::Generate(), which is only called once on game boot, so if the new mapping ended up at a different address than the old one, the pointer in the ASM pointed at garbage, leading to a crash.

This fixes the issue by guaranteeing that the new mapping is mapped at the same address.
2023-09-02 04:03:22 +02:00
4131dffae9 Jit: Allow BLR optimization without fastmem
While both fastmem and the BLR optimization depend on fault handling,
the BLR optimization doesn't depend on fastmem, and there are cases
where you might want the BLR optimization but not fastmem. For me
personally, it's useful when I try to use a debugger on Android and have
to disable fastmem so I don't get SIGSEGVs all the time, but it would be
especially useful for iOS users.
2023-08-29 22:55:29 +02:00
af2c32635a Jit: Add more error checking to ProtectStack 2023-08-29 22:46:50 +02:00
1b2d0c0507 Merge pull request #10575 from JosJuice/jitbase-auto-clear
Jit: Automatically clear cache when JIT settings are updated
2023-08-29 15:56:25 -04:00
7daa19f40d JitArm64: Avoid loading compilerPC multiple times if it's already in a register. 2023-08-26 18:14:07 +03:00
85281e76ee Jit: Remove unnecessary member variables 2023-08-26 17:05:04 +02:00
28e8117b90 Jit: Automatically clear cache when JIT settings are updated
This fixes a problem where changing the JIT debug settings on
Android while a game was running wouldn't cause the changed settings
to apply to code blocks that already had been compiled.
2023-08-26 17:04:56 +02:00
cd31da97d6 Merge pull request #11191 from JosJuice/jitarm64-no-checked-entry
JitArm64: Never check downcount on block entry
2023-08-26 17:00:08 +02:00
2502e412b3 Merge pull request #12117 from JosJuice/config-callback-cpu
Don't call RunAsCPUThread in config callbacks
2023-08-26 16:34:46 +02:00
58ab94c30c GCC: Suppress PPCSTATE_OFF invalid-offsetof warnings
Modify PPCSTATE_OFF and PPCSTATE_OFF_ARRAY macros when using GCC to
avoid useless log spam. Specifically, use a consteval lambda with gcc
_Pragma statements to disable the -Winvalid-offsetof warning inside the
macros.

Each successful build (and many failing ones) on the Android buildbot
generates almost 300 cases of -Winvalid-offsetof, resulting in thousands
of lines of log spam per build. In addition to bloating the log filesize
these spurious warnings make it harder to find actual warnings.

These warnings are generated by calls to the macros PPCSTATE_OFF and
PPCSTATE_OFF_ARRAY, which in turn are used by many other macros used by
the JIT. The ultimate cause is that offsetof is only conditionally
supported on non-standard-layout types, which includes the PowerPCState
struct.

To address potential questions of whether there's a better way to handle
this:

The obvious solution would be to modify PowerPCState so that it does
have a standard layout. This is unfortunately impractical.

To have a standard layout a type can only contain other types with
standard layouts. None of the stl containers are guaranteed to have
standard layouts, and PowerPCState contains a std::tuple and std::array.
PowerPCState also contains a PowerPC::Cache and InstructionCache which
themselves contain std:arrays and std::vectors.

Furthermore InstructionCache derives from Cache, and a derived class can
only have standard layout if at most one class in its hierarchy has a
non-static data member, but both classes have such members. Making
InstructionCache have a standard layout would require duplicating all
the functionality of Cache so it no longer derived from it, as well as
replacing the stl containers. This might require having a raw pointer to
said containers, with the manual memory management that implies.

All of that would be much more disruptive than would be justified to get
rid of some warnings (however annoying they might be). This is
compounded by the fact that PowerPCState hasn't had a standard layout
for a long time, if ever, and if the PPCSTATE_OFF macros weren't working
reliably it would have become obvious a long time ago.

As to why I picked the lambda solution over other potential changes:

- Keeping the define as-is and wrapping some gcc #pragmas around it
  doesn't work because the pragmas don't get included when the define is
  substituted to the call site.

- Keeping the define as a non-lambda expression and using inline
  _Pragma() statements would ideally be better and works fine for msvc,
  but fails for GCC with "'#pragma' is not allowed here".

- Turning off -Winvalid-offsetof globally for gcc would work, but there
  might be other contexts where offsetof is problematic and GCC seems to
  be the only compiler warning about it.
2023-08-21 14:01:11 -07:00
f19651e49b Merge pull request #11025 from AdmiralCurtiss/hle-printf
HLE_OS: Manually handle printfs from emulated software to prevent emulated software from crashing Dolphin with an invalid printf formatting string.
2023-08-20 01:31:49 +02:00