dolphin

mirror of https://github.com/dolphin-emu/dolphin.git synced 2025-08-05 12:39:00 -06:00

Author	SHA1	Message	Date
Léo Lam	51bf2dca21	Merge pull request #9675 from JosJuice/jit64-div-80000000 Jit64: Fix UB/infinite loop when compiling division by 0x80000000	2021-04-26 23:50:27 +02:00
JosJuice	7d4b87e7ae	Jit64: Fix UB/infinite loop when compiling division by 0x80000000	2021-04-26 23:42:03 +02:00
JosJuice	ac679eb24d	Merge pull request #9666 from leoetlino/jit-block-hashtable Jit: Optimize block link queries by using hash tables	2021-04-25 18:45:41 +02:00
JosJuice	69c14d6ec3	JitArm64: Fix frspx with single precision source I haven't observed this breaking any game, but it didn't match the behavior of the interpreter as far as I could tell from reading the code, in that denormals weren't being flushed.	2021-04-25 15:56:59 +02:00
JosJuice	54451ac731	JitArm64: Use ConvertSingleToDoubleLower in RW when faster	2021-04-25 15:56:59 +02:00
JosJuice	9d6263f306	JitArm64: Add unit tests for single/double conversion	2021-04-25 15:56:58 +02:00
JosJuice	2a9d88739c	JitArm64: Skip accurate single/double conversion if store-safe	2021-04-25 15:56:58 +02:00
JosJuice	1d106ceaf5	JitArm64: Optimize ConvertSingleToDouble, part 2 If we can prove that FCVT will provide a correct conversion, we can use FCVT. This makes the common case a bit faster and the less likely cases (unfortunately including zero, which FCVT actually can convert correctly) a bit slower.	2021-04-25 15:56:19 +02:00
JosJuice	018e247624	JitArm64: Optimize ConvertSingleToDouble, part 1	2021-04-25 15:56:19 +02:00
JosJuice	28e4869c43	JitArm64: Optimize ConvertDoubleToSingle	2021-04-25 15:56:19 +02:00
JosJuice	6e0a5876ef	JitArm64: Use accurate single/double conversions Our old conversion approach became a lot more inaccurate when enabling flush-to-zero, to the point of obviously breaking games.	2021-04-25 15:56:19 +02:00
JosJuice	39eccf6603	JitArm64: Call RW before FCMPE in fselx Needed because the next commit will make RW clobber flags.	2021-04-25 15:56:19 +02:00
JosJuice	949686bbe7	JitArm64: Factor out single/double conversion code to functions Preparation for following commits. This commit intentionally doesn't touch paired stores, since paired stores are supposed to flush to zero. (Consistent with Jit64.)	2021-04-25 15:56:19 +02:00
JosJuice	fdf7744a53	JitArm64: Move float conversion code out of EmitBackpatchRoutine This simplifies some of the following commits. It does require an extra register, but hey, we have 32 of them. Something I think would be nice to add to the register cache in the future is the ability to keep both the single and double version of a guest register in two different host registers when that is useful. That way, the extra register we write to here can be read by a later instruction, saving us from having to perform the same conversion again.	2021-04-25 15:56:19 +02:00
Léo Lam	aa3a96f048	Merge pull request #9644 from JosJuice/jit-fallback-discard Jits: Fix interpreter fallback handling of discarded registers	2021-04-25 13:20:41 +02:00
JosJuice	b3b5016f54	Jits: Fix interpreter fallback handling of discarded registers When the interpreter writes to a discarded register, its type must be changed so that it is no longer considered discarded. Fixes a `62ce1c7` regression.	2021-04-25 13:01:40 +02:00
Léo Lam	c812ab6a63	Jit: Optimize block link queries by using hash tables Repeated erase() + iteration on a std::multimap is extremely slow. Slow enough that it causes a 7 second long stutter during some transitions in F-Zero X (a N64 VC game that triggers many, many icache invalidations). And slow enough that JitBaseBlockCache::DestroyBlock shows up on a flame graph as taking >50% of total CPU time on the CPU-GPU thread: https://i.imgur.com/vvqiFL6.png This commit optimises those block link queries by replacing the std::multimap (which is typically implemented with red-black trees) with hash tables. Master: https://i.imgur.com/vvqiFL6.png / 7s stutters (starting from 5.0-2021 and with branch following disabled) This commit: https://i.imgur.com/hAO74fy.png / ~0.7s stutters, which is pretty close to 5.0 stable. (5.0-2021 introduced the performance regression and it is especially noticeable when branch following is disabled, which is the case for all N64 VC games since 5.0-8377.)	2021-04-24 17:20:59 +02:00
Lioncash	adebc499f9	Jit64: Indicate explicit [[fallthrough]] within load helper	2021-04-19 17:37:44 -04:00
JMC47	5322256065	Merge pull request #9625 from leoetlino/mmu-sdr-update MMU: Fix SDR updates being silently dropped in some cases	2021-04-06 20:23:13 -04:00
Pokechu22	dad309d365	Disable ICache emulation for some games Specifically, 'Scooby-Doo! Mystery Mayhem', 'Scooby-Doo! Unmasked', 'Ed, Edd n Eddy: The Mis-Edventures', and the Wii version of 'Happy Feet'. The JIT cache causes problems with emulated icache invalidation in these games, resulting in areas failing to load.	2021-04-06 12:44:10 -07:00
Léo Lam	49edd5f482	MMU: Remove a bunch of useless swaps The swaps are confusing and don't accomplish much. It was originally written like this: u32 pte = bswap((u32)&base_mem[pteg_addr]); then bswap was changed to Common::swap32, and then the array access was replaced with Memory::Read_U32, leading to the useless swaps.	2021-04-06 18:25:29 +02:00
Léo Lam	960d957f4f	MMU: Fix SDR updates being silently dropped in some cases While 6xx_pem.pdf §7.6.1.1 mentions that the number of trailing zeros in HTABORG must be equal to the number of trailing ones in the mask (i.e. HTABORG must be properly aligned), this is actually not a hard requirement. Real hardware will just OR the base address anyway. Ignoring SDR changes would lead to incorrect emulation. Logging a warning instead of dropping the SDR update silently is a saner behaviour.	2021-04-06 18:25:09 +02:00
JMC47	5222a4b7e5	Merge pull request #9585 from JosJuice/jitarm64-skip-carry JitArm64: Skip calculating carry flag when not needed	2021-04-06 04:41:16 -04:00
JMC47	99d43362e6	Merge pull request #9351 from JosJuice/discard-registers Jits: Discard registers which we know will be overwritten	2021-04-06 04:40:26 -04:00
Pokechu22	004dfd1586	Replace uses of cassert with Common/Assert.h	2021-04-02 10:18:18 -07:00
JosJuice	b3f71f7cdc	JitArm64: Allow DoJit at address 0 (fix launching Wii titles) JitArm64::DoJit contains a check where it prints a warning and tries to pause emulation if instructed to compile code at address 0. I'm assuming this was done in order to provide a nicer error behavior in cases where PC was accidentally set to null. Unfortunately, it has started causing us problems recently, as `688bd61` writes and runs some code at address 0 to simulate the PPC being held in reset. What makes this worse is that calling Core::SetState from the CPU thread is actually not allowed and will cause a deadlock instead of the intended behavior. I don't believe there is anything on a real console that would stop you from executing code at address 0 (as long as the MMU has been set up to allow it), and Jit64::DoJit doesn't contain any check like this, so let's remove the check.	2021-04-01 11:28:53 +02:00
Léo Lam	c915b780cf	Merge pull request #9596 from Minty-Meeo/apply-moar-RunAsCPUThread Apply More Core::RunAsCPUThread	2021-03-27 01:11:34 +01:00
JosJuice	62ce1c7653	Jits: Discard registers which we know will be overwritten This commit adds a new "discarded" state for registers. Discarding a register is like flushing it, but without actually writing its value back to memory. We can discard a register only when it is guaranteed that no instruction will read from the register before it is next written to. Discarding reduces the register pressure a little, and can also let us skip a few flushes on interpreter fallbacks.	2021-03-24 20:48:44 +01:00
JosJuice	901170e299	PPCTables: Use u64 for instruction flags We've run out of space :(	2021-03-24 20:48:36 +01:00
JosJuice	1845c5948d	PPCAnalyst: Rework the store-safe logic The output of instructions like fabsx and ps_sel is store-safe if and only if the relevant inputs are. The old code was always marking the output as store-safe if the output was a single, and never otherwise. Also, the old code was treating the output of psq_l/psq_lu as store-safe, which seems incorrect (if dequantization is disabled).	2021-03-24 12:02:09 +01:00
JosJuice	3bd920638d	JitArm64: Use STP for pc/npc, part 2 I missed one place in `dd8e504`.	2021-03-23 21:27:07 +01:00
LC	15ebb1d9e4	Merge pull request #9566 from Sintendo/jit64divwx Jit64: Optimize divwx	2021-03-22 14:40:02 -04:00
JosJuice	baecddd262	JitArm64: Skip calculating carry flag when not needed	2021-03-19 23:02:24 +01:00
Markus Wick	bcd572a820	Merge pull request #9593 from JosJuice/jitarm64-constant-carry JitArm64: Constant carry flag optimizations	2021-03-19 22:58:17 +01:00
JosJuice	4c2cdb61df	JitArm64: Constant carry flag optimizations If we know at compile time that the PPC carry flag definitely has a certain value, we can bake that value into the emitted code and skip having to read from PPCState.	2021-03-19 22:40:19 +01:00
JosJuice	c5abcba77a	JitArm64: Fix broken format strings in Arm64RegCache	2021-03-19 16:14:20 +01:00
Minty-Meeo	db7f3f8f25	Apply More Core::RunAsCPUThread In places where applicable, Core::RunAsCPUThread has replaced Core::SetState workarounds to pause and resume emulation for thread-sensitive operations. - void Core::SaveScreenShot() - void Core::SaveScreenShot(std::string_view name) - void JitInterface::GetProfileResults(Profiler::ProfileStats *prof_stats) - void MainWindow::OnExportRecording()	2021-03-18 22:31:28 -05:00
JosJuice	621b5b8e1a	JitArm64: Optimize general case of srawx Same approach as Jit64. A lot simpler, don't you think? :)	2021-03-17 00:15:23 +01:00
JosJuice	a45a0a2066	Merge pull request #9494 from Dentomologist/convert_arm64reg_to_enum_class Arm64Gen: Convert ARM64Reg to enum class	2021-03-17 00:05:23 +01:00
JosJuice	c0f840525f	JitArm64: Improve srawx special case carry calculation At a first glance it may look like a part of the code I added to srawx in `efeda3b` has a bug when a == s. The code actually happens to work correctly, but in the interest of making the code easier to reason about, I'd like to change the way it's implemented. This change should improve the pipelining a little in the a == s case too.	2021-03-14 18:55:42 +01:00
Dentomologist	f0f206714f	Arm64Gen: Convert ARM64Reg to enum class Most changes are just adding ARM64Reg:: in front of the constants.	2021-03-13 10:10:59 -08:00
Sintendo	defe7162f5	Jit64: divwx - Simplify divisor == -1 case Suggested by @MerryMage. Thanks! Co-authored-by: merry <MerryMage@users.noreply.github.com>	2021-03-07 18:29:12 +01:00
Sintendo	83f38388a1	Jit64: divwx - Micro-optimize default case Both the normal path and the overflow path end with the same instruction, so their tails can be merged. Before: 41 8B C7 mov eax,r15d 45 85 C0 test r8d,r8d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 F8 FF cmp r8d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B F0 mov r14d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 F8 idiv eax,r8d 44 8B F0 mov r14d,eax done: After: 41 8B C7 mov eax,r15d 45 85 C0 test r8d,r8d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0B jne normal_path 41 83 F8 FF cmp r8d,0FFFFFFFFh 75 05 jne normal_path overflow: C1 F8 1F sar eax,1Fh EB 04 jmp done normal_path: 99 cdq 41 F7 F8 idiv eax,r8d done: 44 8B F0 mov r14d,eax	2021-03-07 18:29:12 +01:00
Sintendo	1865035798	Jit64: divwx - Optimize division by 2 ...and let's optimize a divisor of 2 ever so slightly for good measure. I wouldn't have bothered, but most GameCube games seem to hit this on launch. - Division by 2 Before: 41 BE 02 00 00 00 mov r14d,2 41 8B C2 mov eax,r10d 45 85 F6 test r14d,r14d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FE FF cmp r14d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B F0 mov r14d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FE idiv eax,r14d 44 8B F0 mov r14d,eax done: After: 45 8B F2 mov r14d,r10d 41 C1 EE 1F shr r14d,1Fh 45 03 F2 add r14d,r10d 41 D1 FE sar r14d,1	2021-03-07 18:29:12 +01:00
Sintendo	0637a7ec59	Jit64: divwx - Optimize power-of-two divisors Power-of-two divisors can be done more elegantly, so handle them separately. - Division by 4 Before: 41 BD 04 00 00 00 mov r13d,4 41 8B C0 mov eax,r8d 45 85 ED test r13d,r13d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FD FF cmp r13d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B E8 mov r13d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FD idiv eax,r13d 44 8B E8 mov r13d,eax done: After: 45 85 C0 test r8d,r8d 45 8D 68 03 lea r13d,[r8+3] 45 0F 49 E8 cmovns r13d,r8d 41 C1 FD 02 sar r13d,2	2021-03-07 18:29:12 +01:00
Sintendo	530475dce8	Jit64: divwx - Micro-optimize certain divisors When the multiplier is positive (which is the most common case), we can generate slightly better code. - Division by 30307 Before: 49 63 C5 movsxd rax,r13d 48 69 C0 65 6B 32 45 imul rax,rax,45326B65h 4C 8B C0 mov r8,rax 48 C1 E8 3F shr rax,3Fh 49 C1 F8 2D sar r8,2Dh 44 03 C0 add r8d,eax After: 49 63 C5 movsxd rax,r13d 4C 69 C0 65 6B 32 45 imul r8,rax,45326B65h C1 E8 1F shr eax,1Fh 49 C1 F8 2D sar r8,2Dh 44 03 C0 add r8d,eax	2021-03-07 18:29:12 +01:00
Sintendo	95698c5ae1	Jit64: divwx - Optimize constant divisor Optimize division by a constant into multiplication. This method is also used by GCC and LLVM. We also add optimized paths for divisors 0, 1, and -1, because they don't work using this method. They don't occur very often, but are necessary for correctness. - Division by 1 Before: 41 BF 01 00 00 00 mov r15d,1 41 8B C5 mov eax,r13d 45 85 FF test r15d,r15d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FF FF cmp r15d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B F8 mov r15d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FF idiv eax,r15d 44 8B F8 mov r15d,eax done: After: 45 8B FD mov r15d,r13d - Division by 30307 Before: 41 BA 63 76 00 00 mov r10d,7663h 41 8B C5 mov eax,r13d 45 85 D2 test r10d,r10d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FA FF cmp r10d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B C0 mov r8d,eax EB 07 jmp done normal_path: 99 cdq 41 F7 FA idiv eax,r10d 44 8B C0 mov r8d,eax done: After: 49 63 C5 movsxd rax,r13d 48 69 C0 65 6B 32 45 imul rax,rax,45326B65h 4C 8B C0 mov r8,rax 48 C1 E8 3F shr rax,3Fh 49 C1 F8 2D sar r8,2Dh 44 03 C0 add r8d,eax - Division by 30323 Before: 41 BA 73 76 00 00 mov r10d,7673h 41 8B C5 mov eax,r13d 45 85 D2 test r10d,r10d 74 0D je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0E jne normal_path 41 83 FA FF cmp r10d,0FFFFFFFFh 75 08 jne normal_path overflow: C1 F8 1F sar eax,1Fh 44 8B C0 mov r8d,eax EB 07 jmp 00000000161737E7 normal_path: 99 cdq 41 F7 FA idiv eax,r10d 44 8B C0 mov r8d,eax done: After: 49 63 C5 movsxd rax,r13d 4C 69 C0 19 25 52 8A imul r8,rax,0FFFFFFFF8A522519h 49 C1 E8 20 shr r8,20h 44 03 C0 add r8d,eax C1 E8 1F shr eax,1Fh 41 C1 F8 0E sar r8d,0Eh 44 03 C0 add r8d,eax	2021-03-07 18:29:01 +01:00
Sintendo	5bb8798df6	JitCommon: Signed 32-bit division magic constants Add a function to calculate the magic constants required to optimize signed 32-bit division. Since this optimization is not exclusive to any particular architecture, JitCommon seemed like a good place to put this.	2021-03-07 18:27:36 +01:00
Sintendo	c9adc60d73	Jit64: divwx - Special case dividend == 0 Zero divided by any number is still zero. For whatever reason, this case shows up frequently too. Before: B8 00 00 00 00 mov eax,0 85 F6 test esi,esi 74 0C je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0C jne normal_path 83 FE FF cmp esi,0FFFFFFFFh 75 07 jne normal_path overflow: C1 F8 1F sar eax,1Fh 8B F8 mov edi,eax EB 05 jmp done normal_path: 99 cdq F7 FE idiv eax,esi 8B F8 mov edi,eax done: After: Nothing!	2021-03-07 18:27:30 +01:00
Sintendo	c081e3f2b3	Jit64: divwx - Optimize constant dividend When the dividend is known at compile time, we can eliminate some of the branching and precompute the result for the overflow case. Before: B8 54 D3 E6 02 mov eax,2E6D354h 85 FF test edi,edi 74 0C je overflow 3D 00 00 00 80 cmp eax,80000000h 75 0C jne normal_path 83 FF FF cmp edi,0FFFFFFFFh 75 07 jne normal_path overflow: C1 F8 1F sar eax,1Fh 8B F8 mov edi,eax EB 05 jmp done normal_path: 99 cdq F7 FF idiv eax,edi 8B F8 mov edi,eax done: After: 85 FF test edi,edi 75 04 jne normal_path 33 FF xor edi,edi EB 0A jmp done normal_path: B8 54 D3 E6 02 mov eax,2E6D354h 99 cdq F7 FF idiv eax,edi 8B F8 mov edi,eax done: Fairly common with constant dividend of zero. Non-zero values occur frequently in Ocarina of Time Master Quest.	2021-03-07 18:25:08 +01:00

1 2 3 4 5 ...

2414 Commits