dolphin

mirror of https://github.com/dolphin-emu/dolphin.git synced 2025-07-23 06:09:50 -06:00

Author	SHA1	Message	Date
Mai	c54e8d733f	Merge pull request #12347 from JosJuice/jitarm64-paired-offset JitArm64: Use ADDI2R for psq_lXX/psq_stXX immediate offsets	2023-12-01 20:15:06 -05:00
JosJuice	b1987d0187	JitArm64: Use ADDI2R for psq_lXX/psq_stXX immediate offsets This simplifies the source code, and slightly improves the emitted code in some cases.	2023-12-01 21:31:11 +01:00
JosJuice	67791d227c	JitArm64: Add special zero case to ADDI2R This normally doesn't reduce the instruction count, but is nonetheless useful on CPUs that can do 0-cycle moves.	2023-12-01 21:31:11 +01:00
JosJuice	25ffb0dbfc	JitArm64: Mask input to 32-bit ADDI2R In case the input was a s32 that got sign extended as part of conversion to u64.	2023-12-01 21:26:37 +01:00
Mai	5f7e9d3bf1	Merge pull request #12320 from JosJuice/jitarm64-mmu-order PowerPC: Unify "FromJit" MMU functions	2023-11-30 18:34:32 -05:00
Mai	d85cb749c0	Merge pull request #11382 from skyfloogle/traversal-fix-2 Traversal: Use low TTL for probe packet	2023-11-30 18:03:50 -05:00
Mai	d67f54b175	Merge pull request #12186 from TellowKrinkle/MultiTextureComputeMetal VideoBackends:Metal: Support multiple compute textures	2023-11-30 17:46:02 -05:00
Admiral H. Curtiss	163acb5d2c	Merge pull request #12339 from Tilka/bruise GameSettings: add patch to disable interlacing in Black & Bruised	2023-11-30 21:08:15 +01:00
Admiral H. Curtiss	529a51d653	Merge pull request #12341 from JosJuice/jitarm64-msr-pc-order JitArm64: Fix JitAsm without entry points map	2023-11-30 20:44:33 +01:00
JosJuice	4b50a38cf6	JitArm64: Fix JitAsm without entry points map This must have broken in a rebase of one of my recently merged PRs. Dolphin still worked correctly with this bug, for two reasons: 1. Most AArch64 users are not on Windows, and therefore normally do have the entry points map. 2. When the bug was triggered, Dolphin would fall back to the slower path rather than crashing.	2023-11-30 20:11:02 +01:00
TellowKrinkle	394dd02d0a	VideoBackends:Metal: Support multiple compute textures	2023-11-29 18:45:11 -06:00
TellowKrinkle	a399dc43a1	VideoBackends:Metal: Align utility uniform sizes Prevents complaining from validation layers	2023-11-29 18:45:11 -06:00
Tillmann Karras	d12642b392	GameSettings: add patch to disable interlacing in Black & Bruised	2023-11-29 23:59:33 +00:00
Mai	89963c287c	Merge pull request #11958 from JosJuice/jitarm64-dispatcher-microopt JitArm64: Dispatcher optimizations	2023-11-29 16:54:09 -05:00
Mai	2d0e577f8f	Merge pull request #12340 from JosJuice/jit-gp-check-discard-cr PPCAnalyst: Don't discard CR before gather pipe interrupt check	2023-11-29 16:51:03 -05:00
JosJuice	bddcf60673	PPCAnalyst: Don't discard CR before gather pipe interrupt check This fixes a frequently occurring JitArm64 assert caused by merging `6cc4f593e5` without adapting it to the changes made in `5902b5b113`.	2023-11-29 21:53:13 +01:00
JosJuice	06c7862160	JitArm64: Rearrange dispatcher instructions to improve scheduling Loads can take a little while to complete.	2023-11-29 19:13:09 +01:00
JosJuice	9e970bcb30	JitArm64: Optiming shifting and masking PC in slow dispatcher Instead of shifting left by 1, we can first shift right by 2 and then left by 3. This is both faster and smaller, because we get the right shift for free with the masking and the left shift for free with the address calculation. It also happens to match the pseudocode more closely, which is always nice for readability.	2023-11-29 19:13:09 +01:00
JosJuice	c9347a2a19	JitArm64: Use LDP in slow dispatcher With one LDP instruction, we can replace two LDR instructions.	2023-11-29 19:13:09 +01:00
JosJuice	4a4e7d9b8a	Jit: Swap locations of effectiveAddress and msrBits This slightly improves instruction-level parallelism in Jit64's slow dispatcher by shifting the PC left instead of the MSR. In the past, this also enabled an optimization in JitArm64's fast path where we could use LDP to load normalEntry and msrBits in one instruction, but this was superseded by `fd9c970`.	2023-11-29 19:13:09 +01:00
JosJuice	3df09f349d	JitArm64: Prefer X8 and up for temporary registers in JitAsm Just to make the code easier to understand at a glance. I especially found it a bit annoying to reason about whether callee-saved registers like W28 were being used because we needed a callee-saved register or just for no reason in particular. X8 and up is what compilers normally use when they're not register starved.	2023-11-29 19:13:03 +01:00
Mai	0a62b30cd4	Merge pull request #11906 from noahpistilli/request-register-user-id IOS/KD: Implement Request Register User ID	2023-11-29 03:31:59 -05:00
Mai	02de58eb2c	Merge pull request #12337 from Tilka/imm16 Jit64: fix invalid instruction encoding	2023-11-29 01:10:22 -05:00
Tillmann Karras	f6131e9703	Jit64: fix invalid instruction encoding This is a recent regression introduced in `c70dcf99dd`.	2023-11-29 05:49:02 +00:00
Mai	a7216a3035	Merge pull request #9857 from JosJuice/jitarm64-cr-analysis PPCAnalyst: Allow more reordering of CR operations	2023-11-28 21:01:07 -05:00
Sketch	f2607cdd74	IOS/KD: Implement Request Register User ID	2023-11-28 20:40:15 -05:00
Mai	b7435be90a	Merge pull request #12298 from Shoegzer/master Update default IP for HLE BBA	2023-11-28 22:45:17 +01:00
Mai	d095bddbe7	Merge pull request #12141 from JosJuice/jit-blr-msr Jit: Check MSR state in BLR optimization	2023-11-28 22:35:35 +01:00
Mai	934418a289	Merge pull request #12092 from JosJuice/jitarm64-last-nan JitArm64: Skip checking last input for NaN for non-SIMD operations	2023-11-28 22:30:50 +01:00
JosJuice	fc95d59805	JitArm64: Further optimize NaN handling in ps_sumX So short that using farcode is pointless!	2023-11-28 21:45:44 +01:00
JosJuice	8274dcbfe4	JitArm64: Skip checking last input for NaN for non-SIMD operations AArch64's handling of NaNs in arithmetic instructions matches PowerPC's as long as no more than one of the operands is NaN. If we know that all inputs except the last input are non-NaN, we can therefore skip checking the last input. This is an optimization that in principle only works for non-SIMD operations, but ps_sumX effectively is non-SIMD as far as the arithmetic part of it is concerned, so we can use it there too.	2023-11-28 21:45:40 +01:00
Mai	95f06ef231	Merge pull request #12122 from JosJuice/jit-imm-msr Jit: Handle imm msr in EmitStoreMembase	2023-11-28 21:34:23 +01:00
Mai	8cf0597d5f	Merge pull request #12091 from JosJuice/jitarm64-skip-quiet-bit JitArm64: Use one instruction for making NaNs quiet	2023-11-28 21:33:25 +01:00
Mai	e99ead0a68	Merge pull request #12124 from JosJuice/jitarm64-mfsrin-mtsrin-addr JitArm64: Optimize mfsrin/mtsrin address calculations	2023-11-28 21:30:30 +01:00
Mai	b53ecd73fb	Merge pull request #12143 from JosJuice/jitarm64-loadstore-pc JitArm64: Write PC when calling MMU.cpp	2023-11-28 21:29:37 +01:00
Mai	1df685b2d7	Merge pull request #12123 from JosJuice/jit-mcrxr Jit: Some mcrxr optimizations	2023-11-28 19:32:47 +01:00
Mai	20b13df507	Merge pull request #12179 from JosJuice/jitarm64-gp-deduplicate JitArm64: Deduplicate the gather pipe exception check	2023-11-28 19:21:58 +01:00
Mai	ac53766058	Merge pull request #12215 from JosJuice/android-si-devices Android: Add more GameCube controller types	2023-11-28 19:21:29 +01:00
Mai	bfc6bca583	Merge pull request #12235 from JosJuice/jitarm64-float-cls JitArm64: Use LSL+CLS for classifying floats	2023-11-28 19:20:01 +01:00
JosJuice	80171adf1e	PPCTables: Retire FL_EVIL FL_EVIL is only used for blocking instructions from being reordered. There are three types of instructions which have FL_EVIL set: 1. CR operations. The previous commits improved our CR analysis and removed FL_EVIL from these instructions. 2. Load/store operations. These are always blocked from reordering due to always having canCauseException set. 3. isync. I don't know if we actually need to prevent reordering around this one, since as far as I know we only do reorderings that are guaranteed to not change the behavior of the program. But just in case, I've renamed FL_EVIL to FL_NO_REORDER instead of removing it entirely, so that it can be used for this instruction.	2023-11-28 18:59:34 +01:00
JosJuice	f494a3d9e8	PPCAnalyst: Remove CanSwapAdjacentOps's OPCD check Other than the CR instructions, which we now analyze properly, all the covered instructions are not integer operations and also have either FL_ENDBLOCK or FL_EVIL set, so there are two other checks in CanSwapAdjacentOps that will reject them.	2023-11-28 18:59:34 +01:00
JosJuice	96d622bb61	PPCAnalyst: Run cror reordering after cmp reordering We would rather have cror be close to the cmp than the branch.	2023-11-28 18:59:34 +01:00
JosJuice	40e0dd93be	PPCAnalyst: Allow more reordering of CR operations This is possible with the improved CR analysis implemented in the previous commits.	2023-11-28 18:59:34 +01:00
JosJuice	da63cee711	PPCAnalyst: More strict a_flags checks in CanSwapAdjacentOps If for instance instruction a sets OE and instruction b reads it, we shouldn't permit reordering.	2023-11-28 18:59:34 +01:00
JosJuice	8e9609df6e	JitArm64: Add flush/discard support for condition registers By flushing the condition registers as soon as we no longer need them, we reduce the register pressure.	2023-11-28 18:59:31 +01:00
JosJuice	6cc4f593e5	PPCAnalyst: Add in-register/discard analysis for CR This brings the analysis done for condition registers more in line with the analysis done for GPRs and FPRs. This gets rid of the old wantsCR member, which wasn't actually used anyway. In case someone wants it again in the future, they can compute the bitwise inverse of crDiscardable.	2023-11-28 18:58:47 +01:00
JosJuice	d6987b98be	PPCAnalyst: Perform CR analysis for crXXX	2023-11-28 18:51:03 +01:00
JosJuice	4ecdb9e57e	JitArm64: Use one instruction for making NaNs quiet Instead of materializing the quiet bit in a register and ORing the NaN with it, we can perform an arithmetic operation on the NaN. This is a cycle or two slower on some CPUs in cases where generating the quiet bit pipelined well, but this is farcode that rarely runs, so instruction fetch latency is the bigger concern. And for non-SIMD cases, we also save a register.	2023-11-28 18:49:30 +01:00
JosJuice	d5ec5c005a	JitArm64: Some more FPRF optimization By using MOVI2R+MOVI2R+CSEL in the zero case instead of doing bitwise operations on the output of the other MOVI2R+MOVI2R+CSEL, we avoid using BFI, an instruction that takes two cycles on most CPUs. The instruction count is the same and the pipelining should be at least equally good.	2023-11-28 18:30:55 +01:00
JosJuice	255ee3fdce	JitArm64: Use LSL+CLS for classifying floats This is a little trick I came up with that lets us restructure our float classification code so we can exit earlier when the float is normal, which is the case more often than not. First we shift left by 1 to get rid of the sign bit, and then we count the number of leading sign bits. If the result is less than 10 (for doubles) or 7 (for floats), the float is normal. This is because, if the float isn't normal, the exponent is either all zeroes or all ones.	2023-11-28 18:30:45 +01:00

1 2 3 4 5 ...

40924 Commits