Commit Graph

107 Commits

Author SHA1 Message Date
Minty-Meeo
05bebee802 Replace BitUtils with C++20: Counting Zeroes
With the upgrade to C++20, std::countl_zero and std::countr_zero can replace these home-spun implementations from the BitUtil.h library.
2022-12-21 04:17:00 -06:00
JosJuice
454537d53e Replace BitUtils with C++20: RotateLeft/RotateRight
Now that we've flipped the C++20 switch, let's start making use of
the nice new <bit> header.

I'm planning on handling this move away from BitUtils.h incrementally
in a series of PRs. There may be a few functions remaining in
BitUtils.h by the end that C++20 doesn't have any equivalents for.
2022-12-11 08:59:18 +01:00
JosJuice
06e60ac327 JitArm64: Implement accurate NaNs
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.
2022-12-03 19:41:32 +01:00
JosJuice
f45d3a6a2c JitArm64: Optimize ps_mergeXX
1. In some cases, ps_merge01 can be implemented using one instruction.
2. When we need two instructions for ps_merge01, it's best to start with
   a MOV to avoid false dependencies on the destination register.
3. ps_merge10 can be implemented using a single EXT instruction.
2022-11-26 18:14:58 +01:00
JosJuice
d64c3dc267 Arm64Emitter: Add MOVPage2R utility function
This new function is like MOVP2R, except it masks out the lower 12 bits,
returning them instead of writing them to the register. These lower
12 bits can then be used as an offset for LDR/STR. This lets us turn
ADRP+ADD+LDR sequences with a zero offset into ADRP+LDR sequences with
a non-zero offset, saving one instruction.
2022-11-21 23:24:06 +01:00
Bram Speeckaert
ae6ce1df48 Arm64Emitter: Add ArithOption with ExtendSpecifier
ARM64 can do perform various types of sign and zero extension on a
register value before using it. The Arm64Emitter already had support for
this, but it was kinda hidden away.

This commit exposes the functionality by making the ExtendSpecifier enum
available everywhere and adding a new ArithOption constructor.
2022-11-01 12:15:56 +01:00
JosJuice
4dbf0b8e90 JitArm64: Reimplement Force25BitPrecision
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
2022-10-22 10:03:52 +02:00
JosJuice
84375a91d9 Arm64Emitter: Combine immh and immb for Emit(Scalar)ShiftImm
This simplifies the callers of EmitShiftImm and EmitScalarShiftImm.
2022-10-19 20:20:39 +02:00
Merry
0d947ed6fe Arm64Emitter: Simplify LogicalImm further
h/t @dougallj
2022-07-10 22:17:09 +01:00
Merry
3092f40e9f Arm64Emitter: Simplify LogicalImm logic
Heavily simplify logical immediate encoding.

This is based on the observation that if a valid repeating element
exists, it repeats through `value`. Thus it does not matter which
one you analyse. Thus we skip over the least significent element
if LSB = 1 by masking it out with `inverse_mask_from_trailing_ones`,
to avoid the degenerate case of a stretch of 1 bits going 'round
the end' of the word.
2022-07-07 22:53:36 +01:00
JMC47
e5a4a86672
Merge pull request #10055 from JosJuice/jitarm64-reuse-memory
JitArm64: Codegen space reuse
2021-11-20 17:35:24 -05:00
Merry
7c2b09e156 Arm64Emitter: Add FRINTI instruction 2021-11-06 19:15:26 +00:00
JosJuice
44beaeaff5 Arm64Emitter: Check end of allocated space when emitting code
JitArm64 port of 5b52b3e.
2021-10-13 21:52:16 +02:00
JMC47
3bfb3fa52b
Merge pull request #9884 from JosJuice/jitarm64-paired-loadstore-addr
JitArm64: Improve psq_l/psq_st address checking
2021-10-11 16:49:26 -04:00
JosJuice
de21dc5fd9 JitArm64: Add bitset constants for caller saved registers 2021-09-08 21:26:05 +02:00
JosJuice
9889e7eb33 JitArm64: divwx - Optimize power-of-two divisors
Power-of-two divisors can be done more elegantly, so handle them
separately.
2021-08-26 15:04:55 +02:00
JosJuice
09cdb076a3 JitArm64: divwx - Optimize constant dividend
When the dividend is known at compile time, we can eliminate some
of the branching and precompute the result for the overflow case.
2021-08-26 14:50:01 +02:00
JosJuice
a90b0a1c93 JitArm64: Implement mtfsfx
The sixth and final part of implementing the FPSCR system register
instructions.
2021-07-31 23:50:20 +02:00
JosJuice
8af5095ff4 JitArm64: Stop using hand-encoded logical immediates 2021-07-12 22:25:49 +02:00
JosJuice
9e80db123f JitArm64: Encode logical immediates at compile-time where possible
Manually encoding and decoding logical immediates is error-prone.
Using ORRI2R and friends lets us avoid doing the work manually,
but in exchange, there is a runtime performance penalty. It's
probably rather small, but still, it would be nice if we could
let the compiler do the work at compile-time. And that's exactly
what this commit does, so now I have no excuse for trying to
manually write logical immediates anymore.
2021-07-10 20:43:59 +02:00
JosJuice
10861ed8ce JitArm64: Turn IsImmLogical into a constexpr constructor 2021-07-10 20:31:28 +02:00
JosJuice
ab024b218e JitArm64: Accept LogicalImm struct as bitwise inst parameter 2021-07-10 20:13:11 +02:00
Pierre Bourdon
e149ad4f0a
treewide: convert GPLv2+ license info to SPDX tags
SPDX standardizes how source code conveys its copyright and licensing
information. See https://spdx.github.io/spdx-spec/1-rationale/ . SPDX
tags are adopted in many large projects, including things like the Linux
kernel.
2021-07-05 04:35:56 +02:00
Mai M
1054abc9cc
Merge pull request #9712 from JosJuice/jitarm64-fmul-rounding
JitArm64: Fix fmul rounding issues
2021-05-20 10:25:02 -04:00
JosJuice
11be2314fe JitArm64: Fix fmul rounding issues
This is a port of 4f18f60 to JitArm64.
2021-05-15 23:27:34 +02:00
JosJuice
85226e09f0 JitArm64: Implement fres 2021-05-15 19:16:32 +02:00
JosJuice
1d106ceaf5 JitArm64: Optimize ConvertSingleToDouble, part 2
If we can prove that FCVT will provide a correct conversion,
we can use FCVT. This makes the common case a bit faster
and the less likely cases (unfortunately including zero,
which FCVT actually can convert correctly) a bit slower.
2021-04-25 15:56:19 +02:00
JosJuice
4c2cdb61df JitArm64: Constant carry flag optimizations
If we know at compile time that the PPC carry flag definitely
has a certain value, we can bake that value into the emitted code
and skip having to read from PPCState.
2021-03-19 22:40:19 +01:00
Dentomologist
f0f206714f Arm64Gen: Convert ARM64Reg to enum class
Most changes are just adding ARM64Reg:: in front of the constants.
2021-03-13 10:10:59 -08:00
JosJuice
9ad4f724e4 Arm64Emitter: Use ORR in MOVI2R 2021-02-13 21:04:13 +01:00
JosJuice
0d5ed06daf Arm64Emitter: Improve MOVI2R
More or less a complete rewrite of the function which aims
to be equally good or better for each given input, without
relying on special cases like the old implementation did.

In particular, we now have more extensive support for
MOVN, as mentioned in a TODO comment.
2021-02-13 20:23:03 +01:00
JosJuice
4e107935ac Arm64Emitter: Allow specifying 21th bit of ADRP imm 2021-02-13 11:33:27 +01:00
JosJuice
d226b8f825 Arm64Emitter: Remove optimize parameter from MOVI2R
I don't really see the use of this. (Maybe in the past it
was used for when we need a constant number of instructions
for backpatching? But we don't use MOVI2R for that now.)
2021-02-13 11:33:27 +01:00
JosJuice
efeda3b759 JitArm64: More constant propagation optimizations
PR 9262 added a bunch of Jit64 optimizations, some of
which were already in JitArm64 and some which weren't.
This change ports the latter ones to JitArm64.
2021-02-07 13:55:35 +01:00
MerryMage
be6aec9932 Arm64Emitter: Add BFXIL 2021-01-31 12:04:57 +00:00
JosJuice
67491979ab JitArm64: Avoid using X30 with BLR
At least on some CPUs (I found out about this from the
Arm Cortex-A76 Software Optimization Guide), using X30
with BLR is one cycle slower than using another register.
2021-01-23 10:32:44 +01:00
Dentomologist
e3237661ec Arm64Emitter: Convert ShiftType to enum class 2021-01-17 16:21:38 -08:00
Dentomologist
70c54065ab Arm64Emitter: Convert IndexType to enum class 2021-01-15 23:27:11 -08:00
Lioncash
36af39853d Arm64Emitter: Remove unused OpType enum
This isn't used anywhere, so we can remove it.
2021-01-01 11:06:05 -05:00
Lioncash
95cc53edec Arm64Emitter: Convert ArithOption enums into enum classes
Makes the enums strongly typed. While we're at it, we can also make
these enums private.
2021-01-01 07:10:41 -05:00
Lioncash
cca0dffebd Arm64Emitter: Add shorthand member functions for hint instructions
Allows for more concise code.
2020-12-30 20:49:20 -05:00
Lioncash
6046a15267 Arm64Emitter: Make ShiftAmount enum an enum class
Reduces namespace pollution and makes the enum strongly typed.
2020-12-30 20:49:20 -05:00
Lioncash
fab2053439 Arm64Emitter: Make RoundingMode enum an enum class
Prevents namespace pollution and makes the enum members strongly typed.
2020-12-30 20:49:20 -05:00
Lioncash
d87ec71615 Arm64Emitter: Make PStateField enum an enum class
Prevents namespace pollution and makes the enum members strongly typed.
2020-12-30 20:49:20 -05:00
Lioncash
5c3f2fde22 Arm64Emitter: Make BarrierType enum an enum class
Prevents namespace pollution and enforces strong typing.
2020-12-30 20:49:20 -05:00
Lioncash
f21c740919 Arm64Emitter: Make SystemHint enum an enum class
Avoids polluting the namespace and makes the members strongly typed.
2020-12-30 20:49:20 -05:00
Lioncash
5011c155ec Arm64Emitter: Make type member of FixupBranch an enum class
Eliminates some magic numbers and makes the type member strongly typed.
2020-12-30 20:49:20 -05:00
Admiral H. Curtiss
5b52b3e9cb x64Emitter: Check end of allocated space when emitting code. 2020-08-24 19:31:32 +02:00
Techjar
ff972e3673 Reformat repo to clang-format 7.0 rules 2019-05-06 18:48:04 +00:00
Lioncash
208be26bb4 Arm64Emitter: Make the Align* functions return a non-const data pointer
Similar in nature to e28d063539 in which
this same change was applied to the x64 emitter.

There's no real requirement to make this const, and this should also
be decided by the calling code, considering we had places that would
simply cast away the const and carry on
2018-08-27 09:44:38 -04:00