JitArm64: Implement accurate NaNs

For quite some time now, we've had a setting on x86-64 that makes Dolphin handle NaNs in a more accurate but slower way. There's only one game that cares about this, Dragon Ball: Revenge of King Piccolo, and what that game cares about more specifically is that the default NaN (or "generated NaN" as I believe it's called in PowerPC documentation) is the same as on PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the longest time we didn't need to do anything special to get Dragon Ball: Revenge of King Piccolo working. However, in 93e636a I changed how we handle FMA instructions in a way that resulted in the sign of NaNs becoming inverted for nmadd/nmsub instructions, breaking the game. To fix this, let's implement the AccurateNaNs setting, like on x86-64.
2025-07-30 01:29:42 -06:00 · 2022-12-03 17:37:51 +01:00
parent 5c41d3b602
commit 06e60ac327
5 changed files with 329 additions and 15 deletions
--- a/Source/Core/Common/Arm64Emitter.h
+++ b/Source/Core/Common/Arm64Emitter.h
@ -1130,6 +1130,13 @@ public:
  void FRECPE(ARM64Reg Rd, ARM64Reg Rn);
  void FRSQRTE(ARM64Reg Rd, ARM64Reg Rn);

+  // Scalar - pairwise
+  void FADDP(ARM64Reg Rd, ARM64Reg Rn);
+  void FMAXP(ARM64Reg Rd, ARM64Reg Rn);
+  void FMINP(ARM64Reg Rd, ARM64Reg Rn);
+  void FMAXNMP(ARM64Reg Rd, ARM64Reg Rn);
+  void FMINNMP(ARM64Reg Rd, ARM64Reg Rn);
+
  // Scalar - 2 Source
  void ADD(ARM64Reg Rd, ARM64Reg Rn, ARM64Reg Rm);
  void FADD(ARM64Reg Rd, ARM64Reg Rn, ARM64Reg Rm);
@ -1296,6 +1303,7 @@ private:
  void EmitThreeSame(bool U, u32 size, u32 opcode, ARM64Reg Rd, ARM64Reg Rn, ARM64Reg Rm);
  void EmitCopy(bool Q, u32 op, u32 imm5, u32 imm4, ARM64Reg Rd, ARM64Reg Rn);
  void EmitScalar2RegMisc(bool U, u32 size, u32 opcode, ARM64Reg Rd, ARM64Reg Rn);
+  void EmitScalarPairwise(bool U, u32 size, u32 opcode, ARM64Reg Rd, ARM64Reg Rn);
  void Emit2RegMisc(bool Q, bool U, u32 size, u32 opcode, ARM64Reg Rd, ARM64Reg Rn);
  void EmitLoadStoreSingleStructure(bool L, bool R, u32 opcode, bool S, u32 size, ARM64Reg Rt,
                                    ARM64Reg Rn);