Commit Graph

221 Commits

Author SHA1 Message Date
a46a500b94 Core: Fix warnings on Linux related to the JIT 2014-08-02 16:15:20 -04:00
8f768e5a54 Merge pull request #714 from lioncash/gen
Core: Remove using namespace statements from the Jit and Interpreter headers
2014-08-02 10:36:31 -07:00
8467c82f79 Core: Clean up coding style in the Interpreter 2014-08-02 07:49:11 -04:00
afb539699e Core: Remove using statements from the Jit and Interpreter headers 2014-08-02 01:48:02 -04:00
4e2f487741 Core: Get rid of a cast in JitRegCache.cpp 2014-08-01 20:25:39 -04:00
5bb9a74759 Merge pull request #527 from delroth/flags-opt
[RFC] PowerPC flags emulation optimization
2014-07-31 02:51:48 -04:00
fda2190a37 Support the 64bit CR flags in the ARM JIT. 2014-07-30 21:41:18 -07:00
3627bd21f1 Remove JitArmIL files from the project.
Due to how the new CR-flags work, it isn't possible without some hefty work in the JITIL backend to support this on 32bit systems.
2014-07-30 21:41:17 -07:00
f27940478d JitIL: Attempt to constant-fold more aggressively. 2014-07-30 21:41:17 -07:00
79ecdf5fd0 JitIL: Misc small optimizations. 2014-07-30 21:41:17 -07:00
c8dd557dde JITIL: compare instruction folding. 2014-07-30 21:41:17 -07:00
5bb428c685 JITIL: optimize branches. 2014-07-30 21:41:17 -07:00
79cc000d62 JITIL: Optimize compare instruction. 2014-07-30 21:41:17 -07:00
1429fccb97 Initial unoptimized JITIL flag optimization. 2014-07-30 21:41:17 -07:00
5506e57ab8 CR: Replace some magic values with constants. 2014-07-30 21:41:17 -07:00
0ff1481494 Optimize PPC CR emulation by using magic 64 bit values
PowerPC has a 32 bit CR register, which is used to store flags for results of
computations. Most instructions have an optional bit that tells the CPU whether
the flags should be updated. This 32 bit register actually contains 8 sets of 4
flags: Summary Overflow (SO), Equals (EQ), Greater Than (GT), Less Than (LT).
These 8 sets are usually called CR0-CR7 and accessed independently. In the most
common operations, the flags are computed from the result of the operation in
the following fashion:
  * EQ is set iff result == 0
  * LT is set iff result < 0
  * GT is set iff result > 0
  * (Dolphin does not emulate SO)

While X86 architectures have a similar concept of flags, it is very difficult
to access the FLAGS register directly to translate its value to an equivalent
PowerPC value. With the current Dolphin implementation, updating a PPC CR
register requires CPU branching, which has a few performance issues: it uses
space in the BTB, and in the worst case (!GT, !LT, EQ) requires 2 branches not
taken.

After some brainstorming on IRC about how this could be improved, calc84maniac
figured out a neat trick that makes common CR operations way more efficient to
JIT on 64 bit X86 architectures. It relies on emulating each CRn bitfield with
a 64 bit register internally, whose value is the result of the operation from
which flags are updated, sign extended to 64 bits. Then, checking if a CR bit
is set can be done in the following way:
  * EQ is set iff LOWER_32_BITS(cr_64b_val) == 0
  * GT is set iff (s64)cr_64b_val > 0
  * LT is set iff bit 62 of cr_64b_val is set

To take a few examples, if the result of an operation is:
  * -1 (0xFFFFFFFFFFFFFFFF) -> lower 32 bits not 0       => !EQ
                            -> (s64)val (-1) is not > 0  => !GT
                            -> bit 62 is set             =>  LT
            !EQ, !GT, LT

  *  0 (0x0000000000000000) -> lower 32 bits are 0       =>  EQ
                            -> (s64)val (0) is not > 0   => !GT
                            -> bit 62 is not set         => !LT
            EQ, !GT, !LT

  *  1 (0x0000000000000001) -> lower 32 bits not 0       => !EQ
                            -> (s64)val (1) is > 0       =>  GT
                            -> bit 62 is not set         => !LT
            !EQ, GT, !LT

Sometimes we need to convert PPC CR values to these 64 bit values. The
following convention is used in this case:
  * Bit 0 (LSB) is set iff !EQ
  * Bit 62 is set iff LT
  * Bit 63 is set iff !GT
  * Bit 32 always set to disambiguize between EQ and GT

Some more examples:
  * !EQ, GT, LT -> 0x4000000100000001 (!B63, B62, B32, B0)
                -> lower 32 bits not 0          => !EQ
                -> (s64)val is > 0              =>  GT
                -> bit 62 is set                =>  LT
  * EQ, GT, !LT -> 0x0000000100000000
                -> lower 32 bits are 0          =>  EQ
                -> (s64)val is > 0 (note: B32)  =>  GT
                -> bit 62 is not set            => !LT
2014-07-30 21:41:17 -07:00
f507827399 Merge pull request #681 from phire/non-sse4_1
Fix PPC_FP on non-sse4.1 code paths.
2014-07-31 00:31:02 -04:00
8c857b45f8 Fix PPC_FP on non-sse4.1 code paths.
The Invalid bit on the x87 fpu is sticky, so once a single NaN goes
through the old code on CPUs without sse4.1 all future floats are
mutilated.

Patch to emulate PTEST by Fiora.

Fixes issue 7237 and issue 7510.
2014-07-31 16:00:27 +12:00
ad3ade1510 Core: Use an enum for the Gekko exception flags instead of defines 2014-07-29 21:51:30 -04:00
b03c12764d Really get rid of the MSVC 2005 workaround completely 2014-07-29 21:20:43 -04:00
412196a055 Core: Remove defines used to work around an MSVC 2005 bug 2014-07-29 19:33:08 -04:00
e225e3679a Merge pull request #685 from lioncash/invalid-reg
Core: Use the enum constant for an invalid reg in JitRegCache
2014-07-27 19:37:43 -04:00
0bb84a1f65 Core: Use the enum constant for an invalid reg in JitRegCache 2014-07-27 19:29:52 -04:00
38c8a4efb2 MMIO: Cleanup Mapping class by using templates instead of macros. 2014-07-27 19:23:19 +02:00
008b907bf0 Merge pull request #655 from lioncash/crashes
Fix two possible crashes related to the debugger.
2014-07-23 21:24:03 +12:00
c3eb45f7bc Revert "Merge pull request #473 from Tilka/frsp"
This reverts commit d369627d70, reversing
changes made to 67ff926f1e.
2014-07-21 20:40:21 -07:00
d369627d70 Merge pull request #473 from Tilka/frsp
Jit64: implement frsp
2014-07-21 14:35:40 -04:00
44e43fe5c3 Core: Make CPU_POWERDOWN the initial CPU state.
This isn't a correct state for the CPU to begin in when starting the application.

Also CPU_STEPPING as an initial state causes the emulator to crash if you use any of the debugger stepping buttons, since it checks if the CPU is in the CPU_STEPPING state before performing their functions.
2014-07-19 19:43:53 -04:00
98f352a4e6 Core: Fix warnings in JitRegCache 2014-07-18 02:44:29 -04:00
771a30da36 Jit64: implement frsp 2014-07-17 23:22:41 +02:00
6521929f99 Jit64: implement fctiw/fctiwz 2014-07-15 23:58:09 +02:00
7e79806efc remove unused globals
Also change globals into statics which are only used in one file
2014-07-11 16:10:20 +02:00
81ed17be53 avoid the extern keyword in .cpp files 2014-07-11 16:10:20 +02:00
6d3f249dcc mark all local variables as static 2014-07-11 16:10:20 +02:00
22e1aa5bb4 mark all local functions as static 2014-07-11 16:07:23 +02:00
b0b70381f7 Revert "Don't add segfault handler in interpreter mode" 2014-07-07 05:30:06 +02:00
4ec8c3714d Merge pull request #328 from Tilka/enum_cpubackend
Don't add segfault handler in interpreter mode
2014-07-06 19:28:10 +02:00
a798548c30 Merge pull request #546 from workhorsy/header_guard_to_pragma_once
Changed lingering header include guards to pragma once.
2014-07-06 14:19:32 +02:00
311e9e655a CoreParameter: add enum CPUBackend 2014-07-05 11:02:41 +02:00
20a16beabd enum CPUState: rename CPU_* to STATE_* 2014-07-05 11:01:49 +02:00
20dc0e7819 Remove unused variables 2014-07-04 03:56:58 +02:00
124210c50f Changed lingering header include guards to pragma once.
Some headers where using #ifndef to guard being including multiple times. But most were using pragma once. So for consistency I changed them all to use pragma once.
2014-07-01 22:17:33 -07:00
bd377b9580 Merge pull request #443 from magumagu/loadstore-cleanup
Loadstore cleanup
2014-06-26 21:32:59 -04:00
a40ae6883a Move CoreTiming::downcount to PowerPC::ppcState.
This isn't technically the correct place to have the downcount variable, but it is similar to what PPSSPP does to gain a bit of extra speed on ARM.
We access this variable quite a bit, with each exit in a block it is subtracted from.
On ARM this required four instructions to load and store the value, while now it only requires two.

This gives an average of 1FPS gain to most games.
Examples:
Crazy Taxi: 54FPS -> 55FPS
Luigi's Mansion: 20FPS -> 21FPS
Wind Waker(Save Screen): 27FPS -> 28FPS

This seems to average a 6mhz to 16mhz CPU core emulation improvement in the few games I've tested.
2014-06-26 01:48:00 +00:00
416a3201ed Merge pull request #480 from Sonicadvance1/PR371-Revert
Revert "PPCAnalyst now detects internal branches better"
2014-06-25 16:02:52 +02:00
eeb7fc25f4 JIT: Make lmw/stmw use safe load/stores. 2014-06-20 12:54:10 -07:00
6c9095ebab JITIL: fix some unsafe loads.
I'm planning on making changes to the memory code, so I don't want this
stuff to get in the way.  The slight performance hit to JITIL isn't really
important at the moment.
2014-06-20 12:53:22 -07:00
85724dd78a JIT: Remove dead code. 2014-06-20 12:53:21 -07:00
06864e9fee JIT: Clean up float loads and stores.
Less code is good, and this should make future changes to memory handling
easier.
2014-06-20 12:52:39 -07:00
9f22b2378d Merge pull request #485 from magumagu/packed-fp-reciprocal
Interpreter: return single-precision results for ps_rsqrte.
2014-06-19 16:51:33 +02:00