GX_TF_RGBA8 texture decoder optimized with SSE2 producing a ~68% speed increase over reference C implementation.
TABified the entire document per NeoBrainX. :)
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6706 8ced0084-cf51-0410-be5f-012b33b47a6e
~25% speed improvement in decoding GX_TF_RGB5A3 textures which aren't used very much. I thought it would help for movie playback but I misled myself. Video playback has nothing to do with this texture format.
Next I'll see if I can knock out some of these other texture decoders. Byte swizzling I'm sure can somehow be accomplished using _mm_unpacklo_epi8 trickery, so that'd be another big win I hope.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6698 8ced0084-cf51-0410-be5f-012b33b47a6e
Changed the EFB color order from RGBA to ABGR to emulate it correctly on little-endian platforms. Added some enumerations to clear up what components are which colors. Fixed the TEV alpha input LUT which would have caused problems if anything was doing alpha comparisons.
Changed box filter for EFB copies from 3x3 to 2x2 because that is probably correct. Also makes the math nicer.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6696 8ced0084-cf51-0410-be5f-012b33b47a6e
TextureDecoder.cpp: merged two functionally identical decode5A3RGBA and decode5A3rgba methods.
OpcodeDecoding.cpp and DLCache.cpp: optimization for GX_LOAD_XF_REG. The PSUHFB solution sounds better for SSSE3, but this is a small win for the default case.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6692 8ced0084-cf51-0410-be5f-012b33b47a6e
Adjusted the code work without ToMask.
This code should be functionally identical for all inputs to the previous code.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6691 8ced0084-cf51-0410-be5f-012b33b47a6e
* Optimised out the R11 ImmPtr instructions
* Made the whitespacing a little more consistent
* Disabled block linking while it is being reworked
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6688 8ced0084-cf51-0410-be5f-012b33b47a6e
functions accidentally added. Fixed the jitted ar register arithmetic.
Added a CMakeList.txt for the UnitTests, but did not add the subdirectory
to Source/CMakeLists.txt.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6687 8ced0084-cf51-0410-be5f-012b33b47a6e
A lot of tests are failing right now, not sure yet if this is caused by the recent reg changes or other stuff (at least broken according to tests: iar, subarn, addarn, 'ir, 'nr, 'l, 's, 'sn, 'ln, 'lsn, 'lsm, 'lsnm, 'sl etc - err, most/all that use increase/decrease)
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6683 8ced0084-cf51-0410-be5f-012b33b47a6e
* Fixed a bug in the JCC instruction.
* Changed the cycle count from u32 to u16.
* Added cycle counting to the block linker.
* Optimised the branch exit slightly.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6681 8ced0084-cf51-0410-be5f-012b33b47a6e
(acc and ax) and product register with one read/write.
Gives a minuscule speedup of not more than 4%. In exchange, breaks all
your out-of-tree changes to dsp. Tests are not building again, yet.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6680 8ced0084-cf51-0410-be5f-012b33b47a6e
Whatever that means, it fixes that stupid Super Mario Sunshine glitch and possibly lots of other stuff, so test as many glitchy games as possible with this ;)
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6669 8ced0084-cf51-0410-be5f-012b33b47a6e
Should fix almost all regressions of the recent ClearScreen changes and keep the fixed stuff.
The Super Mario Sunshine glitch is caused by another issue and will be addressed in my next commit.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6668 8ced0084-cf51-0410-be5f-012b33b47a6e
assembler for JIT. Replace JIT ToMask() with a different variant. Remove
superfluous zeroWriteBackLog calls(added by me).
Core/Common: Don't bother creating a string and calling into a Logs trigger()
when there is noone listening. Change AtomicLoadAcquire for gcc to just
make the compiler not reorder memory accesses around it instead of doing
a full memory barrier, per the comment in the win32 variant.
Core/AudioCommon: Fix a use of uninitialized variable inside libalsa.
Microbenchmarking results for ToMask variants:(1 000 000 000 iterations):
cpu\variant| shifts | bit scan
intel mobile C2D@2.5GHz | 5.5s | 4.0s
amd athlon64x2@3GHz | 6.1s | 6.4s
(including some constant overhead identical to both variants)
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6667 8ced0084-cf51-0410-be5f-012b33b47a6e
corrected peek color and peek z to correctly emulate real hardware formats.
implements native gamma correction.(i don't own any game that uses this functionality so i will appreciate feedback)
i need a lot of feedback in this changes please
enjoy
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6664 8ced0084-cf51-0410-be5f-012b33b47a6e
make some changes to the Clear code. please test a lot , the point of this commit is to determine the correct behavior of the efb clearing so feedback is welcome
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6663 8ced0084-cf51-0410-be5f-012b33b47a6e
1) What is the FIFO? The fifo is a ring queue for write (CPU) and read (GPU) the graphics commands.
2) What is the Brakpoints? The breakpoint is the FIFO mark to allow parallel work (CPU-GPU) When the GPU reached the breakpoint must stop read immediately until this Breakpoint will be removed for the CPU.
3) What is an overflow? The CPU write all room FIFO possible, and like a ring overwrite commands not processed yet.
4) ¿Why you have an overflow? In theory should not have an overflow never because the fifo has another mark (High Watermark) When the CPU Write reach this mark raise a CP interruption and the FIFO CPU WRITES should stop write until distance between READ POINTER AND WRITE POINTER will be equal to another mark (LO Watemark to prevent and overflow.
5) ¡So if impossible why you have overflows? Simple, the CP interruption is processed later and the Overflow happens. (there is a lot of theories about this)
6) ¿Why is no so simple like when CPU WRITE POINTER is near to the end of the FIFO only process pending graphics command?
Because when this happens sometimes we are in BREAKPOINT and is IMPOSIBLE process the graphics commands.
- This HACK process the pending data when CPU WRITE POINTER is 32 bytes before the end of the fifo, and if there is a Breakpoint force the situation to process the commands and prevent an overflown.
In theory you have not see "FIFO is overflown by GatherPipe nCPU thread is too fast!" anymore. But if you have a hang in game where you had this please read the NOTICE LOG in user\logs, I've added this message "FIFO is almost in overflown, BreakPoint" when the hack is activated. (I will delete this message very soon)
Good Luck!! PD: Shuffle sorry for the large commit description :P
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6662 8ced0084-cf51-0410-be5f-012b33b47a6e
on OS X as on other Unices so OS X initial installs aren't so easily
broken by files being created in constructors.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6661 8ced0084-cf51-0410-be5f-012b33b47a6e
* Completed the JIT versions of the DSP Multiplier instructions (5 instructions added).
* Bug fixed the dec and lsr16 instructions.
* Minor code clean-up.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6657 8ced0084-cf51-0410-be5f-012b33b47a6e