The old calculation was stride * (max_index + 1), which fails if stride is less than the size of a component (for instance, if float XYZ positions are used, and the stride was set to 4 (i.e. sizeof(float)) instead of 12 (i.e. 3 * sizeof(float)), it would be missing the last 8 bytes of the final element in the array. Or, if stride was set to 0, then no bytes would be recorded at all (though that's not a useful configuration so it's unlikely to actually exist).
I'm not aware of any games affected by this issue.
This should fix recording the wall in the staircase leading to the basement in Luigi's Mansion (though I haven't tested it, as I don't own a copy of Luigi's Mansion). This uses NormalIndex3, and the index for the normal vector (generally 0x02XX or 0x01XX) there is always lower than the tangent or binormal (generally 0x07XX). Other games seem to usually have a similar range of indices for the normal, tangent, and binormal, so this issue wouldn't affect them.
In most cases, games will use the same type for all vertex components (either Index8 or Index16 or Direct). However, RS2's deflection towers use Index16 for the texture coordinate and Index8 for everything else, meaning the texture coordinates were recorded incorrectly (the first byte was used, so only indices 0 and 1 were recorded instead of 0 through 0x0192). Worse still, some background elements in RS2 use direct positions but indexed normals or texture coordinates, and those would not be recorded at all.
This is a regression from b5fd35f951.
`count` is the number of stereo samples to write (where each stereo sample is two shorts), while `BUFFER_SIZE` is the size of the buffer in shorts. So `count` needs to be multiplied by `2`, not `BUFFER_SIZE`. Also, when this check was failed, the previous code just clobbered whatever was past the end of the buffer after logging the warning, which corrupted `basename`, eventually resulting in Dolphin crashing.
This affected Datel's Wii-compatible Action Replay, which uses a block size of 2298, or 18384 stereo samples, which is 36768 shorts, which is bigger than the buffer size of 32768. (However, the previous commit means that only one block is transfered at a time, eliminating this issue; fixing the bounds check is just a general safety thing instead of an actual bugfix now.)
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
The new `dispatcher_no_timing_check` is the same as `dispatcher_no_check`
except it includes the "stepping check" in debug mode. This lets us avoid
the `m_enable_debugging ? dispatcher : dispatcher_no_check` dance.
Maybe "tail call" isn't quite the right term for what this code
is doing, since it's jumping to the dispatcher rather than
returning, but it's the same optimization as for a tail call.
fregsIn will include FD for double-precision instructions, since for
dependency tracking purposes the instruction does read the upper
half of FD. This is not what we want in HandleNaNs.
The consequence of this bug is that if an instruction was supposed to
output a NaN and FD happens to contain a NaN and FD happens to be the
same register as an unused register in the instruction encoding, the
NaN in FD could get used as the output instead of the correct NaN.
This isn't known to affect any games, which isn't especially surprising
considering that there's only one game that needs AccurateNaNs anyway.
Jumping to `dispatcher` requires first subtracting the downcount,
otherwise `dispatcher` may unpredictably jump to CoreTiming::Advance,
which could break determinism compatibility with JitArm64. We should
jump to `dispatcher_no_check` instead.
The breakpoint check in Jit.cpp makes it redundant.
Normally this redundant check doesn't cause any issues, but if you
create a breakpoint and enable logging without breaking, you get two
log messages if the breakpoint is at the beginning of a block. See
https://bugs.dolphin-emu.org/issues/13044.
This is also a tiny performance improvement for when debugging is
active, since we no longer check for breakpoints for blocks that never
had any breakpoints to begin with.
Nothing currently uses it. It could theoretically be replaced with fmt support, but I don't think the LOG_VULKAN_ERROR macro is that useful and it'd be better to replace it with regular logging instead.
base is an unsigned variable, so we can make things little more
consistent by making the loop index unsigned so we aren't doing bit
arithmetic with signed types.
MemoryInterface already does this, so we can leave it alone.
No behavioral changes, just a consistency thing.
Rather than makring some parts of VertexLoaderManager dirty in some places and some in others, do it all in VideoState. Also, since CPState no longer contains pointers/non-CP data after d039b1bc0d, we can just use p.Do on it instead of manually saving each field.
Micro-optimization. Some CPUs can fuse CMP+B, TST+B, arith+CBZ, etc.
I also moved things around for CMP+CSET and TST+CSET - which I'm not sure
if any CPUs support - but it doesn't hurt anything, so I might as well.
Improves accuracy but isn't known to affect any games.
This turned out to be fairly convenient to implement; ORing with the
PPC default NaN will quieten SNaNs and do nothing to QNaNs.
This existed in the initial megacommit (though I don't know why) as IO_SIZE. It was used in Memmap's Init() to compute totalMemSize, but I don't know if it actually did anything then. That use was removed in 2d0f714546, but the constant persisted until cc858c63b8, when it became a static variable.
This was added in 385d8e2b15, but became somewhat redundant with Do in 4c7bbd96e4, and completely redundant now that std::is_trivially_copyable_v is well-supported.
This is the first step of getting rid of the controller indirection
on Android. (Needing a way for touch controls to provide input
to the emulator core is the reason why the controller indirection
exists to begin with as far as I understand it.)
This lets the TAS input code use a higher-level interface for
overriding inputs instead of having to fiddle with raw bits.
WiiTASInputWindow in particular was messy with how much
controller code it had to re-implement.
Fixes a Rogue Squadron II regression from 9d73583.
This set_dirty stuff is pretty tricky to reason about. I thought I
was clever when coming up with set_dirty, but maybe I was too clever
for my own good...
In case the register we're binding is the same as the immediate register,
we should fetch the immediate before calling BindToRegister. The way
the register cache currently works, calling GetImm after BindToRegister
actually does work, but it's better to not rely on it.
Avoid waiting for earlier submissions when we flush more often.
The vertex manager will flush more often if the game accesses the EFB
on the CPU, to give the GPU a head start.
The screen real-estate is already reserved, the values are dumped and
restored by the on-DSP code, why not make something out of these values ?
Allows following:
- where exactly send_back was called from ($st1)
- the boundaries and progress of the innermost BLOOP{,I} ($st0, 2 and 3)
up to send_back's call
Before, only the symbols box would update. However, if you edit the symbol of a function in the call stack (which seems like something that would happen reasonably often while debugging), the call stack would be out of date until it was updated by clicking on it. Callers and calls were more of an edge case; for them to be out of date, you would need to right-click on an instruction in a function other than the one containing the currently-selected instruction (though it would also affect recursive functions).
Tested on an official DOL-014 (251 blocks) memory card by executing the
0xf4 command on a card with content along its entire length and then
dumping the whole card: it reads as 0xff all the way through.
Therefor, the current implementation is already consistent with hardware.
Compared to the previous solution of using big `synchronized` blocks,
this makes GameFileCacheManager's executor thread release and re-lock
the lock when possible, giving the GUI thread a chance to do a
(comparatively) quick getOrAdd call if it needs to.
This reverts commit fb265b610d.
The optimization in that commit is safe when the executor thread is
writing and the GUI thread is reading, but I had failed to take into
account that it's unsafe when the GUI thread is writing and the executor
thread is reading. (The native UpdateAdditionalMetadata function loops
through m_cached_files, which is unsafe if another thread is adding
elements to m_cached_files simultaneously.)
Losing out on this optimization isn't too bad, because
719930bb39 makes it very unlikely that
both threads will want the lock at the same time.
Texture dumping can already be done using VideoCommon's system (and in fact the same setting already enabled *both* of these). Dumping objects/TEV stages/texture fetches doesn't currently have an equivalent, but could be added to the FIFO player instead.
A (partial) port of #9481 to ARM64. This commit adds special cases for
immediate values equal to 0 or 0xFFFFFFFF, allowing for more efficient
or no code to be generated.
When a guest register is an immediate, it may be necessary to move this
value into a register. This is handled by gpr.R(), which lacks context
on how the register will be used. This leads to cases where the
immediate is written to a register, only for it to be overwritten. Take
for example this code generated by srwx:
0x5280031b mov w27, #0x18
0x53187edb lsr w27, w22, #24
gpr.BindToRegister() does have this context through the do_load
parameter, but didn't handle immediates. By adding this logic, we can
intelligently skip the write when do_load is false.