* GPU: Access non-prefetch command buffers directly
Saves allocating new arrays for them constantly - they can be quite small so it can be very wasteful. About 0.4% of GPU thread in SMO, but was a bit higher in S/V when I checked.
Assumes that non-prefetch command buffers won't be randomly clobbered before they finish executing, though that's probably a safe bet.
* Small change while I'm here
* Address feedback
I did this on ncbuffer2 when we were using it for LDN 3, but I noticed that it can apply to the current buffer manager too, and it's an easy performance win.
The only buffer access that can come from another thread is the overlap search for buffers that have been unmapped. Everything else, including modifications, come from the main GPU thread. That means we only need to lock the range list when it's being modified, as that's the only time where we'll cause a race with the unmapped handler.
This has a significant performance improvements in situations where FIFO is high, like the other two PRs. Joined together they give a nice boost (73.6 master -> 79 -> 83 fps in SMO).
Since we move to .NET7, JsonSerializer now needs to have explicit options as arguments, which leads to some warnings in Avalonia project. This is fixed by using our `JsonHelper` class.
* Update to LibHac 0.17.0
* Don't clear SD card saves when starting the emulator
This was an old workaround for errors that happened when a user's SD card encryption seed changed. SD card saves have been unencrypted for over a year, so we should be fine to remove the workaround.
* unicorn: Add modified ver of unicorns const gen
* unicorn: Use upstream consts
These consts were generated from the dev branch of unicorn
* unicorn: Split common consts into multiple enums
* unicorn: Remove arch prefix from consts
* unicorn: Add new windows dll
Windows 10 - MSVC x64 shared build
* unicorn: Use absolute path for const generation
* unicorn: Remove fspcr patch
* unicorn: Fix using the wrong file extension
For some reason _NativeLibraryExtension evaluates to ".so" even on Windows.
* unicorn: Add linux shared object again
* unicron: Add DllImportResolver
* unicorn: Try to import unicorn using an absolute path
* unicorn: Add clean target
* unicorn: Replace IsUnicornAvailable() methods
* unicorn: Skip tests instead of silently passing them if unicorn is missing
* unicorn: Write error message to stderr
* unicorn: Make Interface static
* unicron: Include prefixed unicorn libs (libunicorn.so)
Co-authored-by: merry <git@mary.rs>
* unicorn: Add lib prefix to shared object for linux
Co-authored-by: merry <git@mary.rs>
A quick fix to prevent reading the wrong value of Count when reregistering ranges for a new target buffer. Buffer flushes from another thread can modify the range list when the lock isn't active, which can change the count.
This prevents some crashes in Pokemon Scarlet/Violet. It's probably likely that buffer migration during flush is causing some other issues in this game, but this at least prevents the crashing.
* Vulkan: Don't create preload buffer outside a render pass
The preload command buffer is used to avoid render pass splits and barriers when updating buffer data. However, when a render pass is not active (for example, at the start of a pass, or during compute invocations) buffer uploads can be performed at any time, so the optimization isn't as useful.
This PR makes it so that the preload command buffer is only used for buffer updates outside of a render pass. It's still used for textures as I don't want to shake things up right now regarding how the preload buffer is obtained before some other changes, and texture updates are a lot rarer anyways.
Improves performance slightly in Pokemon Scarlet/Violet (43 -> 48), as it was switching to compute, writing a bunch of buffers inline, then dispatching, then flushing commands... It uses 1 command buffer instead of 2 every time it does this now. Maybe it would be nice to find a faster way to sync without creating so many command buffers in a short period of time.
* Address feedback
* Prune ForceDirty and CheckModified caches on unmap
Since we're now using this for modified checks on the HLE indirect draw method, I'm worried that leaving these to forever gather cache entries isn't the best idea for performance in the long term, and it could keep old buffer objects alive for longer than they should be.
This PR adds the ability to prune invalid entries before checking these caches, and queues it whenever gpu memory is unmapped. It also aligns modified checks to the page size, as I figured it would be possible for a huge number of overlapping over a game's runtime.
This prevents Super Mario Odyssey from having 10s of thousands of entries in the modified cache in Metro Kingdom, and them duplicating when entering and leaving a building (should be cleared, as they were unmapped).
* Address Feedback
* am: Stub GetSaveDataSizeMax()
* am: Remove todo comment for GetSaveDataSizeMax()
* am: saveDataSize & journalDataSize should be of type long
* am: Add explanation for returning default values in GetSaveDataSizeMax()
* Use ReadOnlySpan<byte> compiler optimization in more places
* Revert changes in ShaderBinaries.cs
* Remove unused using;
* Use ReadOnlySpan<byte> compiler optimization in more places
* Allow _volatile to be set from MultiRegionHandle checks again
Tracking handles have a `_volatile` flag which indicates that the resource being tracked is modified every time it is used under a new sequence number. This is used to reduce the time spent reprotecting memory for tracking writes to commonly modified buffers, like constant buffers.
This optimisation works by detecting if a buffer is modified every time a check happens. If a buffer is checked but it is not dirty, then that data is likely not modified every sequence number, and should use memory protection for write tracking. If the opposite is the case all the time, it is faster to just assume it's dirty as we'd just be wasting time protecting the memory.
The new MultiRegionBitmap could not notify handles that they had been checked as part of the fast bitmap lookup, so bindings larger than 4096 bytes wouldn't trigger it at all. This meant that they would be subject to a ton of reprotection if they were modified often.
This does mean there are two separate sources for a _volatile set: VolatileOrDirty + _checkCount, and the bitmap check. These shouldn't interfere with each other, though.
This fixes performance regressions from #3775 in Pokemon Sword, and hopefully Yu-Gi-Oh! RUSH DUEL: Dawn of the Battle Royale. May affect other games.
* Fix stupid mistake
The type in the `texOp` in the textureSize instruction doesn't have the exact type on SPIR-V (for example, it is missing the Array flag). This PR gives it the proper type before giving it to the unscaling helper.
This fixes the ground textures being broken on Pokemon Scarlet/Violet when scaling. It wasn't finding the texture, so the descriptor index it provided was -1...
* Eliminate CB0 accesses
Still some work to do, decouple from hle?
* Forgot the important part somehow
* Fix and improve alignment test
* Address Feedback
* Remove some complexity when checking storage buffer alignment
* Update Ryujinx.Graphics.Shader/Translation/Optimizations/GlobalToStorage.cs
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Thread ID Register, Floating-point Control Register, and Floating-point Status Register all had Register capitalized, so the Register in Processor State register should be capitalized.
For some reasons, my fresh installation of Fedora 36 (KDE) doesn't have a
symlink for libX11.so.
This commit fixes this by trying to import the library with its major
version or fallback to the normal way.
* Revert "Add support for releasing a semaphore to DmaClass (#2926)"
This reverts commit 521a07e612.
* Revert "Revert "Add support for releasing a semaphore to DmaClass (#2926)""
This reverts commit ec8a5fd05362f04cc77436ee3e45a9188777f75e.
* Strip non-visible control codes from strings before they are sent to the software keyboard to prevent ugly unicode blocks from being shown on the UI.
* remove debugging junk
* Initialize stringbuilder capacity at the start to prevent resizing (a tiny tiny microoptimization)
* Update remarks documentation. Remove unneeded imports.
* Removing a test that's actually just redundant
Co-authored-by: Logan Stromberg <lostromb@microsoft.com>
`MB` and `GB` can either be interpreted as having base-10 units, or
base-2. `MiB` and `GiB` removes this discrepancy so that units of memory
are always interpreted using base-2 units.
* Implement HLE macro for DrawElementsIndirect
* Shader cache version bump
* Use GL_ARB_shader_draw_parameters extension on OpenGL
* Fix DrawIndexedIndirectCount on Vulkan when extension is not supported
* Implement DrawIndex
* Alignment
* Fix some validation errors
* Rename BaseIds to DrawParameters
* Fix incorrect index buffer and vertex buffer size in some cases
* Add HLE macros for DrawArraysInstanced and DrawElementsInstanced
* Perform a regular draw when indirect data is not modified
* Use non-indirect draw methods if indirect buffer was not GPU modified
* Only check if draw parameters match if the shader actually uses them
* Expose Macro HLE setting on GUI
* Reset FirstVertex and FirstInstance after draw
* Update shader cache version again since some people already tested this
* PR feedback
Co-authored-by: riperiperi <rhy3756547@hotmail.com>
* Ava: Keep command line args when restarting
* UI: Move common UI functions to ProgramHelper
Add command line option to override the configured graphics backend
* Ava: Add CleanupUpdate task back
* Remove unused usings
* Revert combining common UI functions
Rename ProgramHelper to CommandLineState
Move command line parsing to CommandLineState
* Rename CommandLineProfile to Profile
* Fix assigning the wrong array to Arguments
* Update readme to mention .NET 7
* infra: Migrate to .NET 7
.NET 7 is still in preview but this prepare for the release coming up
next month.
* Use Random.Shared in CreateRandom
* Move UInt128Utils.cs to Ryujinx.Common project
* Fix inverted parameters in System.UInt128 constructor
* Fix Visual Studio complains on Ryujinx.Graphics.Vic
* time: Fix missing alignment enforcement in SystemClockContext
Fixes at least Smash
* time: Fix missing alignment enforcement in SteadyClockContext
Fix games (like recent version of Smash) using time shared memory
* Switch to .NET 7.0.100 release
* Enable Tiered PGO
* Ensure CreateId validity requirements are meet when doing random generation
Also enforce correct packing layout for other Mii structures.
This fix a Mario Kart 8 crashes related to the default Miis.
* Vulkan: Implement multisample <-> non-multisample copies and depth-stencil resolve
* FramebufferParams is no longer required there
* Implement Specialization Constants and merge CopyMS Shaders (#15)
* Vulkan: Initial Specialization Constants
* Replace with specialized helper shader
* Reimplement everything
Fix nonexistant interaction with Ryu pipeline caching
Decouple specialization info from data and relocate them
Generalize mapping and add type enum to better match spv types
Use local fixed scopes instead of global unmanaged allocs
* Fix misses in initial implementation
Use correct info variable in Create2DLayerView
Add ShaderStorageImageMultisample to required feature set
* Use texture for source image
* No point in using ReadOnlyMemory
* Apply formatting feedback
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Apply formatting suggestions on shader source
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Support conversion with samples count that does not match the requested count, other minor changes
Co-authored-by: mageven <62494521+mageven@users.noreply.github.com>