Changed several enums from Memmap.h to be static vars and implemented Get functions to query them. This seems to have boosted speed a bit in some titles? The new variables and some previously statically initialized items are now initialized via Memory::Init() and the new AddressSpace::Init(). s_ram_size_real and the new s_exram_size_real in particular are initialized from new OnionConfig values "MAIN_MEM1_SIZE" and "MAIN_MEM2_SIZE", only if "MAIN_RAM_OVERRIDE_ENABLE" is true.
GUI features have been added to Config > Advanced to adjust the new OnionConfig values.
A check has been added to State::doState to ensure savestates with memory configurations different from the current settings aren't loaded. The STATE_VERSION is now 115.
FIFO Files have been updated from version 4 to version 5, now including the MEM1 and MEM2 sizes from the time of DFF creation. FIFO Logs not using the new features (OnionConfig MAIN_RAM_OVERRIDE_ENABLE is false) are still backwards compatible. FIFO Logs that do use the new features have a MIN_LOADER_VERSION of 5. Thanks to the order of function calls, FIFO logs are able to automatically configure the new OnionConfig settings to match what is needed. This is a bit hacky, though, so I also threw in a failsafe for if the conditions that allow this to work ever go away.
I took the liberty of adding a log message to explain why the core fails to initialize if the MIN_LOADER_VERSION is too great.
Some IOS code has had the function "RAMOverrideForIOSMemoryValues" appended to it to recalculate IOS Memory Values from retail IOSes/apploaders to fit the extended memory sizes. Worry not, if MAIN_RAM_OVERRIDE_ENABLE is false, this function does absolutely nothing.
A hotfix in DolphinQt/MenuBar.cpp has been implemented for RAM Override.
Utilizing constexpr, we can eliminate the need to construct the tables
at runtime and just do all the work at compile-time. Making for less
moving parts overall.
The general structure is more or less the same, however rather than one
single initialization function, each table is built off an immediately
executed lambda function. This is nice, since it narrows the scope of
the table building logic down to the tables that actually need it.
While having motion control emulation of IR enabled by default
makes sense in situations like using a DualShock 4 on a PC,
Android has the additional option of touch emulation of IR
which seems to be better liked, and the default value which
was chosen with PC in mind was carried over to Android
without any particular consideration. This change disables
motion control emulation of IR by default on Android only.
The code was actually already rather well adapted for this.
We more or less just have to skip ParseDisc and run
ParsePartitionData directly. This required the PartitionHeader
struct to be removed (which wasn't that useful anyway).
If we start 31 KiB into a 32 KiB block and want to mark 2 KiB
of data as used, we need to mark 2 blocks as used, not just 1.
This problem is avoided when calling MarkAsUsed from
MarkAsUsedE, since MarkAsUsedE aligns to 32 KiB on its own.
Most calls to MarkAsUsed are from MarkAsUsedE, which is why
this hasn't been a noticeable problem in the past.
The constant DESIRED_BUFFER_SIZE was determined by multiplying the
old hardcoded value 32 with the default GCZ block size 16 KiB.
Not sure if it actually is the best value, but it seems fine.
Similar to what we do for addx. Since we're calculating b - a and
because subtraction is not communitative, we can only apply this when
source register a holds the constant.
Before:
45 8B EE mov r13d,r14d
41 83 ED 08 sub r13d,8
After:
45 8D 6E F8 lea r13d,[r14-8]
We can get away with skipping the addition when we know we're dealing
with a constant zero. Just a MOV will suffice in this case.
Once again, we don't bother to add separate handling for when overflow
is needed, because no titles would ever hit that path during my testing.
Before:
8B 7D F8 mov edi,dword ptr [rbp-8]
83 C7 00 add edi,0
After:
8B 7D F8 mov edi,dword ptr [rbp-8]
ADD has a smaller encoding for immediates that can be expressed as an
8-bit signed integer (in other words, between -128 and 127). MOV lacks
this compact representation.
Since addition allows us to swap the source registers, we can always get
the shortest sequence here by carefully checking if we're dealing with a
small immediate first. If we are, move the other source into the
destination and add the small immediate onto that. For large immediates
the reverse is preferrable.
Before:
41 BE 40 00 00 00 mov r14d,40h
44 03 75 A8 add r14d,dword ptr [rbp-58h]
After:
44 8B 75 A8 mov r14d,dword ptr [rbp-58h]
41 83 C6 40 add r14d,40h
Before:
44 8B 7D F8 mov r15d,dword ptr [rbp-8]
41 81 C7 00 68 00 CC add r15d,0CC006800h
After:
41 BF 00 68 00 CC mov r15d,0CC006800h
44 03 7D F8 add r15d,dword ptr [rbp-8]
When the source registers are a simple register and a constant zero and
overflow isn't needed, emitting LEA is kinda silly.
This will occasionally save a single byte for certain registers due to
how x86 encoding works. More importantly, LEA takes up execution
resources while MOV does not.
Before:
41 8D 7D 00 lea edi,[r13]
After:
41 8B FD mov edi,r13d