melonDS/src/GPU3D_Soft.cpp

1819 lines
58 KiB
C++
Raw Normal View History

2017-02-10 08:50:26 -07:00
/*
2024-06-15 09:01:19 -06:00
Copyright 2016-2024 melonDS team
2017-02-10 08:50:26 -07:00
This file is part of melonDS.
melonDS is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option)
any later version.
melonDS is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with melonDS. If not, see http://www.gnu.org/licenses/.
*/
Allow for a more modular renderer backends (#990) * Draft GPU3D renderer modularization * Update sources C++ standard to C++17 The top-level `CMakeLists.txt` is already using the C++17 standard. * Move GLCompositor into class type Some other misc fixes to push towards better modularity * Make renderer-implementation types move-only These types are going to be holding onto handles of GPU-side resources and shouldn't ever be copied around. * Fix OSX: Remove 'register' storage class specifier `register` has been removed in C++17... But this keyword hasn't done anything in years anyways. OSX builds consider this "warning" an error and it stops the whole build. * Add RestartFrame to Renderer3D interface * Move Accelerated property to Renderer3D interface There are points in the code base where we do: `renderer != 0` to know if we are feeding an openGL renderer. Rather than that we can instead just have this be a property of the renderer itself. With this pattern a renderer can just say how it wants its data to come in rather than have everyone know that they're talking to an OpenGL renderer. * Remove Accelerated flag from GPU * Move 2D_Soft interface in separate header Also make the current 2D engine an "owned" unique_ptr. * Update alignment attribute to standard alignas Uses standardized `alignas` rather than compiler-specific attributes. https://en.cppreference.com/w/cpp/language/alignas * Fix Clang: alignas specifier Alignment must be specified before the array to align the entire array. https://en.cppreference.com/w/cpp/language/alignas * Converted Renderer3D Accelerated to variable This flag is checked a lot during scanline rasterization. So rather than having an expensive vtable-lookup call during mainline rendering code, it is now a public constant bool type that is written to only once during Renderer3D initialization.
2021-02-09 15:38:51 -07:00
#include "GPU3D_Soft.h"
2021-08-20 17:54:09 -06:00
#include <algorithm>
2017-02-10 08:50:26 -07:00
#include <stdio.h>
#include <string.h>
#include "NDS.h"
2017-03-01 16:49:44 -07:00
#include "GPU.h"
2017-02-10 08:50:26 -07:00
namespace melonDS
2017-02-10 08:50:26 -07:00
{
void RenderThreadFunc();
2017-02-10 08:50:26 -07:00
Allow for a more modular renderer backends (#990) * Draft GPU3D renderer modularization * Update sources C++ standard to C++17 The top-level `CMakeLists.txt` is already using the C++17 standard. * Move GLCompositor into class type Some other misc fixes to push towards better modularity * Make renderer-implementation types move-only These types are going to be holding onto handles of GPU-side resources and shouldn't ever be copied around. * Fix OSX: Remove 'register' storage class specifier `register` has been removed in C++17... But this keyword hasn't done anything in years anyways. OSX builds consider this "warning" an error and it stops the whole build. * Add RestartFrame to Renderer3D interface * Move Accelerated property to Renderer3D interface There are points in the code base where we do: `renderer != 0` to know if we are feeding an openGL renderer. Rather than that we can instead just have this be a property of the renderer itself. With this pattern a renderer can just say how it wants its data to come in rather than have everyone know that they're talking to an OpenGL renderer. * Remove Accelerated flag from GPU * Move 2D_Soft interface in separate header Also make the current 2D engine an "owned" unique_ptr. * Update alignment attribute to standard alignas Uses standardized `alignas` rather than compiler-specific attributes. https://en.cppreference.com/w/cpp/language/alignas * Fix Clang: alignas specifier Alignment must be specified before the array to align the entire array. https://en.cppreference.com/w/cpp/language/alignas * Converted Renderer3D Accelerated to variable This flag is checked a lot during scanline rasterization. So rather than having an expensive vtable-lookup call during mainline rendering code, it is now a public constant bool type that is written to only once during Renderer3D initialization.
2021-02-09 15:38:51 -07:00
void SoftRenderer::StopRenderThread()
{
if (RenderThreadRunning.load(std::memory_order_relaxed))
{
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// Tell the render thread to stop drawing new frames, and finish up the current one.
RenderThreadRunning = false;
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
Platform::Semaphore_Post(Sema_RenderStart);
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
Platform::Thread_Wait(RenderThread);
Platform::Thread_Free(RenderThread);
RenderThread = nullptr;
}
}
void SoftRenderer::SetupRenderThread(GPU& gpu)
{
if (Threaded)
{
if (!RenderThreadRunning.load(std::memory_order_relaxed))
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
{ // If the render thread isn't already running...
RenderThreadRunning = true; // "Time for work, render thread!"
RenderThread = Platform::Thread_Create([this, &gpu]() {
RenderThreadFunc(gpu);
});
}
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// "Be on standby, but don't start rendering until I tell you to!"
Platform::Semaphore_Reset(Sema_RenderStart);
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// "Oh, sorry, were you already in the middle of a frame from the last iteration?"
if (RenderThreadRendering)
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// "Tell me when you're done, I'll wait here."
Platform::Semaphore_Wait(Sema_RenderDone);
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// "All good? Okay, let me give you your training."
// "(Maybe you're still the same thread, but I have to tell you this stuff anyway.)"
// "This is the signal you'll send when you're done with a frame."
// "I'll listen for it when I need to show something to the frontend."
Platform::Semaphore_Reset(Sema_RenderDone);
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// "This is the signal I'll send when I want you to start rendering."
// "Don't do anything until you get the message."
Platform::Semaphore_Reset(Sema_RenderStart);
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// "This is the signal you'll send every time you finish drawing a line."
// "I might need some of your scanlines before you finish the whole buffer,"
// "so let me know as soon as you're done with each one."
Platform::Semaphore_Reset(Sema_ScanlineCount);
}
else
{
StopRenderThread();
}
}
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
void SoftRenderer::EnableRenderThread()
{
if (Threaded && Sema_RenderStart)
{
Platform::Semaphore_Post(Sema_RenderStart);
}
}
Compute shader renderer (#2041) * nothing works yet * don't double buffer 3D framebuffers for the GL Renderer looks like leftovers from when 3D+2D composition was done in the frontend * oops * it works! * implement display capture for compute renderer it's actually just all stolen from the regular OpenGL renderer * fix bad indirect call * handle cleanup properly * add hires rendering to the compute shader renderer * fix UB also misc changes to use more unsigned multiplication also fix framebuffer resize * correct edge filling behaviour when AA is disabled * fix full color textures * fix edge marking (polygon id is 6-bit not 5) also make the code a bit nicer * take all edge cases into account for XMin/XMax calculation * use hires coordinate again * stop using fixed size buffers based on scale factor in shaders this makes shader compile times tolerable on Wintel - beginning of the shader cache - increase size of tile idx in workdesc to 20 bits * apparently & is not defined on bvec4 why does this even compile on Intel and Nvidia? * put the texture cache into it's own file * add compute shader renderer properly to the GUI also add option to toggle using high resolution vertex coordinates * unbind sampler object in compute shader renderer * fix GetRangedBitMask for 64 bit aligned 64 bits pretty embarassing * convert NonStupidBitfield.h back to LF only new lines * actually adapt to latest changes * fix stupid merge * actually make compute shader renderer work with newest changes * show progress on shader compilation * remove merge leftover
2024-05-13 09:17:39 -06:00
SoftRenderer::SoftRenderer() noexcept
: Renderer3D(false)
2017-02-10 08:50:26 -07:00
{
Sema_RenderStart = Platform::Semaphore_Create();
Sema_RenderDone = Platform::Semaphore_Create();
Sema_ScanlineCount = Platform::Semaphore_Create();
RenderThreadRunning = false;
2017-05-25 17:22:11 -06:00
RenderThreadRendering = false;
RenderThread = nullptr;
2017-02-10 08:50:26 -07:00
}
SoftRenderer::~SoftRenderer()
2017-02-10 08:50:26 -07:00
{
StopRenderThread();
2017-05-25 17:22:11 -06:00
Platform::Semaphore_Free(Sema_RenderStart);
Platform::Semaphore_Free(Sema_RenderDone);
Platform::Semaphore_Free(Sema_ScanlineCount);
2017-02-10 08:50:26 -07:00
}
void SoftRenderer::Reset(GPU& gpu)
2017-02-10 08:50:26 -07:00
{
memset(ColorBuffer, 0, BufferSize * 2 * 4);
memset(DepthBuffer, 0, BufferSize * 2 * 4);
memset(AttrBuffer, 0, BufferSize * 2 * 4);
2017-05-25 20:00:15 -06:00
PrevIsShadowMask = false;
SetupRenderThread(gpu);
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
EnableRenderThread();
2017-02-10 08:50:26 -07:00
}
void SoftRenderer::SetThreaded(bool threaded, GPU& gpu) noexcept
{
Clean up the 3D renderer for enhanced flexibility (#1895) * Give `GPU2D::Unit` a virtual destructor - Undefined behavior avoided! * Add an `array2d` alias * Move various parts of `GPU2D::SoftRenderer` to `constexpr` - `SoftRenderer::MosaicTable` is now initialized at compile-time - Most of the `SoftRenderer::Color*` functions are now `constexpr` - The aforementioned functions are used with a constant value in at least one place, so they'll be at least partially computed at compile-time * Generalize `GLRenderer::PrepareCaptureFrame` - Declare it in the base `Renderer3D` class, but make it empty * Remove unneeded `virtual` specifiers * Store `Framebuffer`'s memory in `unique_ptr`s - Reduce the risk of leaks this way * Clean up how `GLCompositor` is initialized - Return it as an `std::optional` instead of a `std::unique_ptr` - Make `GLCompositor` movable - Replace `GLCompositor`'s plain arrays with `std::array` to simplify moving * Pass `GPU` to `GLCompositor`'s important functions instead of passing it to the constructor * Move `GLCompositor` to be a field within `GLRenderer` - Some methods were moved up and made `virtual` * Fix some linker errors * Set the renderer in the frontend * Remove unneeded `virtual` specifiers * Remove `RenderSettings` in favor of just exposing the relevant member variables * Update the frontend to accommodate the core changes * Add `constexpr` and `const` to places in the interpolator * Qualify references to `size_t` * Construct the `optional` directly instead of using `make_optional` - It makes the Linux build choke - I think it's because `GLCompositor`'s constructor is `private`
2023-11-29 07:23:11 -07:00
if (Threaded != threaded)
{
Threaded = threaded;
SetupRenderThread(gpu);
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
EnableRenderThread();
Clean up the 3D renderer for enhanced flexibility (#1895) * Give `GPU2D::Unit` a virtual destructor - Undefined behavior avoided! * Add an `array2d` alias * Move various parts of `GPU2D::SoftRenderer` to `constexpr` - `SoftRenderer::MosaicTable` is now initialized at compile-time - Most of the `SoftRenderer::Color*` functions are now `constexpr` - The aforementioned functions are used with a constant value in at least one place, so they'll be at least partially computed at compile-time * Generalize `GLRenderer::PrepareCaptureFrame` - Declare it in the base `Renderer3D` class, but make it empty * Remove unneeded `virtual` specifiers * Store `Framebuffer`'s memory in `unique_ptr`s - Reduce the risk of leaks this way * Clean up how `GLCompositor` is initialized - Return it as an `std::optional` instead of a `std::unique_ptr` - Make `GLCompositor` movable - Replace `GLCompositor`'s plain arrays with `std::array` to simplify moving * Pass `GPU` to `GLCompositor`'s important functions instead of passing it to the constructor * Move `GLCompositor` to be a field within `GLRenderer` - Some methods were moved up and made `virtual` * Fix some linker errors * Set the renderer in the frontend * Remove unneeded `virtual` specifiers * Remove `RenderSettings` in favor of just exposing the relevant member variables * Update the frontend to accommodate the core changes * Add `constexpr` and `const` to places in the interpolator * Qualify references to `size_t` * Construct the `optional` directly instead of using `make_optional` - It makes the Linux build choke - I think it's because `GLCompositor`'s constructor is `private`
2023-11-29 07:23:11 -07:00
}
}
void SoftRenderer::TextureLookup(const GPU& gpu, u32 texparam, u32 texpal, s16 s, s16 t, u16* color, u8* alpha) const
2017-02-13 19:29:02 -07:00
{
2017-03-01 16:49:44 -07:00
u32 vramaddr = (texparam & 0xFFFF) << 3;
2017-05-21 12:14:03 -06:00
s32 width = 8 << ((texparam >> 20) & 0x7);
s32 height = 8 << ((texparam >> 23) & 0x7);
2017-03-01 16:49:44 -07:00
s >>= 4;
t >>= 4;
// texture wrapping
// TODO: optimize this somehow
2017-04-21 14:40:15 -06:00
// testing shows that it's hardly worth optimizing, actually
if (texparam & (1<<16))
{
if (texparam & (1<<18))
{
if (s & width) s = (width-1) - (s & (width-1));
else s = (s & (width-1));
}
else
s &= width-1;
}
else
{
if (s < 0) s = 0;
else if (s >= width) s = width-1;
}
if (texparam & (1<<17))
{
if (texparam & (1<<19))
{
if (t & height) t = (height-1) - (t & (height-1));
else t = (t & (height-1));
}
else
t &= height-1;
}
else
{
if (t < 0) t = 0;
else if (t >= height) t = height-1;
}
2017-03-01 16:49:44 -07:00
u8 alpha0;
if (texparam & (1<<29)) alpha0 = 0;
else alpha0 = 31;
2017-03-01 16:49:44 -07:00
switch ((texparam >> 26) & 0x7)
{
case 1: // A3I5
2017-03-01 16:49:44 -07:00
{
vramaddr += ((t * width) + s);
u8 pixel = gpu.ReadVRAMFlat_Texture<u8>(vramaddr);
2017-03-01 16:49:44 -07:00
texpal <<= 4;
*color = gpu.ReadVRAMFlat_TexPal<u16>(texpal + ((pixel&0x1F)<<1));
*alpha = ((pixel >> 3) & 0x1C) + (pixel >> 6);
}
break;
case 2: // 4-color
{
vramaddr += (((t * width) + s) >> 2);
u8 pixel = gpu.ReadVRAMFlat_Texture<u8>(vramaddr);
pixel >>= ((s & 0x3) << 1);
pixel &= 0x3;
2017-03-01 16:49:44 -07:00
texpal <<= 3;
*color = gpu.ReadVRAMFlat_TexPal<u16>(texpal + (pixel<<1));
*alpha = (pixel==0) ? alpha0 : 31;
2017-03-01 16:49:44 -07:00
}
break;
case 3: // 16-color
2017-03-01 16:49:44 -07:00
{
vramaddr += (((t * width) + s) >> 1);
u8 pixel = gpu.ReadVRAMFlat_Texture<u8>(vramaddr);
if (s & 0x1) pixel >>= 4;
else pixel &= 0xF;
2017-03-01 16:49:44 -07:00
texpal <<= 4;
*color = gpu.ReadVRAMFlat_TexPal<u16>(texpal + (pixel<<1));
*alpha = (pixel==0) ? alpha0 : 31;
}
break;
case 4: // 256-color
{
vramaddr += ((t * width) + s);
u8 pixel = gpu.ReadVRAMFlat_Texture<u8>(vramaddr);
texpal <<= 4;
*color = gpu.ReadVRAMFlat_TexPal<u16>(texpal + (pixel<<1));
*alpha = (pixel==0) ? alpha0 : 31;
}
break;
case 5: // compressed
{
vramaddr += ((t & 0x3FC) * (width>>2)) + (s & 0x3FC);
vramaddr += (t & 0x3);
vramaddr &= 0x7FFFF; // address used for all calcs wraps around after slot 3
u32 slot1addr = 0x20000 + ((vramaddr & 0x1FFFC) >> 1);
if (vramaddr >= 0x40000)
slot1addr += 0x10000;
u8 val;
if (vramaddr >= 0x20000 && vramaddr < 0x40000) // reading slot 1 for texels should always read 0
val = 0;
else
{
val = gpu.ReadVRAMFlat_Texture<u8>(vramaddr);
val >>= (2 * (s & 0x3));
}
u16 palinfo = gpu.ReadVRAMFlat_Texture<u16>(slot1addr);
u32 paloffset = (palinfo & 0x3FFF) << 2;
texpal <<= 4;
switch (val & 0x3)
{
case 0:
*color = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset);
*alpha = 31;
break;
case 1:
*color = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset + 2);
*alpha = 31;
break;
case 2:
if ((palinfo >> 14) == 1)
{
u16 color0 = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset);
u16 color1 = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset + 2);
u32 r0 = color0 & 0x001F;
u32 g0 = color0 & 0x03E0;
u32 b0 = color0 & 0x7C00;
u32 r1 = color1 & 0x001F;
u32 g1 = color1 & 0x03E0;
u32 b1 = color1 & 0x7C00;
u32 r = (r0 + r1) >> 1;
u32 g = ((g0 + g1) >> 1) & 0x03E0;
u32 b = ((b0 + b1) >> 1) & 0x7C00;
*color = r | g | b;
}
else if ((palinfo >> 14) == 3)
{
u16 color0 = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset);
u16 color1 = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset + 2);
u32 r0 = color0 & 0x001F;
u32 g0 = color0 & 0x03E0;
u32 b0 = color0 & 0x7C00;
u32 r1 = color1 & 0x001F;
u32 g1 = color1 & 0x03E0;
u32 b1 = color1 & 0x7C00;
u32 r = (r0*5 + r1*3) >> 3;
u32 g = ((g0*5 + g1*3) >> 3) & 0x03E0;
u32 b = ((b0*5 + b1*3) >> 3) & 0x7C00;
*color = r | g | b;
}
else
*color = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset + 4);
*alpha = 31;
break;
case 3:
if ((palinfo >> 14) == 2)
{
*color = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset + 6);
*alpha = 31;
}
else if ((palinfo >> 14) == 3)
{
u16 color0 = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset);
u16 color1 = gpu.ReadVRAMFlat_TexPal<u16>(texpal + paloffset + 2);
u32 r0 = color0 & 0x001F;
u32 g0 = color0 & 0x03E0;
u32 b0 = color0 & 0x7C00;
u32 r1 = color1 & 0x001F;
u32 g1 = color1 & 0x03E0;
u32 b1 = color1 & 0x7C00;
u32 r = (r0*3 + r1*5) >> 3;
u32 g = ((g0*3 + g1*5) >> 3) & 0x03E0;
u32 b = ((b0*3 + b1*5) >> 3) & 0x7C00;
*color = r | g | b;
*alpha = 31;
}
else
{
*color = 0;
*alpha = 0;
}
break;
}
}
break;
case 6: // A5I3
{
vramaddr += ((t * width) + s);
u8 pixel = gpu.ReadVRAMFlat_Texture<u8>(vramaddr);
texpal <<= 4;
*color = gpu.ReadVRAMFlat_TexPal<u16>(texpal + ((pixel&0x7)<<1));
*alpha = (pixel >> 3);
2017-03-01 16:49:44 -07:00
}
break;
case 7: // direct color
{
vramaddr += (((t * width) + s) << 1);
*color = gpu.ReadVRAMFlat_Texture<u16>(vramaddr);
*alpha = (*color & 0x8000) ? 31 : 0;
}
2017-03-01 16:49:44 -07:00
break;
}
}
// depth test is 'less or equal' instead of 'less than' under the following conditions:
// * when drawing a front-facing pixel over an opaque back-facing pixel
// * when drawing wireframe edges, under certain conditions (TODO)
//
// range is different based on depth-buffering mode
// Z-buffering: +-0x200
// W-buffering: +-0xFF
bool DepthTest_Equal_Z(s32 dstz, s32 z, u32 dstattr)
{
s32 diff = dstz - z;
if ((u32)(diff + 0x200) <= 0x400)
return true;
return false;
}
bool DepthTest_Equal_W(s32 dstz, s32 z, u32 dstattr)
{
s32 diff = dstz - z;
if ((u32)(diff + 0xFF) <= 0x1FE)
return true;
return false;
}
bool DepthTest_LessThan(s32 dstz, s32 z, u32 dstattr)
2017-03-01 16:49:44 -07:00
{
if (z < dstz)
return true;
return false;
}
bool DepthTest_LessThan_FrontFacing(s32 dstz, s32 z, u32 dstattr)
{
if ((dstattr & 0x00400010) == 0x00000010) // opaque, back facing
2017-02-13 19:29:02 -07:00
{
if (z <= dstz)
2017-03-12 17:45:26 -06:00
return true;
2017-02-13 19:29:02 -07:00
}
else
{
if (z < dstz)
return true;
}
2017-02-13 19:29:02 -07:00
2017-03-12 17:45:26 -06:00
return false;
}
2017-02-13 19:29:02 -07:00
u32 SoftRenderer::AlphaBlend(const GPU3D& gpu3d, u32 srccolor, u32 dstcolor, u32 alpha) const noexcept
{
u32 dstalpha = dstcolor >> 24;
if (dstalpha == 0)
return srccolor;
u32 srcR = srccolor & 0x3F;
u32 srcG = (srccolor >> 8) & 0x3F;
u32 srcB = (srccolor >> 16) & 0x3F;
if (gpu3d.RenderDispCnt & (1<<3))
{
u32 dstR = dstcolor & 0x3F;
u32 dstG = (dstcolor >> 8) & 0x3F;
u32 dstB = (dstcolor >> 16) & 0x3F;
alpha++;
srcR = ((srcR * alpha) + (dstR * (32-alpha))) >> 5;
srcG = ((srcG * alpha) + (dstG * (32-alpha))) >> 5;
srcB = ((srcB * alpha) + (dstB * (32-alpha))) >> 5;
alpha--;
}
if (alpha > dstalpha)
dstalpha = alpha;
return srcR | (srcG << 8) | (srcB << 16) | (dstalpha << 24);
}
u32 SoftRenderer::RenderPixel(const GPU& gpu, const Polygon* polygon, u8 vr, u8 vg, u8 vb, s16 s, s16 t) const
2017-03-12 17:45:26 -06:00
{
u8 r, g, b, a;
2017-03-01 16:49:44 -07:00
u32 blendmode = (polygon->Attr >> 4) & 0x3;
u32 polyalpha = (polygon->Attr >> 16) & 0x1F;
bool wireframe = (polyalpha == 0);
2017-03-01 16:49:44 -07:00
if (blendmode == 2)
{
if (gpu.GPU3D.RenderDispCnt & (1<<1))
{
// highlight mode: color is calculated normally
// except all vertex color components are set
// to the red component
// the toon color is added to the final color
vg = vr;
vb = vr;
}
else
{
// toon mode: vertex color is replaced by toon color
u16 tooncolor = gpu.GPU3D.RenderToonTable[vr >> 1];
vr = (tooncolor << 1) & 0x3E; if (vr) vr++;
vg = (tooncolor >> 4) & 0x3E; if (vg) vg++;
vb = (tooncolor >> 9) & 0x3E; if (vb) vb++;
}
}
if ((gpu.GPU3D.RenderDispCnt & (1<<0)) && (((polygon->TexParam >> 26) & 0x7) != 0))
{
2017-03-01 16:49:44 -07:00
u8 tr, tg, tb;
u16 tcolor; u8 talpha;
TextureLookup(gpu, polygon->TexParam, polygon->TexPalette, s, t, &tcolor, &talpha);
tr = (tcolor << 1) & 0x3E; if (tr) tr++;
tg = (tcolor >> 4) & 0x3E; if (tg) tg++;
tb = (tcolor >> 9) & 0x3E; if (tb) tb++;
2017-03-01 16:49:44 -07:00
2017-04-21 14:40:15 -06:00
if (blendmode & 0x1)
{
// decal
if (talpha == 0)
{
r = vr;
g = vg;
b = vb;
}
else if (talpha == 31)
{
r = tr;
g = tg;
b = tb;
}
else
{
r = ((tr * talpha) + (vr * (31-talpha))) >> 5;
g = ((tg * talpha) + (vg * (31-talpha))) >> 5;
b = ((tb * talpha) + (vb * (31-talpha))) >> 5;
}
a = polyalpha;
}
else
{
// modulate
r = ((tr+1) * (vr+1) - 1) >> 6;
g = ((tg+1) * (vg+1) - 1) >> 6;
b = ((tb+1) * (vb+1) - 1) >> 6;
a = ((talpha+1) * (polyalpha+1) - 1) >> 5;
}
2017-03-01 16:49:44 -07:00
}
else
{
r = vr;
g = vg;
b = vb;
a = polyalpha;
2017-03-01 16:49:44 -07:00
}
if ((blendmode == 2) && (gpu.GPU3D.RenderDispCnt & (1<<1)))
{
u16 tooncolor = gpu.GPU3D.RenderToonTable[vr >> 1];
vr = (tooncolor << 1) & 0x3E; if (vr) vr++;
vg = (tooncolor >> 4) & 0x3E; if (vg) vg++;
vb = (tooncolor >> 9) & 0x3E; if (vb) vb++;
r += vr;
g += vg;
b += vb;
if (r > 63) r = 63;
if (g > 63) g = 63;
if (b > 63) b = 63;
2017-04-13 11:53:09 -06:00
}
// checkme: can wireframe polygons use texture alpha?
if (wireframe) a = 31;
2017-03-12 17:45:26 -06:00
return r | (g << 8) | (b << 16) | (a << 24);
2017-02-13 19:29:02 -07:00
}
void SoftRenderer::PlotTranslucentPixel(const GPU3D& gpu3d, u32 pixeladdr, u32 color, u32 z, u32 polyattr, u32 shadow)
{
u32 dstattr = AttrBuffer[pixeladdr];
u32 attr = (polyattr & 0xE0F0) | ((polyattr >> 8) & 0xFF0000) | (1<<22) | (dstattr & 0xFF001F0F);
if (shadow)
{
// for shadows, opaque pixels are also checked
if (dstattr & (1<<22))
{
if ((dstattr & 0x007F0000) == (attr & 0x007F0000))
return;
}
else
{
if ((dstattr & 0x3F000000) == (polyattr & 0x3F000000))
return;
}
}
else
{
// skip if translucent polygon IDs are equal
if ((dstattr & 0x007F0000) == (attr & 0x007F0000))
return;
}
// fog flag
if (!(dstattr & (1<<15)))
attr &= ~(1<<15);
color = AlphaBlend(gpu3d, color, ColorBuffer[pixeladdr], color>>24);
if (z != -1)
DepthBuffer[pixeladdr] = z;
ColorBuffer[pixeladdr] = color;
AttrBuffer[pixeladdr] = attr;
}
void SoftRenderer::SetupPolygonLeftEdge(SoftRenderer::RendererPolygon* rp, s32 y) const
2017-05-21 12:14:03 -06:00
{
Polygon* polygon = rp->PolyData;
while (y >= polygon->Vertices[rp->NextVL]->FinalPosition[1] && rp->CurVL != polygon->VBottom)
{
rp->CurVL = rp->NextVL;
if (polygon->FacingView)
{
rp->NextVL = rp->CurVL + 1;
if (rp->NextVL >= polygon->NumVertices)
rp->NextVL = 0;
}
else
{
rp->NextVL = rp->CurVL - 1;
if ((s32)rp->NextVL < 0)
rp->NextVL = polygon->NumVertices - 1;
}
}
rp->XL = rp->SlopeL.Setup(polygon->Vertices[rp->CurVL]->FinalPosition[0], polygon->Vertices[rp->NextVL]->FinalPosition[0],
polygon->Vertices[rp->CurVL]->FinalPosition[1], polygon->Vertices[rp->NextVL]->FinalPosition[1],
polygon->FinalW[rp->CurVL], polygon->FinalW[rp->NextVL], y);
2017-05-21 12:14:03 -06:00
}
void SoftRenderer::SetupPolygonRightEdge(SoftRenderer::RendererPolygon* rp, s32 y) const
2017-05-21 12:14:03 -06:00
{
Polygon* polygon = rp->PolyData;
while (y >= polygon->Vertices[rp->NextVR]->FinalPosition[1] && rp->CurVR != polygon->VBottom)
{
rp->CurVR = rp->NextVR;
if (polygon->FacingView)
{
rp->NextVR = rp->CurVR - 1;
if ((s32)rp->NextVR < 0)
rp->NextVR = polygon->NumVertices - 1;
}
else
{
rp->NextVR = rp->CurVR + 1;
if (rp->NextVR >= polygon->NumVertices)
rp->NextVR = 0;
}
}
rp->XR = rp->SlopeR.Setup(polygon->Vertices[rp->CurVR]->FinalPosition[0], polygon->Vertices[rp->NextVR]->FinalPosition[0],
polygon->Vertices[rp->CurVR]->FinalPosition[1], polygon->Vertices[rp->NextVR]->FinalPosition[1],
polygon->FinalW[rp->CurVR], polygon->FinalW[rp->NextVR], y);
2017-05-21 12:14:03 -06:00
}
void SoftRenderer::SetupPolygon(SoftRenderer::RendererPolygon* rp, Polygon* polygon) const
2017-05-21 12:14:03 -06:00
{
u32 nverts = polygon->NumVertices;
u32 vtop = polygon->VTop, vbot = polygon->VBottom;
s32 ytop = polygon->YTop, ybot = polygon->YBottom;
rp->PolyData = polygon;
rp->CurVL = vtop;
rp->CurVR = vtop;
if (polygon->FacingView)
{
rp->NextVL = rp->CurVL + 1;
if (rp->NextVL >= nverts) rp->NextVL = 0;
rp->NextVR = rp->CurVR - 1;
if ((s32)rp->NextVR < 0) rp->NextVR = nverts - 1;
}
else
{
rp->NextVL = rp->CurVL - 1;
if ((s32)rp->NextVL < 0) rp->NextVL = nverts - 1;
rp->NextVR = rp->CurVR + 1;
if (rp->NextVR >= nverts) rp->NextVR = 0;
}
if (ybot == ytop)
{
vtop = 0; vbot = 0;
int i;
i = 1;
if (polygon->Vertices[i]->FinalPosition[0] < polygon->Vertices[vtop]->FinalPosition[0]) vtop = i;
if (polygon->Vertices[i]->FinalPosition[0] > polygon->Vertices[vbot]->FinalPosition[0]) vbot = i;
i = nverts - 1;
if (polygon->Vertices[i]->FinalPosition[0] < polygon->Vertices[vtop]->FinalPosition[0]) vtop = i;
if (polygon->Vertices[i]->FinalPosition[0] > polygon->Vertices[vbot]->FinalPosition[0]) vbot = i;
rp->CurVL = vtop; rp->NextVL = vtop;
rp->CurVR = vbot; rp->NextVR = vbot;
rp->XL = rp->SlopeL.SetupDummy(polygon->Vertices[rp->CurVL]->FinalPosition[0]);
rp->XR = rp->SlopeR.SetupDummy(polygon->Vertices[rp->CurVR]->FinalPosition[0]);
2017-05-21 12:14:03 -06:00
}
else
{
SetupPolygonLeftEdge(rp, ytop);
SetupPolygonRightEdge(rp, ytop);
}
}
void SoftRenderer::RenderShadowMaskScanline(const GPU3D& gpu3d, RendererPolygon* rp, s32 y)
2017-05-21 12:14:03 -06:00
{
Polygon* polygon = rp->PolyData;
u32 polyattr = (polygon->Attr & 0x3F008000);
if (!polygon->FacingView) polyattr |= (1<<4);
2017-05-21 12:14:03 -06:00
u32 polyalpha = (polygon->Attr >> 16) & 0x1F;
bool wireframe = (polyalpha == 0);
bool (*fnDepthTest)(s32 dstz, s32 z, u32 dstattr);
2017-05-21 12:14:03 -06:00
if (polygon->Attr & (1<<14))
fnDepthTest = polygon->WBuffer ? DepthTest_Equal_W : DepthTest_Equal_Z;
else if (polygon->FacingView)
fnDepthTest = DepthTest_LessThan_FrontFacing;
2017-05-21 12:14:03 -06:00
else
fnDepthTest = DepthTest_LessThan;
2017-05-21 12:14:03 -06:00
if (!PrevIsShadowMask)
2017-05-22 14:29:21 -06:00
memset(&StencilBuffer[256 * (y&0x1)], 0, 256);
PrevIsShadowMask = true;
if (polygon->YTop != polygon->YBottom)
{
if (y >= polygon->Vertices[rp->NextVL]->FinalPosition[1] && rp->CurVL != polygon->VBottom)
{
SetupPolygonLeftEdge(rp, y);
}
if (y >= polygon->Vertices[rp->NextVR]->FinalPosition[1] && rp->CurVR != polygon->VBottom)
{
SetupPolygonRightEdge(rp, y);
}
}
Vertex *vlcur, *vlnext, *vrcur, *vrnext;
s32 xstart, xend;
bool l_filledge, r_filledge;
s32 l_edgelen, r_edgelen;
s32 l_edgecov, r_edgecov;
Interpolator<1>* interp_start;
Interpolator<1>* interp_end;
xstart = rp->XL;
xend = rp->XR;
s32 wl = rp->SlopeL.Interp.Interpolate(polygon->FinalW[rp->CurVL], polygon->FinalW[rp->NextVL]);
s32 wr = rp->SlopeR.Interp.Interpolate(polygon->FinalW[rp->CurVR], polygon->FinalW[rp->NextVR]);
s32 zl = rp->SlopeL.Interp.InterpolateZ(polygon->FinalZ[rp->CurVL], polygon->FinalZ[rp->NextVL], polygon->WBuffer);
s32 zr = rp->SlopeR.Interp.InterpolateZ(polygon->FinalZ[rp->CurVR], polygon->FinalZ[rp->NextVR], polygon->WBuffer);
// right vertical edges are pushed 1px to the left as long as either:
// the left edge slope is not 0, or the span is not 0 pixels wide, and it is not at the leftmost pixel of the screen
if (rp->SlopeR.Increment==0 && (rp->SlopeL.Increment!=0 || xstart != xend) && (xend != 0))
xend--;
// if the left and right edges are swapped, render backwards.
if (xstart > xend)
{
vlcur = polygon->Vertices[rp->CurVR];
vlnext = polygon->Vertices[rp->NextVR];
vrcur = polygon->Vertices[rp->CurVL];
vrnext = polygon->Vertices[rp->NextVL];
interp_start = &rp->SlopeR.Interp;
interp_end = &rp->SlopeL.Interp;
rp->SlopeR.EdgeParams<true>(&l_edgelen, &l_edgecov);
rp->SlopeL.EdgeParams<true>(&r_edgelen, &r_edgecov);
2021-08-20 17:54:09 -06:00
std::swap(xstart, xend);
std::swap(wl, wr);
std::swap(zl, zr);
2023-11-03 17:21:46 -06:00
// CHECKME: edge fill rules for swapped opaque shadow mask polygons
if ((gpu3d.RenderDispCnt & ((1<<4)|(1<<5))) || ((polyalpha < 31) && (gpu3d.RenderDispCnt & (1<<3))) || wireframe)
{
l_filledge = true;
r_filledge = true;
}
else
{
l_filledge = (rp->SlopeR.Negative || !rp->SlopeR.XMajor)
|| (y == polygon->YBottom-1) && rp->SlopeR.XMajor && (vlnext->FinalPosition[0] != vrnext->FinalPosition[0]);
r_filledge = (!rp->SlopeL.Negative && rp->SlopeL.XMajor)
|| (!(rp->SlopeL.Negative && rp->SlopeL.XMajor) && rp->SlopeR.Increment==0)
|| (y == polygon->YBottom-1) && rp->SlopeL.XMajor && (vlnext->FinalPosition[0] != vrnext->FinalPosition[0]);
}
}
else
{
vlcur = polygon->Vertices[rp->CurVL];
vlnext = polygon->Vertices[rp->NextVL];
vrcur = polygon->Vertices[rp->CurVR];
vrnext = polygon->Vertices[rp->NextVR];
interp_start = &rp->SlopeL.Interp;
interp_end = &rp->SlopeR.Interp;
rp->SlopeL.EdgeParams<false>(&l_edgelen, &l_edgecov);
rp->SlopeR.EdgeParams<false>(&r_edgelen, &r_edgecov);
2023-11-03 17:21:46 -06:00
// CHECKME: edge fill rules for unswapped opaque shadow mask polygons
if ((gpu3d.RenderDispCnt & ((1<<4)|(1<<5))) || ((polyalpha < 31) && (gpu3d.RenderDispCnt & (1<<3))) || wireframe)
{
l_filledge = true;
r_filledge = true;
}
else
{
l_filledge = ((rp->SlopeL.Negative || !rp->SlopeL.XMajor)
|| (y == polygon->YBottom-1) && rp->SlopeL.XMajor && (vlnext->FinalPosition[0] != vrnext->FinalPosition[0]))
|| (rp->SlopeL.Increment == rp->SlopeR.Increment) && (xstart+l_edgelen == xend+1);
r_filledge = (!rp->SlopeR.Negative && rp->SlopeR.XMajor) || (rp->SlopeR.Increment==0)
|| (y == polygon->YBottom-1) && rp->SlopeR.XMajor && (vlnext->FinalPosition[0] != vrnext->FinalPosition[0]);
}
}
// color/texcoord attributes aren't needed for shadow masks
// all the pixels are guaranteed to have the same alpha
// even if a texture is used (decal blending is used for shadows)
// similarly, we can perform alpha test early (checkme)
if (wireframe) polyalpha = 31;
if (polyalpha <= gpu3d.RenderAlphaRef) return;
// in wireframe mode, there are special rules for equal Z (TODO)
int yedge = 0;
if (y == polygon->YTop) yedge = 0x4;
else if (y == polygon->YBottom-1) yedge = 0x8;
int edge;
s32 x = xstart;
Interpolator<0> interpX(xstart, xend+1, wl, wr);
if (x < 0) x = 0;
s32 xlimit;
// for shadow masks: set stencil bits where the depth test fails.
// draw nothing.
// part 1: left edge
edge = yedge | 0x1;
xlimit = xstart+l_edgelen;
if (xlimit > xend+1) xlimit = xend+1;
if (xlimit > 256) xlimit = 256;
if (!l_filledge) x = xlimit;
else
for (; x < xlimit; x++)
{
u32 pixeladdr = FirstPixelOffset + (y*ScanlineWidth) + x;
interpX.SetX(x);
s32 z = interpX.InterpolateZ(zl, zr, polygon->WBuffer);
u32 dstattr = AttrBuffer[pixeladdr];
if (!fnDepthTest(DepthBuffer[pixeladdr], z, dstattr))
StencilBuffer[256*(y&0x1) + x] = 1;
if (dstattr & 0xF)
{
pixeladdr += BufferSize;
if (!fnDepthTest(DepthBuffer[pixeladdr], z, AttrBuffer[pixeladdr]))
StencilBuffer[256*(y&0x1) + x] |= 0x2;
}
}
// part 2: polygon inside
edge = yedge;
xlimit = xend-r_edgelen+1;
if (xlimit > xend+1) xlimit = xend+1;
if (xlimit > 256) xlimit = 256;
if (wireframe && !edge) x = std::max(x, xlimit);
else for (; x < xlimit; x++)
{
u32 pixeladdr = FirstPixelOffset + (y*ScanlineWidth) + x;
interpX.SetX(x);
s32 z = interpX.InterpolateZ(zl, zr, polygon->WBuffer);
u32 dstattr = AttrBuffer[pixeladdr];
if (!fnDepthTest(DepthBuffer[pixeladdr], z, dstattr))
StencilBuffer[256*(y&0x1) + x] = 1;
if (dstattr & 0xF)
{
pixeladdr += BufferSize;
if (!fnDepthTest(DepthBuffer[pixeladdr], z, AttrBuffer[pixeladdr]))
StencilBuffer[256*(y&0x1) + x] |= 0x2;
}
}
// part 3: right edge
edge = yedge | 0x2;
xlimit = xend+1;
if (xlimit > 256) xlimit = 256;
2023-11-03 17:21:46 -06:00
if (r_filledge)
for (; x < xlimit; x++)
{
u32 pixeladdr = FirstPixelOffset + (y*ScanlineWidth) + x;
interpX.SetX(x);
s32 z = interpX.InterpolateZ(zl, zr, polygon->WBuffer);
u32 dstattr = AttrBuffer[pixeladdr];
if (!fnDepthTest(DepthBuffer[pixeladdr], z, dstattr))
StencilBuffer[256*(y&0x1) + x] = 1;
if (dstattr & 0xF)
{
pixeladdr += BufferSize;
if (!fnDepthTest(DepthBuffer[pixeladdr], z, AttrBuffer[pixeladdr]))
StencilBuffer[256*(y&0x1) + x] |= 0x2;
}
}
rp->XL = rp->SlopeL.Step();
rp->XR = rp->SlopeR.Step();
}
void SoftRenderer::RenderPolygonScanline(const GPU& gpu, RendererPolygon* rp, s32 y)
{
Polygon* polygon = rp->PolyData;
u32 polyattr = (polygon->Attr & 0x3F008000);
if (!polygon->FacingView) polyattr |= (1<<4);
u32 polyalpha = (polygon->Attr >> 16) & 0x1F;
bool wireframe = (polyalpha == 0);
bool (*fnDepthTest)(s32 dstz, s32 z, u32 dstattr);
if (polygon->Attr & (1<<14))
fnDepthTest = polygon->WBuffer ? DepthTest_Equal_W : DepthTest_Equal_Z;
else if (polygon->FacingView)
fnDepthTest = DepthTest_LessThan_FrontFacing;
else
fnDepthTest = DepthTest_LessThan;
PrevIsShadowMask = false;
2017-05-25 20:00:15 -06:00
2017-05-21 12:14:03 -06:00
if (polygon->YTop != polygon->YBottom)
{
if (y >= polygon->Vertices[rp->NextVL]->FinalPosition[1] && rp->CurVL != polygon->VBottom)
{
SetupPolygonLeftEdge(rp, y);
}
if (y >= polygon->Vertices[rp->NextVR]->FinalPosition[1] && rp->CurVR != polygon->VBottom)
{
SetupPolygonRightEdge(rp, y);
}
}
Vertex *vlcur, *vlnext, *vrcur, *vrnext;
s32 xstart, xend;
bool l_filledge, r_filledge;
s32 l_edgelen, r_edgelen;
s32 l_edgecov, r_edgecov;
Interpolator<1>* interp_start;
Interpolator<1>* interp_end;
2017-05-21 12:14:03 -06:00
xstart = rp->XL;
xend = rp->XR;
s32 wl = rp->SlopeL.Interp.Interpolate(polygon->FinalW[rp->CurVL], polygon->FinalW[rp->NextVL]);
s32 wr = rp->SlopeR.Interp.Interpolate(polygon->FinalW[rp->CurVR], polygon->FinalW[rp->NextVR]);
s32 zl = rp->SlopeL.Interp.InterpolateZ(polygon->FinalZ[rp->CurVL], polygon->FinalZ[rp->NextVL], polygon->WBuffer);
s32 zr = rp->SlopeR.Interp.InterpolateZ(polygon->FinalZ[rp->CurVR], polygon->FinalZ[rp->NextVR], polygon->WBuffer);
2023-11-03 17:21:46 -06:00
// right vertical edges are pushed 1px to the left as long as either:
// the left edge slope is not 0, or the span is not 0 pixels wide, and it is not at the leftmost pixel of the screen
if (rp->SlopeR.Increment==0 && (rp->SlopeL.Increment!=0 || xstart != xend) && (xend != 0))
xend--;
2017-05-21 12:14:03 -06:00
// if the left and right edges are swapped, render backwards.
// on hardware, swapped edges seem to break edge length calculation,
// causing X-major edges to be rendered wrong when filled,
// and resulting in buggy looking anti-aliasing on X-major edges
2017-05-21 12:14:03 -06:00
if (xstart > xend)
{
vlcur = polygon->Vertices[rp->CurVR];
vlnext = polygon->Vertices[rp->NextVR];
vrcur = polygon->Vertices[rp->CurVL];
vrnext = polygon->Vertices[rp->NextVL];
interp_start = &rp->SlopeR.Interp;
interp_end = &rp->SlopeL.Interp;
rp->SlopeR.EdgeParams<true>(&l_edgelen, &l_edgecov);
rp->SlopeL.EdgeParams<true>(&r_edgelen, &r_edgecov);
2017-05-21 12:14:03 -06:00
2021-08-20 17:54:09 -06:00
std::swap(xstart, xend);
std::swap(wl, wr);
std::swap(zl, zr);
// edge fill rules for swapped opaque edges:
// * right edge is filled if slope > 1, or if the left edge = 0, but is never filled if it is < -1
// * left edge is filled if slope <= 1
// * the bottom-most pixel of negative x-major slopes are filled if they are next to a flat bottom edge
// edges are always filled if antialiasing/edgemarking are enabled,
// if the pixels are translucent and alpha blending is enabled, or if the polygon is wireframe
// checkme: do swapped line polygons exist?
if ((gpu.GPU3D.RenderDispCnt & ((1<<4)|(1<<5))) || ((polyalpha < 31) && (gpu.GPU3D.RenderDispCnt & (1<<3))) || wireframe)
{
l_filledge = true;
r_filledge = true;
}
else
{
l_filledge = (rp->SlopeR.Negative || !rp->SlopeR.XMajor)
|| (y == polygon->YBottom-1) && rp->SlopeR.XMajor && (vlnext->FinalPosition[0] != vrnext->FinalPosition[0]);
r_filledge = (!rp->SlopeL.Negative && rp->SlopeL.XMajor)
|| (!(rp->SlopeL.Negative && rp->SlopeL.XMajor) && rp->SlopeR.Increment==0)
|| (y == polygon->YBottom-1) && rp->SlopeL.XMajor && (vlnext->FinalPosition[0] != vrnext->FinalPosition[0]);
}
2017-05-21 12:14:03 -06:00
}
else
{
vlcur = polygon->Vertices[rp->CurVL];
vlnext = polygon->Vertices[rp->NextVL];
vrcur = polygon->Vertices[rp->CurVR];
vrnext = polygon->Vertices[rp->NextVR];
interp_start = &rp->SlopeL.Interp;
interp_end = &rp->SlopeR.Interp;
rp->SlopeL.EdgeParams<false>(&l_edgelen, &l_edgecov);
rp->SlopeR.EdgeParams<false>(&r_edgelen, &r_edgecov);
// edge fill rules for unswapped opaque edges:
// * right edge is filled if slope > 1
// * left edge is filled if slope <= 1
// * edges with slope = 0 are always filled
// * the bottom-most pixel of negative x-major slopes are filled if they are next to a flat bottom edge
// * edges are filled if both sides are identical and fully overlapping
// edges are always filled if antialiasing/edgemarking are enabled,
// if the pixels are translucent and alpha blending is enabled, or if the polygon is wireframe
if ((gpu.GPU3D.RenderDispCnt & ((1<<4)|(1<<5))) || ((polyalpha < 31) && (gpu.GPU3D.RenderDispCnt & (1<<3))) || wireframe)
{
l_filledge = true;
r_filledge = true;
}
else
{
l_filledge = ((rp->SlopeL.Negative || !rp->SlopeL.XMajor)
|| (y == polygon->YBottom-1) && rp->SlopeL.XMajor && (vlnext->FinalPosition[0] != vrnext->FinalPosition[0]))
|| (rp->SlopeL.Increment == rp->SlopeR.Increment) && (xstart+l_edgelen == xend+1);
r_filledge = (!rp->SlopeR.Negative && rp->SlopeR.XMajor) || (rp->SlopeR.Increment==0)
|| (y == polygon->YBottom-1) && rp->SlopeR.XMajor && (vlnext->FinalPosition[0] != vrnext->FinalPosition[0]);
}
2017-05-21 12:14:03 -06:00
}
// interpolate attributes along Y
s32 rl = interp_start->Interpolate(vlcur->FinalColor[0], vlnext->FinalColor[0]);
s32 gl = interp_start->Interpolate(vlcur->FinalColor[1], vlnext->FinalColor[1]);
s32 bl = interp_start->Interpolate(vlcur->FinalColor[2], vlnext->FinalColor[2]);
2017-05-21 12:14:03 -06:00
s32 sl = interp_start->Interpolate(vlcur->TexCoords[0], vlnext->TexCoords[0]);
s32 tl = interp_start->Interpolate(vlcur->TexCoords[1], vlnext->TexCoords[1]);
2017-05-21 12:14:03 -06:00
s32 rr = interp_end->Interpolate(vrcur->FinalColor[0], vrnext->FinalColor[0]);
s32 gr = interp_end->Interpolate(vrcur->FinalColor[1], vrnext->FinalColor[1]);
s32 br = interp_end->Interpolate(vrcur->FinalColor[2], vrnext->FinalColor[2]);
2017-05-21 12:14:03 -06:00
s32 sr = interp_end->Interpolate(vrcur->TexCoords[0], vrnext->TexCoords[0]);
s32 tr = interp_end->Interpolate(vrcur->TexCoords[1], vrnext->TexCoords[1]);
2017-05-21 12:14:03 -06:00
// in wireframe mode, there are special rules for equal Z (TODO)
int yedge = 0;
if (y == polygon->YTop) yedge = 0x4;
else if (y == polygon->YBottom-1) yedge = 0x8;
int edge;
2017-05-21 12:14:03 -06:00
s32 x = xstart;
Interpolator<0> interpX(xstart, xend+1, wl, wr);
2017-06-01 06:59:41 -06:00
if (x < 0) x = 0;
s32 xlimit;
2017-06-01 06:59:41 -06:00
s32 xcov = 0;
// part 1: left edge
edge = yedge | 0x1;
xlimit = xstart+l_edgelen;
if (xlimit > xend+1) xlimit = xend+1;
if (xlimit > 256) xlimit = 256;
if (l_edgecov & (1<<31))
{
xcov = (l_edgecov >> 12) & 0x3FF;
if (xcov == 0x3FF) xcov = 0;
}
if (!l_filledge) x = xlimit;
else
for (; x < xlimit; x++)
2017-05-21 12:14:03 -06:00
{
u32 pixeladdr = FirstPixelOffset + (y*ScanlineWidth) + x;
u32 dstattr = AttrBuffer[pixeladdr];
2017-05-21 12:14:03 -06:00
// check stencil buffer for shadows
if (polygon->IsShadow)
2017-05-21 12:14:03 -06:00
{
u8 stencil = StencilBuffer[256*(y&0x1) + x];
if (!stencil)
continue;
if (!(stencil & 0x1))
pixeladdr += BufferSize;
if (!(stencil & 0x2))
dstattr &= ~0xF; // quick way to prevent drawing the shadow under antialiased edges
}
interpX.SetX(x);
s32 z = interpX.InterpolateZ(zl, zr, polygon->WBuffer);
// if depth test against the topmost pixel fails, test
// against the pixel underneath
if (!fnDepthTest(DepthBuffer[pixeladdr], z, dstattr))
{
if (!(dstattr & 0xF) || pixeladdr >= BufferSize) continue;
pixeladdr += BufferSize;
dstattr = AttrBuffer[pixeladdr];
if (!fnDepthTest(DepthBuffer[pixeladdr], z, dstattr))
continue;
}
u32 vr = interpX.Interpolate(rl, rr);
u32 vg = interpX.Interpolate(gl, gr);
u32 vb = interpX.Interpolate(bl, br);
s16 s = interpX.Interpolate(sl, sr);
s16 t = interpX.Interpolate(tl, tr);
u32 color = RenderPixel(gpu, polygon, vr>>3, vg>>3, vb>>3, s, t);
u8 alpha = color >> 24;
// alpha test
if (alpha <= gpu.GPU3D.RenderAlphaRef) continue;
if (alpha == 31)
{
u32 attr = polyattr | edge;
if (gpu.GPU3D.RenderDispCnt & (1<<4))
{
// anti-aliasing: all edges are rendered
2017-06-03 14:33:14 -06:00
// calculate coverage
s32 cov = l_edgecov;
if (cov & (1<<31))
{
cov = xcov >> 5;
if (cov > 31) cov = 31;
xcov += (l_edgecov & 0x3FF);
}
2017-06-03 14:33:14 -06:00
attr |= (cov << 8);
// push old pixel down if needed
if (pixeladdr < BufferSize)
{
ColorBuffer[pixeladdr+BufferSize] = ColorBuffer[pixeladdr];
DepthBuffer[pixeladdr+BufferSize] = DepthBuffer[pixeladdr];
AttrBuffer[pixeladdr+BufferSize] = AttrBuffer[pixeladdr];
}
}
DepthBuffer[pixeladdr] = z;
ColorBuffer[pixeladdr] = color;
AttrBuffer[pixeladdr] = attr;
}
else
{
if (!(polygon->Attr & (1<<11))) z = -1;
PlotTranslucentPixel(gpu.GPU3D, pixeladdr, color, z, polyattr, polygon->IsShadow);
// blend with bottom pixel too, if needed
if ((dstattr & 0xF) && (pixeladdr < BufferSize))
PlotTranslucentPixel(gpu.GPU3D, pixeladdr+BufferSize, color, z, polyattr, polygon->IsShadow);
}
}
// part 2: polygon inside
edge = yedge;
xlimit = xend-r_edgelen+1;
if (xlimit > xend+1) xlimit = xend+1;
if (xlimit > 256) xlimit = 256;
if (wireframe && !edge) x = std::max(x, xlimit);
else
for (; x < xlimit; x++)
{
u32 pixeladdr = FirstPixelOffset + (y*ScanlineWidth) + x;
u32 dstattr = AttrBuffer[pixeladdr];
// check stencil buffer for shadows
if (polygon->IsShadow)
{
u8 stencil = StencilBuffer[256*(y&0x1) + x];
if (!stencil)
continue;
if (!(stencil & 0x1))
pixeladdr += BufferSize;
if (!(stencil & 0x2))
dstattr &= ~0xF; // quick way to prevent drawing the shadow under antialiased edges
}
interpX.SetX(x);
s32 z = interpX.InterpolateZ(zl, zr, polygon->WBuffer);
// if depth test against the topmost pixel fails, test
// against the pixel underneath
if (!fnDepthTest(DepthBuffer[pixeladdr], z, dstattr))
{
if (!(dstattr & 0xF) || pixeladdr >= BufferSize) continue;
pixeladdr += BufferSize;
dstattr = AttrBuffer[pixeladdr];
if (!fnDepthTest(DepthBuffer[pixeladdr], z, dstattr))
continue;
}
u32 vr = interpX.Interpolate(rl, rr);
u32 vg = interpX.Interpolate(gl, gr);
u32 vb = interpX.Interpolate(bl, br);
s16 s = interpX.Interpolate(sl, sr);
s16 t = interpX.Interpolate(tl, tr);
u32 color = RenderPixel(gpu, polygon, vr>>3, vg>>3, vb>>3, s, t);
u8 alpha = color >> 24;
// alpha test
if (alpha <= gpu.GPU3D.RenderAlphaRef) continue;
if (alpha == 31)
{
u32 attr = polyattr | edge;
2023-11-03 17:21:46 -06:00
if ((gpu.GPU3D.RenderDispCnt & (1<<4)) && (attr & 0xF))
{
// anti-aliasing: all edges are rendered
// set coverage to avoid black lines from anti-aliasing
attr |= (0x1F << 8);
// push old pixel down if needed
if (pixeladdr < BufferSize)
{
ColorBuffer[pixeladdr+BufferSize] = ColorBuffer[pixeladdr];
DepthBuffer[pixeladdr+BufferSize] = DepthBuffer[pixeladdr];
AttrBuffer[pixeladdr+BufferSize] = AttrBuffer[pixeladdr];
}
}
DepthBuffer[pixeladdr] = z;
ColorBuffer[pixeladdr] = color;
AttrBuffer[pixeladdr] = attr;
}
else
{
if (!(polygon->Attr & (1<<11))) z = -1;
PlotTranslucentPixel(gpu.GPU3D, pixeladdr, color, z, polyattr, polygon->IsShadow);
// blend with bottom pixel too, if needed
if ((dstattr & 0xF) && (pixeladdr < BufferSize))
PlotTranslucentPixel(gpu.GPU3D, pixeladdr+BufferSize, color, z, polyattr, polygon->IsShadow);
}
}
// part 3: right edge
edge = yedge | 0x2;
xlimit = xend+1;
if (xlimit > 256) xlimit = 256;
if (r_edgecov & (1<<31))
{
xcov = (r_edgecov >> 12) & 0x3FF;
if (xcov == 0x3FF) xcov = 0;
}
if (r_filledge)
for (; x < xlimit; x++)
{
u32 pixeladdr = FirstPixelOffset + (y*ScanlineWidth) + x;
u32 dstattr = AttrBuffer[pixeladdr];
2017-05-21 12:14:03 -06:00
// check stencil buffer for shadows
if (polygon->IsShadow)
{
u8 stencil = StencilBuffer[256*(y&0x1) + x];
if (!stencil)
2017-05-21 12:14:03 -06:00
continue;
if (!(stencil & 0x1))
pixeladdr += BufferSize;
if (!(stencil & 0x2))
dstattr &= ~0xF; // quick way to prevent drawing the shadow under antialiased edges
2017-05-21 12:14:03 -06:00
}
interpX.SetX(x);
s32 z = interpX.InterpolateZ(zl, zr, polygon->WBuffer);
// if depth test against the topmost pixel fails, test
// against the pixel underneath
if (!fnDepthTest(DepthBuffer[pixeladdr], z, dstattr))
{
if (!(dstattr & 0xF) || pixeladdr >= BufferSize) continue;
pixeladdr += BufferSize;
dstattr = AttrBuffer[pixeladdr];
if (!fnDepthTest(DepthBuffer[pixeladdr], z, dstattr))
continue;
}
2017-05-21 12:14:03 -06:00
u32 vr = interpX.Interpolate(rl, rr);
u32 vg = interpX.Interpolate(gl, gr);
u32 vb = interpX.Interpolate(bl, br);
s16 s = interpX.Interpolate(sl, sr);
s16 t = interpX.Interpolate(tl, tr);
u32 color = RenderPixel(gpu, polygon, vr>>3, vg>>3, vb>>3, s, t);
2017-05-21 12:14:03 -06:00
u8 alpha = color >> 24;
// alpha test
if (alpha <= gpu.GPU3D.RenderAlphaRef) continue;
2017-05-21 12:14:03 -06:00
if (alpha == 31)
{
u32 attr = polyattr | edge;
if (gpu.GPU3D.RenderDispCnt & (1<<4))
2017-05-21 12:14:03 -06:00
{
// anti-aliasing: all edges are rendered
2017-05-21 12:14:03 -06:00
2017-06-03 14:33:14 -06:00
// calculate coverage
s32 cov = r_edgecov;
if (cov & (1<<31))
{
cov = 0x1F - (xcov >> 5);
if (cov < 0) cov = 0;
xcov += (r_edgecov & 0x3FF);
}
2017-06-03 14:33:14 -06:00
attr |= (cov << 8);
// push old pixel down if needed
if (pixeladdr < BufferSize)
{
2017-06-03 14:33:14 -06:00
ColorBuffer[pixeladdr+BufferSize] = ColorBuffer[pixeladdr];
DepthBuffer[pixeladdr+BufferSize] = DepthBuffer[pixeladdr];
AttrBuffer[pixeladdr+BufferSize] = AttrBuffer[pixeladdr];
}
}
2017-05-21 12:14:03 -06:00
DepthBuffer[pixeladdr] = z;
ColorBuffer[pixeladdr] = color;
AttrBuffer[pixeladdr] = attr;
2017-05-21 12:14:03 -06:00
}
else
{
if (!(polygon->Attr & (1<<11))) z = -1;
PlotTranslucentPixel(gpu.GPU3D, pixeladdr, color, z, polyattr, polygon->IsShadow);
2017-05-21 12:14:03 -06:00
// blend with bottom pixel too, if needed
if ((dstattr & 0xF) && (pixeladdr < BufferSize))
PlotTranslucentPixel(gpu.GPU3D, pixeladdr+BufferSize, color, z, polyattr, polygon->IsShadow);
2017-05-21 12:14:03 -06:00
}
}
rp->XL = rp->SlopeL.Step();
rp->XR = rp->SlopeR.Step();
}
void SoftRenderer::RenderScanline(const GPU& gpu, s32 y, int npolys)
2017-05-22 14:22:26 -06:00
{
for (int i = 0; i < npolys; i++)
{
RendererPolygon* rp = &PolygonList[i];
Polygon* polygon = rp->PolyData;
if (y >= polygon->YTop && (y < polygon->YBottom || (y == polygon->YTop && polygon->YBottom == polygon->YTop)))
{
if (polygon->IsShadowMask)
RenderShadowMaskScanline(gpu.GPU3D, rp, y);
else
RenderPolygonScanline(gpu, rp, y);
}
2017-05-22 14:22:26 -06:00
}
}
2017-05-26 07:14:22 -06:00
u32 SoftRenderer::CalculateFogDensity(const GPU3D& gpu3d, u32 pixeladdr) const
{
u32 z = DepthBuffer[pixeladdr];
u32 densityid, densityfrac;
if (z < gpu3d.RenderFogOffset)
{
densityid = 0;
densityfrac = 0;
}
else
{
// technically: Z difference is shifted right by two, then shifted left by fog shift
// then bit 0-16 are the fractional part and bit 17-31 are the density index
// on hardware, the final value can overflow the 32-bit range with a shift big enough,
// causing fog to 'wrap around' and accidentally apply to larger Z ranges
z -= gpu3d.RenderFogOffset;
z = (z >> 2) << gpu3d.RenderFogShift;
densityid = z >> 17;
if (densityid >= 32)
{
densityid = 32;
densityfrac = 0;
}
else
densityfrac = z & 0x1FFFF;
}
// checkme (may be too precise?)
u32 density =
((gpu3d.RenderFogDensityTable[densityid] * (0x20000-densityfrac)) +
(gpu3d.RenderFogDensityTable[densityid+1] * densityfrac)) >> 17;
if (density >= 127) density = 128;
return density;
}
void SoftRenderer::ScanlineFinalPass(const GPU3D& gpu3d, s32 y)
{
2017-06-03 14:33:14 -06:00
// to consider:
// clearing all polygon fog flags if the master flag isn't set?
// merging all final pass loops into one?
if (gpu3d.RenderDispCnt & (1<<5))
2017-06-03 14:33:14 -06:00
{
// edge marking
// only applied to topmost pixels
2017-06-03 14:33:14 -06:00
for (int x = 0; x < 256; x++)
{
u32 pixeladdr = FirstPixelOffset + (y*ScanlineWidth) + x;
u32 attr = AttrBuffer[pixeladdr];
if (!(attr & 0xF)) continue;
u32 polyid = attr >> 24; // opaque polygon IDs are used for edgemarking
2017-06-03 14:33:14 -06:00
u32 z = DepthBuffer[pixeladdr];
if (((polyid != (AttrBuffer[pixeladdr-1] >> 24)) && (z < DepthBuffer[pixeladdr-1])) ||
((polyid != (AttrBuffer[pixeladdr+1] >> 24)) && (z < DepthBuffer[pixeladdr+1])) ||
((polyid != (AttrBuffer[pixeladdr-ScanlineWidth] >> 24)) && (z < DepthBuffer[pixeladdr-ScanlineWidth])) ||
((polyid != (AttrBuffer[pixeladdr+ScanlineWidth] >> 24)) && (z < DepthBuffer[pixeladdr+ScanlineWidth])))
{
u16 edgecolor = gpu3d.RenderEdgeTable[polyid >> 3];
2017-06-03 14:33:14 -06:00
u32 edgeR = (edgecolor << 1) & 0x3E; if (edgeR) edgeR++;
u32 edgeG = (edgecolor >> 4) & 0x3E; if (edgeG) edgeG++;
u32 edgeB = (edgecolor >> 9) & 0x3E; if (edgeB) edgeB++;
ColorBuffer[pixeladdr] = edgeR | (edgeG << 8) | (edgeB << 16) | (ColorBuffer[pixeladdr] & 0xFF000000);
// break antialiasing coverage (checkme)
AttrBuffer[pixeladdr] = (AttrBuffer[pixeladdr] & 0xFFFFE0FF) | 0x00001000;
}
}
}
if (gpu3d.RenderDispCnt & (1<<7))
2017-05-26 07:14:22 -06:00
{
// fog
// hardware testing shows that the fog step is 0x80000>>SHIFT
// basically, the depth values used in GBAtek need to be
// multiplied by 0x200 to match Z-buffer values
// fog is applied to the topmost two pixels, which is required for
// proper antialiasing
// TODO: check the 'fog alpha glitch with small Z' GBAtek talks about
bool fogcolor = !(gpu3d.RenderDispCnt & (1<<6));
2017-05-26 07:14:22 -06:00
u32 fogR = (gpu3d.RenderFogColor << 1) & 0x3E; if (fogR) fogR++;
u32 fogG = (gpu3d.RenderFogColor >> 4) & 0x3E; if (fogG) fogG++;
u32 fogB = (gpu3d.RenderFogColor >> 9) & 0x3E; if (fogB) fogB++;
u32 fogA = (gpu3d.RenderFogColor >> 16) & 0x1F;
2017-05-26 07:14:22 -06:00
for (int x = 0; x < 256; x++)
2017-05-26 07:14:22 -06:00
{
u32 pixeladdr = FirstPixelOffset + (y*ScanlineWidth) + x;
u32 density, srccolor, srcR, srcG, srcB, srcA;
u32 attr = AttrBuffer[pixeladdr];
if (attr & (1<<15))
{
density = CalculateFogDensity(gpu3d, pixeladdr);
srccolor = ColorBuffer[pixeladdr];
srcR = srccolor & 0x3F;
srcG = (srccolor >> 8) & 0x3F;
srcB = (srccolor >> 16) & 0x3F;
srcA = (srccolor >> 24) & 0x1F;
if (fogcolor)
{
srcR = ((fogR * density) + (srcR * (128-density))) >> 7;
srcG = ((fogG * density) + (srcG * (128-density))) >> 7;
srcB = ((fogB * density) + (srcB * (128-density))) >> 7;
}
2017-05-26 07:14:22 -06:00
srcA = ((fogA * density) + (srcA * (128-density))) >> 7;
2017-05-26 07:14:22 -06:00
ColorBuffer[pixeladdr] = srcR | (srcG << 8) | (srcB << 16) | (srcA << 24);
}
// fog for lower pixel
// TODO: make this code nicer, but avoid using a loop
2017-05-26 07:14:22 -06:00
if (!(attr & 0xF)) continue;
pixeladdr += BufferSize;
2017-05-26 07:14:22 -06:00
attr = AttrBuffer[pixeladdr];
if (!(attr & (1<<15))) continue;
density = CalculateFogDensity(gpu3d, pixeladdr);
2017-05-26 07:14:22 -06:00
srccolor = ColorBuffer[pixeladdr];
srcR = srccolor & 0x3F;
srcG = (srccolor >> 8) & 0x3F;
srcB = (srccolor >> 16) & 0x3F;
srcA = (srccolor >> 24) & 0x1F;
if (fogcolor)
{
srcR = ((fogR * density) + (srcR * (128-density))) >> 7;
srcG = ((fogG * density) + (srcG * (128-density))) >> 7;
srcB = ((fogB * density) + (srcB * (128-density))) >> 7;
}
srcA = ((fogA * density) + (srcA * (128-density))) >> 7;
ColorBuffer[pixeladdr] = srcR | (srcG << 8) | (srcB << 16) | (srcA << 24);
2017-05-26 07:14:22 -06:00
}
}
if (gpu3d.RenderDispCnt & (1<<4))
{
// anti-aliasing
2017-10-01 16:55:44 -06:00
// edges were flagged and their coverages calculated during rendering
// this is where such edge pixels are blended with the pixels underneath
for (int x = 0; x < 256; x++)
{
u32 pixeladdr = FirstPixelOffset + (y*ScanlineWidth) + x;
u32 attr = AttrBuffer[pixeladdr];
if (!(attr & 0xF)) continue;
u32 coverage = (attr >> 8) & 0x1F;
if (coverage == 0x1F) continue;
if (coverage == 0)
{
ColorBuffer[pixeladdr] = ColorBuffer[pixeladdr+BufferSize];
continue;
}
u32 topcolor = ColorBuffer[pixeladdr];
u32 topR = topcolor & 0x3F;
u32 topG = (topcolor >> 8) & 0x3F;
u32 topB = (topcolor >> 16) & 0x3F;
u32 topA = (topcolor >> 24) & 0x1F;
u32 botcolor = ColorBuffer[pixeladdr+BufferSize];
u32 botR = botcolor & 0x3F;
u32 botG = (botcolor >> 8) & 0x3F;
u32 botB = (botcolor >> 16) & 0x3F;
u32 botA = (botcolor >> 24) & 0x1F;
coverage++;
// only blend color if the bottom pixel isn't fully transparent
if (botA > 0)
{
topR = ((topR * coverage) + (botR * (32-coverage))) >> 5;
topG = ((topG * coverage) + (botG * (32-coverage))) >> 5;
topB = ((topB * coverage) + (botB * (32-coverage))) >> 5;
}
// alpha is always blended
topA = ((topA * coverage) + (botA * (32-coverage))) >> 5;
ColorBuffer[pixeladdr] = topR | (topG << 8) | (topB << 16) | (topA << 24);
}
}
2017-05-22 14:22:26 -06:00
}
void SoftRenderer::ClearBuffers(const GPU& gpu)
2017-02-10 08:50:26 -07:00
{
u32 clearz = ((gpu.GPU3D.RenderClearAttr2 & 0x7FFF) * 0x200) + 0x1FF;
u32 polyid = gpu.GPU3D.RenderClearAttr1 & 0x3F000000; // this sets the opaque polygonID
// fill screen borders for edge marking
for (int x = 0; x < ScanlineWidth; x++)
{
ColorBuffer[x] = 0;
DepthBuffer[x] = clearz;
AttrBuffer[x] = polyid;
}
for (int x = ScanlineWidth; x < ScanlineWidth*193; x+=ScanlineWidth)
{
ColorBuffer[x] = 0;
DepthBuffer[x] = clearz;
AttrBuffer[x] = polyid;
ColorBuffer[x+257] = 0;
DepthBuffer[x+257] = clearz;
AttrBuffer[x+257] = polyid;
}
for (int x = ScanlineWidth*193; x < ScanlineWidth*194; x++)
{
ColorBuffer[x] = 0;
DepthBuffer[x] = clearz;
AttrBuffer[x] = polyid;
}
// clear the screen
if (gpu.GPU3D.RenderDispCnt & (1<<14))
{
u8 xoff = (gpu.GPU3D.RenderClearAttr2 >> 16) & 0xFF;
u8 yoff = (gpu.GPU3D.RenderClearAttr2 >> 24) & 0xFF;
for (int y = 0; y < ScanlineWidth*192; y+=ScanlineWidth)
{
for (int x = 0; x < 256; x++)
{
u16 val2 = gpu.ReadVRAMFlat_Texture<u16>(0x40000 + (yoff << 9) + (xoff << 1));
u16 val3 = gpu.ReadVRAMFlat_Texture<u16>(0x60000 + (yoff << 9) + (xoff << 1));
// TODO: confirm color conversion
u32 r = (val2 << 1) & 0x3E; if (r) r++;
u32 g = (val2 >> 4) & 0x3E; if (g) g++;
u32 b = (val2 >> 9) & 0x3E; if (b) b++;
u32 a = (val2 & 0x8000) ? 0x1F000000 : 0;
u32 color = r | (g << 8) | (b << 16) | a;
u32 z = ((val3 & 0x7FFF) * 0x200) + 0x1FF;
u32 pixeladdr = FirstPixelOffset + y + x;
ColorBuffer[pixeladdr] = color;
DepthBuffer[pixeladdr] = z;
AttrBuffer[pixeladdr] = polyid | (val3 & 0x8000);
xoff++;
}
yoff++;
}
}
else
{
// TODO: confirm color conversion
u32 r = (gpu.GPU3D.RenderClearAttr1 << 1) & 0x3E; if (r) r++;
u32 g = (gpu.GPU3D.RenderClearAttr1 >> 4) & 0x3E; if (g) g++;
u32 b = (gpu.GPU3D.RenderClearAttr1 >> 9) & 0x3E; if (b) b++;
u32 a = (gpu.GPU3D.RenderClearAttr1 >> 16) & 0x1F;
u32 color = r | (g << 8) | (b << 16) | (a << 24);
polyid |= (gpu.GPU3D.RenderClearAttr1 & 0x8000);
2017-03-12 17:45:26 -06:00
for (int y = 0; y < ScanlineWidth*192; y+=ScanlineWidth)
{
for (int x = 0; x < 256; x++)
{
u32 pixeladdr = FirstPixelOffset + y + x;
ColorBuffer[pixeladdr] = color;
DepthBuffer[pixeladdr] = clearz;
AttrBuffer[pixeladdr] = polyid;
}
}
}
}
2017-05-25 17:22:11 -06:00
void SoftRenderer::RenderPolygons(const GPU& gpu, bool threaded, Polygon** polygons, int npolys)
{
2017-05-22 14:22:26 -06:00
int j = 0;
2017-05-21 12:14:03 -06:00
for (int i = 0; i < npolys; i++)
{
if (polygons[i]->Degenerate) continue;
SetupPolygon(&PolygonList[j++], polygons[i]);
2017-05-22 14:22:26 -06:00
}
RenderScanline(gpu, 0, j);
for (s32 y = 1; y < 192; y++)
2017-03-15 08:53:36 -06:00
{
RenderScanline(gpu, y, j);
ScanlineFinalPass(gpu.GPU3D, y-1);
if (threaded)
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// Notify the main thread that we're done with a scanline.
2017-05-25 17:22:11 -06:00
Platform::Semaphore_Post(Sema_ScanlineCount);
2017-03-15 08:53:36 -06:00
}
ScanlineFinalPass(gpu.GPU3D, 191);
if (threaded)
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// If this renderer is threaded, notify the main thread that we're done with the frame.
Platform::Semaphore_Post(Sema_ScanlineCount);
}
2017-03-15 08:53:36 -06:00
void SoftRenderer::VCount144(GPU& gpu)
{
if (RenderThreadRunning.load(std::memory_order_relaxed) && !gpu.GPU3D.AbortFrame)
Platform::Semaphore_Wait(Sema_RenderDone);
}
void SoftRenderer::RenderFrame(GPU& gpu)
{
auto textureDirty = gpu.VRAMDirty_Texture.DeriveState(gpu.VRAMMap_Texture, gpu);
auto texPalDirty = gpu.VRAMDirty_TexPal.DeriveState(gpu.VRAMMap_TexPal, gpu);
bool textureChanged = gpu.MakeVRAMFlat_TextureCoherent(textureDirty);
bool texPalChanged = gpu.MakeVRAMFlat_TexPalCoherent(texPalDirty);
FrameIdentical = !(textureChanged || texPalChanged) && gpu.GPU3D.RenderFrameIdentical;
if (RenderThreadRunning.load(std::memory_order_relaxed))
{
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// "Render thread, you're up! Get moving."
Platform::Semaphore_Post(Sema_RenderStart);
}
else if (!FrameIdentical)
{
ClearBuffers(gpu);
RenderPolygons(gpu, false, &gpu.GPU3D.RenderPolygonRAM[0], gpu.GPU3D.RenderNumPolygons);
}
}
void SoftRenderer::RestartFrame(GPU& gpu)
Allow for a more modular renderer backends (#990) * Draft GPU3D renderer modularization * Update sources C++ standard to C++17 The top-level `CMakeLists.txt` is already using the C++17 standard. * Move GLCompositor into class type Some other misc fixes to push towards better modularity * Make renderer-implementation types move-only These types are going to be holding onto handles of GPU-side resources and shouldn't ever be copied around. * Fix OSX: Remove 'register' storage class specifier `register` has been removed in C++17... But this keyword hasn't done anything in years anyways. OSX builds consider this "warning" an error and it stops the whole build. * Add RestartFrame to Renderer3D interface * Move Accelerated property to Renderer3D interface There are points in the code base where we do: `renderer != 0` to know if we are feeding an openGL renderer. Rather than that we can instead just have this be a property of the renderer itself. With this pattern a renderer can just say how it wants its data to come in rather than have everyone know that they're talking to an OpenGL renderer. * Remove Accelerated flag from GPU * Move 2D_Soft interface in separate header Also make the current 2D engine an "owned" unique_ptr. * Update alignment attribute to standard alignas Uses standardized `alignas` rather than compiler-specific attributes. https://en.cppreference.com/w/cpp/language/alignas * Fix Clang: alignas specifier Alignment must be specified before the array to align the entire array. https://en.cppreference.com/w/cpp/language/alignas * Converted Renderer3D Accelerated to variable This flag is checked a lot during scanline rasterization. So rather than having an expensive vtable-lookup call during mainline rendering code, it is now a public constant bool type that is written to only once during Renderer3D initialization.
2021-02-09 15:38:51 -07:00
{
SetupRenderThread(gpu);
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
EnableRenderThread();
Allow for a more modular renderer backends (#990) * Draft GPU3D renderer modularization * Update sources C++ standard to C++17 The top-level `CMakeLists.txt` is already using the C++17 standard. * Move GLCompositor into class type Some other misc fixes to push towards better modularity * Make renderer-implementation types move-only These types are going to be holding onto handles of GPU-side resources and shouldn't ever be copied around. * Fix OSX: Remove 'register' storage class specifier `register` has been removed in C++17... But this keyword hasn't done anything in years anyways. OSX builds consider this "warning" an error and it stops the whole build. * Add RestartFrame to Renderer3D interface * Move Accelerated property to Renderer3D interface There are points in the code base where we do: `renderer != 0` to know if we are feeding an openGL renderer. Rather than that we can instead just have this be a property of the renderer itself. With this pattern a renderer can just say how it wants its data to come in rather than have everyone know that they're talking to an OpenGL renderer. * Remove Accelerated flag from GPU * Move 2D_Soft interface in separate header Also make the current 2D engine an "owned" unique_ptr. * Update alignment attribute to standard alignas Uses standardized `alignas` rather than compiler-specific attributes. https://en.cppreference.com/w/cpp/language/alignas * Fix Clang: alignas specifier Alignment must be specified before the array to align the entire array. https://en.cppreference.com/w/cpp/language/alignas * Converted Renderer3D Accelerated to variable This flag is checked a lot during scanline rasterization. So rather than having an expensive vtable-lookup call during mainline rendering code, it is now a public constant bool type that is written to only once during Renderer3D initialization.
2021-02-09 15:38:51 -07:00
}
void SoftRenderer::RenderThreadFunc(GPU& gpu)
{
for (;;)
2017-05-22 14:22:26 -06:00
{
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// Wait for a notice from the main thread to start rendering (or to stop entirely).
Platform::Semaphore_Wait(Sema_RenderStart);
if (!RenderThreadRunning) return;
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// Protect the GPU state from the main thread.
// Some melonDS frontends (though not ours)
// will repeatedly save or load states;
// if they do so while the render thread is busy here,
// the ensuing race conditions may cause a crash
// (since some of the GPU state includes pointers).
2017-05-25 17:22:11 -06:00
RenderThreadRendering = true;
if (FrameIdentical)
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
{ // If no rendering is needed, just say we're done.
Platform::Semaphore_Post(Sema_ScanlineCount, 192);
}
else
{
ClearBuffers(gpu);
RenderPolygons(gpu, true, &gpu.GPU3D.RenderPolygonRAM[0], gpu.GPU3D.RenderNumPolygons);
}
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// Tell the main thread that we're done rendering
// and that it's safe to access the GPU state again.
Platform::Semaphore_Post(Sema_RenderDone);
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
2017-05-25 17:22:11 -06:00
RenderThreadRendering = false;
}
}
Allow for a more modular renderer backends (#990) * Draft GPU3D renderer modularization * Update sources C++ standard to C++17 The top-level `CMakeLists.txt` is already using the C++17 standard. * Move GLCompositor into class type Some other misc fixes to push towards better modularity * Make renderer-implementation types move-only These types are going to be holding onto handles of GPU-side resources and shouldn't ever be copied around. * Fix OSX: Remove 'register' storage class specifier `register` has been removed in C++17... But this keyword hasn't done anything in years anyways. OSX builds consider this "warning" an error and it stops the whole build. * Add RestartFrame to Renderer3D interface * Move Accelerated property to Renderer3D interface There are points in the code base where we do: `renderer != 0` to know if we are feeding an openGL renderer. Rather than that we can instead just have this be a property of the renderer itself. With this pattern a renderer can just say how it wants its data to come in rather than have everyone know that they're talking to an OpenGL renderer. * Remove Accelerated flag from GPU * Move 2D_Soft interface in separate header Also make the current 2D engine an "owned" unique_ptr. * Update alignment attribute to standard alignas Uses standardized `alignas` rather than compiler-specific attributes. https://en.cppreference.com/w/cpp/language/alignas * Fix Clang: alignas specifier Alignment must be specified before the array to align the entire array. https://en.cppreference.com/w/cpp/language/alignas * Converted Renderer3D Accelerated to variable This flag is checked a lot during scanline rasterization. So rather than having an expensive vtable-lookup call during mainline rendering code, it is now a public constant bool type that is written to only once during Renderer3D initialization.
2021-02-09 15:38:51 -07:00
u32* SoftRenderer::GetLine(int line)
{
if (RenderThreadRunning.load(std::memory_order_relaxed))
{
if (line < 192)
Protect savestates while the threaded software renderer is running (#1864) * First crack at ensuring the render thread doesn't touch GPU state while it's being serialized * Get rid of the semaphore wait * Add some extra fields into GPU3D's serialization * Oops, TempVertexBuffer is already serialized * Move vertex serialization into its own method * Lock the GPU3D state when rendering on the render thread or serializing it * Revert "Lock the GPU3D state when rendering on the render thread or serializing it" This reverts commit 2f49a551c13934b9dc815bbda67a45098f0482a7. * Add comments that describe the synchronization within GPU3D_Soft - I need to understand it before I can solve my actual problem - Now I do * Revert "Revert "Lock the GPU3D state when rendering on the render thread or serializing it"" This reverts commit 1977566a6d8671d72bd94ba4ebf832c3bf08933a. * Let's try locking the GPU3D state throughout NDS::RunFrame - Just to see what happens * Slim down the lock's scope * Narrow the lock's scope some more * Remove the lock entirely * Try protecting the GPU3D state with just a mutex - I'll clean this up once I know it works * Remove a duplicate method definition * Add a missing `noexcept` specifier * Remove an unused function * Cut some non-hardware state from `GPU3D`'s savestate * Assume that the next frame after loading a savestate won't be identical * Actually, it _is_ worth it * Don't serialize the clip matrix - It's recalculated anyway * Serialize `RenderPolygonRAM` as an array of indexes * Clean up some comments - I liked the dialogue style, but oh well * Try restarting the render thread instead of using the lock - Let's see what happens * Put the lock back * Fix some polygon and vertex indexes being saved incorrectly - Taking the difference between two pointers results in the number of elements, not the number of bytes * Remove `SoftRenderer::StateBusy` since it turns out we don't need it - The real synchronization was the friends we made along the way
2024-01-07 15:39:43 -07:00
// We need a scanline, so let's wait for the render thread to finish it.
// (both threads process scanlines from top-to-bottom,
// so we don't need to wait for a specific row)
Platform::Semaphore_Wait(Sema_ScanlineCount);
}
return &ColorBuffer[(line * ScanlineWidth) + FirstPixelOffset];
2017-02-10 08:50:26 -07:00
}
}