-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use fullscreen triangles for fullscreen passes. #21541
base: develop
Are you sure you want to change the base?
Conversation
Instead of using a vertex buffer with a quad triangle strip, generate a single fullscreen triangle in the vertex shader. This has a number of advantages: - Avoids fragment quad overshading along the diagonal. - Avoids creating, filling, reading, and destroying a vertex buffer.
Do you have any numbers to how it affects it? |
I want to test it with my raspberry pi |
No. The game currently crashes under DLL injection (see Discord for a discussion), which makes it difficult to profile externally, and explicit timing queries have not yet been added. With the (lack of) complexity of these fragment shaders, I'd be shocked if there was a measurable GPU performance difference in either direction, though. The main win here in practice is less code and fewer allocations to track. |
I'm not so sure about the allocations, the shader program is initialized once. I'm a bit hesitant in doing any changes here, the code we currently have is at least battle tested, if there is not a huge gain then I don't really see the justification at the moment. |
Also gonna tag @LRFLEW, he is the original author for most of the code |
It's been a while since I've been active in this project, but thank you for the ping. Yes, the code for ApplyPaletteShader and ApplyTransparencyShader were my contributions as part of #6541. However, a lot of that code was based on even older code for the now defunct CopyFramebufferShader (which I replaced with Using the fullscreen triangle makes sense. It's not something I was taught (so wouldn't have thought of back in the day), but looking into the idea, it does seem legit, so I support that change. It's the other part of this PR I'm unsure of. Pretty much all the vertex shaders used to use an index like this PR uses, though implemented in a more complicated way and still used a VBO. I opted to replace that system with a simpler VBO-based system in my rewrite, as it seemed like an improvement.
The creation, filling, and destruction of the VBO is only done once per process, so I don't see that as a major performance concern. Similarly, the data is only 16 floats, so I don't think it's a significant VRAM consideration. The only question, then, is whether having the GPU read the VBO is significantly slower than computing the values in the shader. I would assume not, since GPUs are optimized for processing VBO and Texture data, and having only 3 vertices reduces the GPU's ability to parallelize the computation. In terms of code quality, I guess the question is whether we'd rather more complexity in the shader or more complexity in the supporting C++ code. Currently, I'm not sold on the change, and would like to see some performance numbers or a reference for this idea if possible. |
Removing the buffer isn't really about raw performance, but complexity reduction. Code that doesn't exist can't have bugs.
There's two ways to look at this:
On its own this is not a particularly inspiring change, I agree. I wanted to start with something small to learn the PR / review process. |
We did a bit of house cleaning, this needs to rebased. I looked at the changes and I think they seem fine, when I had a brief glance initially I thought its doing something different. Should also wait for @LRFLEW, but I'm fine with this. |
I've left review comments on things I'd like to see changed in the shaders before this gets merged. If those are all addressed, I have no qualms with this being merged. If we're using the VertexID trick here, it might be worth considering it for the other shaders, which use a small VBO in conjunction with instancing for drawing. It doesn't have to be with this PR, but I did just want to make a note of that. |
@@ -1,12 +1,14 @@ | |||
#version 150 | |||
|
|||
in int gl_VertexID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, this is unnecessary (and may cause complications), as the gl_*
variables are built-in and don't need to be declared. (eg. There's no declaration for gl_Position
before using it).
Ditto for the other shader.
@@ -1,12 +1,14 @@ | |||
#version 150 | |||
|
|||
in int gl_VertexID; | |||
|
|||
in vec4 vPosition; | |||
in vec2 vTextureCoordinate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If vPosition
and vTextureCoordinate
are no longer used, then the declarations here should be removed.
Ditto for the other shader.
fTextureCoordinate = vTextureCoordinate; | ||
gl_Position = vPosition; | ||
fTextureCoordinate = vec2((gl_VertexID << 1) & 2, gl_VertexID & 2); | ||
gl_Position = vec4(fTextureCoordinate * 2.0 + -1.0, 0.0, 1.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if reading from an output variable like this is well defined or well supported across the wide range of GLSL interpreters. Personally, I'd change this to be three lines, where the first creates a temporary variable that's read by the other two lines. Personally, I'd also make the temporary variable use only 0.0 and 1.0 values, and make the other two lines use * 2.0
and * 4.0 - 1.0
respectively, though I don't think that's a big deal either way. I think there should also be a comment here to help explain what this is doing (i.e. computing vertex data from vertex id).
All these ditto for the other shader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same as my baseline on ARM SBC
@ACEfanatic02 Could you address the line notes that @LRFLEW added? |
This pull request has been marked as stale and will be closed in 14 days if no action is taken. To keep it open, leave a comment or remove the |
Instead of using a vertex buffer with a quad triangle strip, generate a single fullscreen triangle in the vertex shader. This has a number of advantages:
Avoids fragment quad overshading along the diagonal.
Avoids creating, filling, reading, and destroying a vertex buffer.