Use fullscreen triangles for fullscreen passes. #21541

ACEfanatic02 · 2024-03-07T10:00:16Z

Instead of using a vertex buffer with a quad triangle strip, generate a single fullscreen triangle in the vertex shader. This has a number of advantages:

Avoids fragment quad overshading along the diagonal.
Avoids creating, filling, reading, and destroying a vertex buffer.

Instead of using a vertex buffer with a quad triangle strip, generate a single fullscreen triangle in the vertex shader. This has a number of advantages: - Avoids fragment quad overshading along the diagonal. - Avoids creating, filling, reading, and destroying a vertex buffer.

ZehMatt · 2024-03-07T13:18:52Z

Do you have any numbers to how it affects it?

janisozaur · 2024-03-07T15:07:22Z

I want to test it with my raspberry pi

ACEfanatic02 · 2024-03-07T16:43:58Z

Do you have any numbers to how it affects it?

No. The game currently crashes under DLL injection (see Discord for a discussion), which makes it difficult to profile externally, and explicit timing queries have not yet been added. With the (lack of) complexity of these fragment shaders, I'd be shocked if there was a measurable GPU performance difference in either direction, though. The main win here in practice is less code and fewer allocations to track.

ZehMatt · 2024-03-07T17:43:45Z

I'm not so sure about the allocations, the shader program is initialized once. I'm a bit hesitant in doing any changes here, the code we currently have is at least battle tested, if there is not a huge gain then I don't really see the justification at the moment.

ZehMatt · 2024-03-07T17:53:01Z

Also gonna tag @LRFLEW, he is the original author for most of the code

LRFLEW · 2024-03-07T19:01:39Z

It's been a while since I've been active in this project, but thank you for the ping.

Yes, the code for ApplyPaletteShader and ApplyTransparencyShader were my contributions as part of #6541. However, a lot of that code was based on even older code for the now defunct CopyFramebufferShader (which I replaced with glBlitFramebuffer).

Using the fullscreen triangle makes sense. It's not something I was taught (so wouldn't have thought of back in the day), but looking into the idea, it does seem legit, so I support that change.

It's the other part of this PR I'm unsure of. Pretty much all the vertex shaders used to use an index like this PR uses, though implemented in a more complicated way and still used a VBO. I opted to replace that system with a simpler VBO-based system in my rewrite, as it seemed like an improvement.

Avoids creating, filling, reading, and destroying a vertex buffer.

The creation, filling, and destruction of the VBO is only done once per process, so I don't see that as a major performance concern. Similarly, the data is only 16 floats, so I don't think it's a significant VRAM consideration. The only question, then, is whether having the GPU read the VBO is significantly slower than computing the values in the shader. I would assume not, since GPUs are optimized for processing VBO and Texture data, and having only 3 vertices reduces the GPU's ability to parallelize the computation. In terms of code quality, I guess the question is whether we'd rather more complexity in the shader or more complexity in the supporting C++ code. Currently, I'm not sold on the change, and would like to see some performance numbers or a reference for this idea if possible.

ACEfanatic02 · 2024-03-08T07:33:48Z

The creation, filling, and destruction of the VBO is only done once per process, so I don't see that as a major performance concern.

Removing the buffer isn't really about raw performance, but complexity reduction. Code that doesn't exist can't have bugs.

The only question, then, is whether having the GPU read the VBO is significantly slower than computing the values in the shader. I would assume not, since GPUs are optimized for processing VBO and Texture data, and having only 3 vertices reduces the GPU's ability to parallelize the computation.

There's two ways to look at this:

In isolation, looking at only the vertex shader, yes this will be faster. Three integer ops, two conversions, and two FMAs will be faster than loading a cache line to read the vertex buffer. The number of vertices is actually irrelevant in this case, because shaders run in a wide (32- or 64-thread) SIMT model. Rendering one triangle versus two takes exactly the same number of VS threadgroups.
In practice, this will not be faster or slower, because there's no hazard with this vertex shader and so it will (most likely) spawn its one threadgroup in amongst the tail end of the previous pass's FS work where there are unoccupied shader units. So either way the VS for fullscreen passes tends to be "free" unless it has an actual dependency (or if there is a manual barrier in place.)

On its own this is not a particularly inspiring change, I agree. I wanted to start with something small to learn the PR / review process.

ZehMatt · 2024-03-10T20:53:40Z

We did a bit of house cleaning, this needs to rebased. I looked at the changes and I think they seem fine, when I had a brief glance initially I thought its doing something different. Should also wait for @LRFLEW, but I'm fine with this.

LRFLEW · 2024-03-11T03:57:34Z

I've left review comments on things I'd like to see changed in the shaders before this gets merged. If those are all addressed, I have no qualms with this being merged.

If we're using the VertexID trick here, it might be worth considering it for the other shaders, which use a small VBO in conjunction with instancing for drawing. It doesn't have to be with this PR, but I did just want to make a note of that.

LRFLEW · 2024-03-07T19:03:52Z

data/shaders/applypalette.vert

@@ -1,12 +1,14 @@
 #version 150

+in int gl_VertexID;


IIRC, this is unnecessary (and may cause complications), as the gl_* variables are built-in and don't need to be declared. (eg. There's no declaration for gl_Position before using it).

Ditto for the other shader.

LRFLEW · 2024-03-07T19:05:15Z

data/shaders/applypalette.vert

@@ -1,12 +1,14 @@
 #version 150

+in int gl_VertexID;
+
 in vec4 vPosition;
 in vec2 vTextureCoordinate;


If vPosition and vTextureCoordinate are no longer used, then the declarations here should be removed.

Ditto for the other shader.

LRFLEW · 2024-03-11T03:47:55Z

data/shaders/applypalette.vert

-    fTextureCoordinate = vTextureCoordinate;
-    gl_Position = vPosition;
+    fTextureCoordinate = vec2((gl_VertexID << 1) & 2, gl_VertexID & 2);
+    gl_Position = vec4(fTextureCoordinate * 2.0 + -1.0, 0.0, 1.0);


I'm not sure if reading from an output variable like this is well defined or well supported across the wide range of GLSL interpreters. Personally, I'd change this to be three lines, where the first creates a temporary variable that's read by the other two lines. Personally, I'd also make the temporary variable use only 0.0 and 1.0 values, and make the other two lines use * 2.0 and * 4.0 - 1.0 respectively, though I don't think that's a big deal either way. I think there should also be a comment here to help explain what this is doing (i.e. computing vertex data from vertex id).

All these ditto for the other shader.

janisozaur

It's the same as my baseline on ARM SBC

Gymnasiast · 2024-03-30T20:13:38Z

@ACEfanatic02 Could you address the line notes that @LRFLEW added?

github-actions · 2024-05-01T01:49:38Z

This pull request has been marked as stale and will be closed in 14 days if no action is taken. To keep it open, leave a comment or remove the stale-pr label. If you're awaiting feedback from a developer, please send us a reminder (either here or on Discord).

LRFLEW suggested changes Mar 11, 2024

View reviewed changes

janisozaur approved these changes Mar 11, 2024

View reviewed changes

github-actions bot added the stale-pr label May 1, 2024

janisozaur removed the stale-pr label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use fullscreen triangles for fullscreen passes. #21541

Use fullscreen triangles for fullscreen passes. #21541

ACEfanatic02 commented Mar 7, 2024

ZehMatt commented Mar 7, 2024

janisozaur commented Mar 7, 2024

ACEfanatic02 commented Mar 7, 2024

ZehMatt commented Mar 7, 2024

ZehMatt commented Mar 7, 2024

LRFLEW commented Mar 7, 2024 •

edited

ACEfanatic02 commented Mar 8, 2024

ZehMatt commented Mar 10, 2024

LRFLEW commented Mar 11, 2024

LRFLEW Mar 7, 2024

LRFLEW Mar 7, 2024

LRFLEW Mar 11, 2024

janisozaur left a comment

Gymnasiast commented Mar 30, 2024

github-actions bot commented May 1, 2024

Use fullscreen triangles for fullscreen passes. #21541

Are you sure you want to change the base?

Use fullscreen triangles for fullscreen passes. #21541

Conversation

ACEfanatic02 commented Mar 7, 2024

ZehMatt commented Mar 7, 2024

janisozaur commented Mar 7, 2024

ACEfanatic02 commented Mar 7, 2024

ZehMatt commented Mar 7, 2024

ZehMatt commented Mar 7, 2024

LRFLEW commented Mar 7, 2024 • edited

ACEfanatic02 commented Mar 8, 2024

ZehMatt commented Mar 10, 2024

LRFLEW commented Mar 11, 2024

LRFLEW Mar 7, 2024

Choose a reason for hiding this comment

LRFLEW Mar 7, 2024

Choose a reason for hiding this comment

LRFLEW Mar 11, 2024

Choose a reason for hiding this comment

janisozaur left a comment

Choose a reason for hiding this comment

Gymnasiast commented Mar 30, 2024

github-actions bot commented May 1, 2024

LRFLEW commented Mar 7, 2024 •

edited