I have this little experiment where I am trying to render this voxel-like landscape map. I take a black and white image generated using the Gimp solid noise render script. I then interpret this as a height map to generate land and sea. The height of each pixel in the image is rendered as a "cube" at that height. Not really a cube actually - I generate the top surface quad, and then generate a quad on each side where the neighbouring pixel is of a different height. So the mesh ends up bing a plane and I don't render sides that are not visible. I am not merging surface squares into larger areas when there are many pixels of the same height - not yet.
Anyway, I end up rendering a lot of cubes (quads). Each quad has 4 vertexes and 6 indexes. I am rendering the whole thing in a single glDrawElements
call. So I first traverse the map counting how many quads I am going to need. I then malloc separate buffers for vertexes, uvs (the quads are textured) and lights. I am not using normals, as I am not implementing a light source shader - each quad has a hard-coded light level. The vertex and fragment shaders are very simple.
So a quad is made up of 4 vertexes for each corner, with 6 indexes. The first quads indexes look like this:
{0, 1, 2, 1, 2, 3}
Which represents two triangles that make up the quad. The second quads indexes look like this:
{4, 5, 6, 5, 6, 7}
And the third:
{8, 9, 10, 9, 10, 11}
The index buffer looks like this from the beginning:
{0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7, 8, 9, 10, 9, 10, 11...
And so on. Anyway, at some point I bump up against the fact that index values have a size of uint16_t
, giving a maximum index range of 0 to 65535. If I have more than this number of indexes only part of the map renders and the rest is garbage. So I think to myself: oh, I need to batch these calls to glDrawElements
! Not being particularly good at OpenGL my solution for batching involved creating a completely new set of vertexes, uvs and lights for every batch of indexes. I started down this road and it got very complex and ugly very fast. Then I had an ah-ha moment!
I had this faint memory of being able to specify a start offset value into a data buffer. So I look up the documentation and I see this:
void glVertexAttribPointer(GLuint index,
GLint size,
GLenum type,
GLboolean normalized,
GLsizei stride,
const GLvoid *pointer); <--- look at meee
That last parameter (that I always set to 0 without thinking) is what I was looking for. I can keep all my vertexes, uvs and lights alone in their original buffers, and just adjust this pointer value for each batch, resetting the attribute pointer start for each batch render. So my render function is: bind the shader program, then enter a loop over the batches of indexes. Inside that loop I call glVertexAttribPointer
with pointer offsets for each of my buffers, call glDrawElements
with my index batch and then increment those pointer offsets for the next iteration.
So I write that code but I still haven't changed my index buffer. It is still the monolith buffer with funny values. I sigh at the thought of having to build these, so out of procrastination I deside to build the program as-is and just run it with my monolith index buffer and see how it explodes.
And it works perfectly...
Ok, but how exactly does it work? I suspect something really strange is happening and that I don't want to leave it like this. But I make a savepoint commit anyway and ponder this in the commit message:
Experiment in "batch" rendering to get around uint16_t size limit of indexes
So this seems to work as-is, but I have absolutely no idea how...
I am breaking up the rendering into batches that reduce the index count below
65535 indexes per batch. Each batch adds an offset when binding the vertex, uv
and light data, so on each batch cycle it is moving in chunks through these
loaded buffers. I thought I was going to have to create an array of `indexes`,
`nr_indexes` and `index_id`s for each batch - so each batch would call a
different set of indexes, each index one starting at zero. But I was very
surprised when I just left it rendering the same `index_id` and `nr_indexes`
for each batch!
I think I might know how now, and it's pretty nasty: Each batch loop advances
the pointer forward for the attributes. When we get to `glDrawElements` we
re-use the same index set over and over (3 batches), each time starting the
indexes at zero. However the *sequence* of how the indexes is the same for
every quad (0,1,2,1,2,3). So OpenGL is quite happy with this and because we
have advanced the other attibute pointers forward by an amount compatible with
the indexes all is ok.
Except in the last batch when the `nr_indexes` exceeds the float count for the
attributes. My guess is that OpenGL takes an index value, sees that it exceeds
the bound data, and just drops it - not erroring and not blowing up.
If this is true, then in theory I could just load an index buffer of 6 indexes,
set my batch size to render 1 quad, and have thousands of batches (but
obviously not ideal). Better would be to precalculate the batch size when
loading data, and load only batch size number of indexes. Then in the last pass
of the final batch, reduce `nr_indexes` to match the size of the last batch.
Then I am not storing thousands of unused indexes.
To explain in another way: Let's say I went ahead and built these batches of index values that I planned to. At the beginning of each batch I would start the indexes again from 0, because the offset pointers in my vertex attribute binding "shift" the location of what 0 means to deeper into the buffers. My batches would have the exact same index sequences as each other. Except for the last batch which has less vertexes and so would have less indexes. This means I don't have to create multiple identical index buffers - I can just create one and re-use it, like I was accidentially doing.
I create a single index buffer that stores the incrementing indexes for a single batch. I then create my other buffers as usual with the full vertex, uv and light data for all batches. When rendering I enter into a loop advancing the buffer offsets for the "graphics" buffers, and reuse the one index buffer for each draw call. In the last batch I adjust the nr_indexes
argument to glDrawElements
down to match the lesser quad count in the last batch. It works like a freak'n charm.
GLvoid
?One confusing thing I found about the pointer
argument was the fact that it was a pointer. In my limited understanding of OpenGL this buffer exists in RAM on the GPU (unless Mesa is doing software rendering). So a pointer doesn't make any sense! The memory I allocated for vertex data was free'd after I loaded the data onto the GPU, and that chunk of memory was from a completely different set of chips. The documentation says this about it:
pointer
Specifies a pointer to the first component of the first generic vertex
attribute in the array. The initial value is 0.
I looked at the typedef for GLvoid
in my GL header and it was indeed this: typedef void GLvoid;
. After some searching and stack overflow it turns out this is an API quirk from days-gone-by, and what it actually means is a byte offset into the data buffer. My vertex data is a big list of float triplicates and a float is bytes long, so I have to calculate the 8-bit byte offset for each batch and pass that. The API expects a pointer value, so I have to cast, so I have to use a uint64_t
type for my offset variable otherwise the compiler complains that I am casting the wrong sized thing into a pointer. I guess my code will only run on 64 bit hardware without some macro funkyness, but meh. Or I use a variable of type size_t
I suppose.
Play more and try to run things even I think they are broken. They might work, or they might break in a way I wasn't expecting, which is an oppertunity for deeper learning.
This inspired me to experiment rendering larger and larger chunks of the map and documenting how the rendering smoothness was. I discovered I could render up to 141 batches before there was some stuttering. This was a map chunk size of 800x800 pixels, so 640,000 surface quads at least, plus the sides which I estimate doubles that at least to 1,280,000 quads. Or 2,560,000 textured triangles. On an 8 year old laptop! At that number of quads I have to zoom out a lot to see them, which makes the texturing pointless, meaning I could have a zoom out feature that switches to plain color rendering after some level.
Also write about stuff more because I just realized I can make my batch sizes larger. The first 6 indexes have a maximum index number of 3. The first 12 indexes have a maximum number of 7. I want to avoid having an index value within a batch greater than 65535, not a count of indexes.