Water forms a big part of the earthsim3 engine. There are two major challenging components to this.
- Simulating a large amount of interconnected bodies of water in real time
- Rendering a large scale landscape that has dynamic simulated water on it.
Like pretty much everything in the world of computers, the fundamental limit to what you can do is down to memory. How much of it you can access and how fast you can get to it.
Water simulation is no different. The maths for how to do it is pretty straight forward, the challenges come in making a fast usable implementation that can work for a game.
Realtime water simulations work very well on the GPU, the GPU is ideally suited for massive batch processing of the water data. But we want the Earthsim landscape to be big, 8k, 16k and even 32k landscapes should be possible for people that have a machine setup for it.
For example, a 16k x 16k tile of landscape gives you 268 Mega cells (each cell is one pixel of land data). If the memory use per cell of landscape (water simulation variables + land type layers) is say 64 bytes. A 16k landscape takes over 17 Gigabytes of memory!
It is not possible to fit this scale of landscape on the GPU and leave a decent amount of space for both the frame buffer and atmosphere voxel buffer.
It becomes clear from memory alone that land and water simulation have to go on the CPU. This leaves the GPU to do rendering, and atmospheric simulation. That’s plenty to keep a GPU busy, especially if we ray trace the dynamic atmosphere as we hope to do.
Luckily there are two features of modern CPU’s that make this doable.
- Multi core is happening in a big way, especially with the latest AMD processors.
- SIMD code on the CPU is beginning to bring you into the ballpark of GPU speed.
AMD has the edge right now on core counts, but Intel has 16 way SIMD with AVX512. This should give it a 2x advantage over a similar AMD core that only has AVX2.
Its going to be fascinating to build a simulation benchmark out of this and try out some of the latest CPU’s.
The landscape is divided up into tiles, each of which can run on an independent CPU core. This video shows the water running in separate tiles (looking like aquariums in the video) This was the first step that we used to prove out if the performance of a CPU SIMD implementation was good enough. It was.
The next step is to getting the code to run on multiple cores and hand the data between the cores at the edges in a fast enough way. Its coming along well but still has some bugs and work todo before its done.
Once again, the challenge in rendering the water is all about memory. For our 16k x 16k example tile of 268 Mega cells, if the water heights alone were 2 bytes, that’s a half gigabyte to get from the CPU to the GPU every frame.
Add to this that the land in earthsim3 is also alive with:
- real-time erosion
- sand/earth/gravel settling
- lava moving
That adds another 3 bytes of data (2 for land height and 1 for type). To give over a gigabyte of data uploading to the GPU every frame. That’s too much to run at 60Hz.
This 1gig+ upload is then turned into a pyramid of LOD’s dynamically by the GPU to render the landscape. It is worth adding view frustum culling to the up-loader so we only load the LOD’s that are visible to the GPU, but that means calculating the LOD’s on the CPU. This should be relatively cost free to combine into the water and tallus engines.
Additionally the land updates don’t need to run quite as fast as the water. Water looks great at full framerate 60 or 120 Hz. But land updates can happen 2x or 3x slower and its hardly noticeable as sand movement, land settling and erosion all are much less dynamic effects.
After all this is working, the next step will be to look into a few simple real-time compression schemes for the data. Even a 2x reduction in data would make a large difference. Realtime DXT compression should do the trick and bring in another 2-4x on size.
Finally because we have a map of what each type of surface value is, we can make an intelligent landscape extrapolate on the GPU to give us 2-4x extra detail on there for rendering. The extrapolator is well understood technology so will come later once the parts we are less sure about are finished off.