Dave and I have started working together on Earthsim.
We have started with a clean slate and are writing pretty much everything from scratch. But times have changed, there is so much open source out there that we can base many components on existing technologies. This leaves us clear to focus on just the pieces we want to specialize on.
We have already set out some of our goals for the game and what the engine needs to do:
- Simulate an 80km square tile of landscape at 16k x 16k resolution.
- Simulate over a million plants & trees for the 80km square down to centimeter accuracy.
Above is our first screenshot from getting the first rendering prototype drawing a test landscape. Dave is building out the DirectX parts of the engine while I work on the model for how all our plants and trees grow.
Hierarchical grids and real time data compression
The most fundamental gate on performances is the speed a processor can get to its memory. It does not matter how fast you make your processing. If the data IO takes longer than the computer speed, the IO speed is the limit of performance.
Essentially, if you want to process big things fast, you have to make the memory use smaller.
Storing vast data sets at high performance needs real-time data compression. Otherwise you can easily run out of cache, and not manage to read in your data fast enough to keep the CPU or GPU running at a speed it should be. You stall.
One way of achieving this is to use hierarchical grids running at multiple resolutions that divide a world up into successively smaller cells, each with their own sub coordinate systems.
These sub-coordinate systems let us store plant and creature positions relative to their closest grid cell. So we can often use half precision floating point and sometimes even 8 bit fixed point precision to give a 4x data improvement over normal size floating point data. And this gain translates directly to performances gains.
Here is the design I just finished for our simulation grids. This shows all the different grid resolutions we are using to model and simulate our ecosystem.
(right click to open in a new tab if you really want to see the grids)
These grids also enable the algorithms to be more easily multi threaded for many core parallel performance, as well as enabling us to precisely tune the code so that all data fits by design into first and second level CPU caches over a complete data processing loop. This removes the requirement of the CPU to fetch new data from main memory until a given cell’s work is complete and sent back out of the cache to main memory.
We want all performance critical code to be running in this fashion so performance is gated on first or second level cache speed rather than main memory speed.