01 October 2008

Ensure a database doesnt undo all your hard work in increasing concurrency

Caching, Parallelism and Scalability | Javalobby

So you make your application super scalable... dont just stick a database in the way....

databases that touch disks that inherently involve sequential, serial processing. And there is no way you can change your database vendor’s code. Systems that involve all but the simplest, most infrequent database use would be facing a massive bottleneck thanks to this serialization. Databases are just one common example of process serialization, but there could be others as well. Serialization is the real enemy here, as it undoes any of the throughput gains parallelism has to offer. Serialization would only allow a single core in a multi-core system to work at a time, and limit usable computing power. And this is made even worse in a cluster or grid. Using a bigger and beefier database server reduces this somewhat, but still does not overcome the problem that as you add more servers to your grid, you still have process serialization going on in your database server.

Distributed caches are a solution to this problem.  They are simple to implement where data is immutable (read-only), things become a bit more complex when the data being distributed is mutable.  Ideally the cache dynamically distributes the data across nodes based on runtime usage.