Clustrix Sierra Clustered Database Engine
In the light of publicly announcing customers, I wanted to read a bit about Clustrix Clustered Database Systems.
The company homepage is describing the product:
- scalable database appliaces for Internet-scale work loads
- Linearly Scalable: fully distributed, parallel architecture provides unlimited scale
- SQL functionality: full SQL relational and data consistency (ACID) functionality
- Fault-Tolerant: highly available providing fail-over, recovery, and self-healing
- MySQL Compatible: seamless deployment without application changes.
All these sounded pretty (
too) good. And I’ve seen a very similar presentation for Xeround: Elastic, Always-on Storage Engine for MySQL.
So, I’ve continued my reading with the Sierra Clustered Database Engine whitepaper (PDF).
Here are my notes:
- Sierra is composed of:
- database personality module: translates queries into internal representation
- distributed query planner and compiler
- distributed shared-nothing execution engine
- persistent storage
- NVRAM transactional storage for journal changes
- inter-node Infiniband
- queries are decomposed into query fragments which are the unit of work. Query fragments are sent for execution to nodes containing the data.
- query fragments are atomic operations that can:
- insert, read, update data
- execute functions and modify control flow
- perform synchronization
- send data to other nodes
- format output
- query fragments can be executed in parallel
- query fragments can be cached with parameterized constants at the node level
- determining where to sent the query fragments for execution is done using either range-based rules or hash function
- tables are partitioned into slices, each slice having redundancy replicas
- size of slices can be automatically determined or configured
- adding new nodes to the cluster results in rebalancing slices
- slices contained on a failed device are reconstructed using their replicas
- one of the slices is considered primary
- writes go to all replicas and are transactional
- all reads fo the the slice primary
The paper also exemplifies the execution of 4 different queries:
SELECT * FROM T1 WHERE uid=10 SELECT uid, name FROM T1 JOIN T2 on T1.gid = T2.gid WHERE uid=10 SELECT * FROM T1 WHERE uid<100 and gid>10 ORDER BY uid LIMIT 5 INSERT INTO T1 VALUES (10,20)
- who is coordinating transactions that may be executed on different nodes?
- who is maintains the topology of the slices? In case of a node failure, you’d need to determine:
- what slices where on the failing node
- where are the replicas for each of these slices
- where new replicas will be created
- when will new replicas become available for writes
- who elects the slice primary?