clustrix

Clustrix Sierra Clustered Database Engine

In the light of publicly announcing customers, I wanted to read a bit about Clustrix Clustered Database Systems.

The company homepage is describing the product:

  • scalable database appliaces for Internet-scale work loads
  • Linearly Scalable: fully distributed, parallel architecture provides unlimited scale
  • SQL functionality: full SQL relational and data consistency (ACID) functionality
  • Fault-Tolerant: highly available providing fail-over, recovery, and self-healing
  • MySQL Compatible: seamless deployment without application changes.

All these sounded pretty (too) good. And I’ve seen a very similar presentation for Xeround: Elastic, Always-on Storage Engine for MySQL.

So, I’ve continued my reading with the Sierra Clustered Database Engine whitepaper (PDF).

Here are my notes:

  • Sierra is composed of:
    • database personality module: translates queries into internal representation
    • distributed query planner and compiler
    • distributed shared-nothing execution engine
    • persistent storage
    • NVRAM transactional storage for journal changes
    • inter-node Infiniband
  • queries are decomposed into query fragments which are the unit of work. Query fragments are sent for execution to nodes containing the data.
  • query fragments are atomic operations that can:
    • insert, read, update data
    • execute functions and modify control flow
    • perform synchronization
    • send data to other nodes
    • format output
  • query fragments can be executed in parallel
  • query fragments can be cached with parameterized constants at the node level
  • determining where to sent the query fragments for execution is done using either range-based rules or hash function
  • tables are partitioned into slices, each slice having redundancy replicas
    • size of slices can be automatically determined or configured
    • adding new nodes to the cluster results in rebalancing slices
    • slices contained on a failed device are reconstructed using their replicas
  • one of the slices is considered primary
  • writes go to all replicas and are transactional
  • all reads fo the the slice primary

The paper also exemplifies the execution of 4 different queries:

SELECT * FROM T1 WHERE uid=10

SELECT uid, name FROM T1 JOIN T2 on T1.gid = T2.gid WHERE uid=10

SELECT * FROM T1 WHERE uid<100 and gid>10 ORDER BY uid LIMIT 5

INSERT INTO T1 VALUES (10,20)

Questions:

  • who is coordinating transactions that may be executed on different nodes?
  • who is maintains the topology of the slices? In case of a node failure, you’d need to determine:
    1. what slices where on the failing node
    2. where are the replicas for each of these slices
    3. where new replicas will be created
    4. when will new replicas become available for writes
  • who elects the slice primary?

Original title and link: Clustrix Sierra Clustered Database Engine (NoSQL databases © myNoSQL)

Rakuten's New DataCenter Infrastructure

rakuten technology conference 2013に参加したセッションのメモです。

Rakuten’s New DataCenter Infrastructure

楽天が刷新したインフラと、データベースのプラットフォームのおはなしでした。

インフラ

  • 物理サーバから、仮想サーバへ変えた。
  • ネットワークの構成が枝葉のような構成になっていたので、ネストを浅くなるようにした。 $10,000 / 6週間 -> $1,800 / 5日になったそう。

データベースプラットフォーム

MySQLは、マスター1/スレーブNの組み合わせをいくつも作っていたが、アプリの参照するデータベースがとてもたくさんになって複雑だったことと、CPUリソースが90%も余っていた。 Clustrixに変更することで、アプリは参照するデータベースサーバをひとつにすることができ、データベースは内部でネットワークを作って負荷分散ができるようになった。 マネジメントツールを提供することで、操作を容易にした。

きいてみて

Clustrixはマスタースレーブの構成ではないもよう。MySQLでマルチマスタは難しいので、アクセス数の多い楽天さんでは確かに効きそう。

Clustrix: Creating the World’s Leading NewSQL Database

Clustrix is the developer of Sierra NewQSL Database Engine. The company is headquartered in San Francisco, California. Its founders were Paul Mikesell (who was formerly with EMC Isilon) and Sergei Tsarev (who developed the Simple Time-series Database). Robin Purohit (who was formerly with HP) is the head of the company. This privately-held company is backed by ATA Ventures, US Venture Partners, and Sequoia Capital. The company has offices in London and Seattle, and markets the server-pre-installed Sierra Database Engine as an appliance using Clustrix as brand name.

Clustrix’s vision was to create a “limitless” database – limitless in database size, limitless in table size, and limitless in the intricacies of performance and queries. This database provides dynamic online scaling, flawless fault tolerance, full transactional and relational capabilities, and a MySQL wire-line that is compatible in a single-instance database.

Clustrix designed the database for simplicity yet capable of delivering a database appliance that is fault-tolerant at an amazing speed and scale making it a good replacement for an existing MySQL infrastructure. Any online business can find Clustrix’s database as a simple yet groundbreaking approach to a seamless scale that caters to limitless users, data and transactions that are full ACID compliant and without database sharing. This makes Clustrix a more preferred database with a client base that is constantly growing.

This is the way Clustrix has reinvented relational database from the bottom up. It is safe to say that the company has innovated an entirely new and different database category in the database market. Clustrix is indeed the first of all scalable SQL database systems that is capable of handling huge transactional data applications.

In one instance, CEO Purohit said, “As more global clients deploy Clustrix, we see our vision become reality—a radically simple and scalable distributed database seamlessly deployed and scaled, so our clients can focus 100% on innovation.”

via TechCrunch

Databases are the spine of the tech industry: unsung, invisible, but critical–and beyond disastrous when they break or are deformed. This makes database people cautious. For years, only the Big Three–Oracle, IBM’s DB2, and (maybe) SQL Server–were serious options. Then the open-source alternatives–MySQL, PostgreSQL–became viable. …And then, over the last five years, things got interesting.

Some history: around the turn of this millennium, more and more people begin to recognize that formal, structured, normalized relational databases, interrogated by variants of SQL, often hindered rather than helped development. Over the following decade, a plethora of new databases bloomed, especially within Google, which had a particular need for web-scale datastore solutions: hence BigTable, Megastore and Spanner.

Meanwhile, Apache brought us Cassandra, HBase, and CouchDB; Clustrix offered a plug-and-play scalable MySQL replacement; Redis became a fundamental component of many Rails (and other) apps; and, especially, MongoDB became extremely popular among startups, despite vociferous criticism — in particular, of its write lock which prevented concurrent write operations across entire databases. This will apparently soon be much relaxed, after which there will presumably be much rejoicing. (For context: I’m a developer, and have done some work with MongoDB, and I’m not a fan.)

As interesting as these new developments–called “NoSQL databases”–were, though, only bleeding-edge startups and a tiny handful of other dreamers were really taking them seriously. Databases are beyond mission-critical, after all. If your database is deformed, you’re in real trouble. If your database doesn’t guarantee the integrity of its data and your transactions–i.e. if it doesn’t substantially support what are known as “ACID transactions“–then real database engineers don’t take it seriously:

MongoDB is not ACID compliant. Neither is Cassandra. Neither is Riak. Neither is Redis. Etc etc etc. In fact, it was sometimes claimed that NoSQL databases were fundamentally incompatible with ACID compliance. This isn’t true — Google’s Megastore is basically ACID compliant, and their Spanner is even better — but you can’t use Megastore outside of Google unless you’re willing to build your entire application on their idiosyncratic App Engine platform.

Which is why I was so intrigued a couple of years ago when I stumbled across a booth at TechCrunch Disrupt whose slogan was “NoSQL, YesACID.” It was hosted by a company named FoundationDB, who have performed the remarkable achievement of building an ACID-compliant1 key-value datastore while also providing a standard SQL access layer on top of that. Earlier this week they announced the release of FoundationDB 3.0, a remarkable twenty-five times faster than their previous version, thanks to what co-founder and COO compares to a “heart and lungs transplant” for their engine. This new engine scales up to a whopping 14.4 million writes per second.

That is a quite a feat of engineering. To quote their blog post, this isn’t just 14 million writes per second, it’s 14 million “in a fully-ordered, fully-transactional database with 100% multi-key cross-node transactions […] in the public cloud […] Said another way, FoundationDB can do 3.6 million database writes per penny.”

Impressive stuff. Impressive enough to capture the attention of enterprise database engineers, maybe. And obviously a great fit with the forthcoming Internet of Things, and the enormous amount of data that billions of connected devices will soon be constantly capturing.

But most importantly, this will push their competitors to do even better — which, in turn, will hopefully nudge the enormous numbers of enterprises still in the database Bronze Ages, running off Oracle and DB2, to consider maybe, just maybe, beginning to slowly, cautiously, carefully move into the bold new present day, in which developers are spoiled with simple key-value semantics, the full power of classic SQL queries, and distributed ACID transactions, all at the same time. In the long run that will make life better. In the interim, hats off to all the unsung database engineers out there pushing the collective envelope. You may not realize it, but they’re doing us all a huge service.

1If you click through you’ll note they elide discussion of the “C” in ACID, “consistency.” Suffice to say that the discussion of consistency is abstruse enough to make medieval debates about angels on the head of a pin sound like knock-knock jokes; but for the technically inclined, they are strongly rather than merely eventually consistent.

http://tctechcrunch2011.files.wordpress.com/2014/12/new-new-2.jpg?w=738

Using as a pretext a comparison with MongoDB — why MongoDB? — Sergei Tsarev provides some details about Clustrix data distribution, fault tolerance, and availability models.

At Clustrix, we think that Consistency, Availability, and Performance are much more important than Partition tolerance. Within a cluster, Clustrix keeps availability in the face of node loss while keeping strong consistency guarantees. But we do require that more than half of the nodes in the cluster group membership are online before accepting any user requests. So a cluster provides fully ACID compliant transactional semantics while keeping a high level of performance, but you need majority of the nodes online.

Original title and link: Clustrix: Distribution, Fault Tolerance, and Availability Models (NoSQL databases © myNoSQL)

This made some rounds yesterday. And it got some long comments on both Hacker News and Reddit.

While I haven’t gone through the benchmark details, the first thing that made me raise an eyebrow was this comment early in the post:

Well, that’s just bullshit. There is absolutely nothing about SQL or the relational model preventing it from scaling out.

I’m afraid I’ll have to disagree with the second part.

Original title and link: MongoDB vs Clustrix Performance Comparison (NoSQL databases © myNoSQL)