Cassandra Architecture

Cassandra is made of a masterless ring architecture. The benefit of a masterless ring architecture is that it is elegant, easy to set up and maintain.

What is Cluster?

A cluster is a group of servers and other resources that act like a single system and enable high availability. Clusters are usually deployed to improve performance and availability. A Cassandra cluster consist of different part:

Node: It is a single Cassandra instance running on a machine.
Rack: It is a set of nodes.
Data Center: It is a set of racks.

What is a Coordinator?

The coordinator is selected by the driver based on the policy you have set to receive a particular read or write request to its cluster. At first requests will be sent to the nodes which client driver knows about and once it connects and understands the cluster, it may change to a closer coordinator.

The coordinator node is typically chosen based on network distance. It is the coordinator that manages the Replication Factor (number of nodes should a write be copied) and applies the Consistency Level (number of nodes that must acknowledge a read or write request).

What is Partitioners?

A partitioner determines how data is distributed across the nodes in the cluster. A partitioner is a hash function for computing the token of a partition key. Each row of data is distributed across the cluster by the value of the token.

e.g rows whose partition key values range from 1 to 100 may reside in node A, partition key values range from 101 to 200 may reside in node B and so on.

Cassandra supports following partitioners:

Murmur3Partitioner: This is the default partitioner. It uniformly distributes data across the cluster based on MurmurHash hash values. The MurmurHash function creates a 64-bit hash value of the partition key. The hash values renge from -263 to +263-1.
RandomPartitioner: It uniformly distributes data across the cluster based on MD5 hash values. The hash values ranges from 0 to 2127 -1.
ByteOrderedPartitioner: It keeps an ordered distribution of data lexically by key bytes. It is included for backwards compatibility.

What is Virtual Nodes?

Prior to Cassandra 1.2, a single token is manually calculated by user and assigned to each node in a cluster. Each token determined the node's position in the ring and its portion of data according to its hash value.

Cassandra 1.2 and later, each node is allowed many token and this paradigm is called virtual nodes (vnodes). Vnodes allow each node to own a large number of small partition ranges distributed throughout the cluster. A consistent hashing method is used by Vnodes to distribute data.

Advantage of Virtual nodes are:

Tokens are automatically calculated and assigned to each node.
Cluster balanced itself on addition or removal of nodes.
Smaller and larger computers can be used in building a cluster.
If a node fails, the load is spread evenly across other nodes in the cluster.

Tech Blog

Tuesday, 24 July 2018