Tuesday 24 July 2018

Cassandra Architecture a Complete Guide

Cassandra Architecture a Complete Guide

1. Objective

In our last Cassandra Tutorial, we saw Cassandra Applications and why Cassandra is so popular. Today, we will learn about Cassandra Architecture. Before starting we should be familiar with some key terms of Cassandra Architecture.
So, let’s learn Cassandra Architecture in detail.
Cassandra Architecture
Cassandra Architecture a Complete Guide

2. Key Terms Of Cassandra Architecture

Below, we are discussing some key terms in the architecture of Cassandra:

a. Cassandra Nodes

It is the basic fundamental unit of Cassandra. Data stores in these units(computer/server).

b. Cassandra Data Center

Cassandra Datacenter, basically a collection of related Cassandra nodes. A centralized place to accommodate computer and networking system to meet the needs of an organization’s information technology.

c. Cassandra Rack

A rack is a unit that contains all the multiple servers all stacked on top of another. A node is a single server in a rack.

d. Cassandra Cluster

A collection of many data centers form a Cassandra cluster. It can be spanned to physical locations.
Cassandra Architecture
Cassandra Architecture- Cassandra Cluster

e. Cassandra Commit log

Every writes operation performs in a commit log to ensure the durability of the data. After it has been flushed to an SSTable data archives or delete or change here. It is like a crash recovery mechanism.

f. MemTables

A temporary memory location where we write data during updates or deletion. Data is written in memtables after it has been written in the commit log. The data in memtables is flushed to the disk, once they are full, to form SSTables.

g. SSTables

SSTables are the fixed set of data files in which Cassandra writes memtables periodically. These are appended only, which means that we can add data at the end of the file thus helping in the sequential storage in the disk. These are maintained for each Cassandra table.

h. Data Replication

Imagine a situation if one of the nodes goes down in a data center then a part of information will lost. Thus to overcome this limitation, Cassandra made replicas of data on various nodes. This is called replication. This ensures fault tolerance and reliability.   

3. Cassandra Architecture

Cassandra takes hardware failure into consideration. Thus, it possesses plans of contingency to avoid such failures. It consists of a ring type structure i.e. its nodes are logically distributed like a ring. Thus it has no master or slave nodes. It makes replicas of data on several homogenous nodes of the cluster. Each information exchanges among the nodes of the cluster every second. A sequentially written commit log on each node captures write activity to make sure data durability. This data is then indexed and written to memtable. Once the memtable is full, we write data on disk on SSTable data file. All the data is partitioned and replicated to other nodes automatically. By using a process known as compaction. Cassandra periodically updates SSTables and remove outdated data and tombstones.
A client can make read/write request to any node in the cluster. That particular node, also called coordinator, acts as a proxy between a client’s application and the node which has the required data.

a. Data Replication

As we all now know that to avoid a single point of failure, Cassandra makes replicas of data on several nodes. Here, there are two things that are important to understanding the process correctly:
  1. Replication Factor: Replication means the no. of copies maintained on different nodes. Replication Factor of 3 means, 3 copies of data maintained on 3 different nodes. So if 2 of the nodes go down we still have one copy of data safe.
  2. Replication Strategy: There is two replication strategy.
Simple strategy: This strategy is used when there is only one data center, data is copied in a clockwise manner on all the nodes.
Cassandra Architecture
Cassandra Architecture- Simple Strategy
Network topology strategy: This strategy is highly recommended as there is a possibility to expand according to the future use.
Cassandra Architecture
Cassandra Architecture- Network Topology Strategy
Here rack set of data for each data center place separately in a clockwise direction on different racks of the same data center. This process continues until the first node is reached.
So, this was all about Cassandra architecture and the Key terms of Cassandra Architecture. Hope You like our explanation

4. Conclusion

Hence, we saw Cassandra architecture. Moreover, we discussed the different Key Terms of Cassandra Architecture such as Cassandra nodes, Datacenter, SStables, Memtables, Cassandra Cluster, Commit log etc. Also, we looked at Data Replication, replication factor, and Strategy. Finally, we discussed Simple Strategy and Network Topology Strategy. In the next article, we will learn about the Cassandra Data Model. Furthermore, if you have any query, feel free to ask in the comment section. 

1 comment:

  1. Thanks for sharing such beautiful information with us. I hope you'll some more info about Cassandra Architecture. Please suggest!

    https://www.i2tutorials.com/cassandra-tutorial/

    ReplyDelete