Data Read in Cassandra
During a read operation, Cassandra use the below modules in order:
- Check the memtable
- If the memtable has the desired partition data, then the data is read and then merged with the data from the SSTables.
- Check row cache, if enabled
- Row cache improve performance for read-intensive operations. When row cache is enabled, it stores a subset of the partition data stored on disk in the SSTables in memory. Requested partition data is read from the row cache if it is enabled and it makes read operation faster.
You can configure that how many row should be stored in the row cache. This will make "last 100 record" type query fast.When the row cache is full, it reclaim memory using LRU (least-recently-used) policy.
- Checks Bloom filter
- Each SSTable has a Bloom filter associated with it and it is used to discover which SSTables are likely to have the request partition data and this way It speeds up the process of partition key lookup.
- Checks Partition Key Cache, if enabled
- Sometimes SSTables identified by the Bloom filter won't have data. So if the Bloom filter does not rule out an SSTable, Cassandra checks the partition key cache.
Partition Key Cache stores a cache of the partition index. If a partition key is found in the partition key cache then it jumps to compression offset map else checks the partition summary.
- Partition Summary
- In Cassandra a partition index contains all partition keys, whereas a partition summary samples every X keys, and maps the location of every Xth key's location in the index file e.g. if the partition summary is set to sample every 100 keys, it will store the location of the first key as the beginning of the SSTable file, the 100th key and its location in the file, and so on.
- Partition Index
- The partition index resides on disk and stores an index of all partition keys mapped to their offset.
- Compression offset map
- The compression offset map locates the data on disk.
- Fetches the data from the SSTable on disk.
0 comments:
Post a Comment