Composite Partition Key in Cassandra

A composite partition key uses two or more columns to identify where data will reside. Composite partition keys are used when the data stored is too large to reside in a single partition. Composite partition key breaks the data into chunks which is effective if a Cassandra cluster experiences hotspotting, or congestion in writing data to one node repeatedly.

Data is retrieved using the partition key. To retrieve data from the table, values for all columns defined in the partition key have to be supplied, if secondary indexes are not used. Composite partition key are defined between double parentheses in the PRIMARY KEY:

CREATE TABLE movies_by_year_genre (

  year text,

  genre text,

  movie_name text

  PRIMARY KEY ((year, genre), movie_name)

);

Here (year, genre) is composite partition key for this table and both column should be available to get the data. The general rule to make query is you have to pass at least all partition key columns, then you can add each key in the order they're set. e.g. valid queries are:

SELECT * FROM movies_by_year_genre where year=2016 and genre='horror';

SELECT * FROM movies_by_year_genre where year=2016 and genre='horror'

and movie_name='xyz';

invalid queries are:

SELECT * FROM movies_by_year_genre where year=2016;

SELECT * FROM movies_by_year_genre where genre='horror'

 and movie_name='xyz';

SELECT * FROM movies_by_year_genre where year=2016 and movie_name='xyz';

Tech Blog

Tuesday, 24 July 2018