Using Collection in Cassandra

Cassandra provides collection types as a way to group and store related type of data together in a column. Use collection if the data to be stored in collection is limited. If the data has unbounded growth use a table with a compound primary key where data is stored in the clustering columns. Supported collections are:

list
set
map

Using the set type

Create the table having set column type:

CREATE TABLE users (

  id text PRIMARY KEY,

  name text,

  emails set<text>

);

Insert the data into the table:

INSERT INTO users (id, name, emails)

  VALUES('123', 'Ajay', {'ajay1@gmail.com', 'ajay2@gmail.com'});

Select records from table:

SELECT id, emails FROM users WHERE id = '123';

It will return below result:

id | emails

---------+------------------------------------------

 ajay   | {'ajay1@gmail.com', 'ajay2@gmail.com'}

You can add more emails:

UPDATE users

  SET emails = emails + {'ajay3@gmail.com'} WHERE id = '123';

You can delete an email:

UPDATE users

  SET emails = emails - {'ajay3@gmail.com'} WHERE id = '123';

Using the list type

When the order of elements matters, use list type. A list will store the elements in the same order it was added to the list. It can have duplicate values. For example in the users table we will add a new column place_visited:

ALTER TABLE users ADD place_visited list<text>;

Add some data:

UPDATE users

  SET place_visited = [ 'agra', 'delhi' ] WHERE id = '123';

You can append and prepend the data to the list:

UPDATE users

  SET place_visited = place_visited + [ 'mumbai' ] WHERE id = '123';

UPDATE users

  SET place_visited = [ 'kolkata' ] + place_visited WHERE id = '123';

You can add element at particular position:

UPDATE users SET place_visited[2] = 'jaipur' WHERE id = '123';

When you add an element at a particular position, Cassandra reads the entire list and then writes only the updated element. It results in greater latency than appending or prefixing an element to a list.

Delete element at particular position:

DELETE place_visited[3] FROM users WHERE id = '123';

Using the Map Type

A map stores data in key-value form. Each data will have a key and its value. If you have the key, you can get its value quickly. Keys are unique. Each element can have an individual time-to-live and expire when the TTL ends. Each element of the map is internally stored as one Cassandra column. We will add new column todo in the users table:

ALTER TABLE users ADD todo map<timestamp, text>;

Add some values to the user's todo map:

UPDATE users

  SET todo =

  { '2016-1-2 17:00' : 'Buy icecream',

  '2016-1-2 12:00' : 'Call seller' }

  WHERE id = '123';

Update the user's todo map:

UPDATE users SET todo['2016-1-2 12:00'] = 'call seller2'

  WHERE id = '123';

Delete an element from map:

DELETE todo['2016-1-2 12:00'] FROM users WHERE id = '123';

See the map using SELECT command:

SELECT id, todo FROM users WHERE id = '123';

Limitation

The maximum number of keys for a map collection is 65,535.
The maximum size of an item in a set collection is 65,535 bytes.
Cassandra can query upto 2 billion items in a collection. So do not insert more then that many elements.
The maximum size of an item in a list or a map collection is 2GB.
Cassandra reads a collection in its entirety so keep collections small to prevent delays during querying. The collection is not paged internally.

Tech Blog

Tuesday, 24 July 2018