Cassandra also supports collections in its data model to store a small amount of data. Collections are a complex type that can provide tremendous flexibility. Three collections are supported: Set, List, and Map. The type of data stored in each of these collections requires to be defined, for example, a set of timestamp is defined as set<timestamp>
, a list of text is defined as list<text>
, a map containing a text key and a text value is defined as map<text, text>
, and so on. Also, only native data types can be used in collections.
Cassandra reads a collection in its entirety and the collection is not paged internally. The maximum number of items of a collection is 64K and the maximum size of an item is 64K.
To better demonstrate the CQL support on these collections, let us create a table in the packt
keyspace with columns of each collection and insert some data into it, as shown in the following screenshot:
How to update or delete a collection?
CQL also supports updation and deletion of elements in a collection. You can refer to the relevant information in DataStax's documentation at http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use_collections_c.html.
As in the case of native data types, let us walk through each collection below.
CQL uses sets to keep a collection of unique elements. The benefit of a set is that Cassandra automatically keeps track of the uniqueness of the elements and we, as application developers, do not need to bother on it.
CQL uses curly braces ({}
) to represent a set of values separated by commas. An empty set is simply {}
. In the previous example, although we inserted the set as {'Lemon', 'Orange', 'Apple'}
, the input order was not preserved. Why?
The reason is in the mechanism of how Cassandra stores the set. Internally, Cassandra stores each element of the set as a single column whose column name is the original column name suffixed by a colon and the element value. As shown previously, the ASCII values of 'Apple'
, 'Lemon'
, and 'Orange'
are 0x4170706c65
, 0x4c656d6f6e
, and 0x4f72616e6765
, respectively. So they are stored in three columns with column names, setfield:4170706c65
, setfield:4c656d6f6e
, and setfield:4f72616e6765
. By the built-in order column-name-nature of Cassandra, the elements of a set are sorted automatically.
A list is ordered by the natural order of the type selected. Hence it is suitable when uniqueness is not required and maintaining order is required.
CQL uses square brackets ([]
) to represent a list of values separated by commas. An empty list is []
. In contrast to a set, the input order of a list is preserved by Cassandra. Cassandra also stores each element of the list as a column. But this time, the columns have the same name composed of the original column name (listfield
in our example), a colon, and a UUID generated at the time of update. The element value of the list is stored in the value of the column.
A map in Cassandra is a dictionary-like data structure with keys and values. It is useful when you want to store table-like data within a single Cassandra row.
CQL also uses curly braces ({}
) to represent a map of keys and values separated by commas. Each key-value pair is separated by a colon. An empty map is simply represented as {}
. Conceivably, each key/value pair is stored in a column whose column name is composed of the original map column name followed by a colon and the key of that pair. The value of the pair is stored in the value of the column. Similar to a set, the map sorts its items automatically. As a result, a map can be imagined as a hybrid of a set and a list.