skip to main content
About the Apache Cassandra Driver : Complex Type Normalization : Collection Types
  

Try DataDirect Drivers Now

Collection Types

Cassandra collection types include the Map, List, and Set types. If collection types are discovered, the driver normalizes the native data into a set of parent-child tables. Primitive types are normalized in a parent table, while each collection type is normalized in a child table that has a foreign key relationship to the parent table. Take for example the following Cassandra table:
CREATE TABLE employee (
empid int PRIMARY KEY,
phone map<varchar, varint>,
client list<varchar>,
review set<date>);
The following employee table is a tabular representation of the native Cassandra table with data included. In this example, four distinct relational tables are created. A parent table is created based on the empid column, and a child table is created for each of the three collection types (Map, List, and Set).
Table 2. employee (native)
empid
(primary key)
phone
client
review
int
map<varchar, varint>
list<varchar>
set<date>
103
home: 2855551122
mobile: 2855552347
office: 2855555566
spouse: 2855556782
Li
Kumar
Jones
2013-12-07
2015-01-22
2016-01-10
105
home: 2855555678
mobile: 2855553335
office: 2855555462
Yanev
Bishop
Bogdanov
2015-01-22
2016-01-12

The Parent Table

The parent table is comprised of the primitive integer type column empid and takes its name from the native table. A SQL statement would identify the column as employee.empid.
Table 3. employee (relational parent)
empid
(primary key)
int
103
105
A SQL insert on the employee parent table would take the form:
INSERT INTO employee (empid) VALUES (107)

The Map Child Table

The Map collection is normalized into a three column child table called employee_phone. The name of the table is formulated by concatenating the name of the native table and the name of the Map column. A foreign key relationship to the parent table is maintained via the employee_empid column, and the Map's key value pairs are resolved into separate keycol and valuecol columns. In a SQL statement, these columns would be identified as the employee_phone.employee_empid, employee_phone.keycol, and employee_phone.valuecol, respectively.
Table 4. employee_phone (relational child of the map column)
employee_empid
(foreign key)
keycol
valuecol
int
varchar
varint
103
home
2855551122
103
mobile
2855552347
103
office
2855555566
103
spouse
2855556782
105
home
2855555678
105
mobile
2855553335
105
office
2855555462
A SQL insert on the employee_phone child table would take the form1:
INSERT INTO employee_phone (employee_empid, keycol, valuecol)
VALUES (107, 'mobile', 2855552391)

The List Child Table

The List collection is normalized into a three column child table called employee_client. The name of the table is formulated by concatenating the name of the native table and the name of the List column. A foreign key relationship to the parent table is maintained via the employee_empid column; the order of the elements in the List is maintained via the current_list_index column; and the elements themselves are contained in the client column. SQL statements would identify these columns as employee_client.employee_empid, employee_client.current_list_index, and employee_client.client, respectively.
Table 5. employee_client (relational child of the list column)
employee_empid
(foreign key)
current_list_index
client
int
int
varchar
103
0
Li
103
1
Kumar
103
2
Jones
105
0
Yanev
105
1
Bishop
105
2
Bogdanov
A SQL insert on the employee_client child table would take the form2:
INSERT INTO employee_client (employee_empid, client) VALUES (107, 'Nelson')

The Set Child Table

The Set collection is normalized into a two column child table called employee_review. The name of the table is formulated by concatenating the name of the native table and the name of the Set column. A foreign key relationship to the parent table is maintained via the employee_empid column, while the elements of the Set are given in natural order in the review column. In this child table, SQL statements would identify these columns as employee_review.employee_empid and employee_review.review
Table 6. employee_review (relational child of the set column)
employee_empid
(foreign key)
review
int
date
103
2013-12-07
103
2015-01-22
103
2016-01-10
105
2015-01-22
105
2016-01-12
A SQL insert on the employee_client child table would take the form3:
INSERT INTO employee_review (employee_empid, review) VALUES (107, '2015-01-20')

Update Support

Update is supported for primitive types, non-nested Tuple types, and non-nested user-defined types. Update is also supported for value columns (valuecol) in non-nested Map types. The driver does not support updates on List types, Set types, or key columns (keycol) in Map types because the values in each are part of the primary key of their respective child tables and primary key columns cannot be updated. If an Update is attempted when not allowed, the driver issues the following error message:
[DataDirect][Cassandra ODBC Driver][Cassandra]syntax error or access rule violation: UPDATE not permitted for column: column_name

1 The driver supports an insert on a child table prior to an insert on a parent table, circumventing referential integrity constraints associated with traditional RDBMS. To maintain integrity between parent and child tables, it is recommended that an insert first be performed on the parent table for each foreign key value added to the child. If such an insert is not first performed, the driver automatically inserts a row into the parent table that contains only the primary key values and NULL values for all non-primary key columns.

2 The driver supports an insert on a child table prior to an insert on a parent table, circumventing referential integrity constraints associated with traditional RDBMS. To maintain integrity between parent and child tables, it is recommended that an insert first be performed on the parent table for each foreign key value added to the child. If such an insert is not first performed, the driver automatically inserts a row into the parent table that contains only the primary key values and NULL values for all non-primary key columns.

3 The driver supports an insert on a child table prior to an insert on a parent table, circumventing referential integrity constraints associated with traditional RDBMS. To maintain integrity between parent and child tables, it is recommended that an insert first be performed on the parent table for each foreign key value added to the child. If such an insert is not first performed, the driver automatically inserts a row into the parent table that contains only the primary key values and NULL values for all non-primary key columns.