Consuming and managing CDC change data

The database populates CDC tables with Change data for ETL and other applications for the intended use. For most normal ETL and other CDC applications, Change data instances should be used only once. If multiple applications, have to use Change data instances, then each of the applications should query for and use the Change data once individually. In order to ensure that each application accesses change data only once, the application consuming the change data should delete the data after consuming it.

Use a SQL DELETE statement, for the Change Tracking table, with the same WHERE clause and WHERE clause constants that were used to Select the Change data for consuming.

The SELECT statement, for consuming change data, should return the change sequence value, from the _Change-sequence column in the Change Tracking table, to the application. The low and high change sequence values from a successful Select should then limit the data deleted.

This use of the high change sequence prevents deleting data that was in a concurrent, conflicting EXCL transaction, and hence was not read by the Select statement.

The DELETE statement to remove consumed change data should always use the change sequence values to define the range of Change table rows to delete. This form of range will automatically handle any “split records”.

select ct.”_Change-Sequence”,
c.* -- specify Change table fields here.
from pub."_Cdc-Change-Tracking" ct
inner join
pub.CDC_Customer c -- well-known Change table name.
on ct."_Change-Sequence" = c."_Change-Sequence"
where ct."_ Source-Table-Number" = ? -- from 1st query.
and ct.”_Time-Stamp” between ? and ? -- time range parameters from app
order by ct."_Policy-Id", ct."_Change-Sequence";

Once the ETL application reads the Result Set from this query, it saves the _Change-Sequence value for each row. This ensures that the high _Change-sequence will be known when the query is complete from the last row in the Result Set. The ETL application should then save the first change sequence value selected as the “low” _Change-sequence value.

delete from pub."_Cdc-Change-Tracking" ct
where ct."_ Source-Table-Number" = ? -- from 1st query.
and ct.”_Time-Stamp” between ? and ? -- time range parameters from app.
and ct. .”_Change-Sequence” < ?; -- saved high change sequence from app.

delete from pub.CDC_Customer c -- well-known Change table name.
where c. .”_Change-Sequence” between ? and ?;
-- low, high _Change-sequence
-- values from originating Select.

Step 3 uses a query which gives a Result Set of the set of change values for each source table row R from the preliminary query, in change sequence order.

select c.* -- specify change table fields here.
from pub."_Cdc-Change-Tracking" ct
inner join
pub.CDC_Customer c -- well-known Change table name.
on ct."_Change-Sequence" = c."_Change-Sequence"
where ct.”_Source-Table-Number" = ? -- from 2nd query.
and ct."_Partition-Id" = ? -- partition id from preliminary query.
and ct."_Recid" = ? -- recid from preliminary query.
and c."_Operation" != 3 -- optional - exclude Before Update images.
order by ct.”_Source-Table-Number", ct."_Partition-Id", ct."_Recid",
ct."_Change-Sequence" ;
-- Order By keys map to index _Part-Rec-Id and should skip sorting.

These changes could be Insert, Delete, or Update operations and the application logic can decide how to transfer each change to the Data Warehouse. Step 1 and Step 2 of the query can be embedded inside an application in nested loops as shown in the following table:

Query #1
save rowid, partitioned for row R
query #2
build query #3
do
query #3
exitloop if no data
accumulate type of changes, aggregate changes, to Source row
send changed Source row values to Data Warehouse.
enddo