Streaming ETL with Azure Data Factory and CDC – Enabling CDC

In this series we look at building a Streaming ETL with Azure Data Factory and CDC – Enabling CDC. This is Part 1, The rest of the series is below.

  1. Enabling CDC
  2. Setting up Audit Tables
  3. Provisioning Azure Data Factory
  4. Provisioning Azure Blog Storage
  5. Create Data Source Connection in ADF
  6. Create Incremental Pipeline in ADF
  7. Create a Parameter Driven Pipeline
  8. Create a Rolling Trigger

This series uses the Adventureworks database. For more information on how to get that set up see my Youtube video for Downloading and Restoring the database.

Enabling CDC on SQL Server

  1. First, enable Change Data Capture at the Database and Table Level using the following scripts. More information is available on the Microsoft Site. If you are not using SQL Server or a tool that has organic CDC build it, a similar process can be hand built a read only view by leveraging timestamps on last created date and last updated date.

/********Code**********/

sys.sp_cdc_enable_db

EXECUTE sys.sp_cdc_enable_table

    @source_schema = N’dbo’

  , @source_name = N’DimProduct’

  , @role_name = N’cdc_Admin’;

GO

  • This step will enable CDC on the database as well as add it to the table “DimProduct”. SQL Agent must be running as two jobs are created during this process, one to load the table and one to clean it out. This is a new process that will consume resources to be sure to do your research before turning this on for all tables.
  • Once the CDC is enabled, SQL Server will automatically create the following tables “schema_tablename_CT” in the System folder section. This table will now automatically track all data changes that occur on that table. You can reference the _$operationcode to determine what change occurred with the legend below. If you wish to capture changes to only certain fields, see the Microsoft documentation on CDC to see how that can be set. If you are handwriting your SQL, this can also be programmed in when building your staging query.

1 = delete

2 = insert

3 = update (captured column values are those before the update operation). This value applies only when the row filter option ‘all update old’ is specified.

4 = update (captured column values are those after the update operation)

  • Now once you add, edit or delete a record, you should be able to find it in the new CDC table. Next, we will look at scanning this table and turning the data to JSON to send to an Event Hub! Run the code below to insert and update into the DimProduct table.

Insert

INSERT INTO [dbo].[DimProduct]

           ([ProductAlternateKey]

           ,[ProductSubcategoryKey]

           ,[WeightUnitMeasureCode]

           ,[SizeUnitMeasureCode]

           ,[EnglishProductName]

           ,[SpanishProductName]

           ,[FrenchProductName]

           ,[StandardCost]

           ,[FinishedGoodsFlag]

           ,[Color]

           ,[SafetyStockLevel]

           ,[ReorderPoint]

           ,[ListPrice]

           ,[Size]

           ,[SizeRange]

           ,[Weight]

           ,[DaysToManufacture]

           ,[ProductLine]

           ,[DealerPrice]

           ,[Class]

           ,[Style]

           ,[ModelName]

           ,[LargePhoto]

           ,[EnglishDescription]

           ,[FrenchDescription]

           ,[ChineseDescription]

           ,[ArabicDescription]

           ,[HebrewDescription]

           ,[ThaiDescription]

           ,[GermanDescription]

           ,[JapaneseDescription]

           ,[TurkishDescription]

           ,[StartDate]

           ,[EndDate]

           ,[Status])

     VALUES

           (‘VV-2903’

           ,NULL

           ,NULL

           ,NULL

           ,’Test Product’

           ,’Test Product’

           ,’Test Product’

           ,0

           ,0

           ,’Black’

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL

           ,NULL)

GO

Update

UPDATE [dbo].[DimProduct]

SET FinishedGoodsFlag = 1

WHERE [ProductAlternateKey] = ‘VV-2903’

  • Now you can query the table “cdc.dbo_DimProduct_CT” to see the changes that were recorded.

Streaming ETL with Azure Data Factory and CDC – Enabling CDC


Posted

in

, , , ,

by