Streaming ETL with Azure Data Factory and CDC – Setting up Audit Tables

In this series we look at building a Streaming ETL with Azure Data Factory and CDC – Setting up Audit Tables. This is Part 1, The rest of the series is below.

  1. Enabling CDC
  2. Setting up Audit Tables
  3. Provisioning Azure Data Factory
  4. Provisioning Azure Blog Storage
  5. Create Data Source Connection in ADF
  6. Create Incremental Pipeline in ADF
  7. Create a Parameter Driven Pipeline
  8. Create a Rolling Trigger

This series uses the Adventureworks database. For more information on how to get that set up see my Youtube video for Downloading and Restoring the database.

Setting up Audit Tables

  1. Audit tables will be required to track what has been loaded and sent to the streaming process of the ETL. The second step will seed this table with the first row of data.

/********Code************/

— Create Separate Offset Table to Manage Last Position/Row Sent to Azure

CREATE TABLE [dbo].[Audit_Streaming_ETL]

(

 [TableName] [varchar](50) NOT NULL,

 [MaxVal] [binary](10) NOT NULL,

 [LastUpdateDateTime] [datetime] NOT NULL DEFAULT getdate(),

 [LastCheckedDateTime] [datetime] NOT NULL DEFAULT getdate(),

 CONSTRAINT [PK_Audit_Streaming_ETL] PRIMARY KEY NONCLUSTERED

 (

 [TableName] ASC

 )

)

GO

INSERT INTO [dbo].[Audit_Streaming_ETL]

SELECT ”, 0x0000000000000000000, ‘1900-01-01 00:00:00’, ‘1900-01-01 00:00:00’

Streaming ETL with Azure Data Factory and CDC – Setting up Audit Tables


Posted

in

, , , ,

by