We have existing database which replicated data from operational database. Current we do full loading by grabbing data from replication database and insert all data to staging tables. In the staging table we do data conversion and
we compare business key between staging tables and data warehouse. If we cannot find records in DW, we insert new records from staging to DW.
If we find matched Business key, then we do update (either Type 1 or Type 2).
If we find business key in DW but not in staging, then we delete the records in DW.
Due to we do full loading on data source and compare all data with DW, SSIS packages have bad performance.
I am going to use incremental load to replace current design.
I have two options to do incremental load ( we only need net change each day in SSIS loading)
Option 1: Change tracking
As Change tracking only care about the row has been changed with primary key and Version number, this is light weight solution and there is less chance to make schema change on primary key. Not sure where the change data store? Is it part of Backup? Any risk for data lost? what should be done if there is data loss? I cannot get enough info regarding Change tracking of SQL Server.
Option 2: CDC
Concern:
1. CDC on replicated database may have change data lost issue if replicated table need to reinitialize replication.
2. If CDC tables have schema change how to process replication and CDC together for the schema change? what should be done if there is data loss?
Many thanks
If we find matched Business key, then we do update (either Type 1 or Type 2).
If we find business key in DW but not in staging, then we delete the records in DW.
Due to we do full loading on data source and compare all data with DW, SSIS packages have bad performance.
I am going to use incremental load to replace current design.
I have two options to do incremental load ( we only need net change each day in SSIS loading)
Option 1: Change tracking
As Change tracking only care about the row has been changed with primary key and Version number, this is light weight solution and there is less chance to make schema change on primary key. Not sure where the change data store? Is it part of Backup? Any risk for data lost? what should be done if there is data loss? I cannot get enough info regarding Change tracking of SQL Server.
Option 2: CDC
Concern:
1. CDC on replicated database may have change data lost issue if replicated table need to reinitialize replication.
2. If CDC tables have schema change how to process replication and CDC together for the schema change? what should be done if there is data loss?
Many thanks
Sea Cloud