This content has been marked as final. Show 2 replies
And if it never catches up? If it is always behind? Seems to me the issue is one of design. The ETL job needs to be written so it can handle that which exists and not worry about what does not yet exist.
The approach is still evolving and I'm open to suggestions. I'm thinking that the ETL job will wait for a while but eventually it will pull whatever is available and then send an email to the support group letting them know that CDC is behind and we may not have all the data we should. This ETL job has been in production for a year and it has been pleasantly reliable. It pulls whatever is available at 2 am and loads it to the warehouse.
The change we are working on is in response to a secondary project that is responsible for verifying that the data we have in the warehouse is accurate compared to the source. On those days when CDC is behind (by as much as 6 hours sometimes!) the comparison is wrong because the warehouse doesn't have the data it should.
The challenge for the ETL is figuring out if and how far behind CDC is.