We are loading records into a target table via an external table and staging table where we are applying multiple validation rules to check the quality of the data.One of the rules needs to check that the staged record is not already loaded into the target table (duplicate check).
We want to process the entire staging table record set for duplicate entries in the target table without loading the data into the target table if any duplicates are identified in the staged data - it is all or nothing.
The target table has a primary surrogate key and a unique key - the unique key can be used to check for duplicates as it holds the business key for the data.
Will flow control give us the means to check this - the documentation suggests (to me!) that the data will be loaded excluding the bad records. I do not want this to happen. The max number of errors property also suggests that if I set this to 0 then no further processing takes place - the integration will fail immediatley and not validate the entire staged record set as soon as a single error is identified. Any other number than 0 will allow some records to be loaded until the threshold is breached.
It is important that we can report back to the source data provider all validation issues in one communications and not trickle feed them with issues. We can do this for all other validation rules but I'm not sure of whether ODI gives us an easy way to validate all staged records as being duplicates in the target, report on it and not load any data if one or more record is identified as a duplicate.
If flow control does not offer this then I am left with doing a lookup between the staging and target table (which could be problematic as the target table grows).
Any guidance will be appreciated!
Flow control will support your requirement if you set the max number of errors allowed to 1