I forgot to add one more thing. In my prod the record volume is like this
Stage1 -120 millioin
stage2- 50 million
Final_stage- 128 million ( without duplicates)
131 million(with duplicates)
I think this is probably not the right forum for this question. It does not appear to be related to the Enterprise Data Quality product.
You could of course resolve the problem using EDQ if required - for example by joining the data sets using Merge Data Streams and removing duplicates using Group and Merge.
To attempt to help in any case, are you sure you don't have duplicates within each data set? UNION would not remove these. And are they exact duplicates? Have you tested using SELECT DISTINCT?