You need to set partial update locks on exactly the input files that you reference in your partial_pipeline.epx diagram - at least one and all of the required ones (if you have several and some are optional)
Out of the box, the load_partial_test_data script sets flags for adds.txt.gz, updates.txt.gz, and deletes.txt.gz. Edit the load_partial_test_data script to see how it is setting this flag. Then swap out the files it references to the ones your partial pipeline references. For example, change to
After you do that, running the partial_update script should work just fine.
Thank you very much. It worked
I changed the file names in load_partial_test_data script to the ones my pipeline references and it worked.
But what do you mean by "You need to set partial update locks on exactly the input files that you reference in your partial_pipeline.epx diagram - at least one and all of the required ones (if you have several and some are optional)"? can you please explain in detail.
1 person found this helpful
Not much else to say, you did it!
But lets say in your partial update pipeline you had 4 sources, and 2 were set to data required, I just meant that you would need to make sure that you set the flags for those two, and optionally the remaining two.
This framework ensures that a partial update will only process data files that you explicitly set as being ready via these flags. I'll leave it to you to think through the implications especially if you have a complex partial update strategy/requirements.
Grt! Thanks again