Categories
Self Replacing Data in Data Flow

Hi ,
I am looking to implement a manual tracker for one of ongoing project.
There is a time-series information that I would be manually upload to the system on a frequency basis
Basically Ive created a dataflow that look like below
The idea is that I run the query on the frequency basis to update the table. However Ive found it wont execute if I try to replace "Dataset A" as a save file and while using it as one of the source.
Unfortunately I wont be able to setup proper DB for this tracker to be use as a connection.
Seeking advise.
Thank you,
Regards
Best Answer
-
Hello! Data flows will not allow you to read from and overwrite the same “Dataset A” within one data flow. Here's a pattern you can use instead that involves two data flows placed in a single sequence that can be run on a schedule -
Data Flow 1
- Read Dataset A
- Write it out immediately as Dataset A_backup (in case you need to keep a copy)
- Union your “new portion of A” uploads with Dataset A
- Write the combined result to Dataset A_temp
Data Flow 2
- Read Dataset A_temp
- Overwrite Dataset A with this refreshed dataset
Place these two data flows in 1 sequence.
Let me know if this works for you.
2
Answers
-
Hi Prichard,
Thank you for the input, while the individual dataflow did work, when I placed them into a sequence it seem it would not execute properly.
It seem to failed on the second Data Flow step.
Below is the failed log for Sequence : Work A
************************************************************
* 2025-05-21 11:52:33 'XXXX'.'Work A'
************************************************************
74 INFO 2025-05-21 11:52:33.375 SGT Starting to process request, run id: 711F5E10946E19AC71E1282BBAEB2E after being in queue for 21 second(s), 366 millisecond(s).
230 INFO 2025-05-21 11:52:39.792 SGT Will run up to 5 task(s) in parallel
234 INFO 2025-05-21 11:52:39.826 SGT Creating necessary connections.
238 INFO 2025-05-21 11:52:39.835 SGT Sources created.
239 INFO 2025-05-21 11:52:39.841 SGT 143.156 (GB) of 2.50 (TB) used (5.592 %).
248 INFO 2025-05-21 11:52:39.868 SGT Loading tasks for a new run.
251 FINER 2025-05-21 11:52:40.03 SGT Loading of tasks completed
256 INFO 2025-05-21 11:52:40.128 SGT A total of 3 task(s) need to be executed for this run.
259 INFO 2025-05-21 11:52:40.144 SGT Executing tasks.
306 INFO 2025-05-21 11:52:45.626 SGT Task "DFLW : Job 1" completed
343 SEVERE 2025-05-21 11:52:51.027 SGT Execution of "dataflow : 'XXXX'.'Job 2' ('XXXX'.'Work A')" on BIJDBC failed. OBIS job id 40000000-5383-1a0b-8029-ac1f4c100000
[nQSError: 46236] Dataset Service error during Bulk Finalize - {"prefix":"DSS","code":50000,"message":"DSS understood the request, but was unable to perform the operation due to an internal error","subMessage":""}
[nQSError: 43224] The Dataflow "Job 2" failed during the execution.
[nQSError: 43204] Asynchronous Job Manager failed to execute the asynchronous job.
346 SEVERE 2025-05-21 11:52:51.061 SGT An error occurred, which could not be interpreted.
352 SEVERE 2025-05-21 11:52:51.262 SGT An error occurred, which could not be interpreted.
358 SEVERE 2025-05-21 11:52:51.362 SGT An error occurred, which could not be interpreted.
362 SEVERE 2025-05-21 11:52:51.369 SGT An error occurred, which could not be interpreted.
363 SEVERE 2025-05-21 11:52:51.377 SGT Task "DFLW : Job 2 - Work A" failed. Check the tasks log for more information
386 INFO 2025-05-21 11:52:51.563 SGT Finished.
390 INFO 2025-05-21 11:52:51.623 SGT
************************************************************
* Summary
************************************************************
392 INFO 2025-05-21 11:52:51.629 SGT Job Run Summary
Total Number Of Tasks: 3
Total Number Of Successful Rows across all tasks: 0
Total Data Size across all tasks: .00 (BYTE)
394 INFO 2025-05-21 11:52:51.635 SGT Job finishing up....
398 ALL 2025-05-21 11:52:51.75 SGT The process took 18 Second(s) 216 Milli-second(s)
405 INFO 2025-05-21 11:52:51.769 SGT Closing connections...
410 ALL 2025-05-21 11:52:51.781 SGT The process failed.Below is the failed log for Dataflow : Job 2
**************************************************************************
* 2025-05-21 11:52:45 DFLW : Job 2 - Work A
**************************************************************************
339 INFO 2025-05-21 11:52:45.974 SGT "dataflow : 'XXXX'.'Job 2' ('XXXX'.'Work A')" on BIJDBC submitted with job id 40000000-5383-1a0b-8029-ac1f4c100000
341 INFO 2025-05-21 11:52:51.01 SGT Finished execution with status Failed
342 SEVERE 2025-05-21 11:52:51.014 SGT Execution failed.
[nQSError: 46236] Dataset Service error during Bulk Finalize - {"prefix":"DSS","code":50000,"message":"DSS understood the request, but was unable to perform the operation due to an internal error","subMessage":""}
[nQSError: 43224] The Dataflow "Job 2" failed during the execution.
[nQSError: 43204] Asynchronous Job Manager failed to execute the asynchronous job.
365 INFO 2025-05-21 11:52:51.386 SGT Execution time :18 second(s), 377 millisecond(s)
367 INFO 2025-05-21 11:52:51.39 SGT
***********************************************************************************
* Historical load summary for dataflow : 'XXXX'.'Job 2'
***********************************************************************************
Number Of Successful Loads: 1
Number Of Failed Loads: 0
Load Duration: 14 second(s), 0 millisecond(s)
Load Date: 2025-05-21 10:55:15.0
Average Load Duration: 14 second(s), 0 millisecond(s)
Minimum Load Duration: 14 second(s), 0 millisecond(s)
Maximum Load Duration: 14 second(s), 0 millisecond(s)Any way I can set this as a sequence?
0 -
You should be allowed to use a data flow output as another data flow input in a sequence. Do the task-level logs provide any more information?
0