We use data flows a lot, and have seen an improvement in them over the years.
There's an annoying issue to do with how text data types are handled, that can cause very slow performance on the data flow if you save the data flow Database Connection (red data set) rather than csv (green data set). This is because of text lengths.
So;
- Converting to Text using the UI (right click > Convert to Text) will convert using VARCHAR(4000) - regardless of the actual length in the column
- Using CONCAT or || to add columns or transform columns will often create a CLOB object
We have seen that runtime of a data flow can go from 2-3 hours to 2-3 minutes just by using CAST statements to convert the relevant columns.
The issue is that there's nothing within the UI of the data flow to tell you that this is the case (it tells you text only, not Varchar(xx) etc; all that can be done is to run the data flow (create a "Red" data set) and look at the detail in SQL developer.
Can something be done on this? I could think of the following solutions;
- Surface detailed data types on the save screen; Column1 is Text AND a CLOB
- Some sort of automatic conversion of text length that can be run and convert them automatically (something like Column A is a CLOB/VARCHAR(4000) but the maximum character length is 20 so we'll convert to VARCHAR(25) to be on the safe side so we don't crop too much.