3 Replies Latest reply on May 2, 2014 12:18 PM by Mike-Matthews-Oracle

    Suggestions and Best practices for Performance

    Ravi Shankar Danturti

      Hi Team,


      I've developed a complex project in EDQ with multiple processes to read from flat files, do the data cleansing(quite complex) and write to Oracle tables.

      As of now to cleanse 1 million records, it is taking  1 hr 42 min for the job to complete.

      It is expected to handle a load of 30 million records in near future and we cannot afford to spend so much time in just the data cleansing,


      Can you please suggest ways to improve the performance of such a heavy load? Is parallel processing done by  default in EDQ or any parameters need to be set?

      Thanks in advance for the inputs.




        • 1. Re: Suggestions and Best practices for Performance

          The first thing to do is to make sure you are minimizing writing - see the Performance Tuning Guide in the online help.


          The very fastest mode to run data from source to target is to run the job as a 'stream' with no data being written to snapshots, results or staged data. Processes should be linked together with data interfaces, or by disabling the writing of staged data if this links processes, when you configure the job. In most cases all processes can run as a single chain in a single phase, together with reading source data and exporting the output. It will be a little faster still if run with a run label as this definitively ensures no metrics are written unless they are configured to be.


          After that, it comes down to process design - and in general avoiding some of the more expensive processors if you can (such as set based processors like Group and Merge/Profilers).


          It is not normally necessary to tune the server itself, provided it has adequate memory etc. EDQ jobs will automatically use as many threads etc. as it can when they run.

          • 2. Re: Suggestions and Best practices for Performance
            Ravi Shankar Danturti

            Thanks Mike for your inputs.


            Streaming can be definitely considered. We've not used data interfaces in our project though.


            Where would I be able to find more information on run labels?


            The performace tuning guide mentions setting

            processengine.processThreads = [number of threads]

            in override.properties file. Should we explicitly do this for unix installations?





            • 3. Re: Suggestions and Best practices for Performance

              No, the doc. on that is a little misleading - runtime threads are automatically set for all operating systems so don't adjust it unless someone advises you to.