The first thing to do is to make sure you are minimizing writing - see the Performance Tuning Guide in the online help.
The very fastest mode to run data from source to target is to run the job as a 'stream' with no data being written to snapshots, results or staged data. Processes should be linked together with data interfaces, or by disabling the writing of staged data if this links processes, when you configure the job. In most cases all processes can run as a single chain in a single phase, together with reading source data and exporting the output. It will be a little faster still if run with a run label as this definitively ensures no metrics are written unless they are configured to be.
After that, it comes down to process design - and in general avoiding some of the more expensive processors if you can (such as set based processors like Group and Merge/Profilers).
It is not normally necessary to tune the server itself, provided it has adequate memory etc. EDQ jobs will automatically use as many threads etc. as it can when they run.
Thanks Mike for your inputs.
Streaming can be definitely considered. We've not used data interfaces in our project though.
Where would I be able to find more information on run labels?
The performace tuning guide mentions setting
processengine.processThreads = [number of threads]
in override.properties file. Should we explicitly do this for unix installations?
No, the doc. on that is a little misleading - runtime threads are automatically set for all operating systems so don't adjust it unless someone advises you to.