Forum Stats

  • 3,728,529 Users
  • 2,245,647 Discussions
  • 7,853,567 Comments

Discussions

cursor vs bulk collect

francy77
francy77 Member Posts: 258 Bronze Badge

Hi all,

i'm asking which one is better:

select distinct(cli.k_id,cli.n_id) BULK COLLECT INTO in_user_id_list

from clienti cli 

  where cli.n_user_aggiorna = 'ATI'

  and cli.D_FINE_VAL is null


OR


CURSOR cur IS

  SELECT distinct(cli.k_id,cli.n_id)

  FROM

    wiam_clienti    cli,

 where cli.n_user_aggiorna = 'ATI'

  and cli.D_FINE_VAL is null




In the first case I need an array to copy the value retrieved while in the second not!!! In the first case I'm going to make elaboration entirely on SGA while in the second case I can use Disk I/O?? is that true?

Answers

  • BEDE
    BEDE Oracle Developer Member Posts: 2,222 Silver Trophy

    Not really true that assertion concerning disk I/O. From the cursor you may fetch row by row, thus having context switching between the SQL statement executor and the PL/SQL engine for each fetch, or fetch with bulk collect, which leads to one switching between SQL statement executor and the PL/SQL engine.

    And what do you wish to do with that select distinct? If the returns of that is to be used in another select to get other data, then it would be much better to try to get all the data in one only select, something like the below:

    with tb as (

     SELECT distinct cli.k_id,cli.n_id

      FROM

        wiam_clienti    cli,

     where cli.n_user_aggiorna = 'ATI'

      and cli.D_FINE_VAL is null

    )

    select tb.*, ...

    from tb

    join ...

    where...

    ;

  • User_H3J7U
    User_H3J7U Member Posts: 86 Blue Ribbon

    FOR IN CURSOR uses array fetch by 100 rows (plsql_optimize_level>=2) and its performance is comparable to BULK COLLECT (LIMIT 100).

    francy77
  • Billy Verreynne
    Billy Verreynne Member Posts: 28,182 Red Diamond

    ALL SQLs ARE EXECUTED AS CURSORS. Every single time.

    So the 1st code snippet of a PL/SQL bulk collect creates an implicit cursor, executes it, and uses a bulk fetch.

    The 2nd code snippet is an explicit PL/SQL definition of a cursor only. Additional code is needed to execute the cursor and fetch data from it. This fetch can also be a bulk fetch.

    An identical SQL cursor will be created in the Shared Pool, and used.

    Which approach is better DOES NOT depend on the bulk fetch or misguided false opinions of SGA or disk I/O.

    It is about code readability, maintainability, modularisation, reuse-ability - which approach serve the design and technical requirements the best.

    NextNameBluShadowfrancy77
  • francy77
    francy77 Member Posts: 258 Bronze Badge

    Thanks, no i do not have to retrieve other data; I have to update another table (one by one) (I mean every update should make a commit) I have to cycle over one milion of row; So the context switching can be very huge, and the bulk collect is better in terms of velocity??? or not

  • francy77
    francy77 Member Posts: 258 Bronze Badge

    I don 't no the value of plsql_optimize_level; I know i have to process over one milion of record!!! so I need to know if some way is very faster than the other.

  • francy77
    francy77 Member Posts: 258 Bronze Badge

    Indeed the procedure should run one and only one time (aspect of readability, maintainability, modularisation, reuse-ability) are not important at the moment; I only care about time spend this procedure has to process over one milion of records and he time spend is very important because other activity have to start when this one has finished. If you say that one way or the other way in terms of time spend are almost equals, I prefer the one with no bulk collect; thanks

  • BEDE
    BEDE Oracle Developer Member Posts: 2,222 Silver Trophy

    I do hope you will not use something like the below:

    for r in (select ...) loop

    update x_table t set ...

    where t.x_column=r.x_column;

    commit;

    end loop;

    Rather then doing so, use the facilities of dbms_parallel_execute, that is if a merge won't do.

    Usually, for this kind of problem - that is when I have to update one table using the results of a select - I use merge. RTM on merge if you haven't used merge until now.

    francy77
  • BEDE
    BEDE Oracle Developer Member Posts: 2,222 Silver Trophy

    I do hope you will not use something like the below:

    for r in (select ...) loop

    update x_table t set ...

    where t.x_column=r.x_column;

    commit;

    end loop;

    Rather then doing so, use the facilities of dbms_parallel_execute, that is if a merge won't do.

    Usually, for this kind of problem - that is when I have to update one table using the results of a select - I use merge. RTM on merge if you haven't used merge until now.

  • BEDE
    BEDE Oracle Developer Member Posts: 2,222 Silver Trophy

    I do hope you will not use something like the below:

    for r in (select ...) loop

    update x_table t set ...

    where t.x_column=r.x_column;

    commit;

    end loop;

    Rather then doing so, use the facilities of dbms_parallel_execute, that is if a merge won't do.

    Usually, for this kind of problem - that is when I have to update one table using the results of a select - I use merge. RTM on merge if you haven't used merge until now.

  • cormaco
    cormaco Member Posts: 1,535 Gold Trophy

    I second BEDE suggestion of using MERGE and have a link where you can see an example of using merge that shows that it is at least 10 times faster than any other method:

    The output shows the straight MERGE statement is an order of magnitude faster than its nearest rival. The update/insert performs almost twice the speed of the insert/update and even out performs the row-by-row MERGE.

    Just comparing the update/insert and the insert/update methods in isolation, we have to remember the comparisons will vary depending on the data in the table. If most of the data will not be present already, the insert/update approach may be better. If most of the data is already present, the update/insert approach will probably be better. If you are not sure, just use merge as it is clearer.

    In addition to the straight MERGE statement being faster, because it is a DML statement it can easily be run in parallel to improve performance further, provided your server can handle the extra load.

    francy77
  • User_H3J7U
    User_H3J7U Member Posts: 86 Blue Ribbon

    The output shows the straight MERGE statement is an order of magnitude faster than its nearest rival.

    Single merge statement should first be compared to single update (join) and single insert (left join/not exists). This will gives two join on two statements instead of one join for merge. The slowdown is not "magnitude faster".

    Another approach especially for collection is to use forall (one merge or two update+insert).

    francy77
  • BluShadow
    BluShadow Member, Moderator Posts: 40,930 Red Diamond


    Although collecting data in to a collection in expensive PGA memory in order to just carry out an insert/update on the database is generally not the best idea; wasting memory resources and typically less performant than just using SQL directly. Whenever one considers doing inserts/updates on a table based on relational data from another table, MERGE sql statements would typically be the first and preferred way, and most definitely if it involves both inserts and updates depending on record existence. Collecting data in to memory first should really be a last option if there's really no easy way to achieve something using pure SQL. It's very rare for me to have any use of collections in the years of database/data processing code that I've written as typically processing is so easily done with the power of SQL.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,587 Gold Crown

    @francy77

    I think you're seeing the result of asking about a detail of a strategy you've picked rather than explaining what you want to achieve as an introduction to your current thoughts. It's possible that the bigger picture might mean you don't get suggestions pointing you in the wrong direction, or get a suggestion for a much better overall strategy.

    You've said something about updating another tables, a million rows, and commits after each row. Is the result from this "select distinct" the thing that gives you 1M rows, and are you going to be identifying one row in another table for each row in this select? Or do you have something else in mind.

    Why do you have to commit after every row?

    Do you know about the LIMIT construct for BULK COLLECT that allows you to set an array fetch size?

    Do you know about the FORALL construct that could allow you to do an array update with the array you fetched with the BULK COLLECT (with LIMIT).

    Have you considered the option for a merge (or an updateable join view, or a correlated update). How big is the table that you are updating and how many rows will be updated?

    Regards

    Jonathan Lewis

    francy77
  • BluShadow
    BluShadow Member, Moderator Posts: 40,930 Red Diamond

    I missed the bit about committing on every update (i.e. potentially every row) as Mr Lewis has pointed out.

    Committing every row is a very bad thing to do (and typically has no business sense to it).

    Not only is it harder, from a logical sense, to try and recover from issues if anything goes wrong, because you won't have automatic rollback of the overall business process, but from a more technical perspective this can have a serious performance impact on the Oracle Server.

    If you consider the way data is written to the disks of the server, oracle does this using DBWR (DataBase WRiter) processes, and tries to optimise the work to ensure things like, recently used data is in cached memory for faster access, and older data is written out (as it's less likely to be needed any time soon), as well as blocking data together for larger writes in one go to improve overall read/write performance on the physical hardware.

    By default the database starts up with a number of DBRW processes (e.g. 2 processes) which is often based on the hardware spec (# of CPU's etc.). The Oracle RDBMS can then dish out the writes it wants to these processes to distribute the workload.

    However, a lot of this is done when commits are issued, so if you start issuing lots of frequent commits in your code, the database looks at this "workload" and can automatically determine that it needs more DBRW processes, so it spawns new ones on the server to handle the additional requests. These processes take additional server memory, and can potentially end up queuing to write their data to the disks because they're waiting on the other processess to finish what they're doing (it's a bit of a "Oracle's done what it can to distrubute the workload, but if the hardware can't handle it then it's out of oracle's control"). What's worse is that these additional processes, once spawned, don't automatically go away again when things calm down, they stay there taking up server memory, typically doing nothing, unless another piece of code with "bulk row by row commits" comes along. The only time they disappear is if the database is shut down and restarted, when the default number will be started up again instead.

    So, committing every row... definitely not a good thing, without very very good reason.

    Also:

    https://docs.oracle.com/cd/E11882_01/server.112/e40402/bgprocesses.htm#REFRN104

    cormacofrancy77
  • User_H3J7U
    User_H3J7U Member Posts: 86 Blue Ribbon

    However, a lot of this is done when commits are issued, so if you start issuing lots of frequent commits in your code, the database looks at this "workload" and can automatically determine that it needs more DBRW processes, so it spawns new ones on the server to handle the additional requests.

    LGWR writes the redo log buffer on commit [immediate]. DBWR do not affected by commit.

  • BluShadow
    BluShadow Member, Moderator Posts: 40,930 Red Diamond

    Are you sure? So when I start committing lots of individual transactions, you're saying that the database doesn't do anything at all to try and ensure that data gets written to the disk i.e. there's no increase in disk writing activity requirements? That's not what I was taught at Oracle University when I went on my DBA courses.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,587 Gold Crown

    @BluShadow

    @User_H3J7U is correct, a commit is of no interest to the DBWR. It's only the redo log that has to be written to disc in a timely fashion; Oracle tries to avoice writing data block for as long as it can and, roughly speaking, there are various time-outs and load conditions that trigger DBWR to write "non-transactional" batches of blocks.

    Regards

    Jonathan Lewis

  • BluShadow
    BluShadow Member, Moderator Posts: 40,930 Red Diamond

    I stand corrected, I must have been mixing up DBWR woth Log writer processess.

    Regardless, it's still a matter of multiple/frequent commits cause more Log writer processes to be spawned and they remain until the database is restarted, so it's still a problem for server resources and conflicting process writes to disk.

    😉

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,587 Gold Crown

    @BluShadow

    Recent versions of Oracle have the ability to dynamically switch between single writer (lgwr) and multiple slave processes (lgNN), but the number of slaves is dictated at startup and Oracle will tend to switch to multiple slaves temporary only while the level of concurrent commits (log file sync messages) is high; a single process doing lots of single row commits is unlikely to cause a switch to slaves.

    It is true, though, that single row commits will result in more undo and more redo being generated, as well as calls to get and release various latches and mutexes; so apart from the extra volume of I/O there's a significant increase in CPU, with associated concurrency threats to other sessions.

    Regards

    Jonathan Lewis

    francy77
  • francy77
    francy77 Member Posts: 258 Bronze Badge

    Thanks for your explanation, indeed we agree to commit every row in order to allow a client (a people how use this system to do his work, whitout wating the all process to finished); we have tryed with 250K and it takes about 15 minute to complete); I hopo DBA will restart the database because I don't know if there will be DBWR process created for that;

Sign In or Register to comment.