This discussion is archived
10 Replies Latest reply: Apr 3, 2013 1:37 PM by mattknowles RSS

How to claim quantities in a join?

mattknowles Newbie
Currently Being Moderated
I'm not even sure how to search for this, but hopefully someone recognizes the problem and can point me in the right direction.

I have two tables that both have an ID and an AMOUNT. Table1 has (1, 15) and (2, 15). Table2 has (A, 10) and (B, 20). I'm looking for a select statement that will produce the following output:
1, A, 10
1, B, 5
2, B, 15
In essence, I'm trying to assign all 10 of A to 1, and then another 5 of B to 1, to account for all 15 of 1. Then I'm assigning the remaining 15 from B to 2 to account for all 15 of 2.

Thanks!
  • 1. Re: How to claim quantities in a join?
    onedbguru Pro
    Currently Being Moderated
    what have you tried? This is not a "do my homework/job for me" forum. Some may show their SQL prowess by giving you the answer. Until you start testing for yourself, you will never learn.
  • 2. Re: How to claim quantities in a join?
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    mattknowles wrote:
    I'm not even sure how to search for this, but hopefully someone recognizes the problem and can point me in the right direction.
    I'm not sure how to search for it, either. There ought to be some nice, pity name for this kind of thing.
    I have two tables that both have an ID and an AMOUNT. Table1 has (1, 15) and (2, 15). Table2 has (A, 10) and (B, 20).
    Whenever you have a problem, please post CREATE TABLE and INSERT statements for your sample data. For example:
    CREATE TABLE table_1
    (   id_1          NUMBER     PRIMARY KEY     -- ID is not a very good column name
    ,   amount          NUMBER
    );
    
    INSERT INTO table_1 (id_1, amount) VALUES (1, 15);
    INSERT INTO table_1 (id_1, amount) VALUES (2, 15);
    
    
    CREATE TABLE table_2
    (   id_2          VARCHAR2 (10)     PRIMARY KEY     -- ID is not a very good column name
    ,   amount          NUMBER
    );
    
    INSERT INTO table_2 (id_2, amount) VALUES ('A', 10);
    INSERT INTO table_2 (id_2, amount) VALUES ('B', 20);
    This will allow the people who want to help you to re-create the problem and test their ideas.
    I'm looking for a select statement that will produce the following output:
    1, A, 10
    1, B, 5
    2, B, 15
    In essence, I'm trying to assign all 10 of A to 1, and then another 5 of B to 1, to account for all 15 of 1. Then I'm assigning the remaining 15 from B to 2 to account for all 15 of 2.
    So it's as if one of the tables represents amounts taken it from different sources in some order, and the other table represents amounts given out in order, and you want to show which sources contributed how much to which destinations. Accountants must have a name for this kind of thing.

    Here's one way to do it:
    WITH     t1     AS
    (
         SELECT  id_1, amount
         ,     SUM (amount) OVER (ORDER BY  id_1) - amount     AS range_start
         ,     SUM (amount) OVER (ORDER BY  id_1)             AS range_end
         FROM     table_1
    --     WHERE     ...     -- If you need any filtering, put it here
    )
    ,     t2     AS
    (
         SELECT  id_2, amount
         ,     SUM (amount) OVER (ORDER BY  id_2) - amount     AS range_start
         ,     SUM (amount) OVER (ORDER BY  id_2)             AS range_end
         FROM     table_2
    --     WHERE     ...     -- If you need any filtering, put it here
    )
    SELECT    t1.id_1
    ,       t2.id_2
    ,       LEAST    (t1.range_end,   t2.range_end)
         - GREATEST (t1.range_start, t2.range_start)     AS dispersal_amount
    FROM       t1
    JOIN       t2  ON    t1.range_start  <= t2.range_end
               AND   t1.range_end    >= t2.range_start
    ORDER BY  t1.range_start
    ,            t2.range_start
    ;
    Output:
    `     ID_1 ID_2       DISPERSAL_AMOUNT
    ---------- ---------- ----------------
             1 A                        10
             1 B                         5
             2 B                        15
    In this example, the total amount in each table is 30. Is it possible for the total amount in one table to be higher than the total amount in the other table? Is so, what results would you want?
  • 3. Re: How to claim quantities in a join?
    mattknowles Newbie
    Currently Being Moderated
    Frank Kulash wrote:Whenever you have a problem, please post CREATE TABLE and INSERT statements for your sample data... This will allow the people who want to help you to re-create the problem and test their ideas.
    I will make sure I do that in the future. I very much want to make it easier for the people who are inclined to offer help.

    Thank you very much for sharing your expertise on this subject. My co-worker and I were trying to brainstorm the problem, and we were both trying to see how to do it using RANGE BETWEEN. It looks like the OVER (ORDER BY id_1) does an implicit RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. That surprised me for some reason, but reading up on Analytic Functions, I see that "If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", so I guess I shouldn't have been surprised.

    I also puzzled over the WHERE clause for a few minutes, trying to fully understand what was going on, and it made a lot more sense to me when I saw it as:
    FROM       t1
    JOIN       t2  ON    t1.range_start <= t2.range_end
               AND   t2.range_start <= t1.range_end
    Again, thank you for sharing your experience with the community. Your efforts are sincerely appreciated.
  • 4. Re: How to claim quantities in a join?
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    mattknowles wrote:
    ... I also puzzled over the WHERE clause for a few minutes, trying to fully understand what was going on,
    Sorry, I dodn't have a lot of time then to give a detailed explanation.
    Now I'm puzzled. I didn't use a WHERE clause. Were you confused precisely because there was no WHERE clause? Do you mean the 2 commented-out WHERE clauses?
    and it made a lot more sense to me when I saw it as:
    FROM       t1
    JOIN       t2  ON    t1.range_start <= t2.range_end
               AND   t2.range_start <= t1.range_end
    I found this un-intuitive at first.
    Look at it this way: Two ranges DO NOT overlap if (and only if) one of them ends before the other one begins. Therefore, two ranges DO overlap if (and only if) each of them begins before the other one ends.
  • 5. Re: How to claim quantities in a join?
    mattknowles Newbie
    Currently Being Moderated
    Frank Kulash wrote:
    Hi,
    mattknowles wrote:
    ... I also puzzled over the WHERE clause for a few minutes, trying to fully understand what was going on,
    Sorry, I dodn't have a lot of time then to give a detailed explanation.
    Now I'm puzzled. I didn't use a WHERE clause. Were you confused precisely because there was no WHERE clause? Do you mean the 2 commented-out WHERE clauses?
    No, I was just speaking inaccurately. I puzzled over the ON clause in the JOIN :)

    Like I said, it made more sense when I saw it as rows where each side's RANGE_START is prior to the other side's RANGE_END. As usual, your explanation was very helpful, and definitely got me pointed in the right direction. I did a little follow-up research to make sure I thoroughly understood what was going on, and am very happy with what I have now. I'm not the type to just grab a code sample off the 'net and use it without making sure I understand why it works.

    Thanks again :)
  • 6. Re: How to claim quantities in a join?
    mattknowles Newbie
    Currently Being Moderated
    Frank Kulash wrote:
    In this example, the total amount in each table is 30. Is it possible for the total amount in one table to be higher than the total amount in the other table? Is so, what results would you want?
    Here's the code I'm using for my tests:
    drop table table_1;
    create table table_1 (
         table_1_id number(14),
         amount number(14, 2)
    );
    drop table table_2;
    create table table_2 (
         table_2_id number(14),
         amount number(14, 2)
    );
    
    insert into table_1 values (1, 100);
    insert into table_1 values (2, 100);
    insert into table_1 values (3, 100);
    commit;
    
    insert into table_2 values (1, 40);
    insert into table_2 values (2, 70);
    insert into table_2 values (3, 150);
    -- Change the following insert to 40 to test the green path.
    insert into table_2 values (4, 60);
    commit;
    
    with t1 as (
              select table_1_id,
                        amount,
                        sum(amount) over (order by table_1_id) - amount as range_start,
                        sum(amount) over (order by table_1_id) as range_end,
                        sum(amount) over () as total
                   from table_1
         ),
         t2 as (
              select table_2_id,
                        amount,
                        sum(amount) over (order by table_2_id) - amount as range_start,
                        sum(amount) over (order by table_2_id) as range_end,
                        sum(amount) over () as total
                   from table_2
         )
    select t1.table_1_id,
              t1.amount,
              t1.range_start,
              t1.range_end,
              t1.total,
              t2.table_2_id,
              t2.amount,
              t2.range_start,
              t2.range_end,
              t2.total,
              least(t1.range_end, t2.range_end)
                   - greatest(t1.range_start, t2.range_start)
                   as dispersal_amt
         from t1
              inner join t2
                   on t1.range_start <= t2.range_end
                   and t2.range_start <= t1.range_end
         where t1.total = t2.total
         order by t1.table_1_id,
              t2.table_2_id
    ;
    When I first implemented this in my procedure, I was doing an explicit select from both tables to compare the amounts and raising an exception if they were out of balance, and then I would use this select statement to populate the dispersal table. It occurred to me that it might be better to try to do this all in one statement, so I added a TOTAL for each table, so that I could reference that in the where clause. Now, I can simply add a statement after my main insert into the dispersal table to check if (sql%rowcount = 0) and raise an exception at that point. It is also an error condition if neither table has any rows.

    If you're still paying attention, I'd be curious to know your thoughts on this.
  • 7. Re: How to claim quantities in a join?
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    mattknowles wrote:
    ... When I first implemented this in my procedure, I was doing an explicit select from both tables to compare the amounts and raising an exception if they were out of balance, and then I would use this select statement to populate the dispersal table. It occurred to me that it might be better to try to do this all in one statement, so I added a TOTAL for each table, so that I could reference that in the where clause. ...
    I agree; you don't need a separate query to get the grand totals; it's easy enough, and more efficient, to calculate the grand totals in this query.

    There's a flaw in the query I posted earlier. If, by chance, in the middle of the results, t1.range_end = t2.range_end, then you will get output rows with dispersal_amt=0. That is, if you add these to the "green path" sample data:
    insert into table_1 values (4, 5);
    insert into table_2 values (5, 5);
    then 3 rows will be added to the result set, 2 rows with dispersal_amt=0, and 1 row with dispersal_amt=5. I assume you only wnat the 1 row, with dispersal_amt=5. To exclude the 2 unwanted rows, add another condition to the main WHERE clause, like this:
    with t1 as (
              select table_1_id,
                        amount,
                        sum(amount) over (order by table_1_id) - amount as range_start,
                        sum(amount) over (order by table_1_id) as range_end,
                        sum(amount) over () as total
                   from table_1
         ),
         t2 as (
              select table_2_id,
                        amount,
                        sum(amount) over (order by table_2_id) - amount as range_start,
                        sum(amount) over (order by table_2_id) as range_end,
                        sum(amount) over () as total
                   from table_2
         )
    select t1.table_1_id,
              t1.amount,
              t1.range_start,
              t1.range_end,
              t1.total,
              t2.table_2_id,
              t2.amount,
              t2.range_start,
              t2.range_end,
              t2.total,
              least(t1.range_end, t2.range_end)
                   - greatest(t1.range_start, t2.range_start)
                   as dispersal_amt
         from t1
              inner join t2
                   on t1.range_start <= t2.range_end
                   and t2.range_start <= t1.range_end
         where t1.total = t2.total
         AND       least(t1.range_end,   t2.range_end)     -- *****  NEW  *****
             != greatest(t1.range_start, t2.range_start)     -- *****  NEW  *****
         order by t1.table_1_id,
              t2.table_2_id
    ;
  • 8. Re: How to claim quantities in a join?
    mattknowles Newbie
    Currently Being Moderated
    Frank Kulash wrote:
    There's a flaw in the query I posted earlier. If, by chance, in the middle of the results, t1.range_end = t2.range_end, then you will get output rows with dispersal_amt=0.
    Thanks for pointing this out. I'd like to think we'd have caught that in testing :)

    It brings up an interesting question for me, though, and this is something I've wondered about quite a bit...

    I probably would have solved the problem by doing this:
    with t1 as (
              select table_1_id,
                        sum(amount) over (order by table_1_id) - amount as range_start,
                        sum(amount) over (order by table_1_id) as range_end,
                        sum(amount) over () as total
                   from table_1
         ),
         t2 as (
              select table_2_id,
                        sum(amount) over (order by table_2_id) - amount as range_start,
                        sum(amount) over (order by table_2_id) as range_end,
                        sum(amount) over () as total
                   from table_2
         ),
         dispersals as (
              select t1.table_1_id,
                        t2.table_2_id,
                        least(t1.range_end, t2.range_end)
                             - greatest(t1.range_start, t2.range_start)
                             as dispersal_amt
                   from t1
                        inner join t2
                             on t1.range_start <= t2.range_end
                             and t2.range_start <= t1.range_end
                   where t1.total = t2.total
         )
    select d.table_1_id,
              d.table_2_id,
              d.dispersal_amt
         from dispersals d
         where d.dispersal_amt <> 0
         order by d.table_1_id,
              d.table_2_id
    ;
    My intuitive sense is that the LEAST and GREATEST functions in the WHERE clause would be recalculated, but it also occurs to me that Oracle might be smart enough to realize they're the exact same function calls as those in the SELECT list.

    Do you have a strong preference? Is there a section of the documentation you would recommend that I could read up on to learn more about this in general?
  • 9. Re: How to claim quantities in a join?
    Frank Kulash Guru
    Currently Being Moderated
    Hi,

    The fix I posted in my last message is unnecessarily complicated.
    We can exclude the rows with dispersal_amt=0 by simply changing <= to < in the join condtions.
    That is, the main query can end like this:
    ...          inner join t2
                   on t1.range_start  < t2.range_end     -- ***** NOT <=  *****
                   and t2.range_start < t1.range_end     -- ***** NOT <=  *****
         where t1.total = t2.total
                    --  ***** Nothing more is needed here  *****
         order by t1.table_1_id,
              t2.table_2_id
    ;
    But the point you raised is very interesting.
    mattknowles wrote:
    It brings up an interesting question for me, though, and this is something I've wondered about quite a bit...

    I probably would have solved the problem by doing this:
    with t1 as (
              select table_1_id,
                        sum(amount) over (order by table_1_id) - amount as range_start,
                        sum(amount) over (order by table_1_id) as range_end,
                        sum(amount) over () as total
                   from table_1
         ),
         t2 as (
              select table_2_id,
                        sum(amount) over (order by table_2_id) - amount as range_start,
                        sum(amount) over (order by table_2_id) as range_end,
                        sum(amount) over () as total
                   from table_2
         ),
         dispersals as (
              select t1.table_1_id,
                        t2.table_2_id,
                        least(t1.range_end, t2.range_end)
                             - greatest(t1.range_start, t2.range_start)
                             as dispersal_amt
                   from t1
                        inner join t2
                             on t1.range_start <= t2.range_end
                             and t2.range_start <= t1.range_end
                   where t1.total = t2.total
         )
    select d.table_1_id,
              d.table_2_id,
              d.dispersal_amt
         from dispersals d
         where d.dispersal_amt <> 0
         order by d.table_1_id,
              d.table_2_id
    ;
    I thought about doing it that way, also.
    My intuitive sense is that the LEAST and GREATEST functions in the WHERE clause would be recalculated, but it also occurs to me that Oracle might be smart enough to realize they're the exact same function calls as those in the SELECT list.
    I've seen clear evidence that Oracle is smart like that in other areas. Whether it is in this particular case or not, I wasn't sure. To try to find out, I wrote my own version of the LEAST function:
    CREATE OR REPLACE FUNCTION my_least
    (  in_num_1  IN  NUMBER
    ,  in_num_2  IN  NUMBER
    )
    RETURN  NUMBER
    DETERMINISTIC
    IS
    BEGIN
        dbms_output.put_line (  in_num_1
                        || ' = in_num_1, '
                    || in_num_2
                    || ' = in_num_2 entering my_least'
                    );
        RETURN  LEAST (in_num_1, in_num_2);
    END  my_least;
    /
    SHOW ERRORS
    
    SET  SERVEROUTPUT  ON
    I substituted MY_LEAST for LEAST in the query (in both places, of course) and saw that it is only calling my_least once for each pari of arguments. The output was:
    305 = in_num_1, 40 = in_num_2 entering my_least
    300 = in_num_1, 40 = in_num_2 entering my_least
    200 = in_num_1, 40 = in_num_2 entering my_least
    100 = in_num_1, 40 = in_num_2 entering my_least
    100 = in_num_1, 40 = in_num_2 entering my_least
    305 = in_num_1, 110 = in_num_2 entering my_least
    300 = in_num_1, 110 = in_num_2 entering my_least
    200 = in_num_1, 110 = in_num_2 entering my_least
    ...
    Of course, this isn't conclusive proof that it will behave the same way with the built-in LEAST function. (Though, if it did behave differently, I would suppose Oracle would be even smarter concerning its own built-in functions.) Also, the sample size mignt make a difference. That is, Oracle might be able to cache a fixed number of results, 1000 say. Since I was calling it wil fewer than 1000 distinct sets of arguments, it was able to get duplicte results from the cache, but if I needed to call it more than 1000 times, then perhaps it would have started calling the function twice with the same arguments.
    Do you have a strong preference? Is there a section of the documentation you would recommend that I could read up on to learn more about this in general?
    Sorry, I don't know much about that sort of thing, and I don't recall seeing it documented.

    My personal feeling is that thinking about what must or might be more efficient is fine, but there's no substitute for actaully trying things different ways, and measuring the results using full-sized tables. This isn't always trivial to do, but if performance is important enough, it's worth doing.
    For a lot of what I do, performance is not an issue. How much time it takes to develop, and how easy it is to debug and maintain are sometimes more important.
  • 10. Re: How to claim quantities in a join?
    mattknowles Newbie
    Currently Being Moderated
    Frank Kulash wrote:
    The fix I posted in my last message is unnecessarily complicated.
    We can exclude the rows with dispersal_amt=0 by simply changing <= to < in the join condtions.
    I actually considered trying that, but was busy with other things and didn't take the time to test it.
    Also, the sample size mignt make a difference.
    In our case, we're applying payments to one or more accounts in real-time, so we're not likely to have more than a few records in either set, and we'll very frequently only have one of each.

    Thanks again for sharing your expertise. It is much appreciated.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points