Forum Stats

  • 3,768,167 Users
  • 2,252,755 Discussions
  • 7,874,480 Comments

Discussions

getting 2nd recent batch

Sandy0594
Sandy0594 Member Posts: 680 Bronze Badge
edited Oct 13, 2021 3:57PM in SQL & PL/SQL

Hi All,

I have a table with customer order information, everytime new data comes we assign a batch_id to it. I have a requirement where if a customer XXX with ship_to YYY has the latest order then I have to fetch the 2nd recent batch_id of that particular customer/ship-to combination.

DB Version: 12.1.0.2.0

CREATE TABLE xxc_temp(batch_id number, cust_num number,ship_to varchar2(20),release_number number, cust_po_num varchar2(30),creation_date date,org_id number,filename varchar2(30),qty number);
insert into xxc_temp values(1001, 124,'D1_124',1020,'F467A_20211013', TO_DATE('10/11/2021 2:00:03 AM', 'MM/DD/YYYY HH:MI:SS AM'),82,'XX202110110203_82.dat',10);
insert into xxc_temp values(1002, 123,'D1_123',1004,'F345X_20211012', TO_DATE('10/12/2021 2:00:03 AM', 'MM/DD/YYYY HH:MI:SS AM'),82,'XX202110120203_82.dat',6);
insert into xxc_temp values(1002, 123,'D2_123',1004,'F785X_20211013', TO_DATE('10/12/2021 2:00:04 AM', 'MM/DD/YYYY HH:MI:SS AM'),82,'XX202110120203_82.dat',7);
insert into xxc_temp values(1003, 123,'D2_123',1005,'F785X_20211012', TO_DATE('10/12/2021 4:30:29 AM', 'MM/DD/YYYY HH:MI:SS AM'),82,'XX202110120430_82.dat',5);
insert into xxc_temp values(1003, 123,'D1_123',1005,'F345X_20211013', TO_DATE('10/12/2021 4:30:29 AM', 'MM/DD/YYYY HH:MI:SS AM'),82,'XX202110120430_82.dat',12);
insert into xxc_temp values(1004, 124,'D1_124',1021,'F467A_20211013', TO_DATE('10/13/2021 4:30:29 AM', 'MM/DD/YYYY HH:MI:SS AM'),82,'XX202110130430_82.dat',5);
insert into xxc_temp values(1005, 123,'D1_123',1006,'F345X_20211012', TO_DATE('10/14/2021 2:00:03 AM', 'MM/DD/YYYY HH:MI:SS AM'),82,'XX202110140200_82.dat',8);
insert into xxc_temp values(1005, 123,'D1_123',1006,'F345X_20211013', TO_DATE('10/14/2021 2:00:04 AM', 'MM/DD/YYYY HH:MI:SS AM'),82,'XX202110140200_82.dat',9);
insert into xxc_temp values(1006, 800,'D1_800',2980,'F467G_20211019', TO_DATE('10/14/2021 3:00:04 AM', 'MM/DD/YYYY HH:MI:SS AM'),2890,'XX202110140300_2890.dat',5);
insert into xxc_temp values(1007, 800,'D2_800',1800,'F487G_20211019', TO_DATE('10/14/2021 3:30:04 AM', 'MM/DD/YYYY HH:MI:SS AM'),2890,'XX202110140330_2890.dat',15);
insert into xxc_temp values(1008, 124,'D1_124',1022,'F467A_20211013', TO_DATE('10/15/2021 2:30:04 AM', 'MM/DD/YYYY HH:MI:SS AM'),82,'XX202110150230_82.dat',7);
commit;

If you look at the above data, the most recent batch for Customer '124' with ship_to 'D1_124' is 1008. So I am basically looking to get 2nd recent batch_id i.e., 1004. I tried below query but I get multiple batch_id's rather than one.

SELECT *
    FROM (SELECT batch_id
          --,DENSE_RANK () OVER (ORDER BY release_number DESC) row_num
          ,DENSE_RANK () OVER (PARTITION BY cust_num,ship_to ORDER BY batch_id DESC) row_num
       FROM xxc_temp
       where org_id = 82)
    WHERE 1=1--row_num = 2
    order by row_num;


Appreciate your suggestions on this!

Tagged:

Best Answers

  • mathguy
    mathguy Member Posts: 10,151 Blue Diamond
    Accepted Answer

    In the data you provided us there can also be "ties" where the ORG_ID, the CUST_NUM, the SHIP_TO and the CREATION_DATE are all equal - let alone cases where just the CREATION_DATE is the same. Or was that a mistake?

    If that is possible, what is the required handling of "ties"? Suppose for your ORG_ID there are exactly two rows with the latest CREATION_DATE, and they both have the same CUST_NUM and the same SHIP_TO. (Assuming that is possible; it is present in your original test data, anyway.) Is one of the two rows the "latest" and the other "second latest", or do you need to go back to a different "second latest" date? And if so, what if the "second latest" date also has ties (with the same customer and same SHIP_TO)?

    Different question - what if you find that the "latest" row is for a combination of customer and SHIP_TO that doesn't appear anywhere else in the table? Meaning, the "latest" row is also the only (first!) record for that combination of customer and SHIP_TO? What do you need to return then?

    Sandy0594
  • mathguy
    mathguy Member Posts: 10,151 Blue Diamond
    Accepted Answer

    I think the query below covers all the cases you described. The outer join at the end is needed for customer - ship_to combinations that don't have a "second latest" record (that appeared on the most recent date for the first time).

    Perhaps the code can be simplified a little; I wrote each step as a separate subquery in the WITH clause to make it easier to follow the logic. Please test it thoroughly, with as many special cases as you can think of, and see if performance is OK (perhaps it can be improved - for now make sure it's at least correct, regardless of speed).

    with
      prep1 as (
        select t.*, max(creation_date) over () as max_date
         from   xxc_temp t
         where  org_id = 82
      )
    , latest (cust_num, ship_to) as (
        select cust_num, ship_to
        from   prep1
        where  creation_date = max_date
      )
    , prep2 as (
        select t.*,
               row_number() over (partition by cust_num, ship_to
                                  order     by creation_date desc) as rn
        from   xxc_temp t
        where  org_id = 82
          and  (cust_num, ship_to) in (select cust_num, ship_to from latest)
      )
    , second_to_latest as (
        select batch_id, cust_num, ship_to, release_number, cust_po_num,
               creation_date, org_id, filename, qty
        from   prep2
        where  rn = 2
      )
    select batch_id, cust_num, ship_to, release_number, cust_po_num,
           creation_date, org_id, filename, qty
    from   latest left outer join second_to_latest using (cust_num, ship_to)
    ;
    

Answers

  • mathguy
    mathguy Member Posts: 10,151 Blue Diamond

    Several things are unclear in your question.

    Your table has a column ORG_ID, and it seems to play a role in your problem, since in the query you filter for ORG_ID = 82;but in the problem description, you didn't mention anything about that. So, please clarify: do you need to start from the latest order, or do you need to start from the latest row for a specific order, given in the query as an input variable?

    Also - since you never mentioned ORG_ID in your problem statement: suppose you find the "latest" customer number and SHIP_TO.Then you search for "second latest" for that combination... second latest regardless of ORG_ID, or only second-latest for the same ORG_ID as the latest one? (Or do you first filter by ORG_ID so that only one ORG_ID is considered for the rest of the query anyway?)

    Then: there may be ties between rows with the same ORG_ID, CUST_NUMand SHIP_TO. (For example: the third and the fourth row in your sample data have the same values in all three columns). So, then, what do you mean by "latest" and "second latest"? There is ambiguity. What if there are two rows tied for latest? How do you define "second latest" then? Or, what if there is a unique "latest" row, but two rows tied for "second latest"? You seem to want to address that, since you are using dense_rank - but then you are talking about a single row in the output. How do you choose which one, when there is more than one "second latest" row?

    Please clarify. I predict that we will see between three and five solutions even before you answer, each solution making a different set of assumptions (and not stating what assumptions were made).

  • Mustafa_KALAYCI
    Mustafa_KALAYCI Member Posts: 3,368 Bronze Crown

    based on your example, maybe like this:

    with main_data as (
        select x.*, max(ship_to) keep (dense_Rank first order by batch_id desc) over() latest_ship_to
        from xxc_temp x
      ),
      latest_batches as (
        select m.*, rownum as batch_number
        from  main_data m
        where latest_ship_to = ship_to
        order by creation_date desc
      )
    select * 
    from   latest_batches
    where batch_number = 2
    
    
    


  • Mustafa_KALAYCI
    Mustafa_KALAYCI Member Posts: 3,368 Bronze Crown

    Correct, I was thinking about using first and last analytic function. No need order by.

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,205 Red Diamond

    Hi, @Sandy0594

    Thanks for posting the CREATE TABLE and INSERT statements; that helps a lot! Don't forget to post the exact results you want from the given sample data.

    If you look at the above data, the most recent batch for Customer '124' with ship_to 'D1_124' is 1008. So I am basically looking to get 2nd recent batch_id i.e., 1004. I tried below query but I get multiple batch_id's rather than one.

    If you change the main WHERE clause from

     WHERE 1=1--row_num = 2
    

    to

     WHERE row_num = 2
    

    then it looks like you get one batch_id for each distinct combination of customer and ship_to. If you're only interested in a given customer and/or ship_to, then filter those values in the sub-query, the same way you filtered for org_id=82.

    Since the combination (cust_num, ship_to, batch_id) is not unique, this may result in getting multiples rows in the output, even though there is only one batch_id. To get only one row, use SELECT DISTINCT.

    I find WITH clauses easier to read, debug and maintain than in-line views, but you should use whatever you (and the people with whom you work) like best.

    If the query includes  WHERE row_num = 2  , then there's no point on saying  ORDER BY row_num  . If the query only produces (at most) one row, then there's no point in having an ORDER BY clause at all.


    Putting all this together, you may want something like the following:

    WITH  got_row_num  AS
    (
    	SELECT  batch_id
            ,	DENSE_RANK () OVER ( -- PARTITION BY cust_num -- PARTITION BY not needed if filtering
    			   	     --,	     ship_to
    				     ORDER BY 	     batch_id  DESC
    				   ) AS row_num
        	FROM 	xxc_temp
        	WHERE 	org_id	  = 82
    	AND	cust_num  = 124  -- don't use quotes around NUMBERs
    	AND	shiP_TO	  = 'D1_124'
    )
    SELECT DISTINCT batch_id
    FROM	  	 got_row_num
    WHERE	  	 row_num = 2
    ;
    


    Sandy0594
  • Sandy0594
    Sandy0594 Member Posts: 680 Bronze Badge

    org_id is important but I do know at runtime which org I am dealing with. Anytime I only deal with one org either 82 (or) 2890.


    The idea why I need 2nd latest batch is - basically one of our customer shares their forecast data with us. We create sales orders based on their forecast, but the caveat is they can change the forecast for a particular order till the moment we ship their order. So based on this in above example for instance our customer sent a forecast for PO# F467A_20211013 on 10/13 with a qty of 5 but on 10/15 they want a qty of 7.

    Yes there could be ties like in the below example where on 10/14 for cust 123 we received two requests in same file but for different ship-to's which were processed earlier in different batches. So here the 2nd latest one's would be 1001,1002. Thanks for bringing this up I didn't think of this scenario earlier.

    Hope my explanation isn't confusing.

  • mathguy
    mathguy Member Posts: 10,151 Blue Diamond
    Accepted Answer

    In the data you provided us there can also be "ties" where the ORG_ID, the CUST_NUM, the SHIP_TO and the CREATION_DATE are all equal - let alone cases where just the CREATION_DATE is the same. Or was that a mistake?

    If that is possible, what is the required handling of "ties"? Suppose for your ORG_ID there are exactly two rows with the latest CREATION_DATE, and they both have the same CUST_NUM and the same SHIP_TO. (Assuming that is possible; it is present in your original test data, anyway.) Is one of the two rows the "latest" and the other "second latest", or do you need to go back to a different "second latest" date? And if so, what if the "second latest" date also has ties (with the same customer and same SHIP_TO)?

    Different question - what if you find that the "latest" row is for a combination of customer and SHIP_TO that doesn't appear anywhere else in the table? Meaning, the "latest" row is also the only (first!) record for that combination of customer and SHIP_TO? What do you need to return then?

    Sandy0594
  • Sandy0594
    Sandy0594 Member Posts: 680 Bronze Badge
    edited Oct 13, 2021 7:13PM

    Oh I see what you mean. Those aren't ties because if you see the Cust_po_number is different. I am assuming you are referring to batch_id 1005. In cases where everything is same(Cust_po, release_number, cust_num, ship-to) and there are two records for a given batch then we add-up the quantity.

    This table basically holds the forecast data which means it's a direct reflection of the files they share with us. Other than adding batch_id to this table everything is from the customer. The adding up of quantity and all that is at a later stage when we create Sales orders.

    For your last question, if there's no previous entry for that combination it basically means we need to create a new sales order(as there's no existing sales order present with this forecast data). The batch_id can be returned as 0 in this case.

  • mathguy
    mathguy Member Posts: 10,151 Blue Diamond
    Accepted Answer

    I think the query below covers all the cases you described. The outer join at the end is needed for customer - ship_to combinations that don't have a "second latest" record (that appeared on the most recent date for the first time).

    Perhaps the code can be simplified a little; I wrote each step as a separate subquery in the WITH clause to make it easier to follow the logic. Please test it thoroughly, with as many special cases as you can think of, and see if performance is OK (perhaps it can be improved - for now make sure it's at least correct, regardless of speed).

    with
      prep1 as (
        select t.*, max(creation_date) over () as max_date
         from   xxc_temp t
         where  org_id = 82
      )
    , latest (cust_num, ship_to) as (
        select cust_num, ship_to
        from   prep1
        where  creation_date = max_date
      )
    , prep2 as (
        select t.*,
               row_number() over (partition by cust_num, ship_to
                                  order     by creation_date desc) as rn
        from   xxc_temp t
        where  org_id = 82
          and  (cust_num, ship_to) in (select cust_num, ship_to from latest)
      )
    , second_to_latest as (
        select batch_id, cust_num, ship_to, release_number, cust_po_num,
               creation_date, org_id, filename, qty
        from   prep2
        where  rn = 2
      )
    select batch_id, cust_num, ship_to, release_number, cust_po_num,
           creation_date, org_id, filename, qty
    from   latest left outer join second_to_latest using (cust_num, ship_to)
    ;