Forum Stats

  • 3,758,102 Users
  • 2,251,328 Discussions
  • 7,870,030 Comments

Discussions

Data filter

User_AC3QE
User_AC3QE Member Posts: 34 Red Ribbon

Hi

when starting with the same sequence, display the item with the most characters

CREATE TABLE EXPORT_TABLE 
  (DT_ATUALIZACAO DATE NOT NULL, 
	DADOS VARCHAR2(4000 BYTE) NOT NULL 
	 );

DROP TABLE EXPORT_TABLE

Insert into EXPORT_TABLE (DT_ATUALIZACAO,DADOS) values (to_date('16/04/21 18:00:00','DD/MM/RR HH24:MI:SS'),'10990350,10990352,10856224,10990348');
Insert into EXPORT_TABLE (DT_ATUALIZACAO,DADOS) values (to_date('16/04/21 18:00:00','DD/MM/RR HH24:MI:SS'),'10990350,10990352,10856224,10990348');
Insert into EXPORT_TABLE (DT_ATUALIZACAO,DADOS) values (to_date('19/04/21 18:00:00','DD/MM/RR HH24:MI:SS'),'10990350,10990352,10856224,10990348');
Insert into EXPORT_TABLE (DT_ATUALIZACAO,DADOS) values (to_date('12/04/21 18:00:00','DD/MM/RR HH24:MI:SS'),'10990063,10990235,10990065');
Insert into EXPORT_TABLE (DT_ATUALIZACAO,DADOS) values (to_date('19/04/21 18:00:00','DD/MM/RR HH24:MI:SS'),'10990249,10990250');
Insert into EXPORT_TABLE (DT_ATUALIZACAO,DADOS) values (to_date('25/04/21 18:00:00','DD/MM/RR HH24:MI:SS'),'10990249,10990250,10990251');
Insert into EXPORT_TABLE (DT_ATUALIZACAO,DADOS) values (to_date('25/04/21 18:00:00','DD/MM/RR HH24:MI:SS'),'10990249,10990250,10990251');
Insert into EXPORT_TABLE (DT_ATUALIZACAO,DADOS) values (to_date('16/04/21 18:00:00','DD/MM/RR HH24:MI:SS'),'10990254,10990256');
Insert into EXPORT_TABLE (DT_ATUALIZACAO,DADOS) values (to_date('14/04/21 18:00:00','DD/MM/RR HH24:MI:SS'),'10990063,10990235,10990065');

SELECT * FROM EXPORT_TABLE

RESULT

line 5 and line 6 starts with the same character 10990249
bring only with greater quantity of character


Best Answer

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,042 Red Diamond
    Accepted Answer

    Hi, @User_AC3QE

    You can use a Top-N Query like this:

    WITH  got_rn  AS
    (
    	SELECT  dt_atualizacao, dados
    	,	ROW_NUMBER () OVER ( PARTITION BY SUBSTR ( dados
    			   	   	    	  	 , 1
    							 , INSTR ( dados || ','
    							  	 , ','
    								 )
    							 )
    				    ORDER BY   LENGTH (dados) DESC
    			   	  ) AS rn
    	FROM	export_table
    )
    SELECT   dt_atualizacao, dados
    FROM	 got_rn
    WHERE	 rn = 1
    ORDER BY dados, dt_atualizacao -- or whatever you want
    ;
    

    For each group of rows where the value of dados up to the first comma is the same, only the row with the longest dados is displayed. You didn't specify what you want when there's a tie for the longest dados. As posted above, one of the rows with the longest dados is picked arbitrarily. If you want something different, you can add tie-breaking expressions to the analytic ORDER BY clause and/or use RANK instead of ROW_NUMBER.

Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,042 Red Diamond

    Hi, @User_AC3QE

    Thanks for posting the CREATE TABLE and INSERT statements. Don't forget to post the full, exact results you want from the given data.

    line 5 and line 6 starts with the same character 10990249
    

    Don't lines 5, 6 and 7 all start with '10990249'? What do you want when there's a tie for the longest dados?

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,042 Red Diamond
    Accepted Answer

    Hi, @User_AC3QE

    You can use a Top-N Query like this:

    WITH  got_rn  AS
    (
    	SELECT  dt_atualizacao, dados
    	,	ROW_NUMBER () OVER ( PARTITION BY SUBSTR ( dados
    			   	   	    	  	 , 1
    							 , INSTR ( dados || ','
    							  	 , ','
    								 )
    							 )
    				    ORDER BY   LENGTH (dados) DESC
    			   	  ) AS rn
    	FROM	export_table
    )
    SELECT   dt_atualizacao, dados
    FROM	 got_rn
    WHERE	 rn = 1
    ORDER BY dados, dt_atualizacao -- or whatever you want
    ;
    

    For each group of rows where the value of dados up to the first comma is the same, only the row with the longest dados is displayed. You didn't specify what you want when there's a tie for the longest dados. As posted above, one of the rows with the longest dados is picked arbitrarily. If you want something different, you can add tie-breaking expressions to the analytic ORDER BY clause and/or use RANK instead of ROW_NUMBER.

  • User_AC3QE
    User_AC3QE Member Posts: 34 Red Ribbon


    excuse in case of a tie should be the most recent date

    I tried to use the rank without success

  • mathguy
    mathguy Member Posts: 10,071 Blue Diamond

    I am confused about your problem requirement. I believe it is one way (and some of your wording suggests that), but then your explanation goes a different way.


    Suppose you have the following strings in two different rows (never mind that my numbers are smaller, just for simplicity):

    123,45

    123,6789


    Do you need both of these to be returned? Or only one of them?

    From the way you described, they both "start with the same sequence" so you would only return the second string and reject the first, because the second is longer.

    But from some of your wording (and examples), it seems that you should reject one row only if ALL OF IT is at the beginning of another one. So, in the example above, you would keep both.

    Only if there was one more string

    123,45,83000

    you would reject 123,45 because this third row begins with the entire sequence 123,45 (not just the first number, but ALL of them).


    Please clarify. I don't even know which of these interpretations Mr. Kulash's answer uses; I assume he thought it was obvious.

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,042 Red Diamond

    Hi, @User_AC3QE

    in case of a tie should be the most recent date

    Okay, then add another item at the end of the analytic ORDER BY clause. Does "most recent" mean latest, or latest before some point (like now)?

    What if there is still a tie (that is, two or more rows with equally long dados and the most recent date.

    I tried to use the rank without success

    Post your code. I can't say what you did wrong when I don't know what you did. You still need to post the complete results, too.