7 Replies Latest reply: Jul 29, 2011 2:23 PM by 607171 RSS

    Sys_connect_by_path

    607171
      Hi All,

      I am on Oracle 10g. I am using sys_connect_by_path to concatenate unique values , however, it seems to append semi-colons when no values are found.
      Can you pls point out how I can remove extra appended semicolons. I tried a replace, but that did not seem to work.

      -
       SELECT 
                (TRIM(';' from sys_connect_by_path(ActualFGMaterialNum,';')))ActualFGMaterialNum
           FROM (
                  SELECT NPIProjectID, 
                         CASE  
                            WHEN SYS_CONNECT_BY_PATH(PRIOR ActualFGMaterialNum,';') || ',' LIKE '%' || ActualFGMaterialNum || '%' THEN NULL
                            ELSE ActualFGMaterialNum
                         END ActualFGMaterialNum,
                         rnum,
                         tot
                  FROM (
                        SELECT npip.id AS NPIProjectID,
                        nm.ActualFGMaterialNum AS ActualFGMaterialNum, 
                        row_number() OVER (PARTITION BY npip.id ORDER BY nm.id) rnum,
                        COUNT(*) OVER (PARTITION BY npip.id) tot
                        FROM ADI.NPIProjects npip LEFT OUTER JOIN ADI.FormFactorInformation ffi ON ffi.NPIProjectId = npip.Id 
                        LEFT OUTER JOIN ADI.TestGrades tg ON tg.FormFactorInformationId = ffi.Id
                        LEFT OUTER JOIN ADI.NewModels nm ON nm.TestGradesID = tg.Id
                        )
                        START WITH rnum =1 CONNECT BY PRIOR rnum = rnum-1 AND PRIOR NPIProjectID = NPIProjectID
                     )
                      WHERE rnum=tot START WITH rnum =1 CONNECT BY PRIOR rnum = rnum-1 AND PRIOR NPIProjectID = NPIProjectID; -
      The results are returned as follows:

      ActualFGMaterialNum

      AD7862ARSZ-2REEL;AD7862ARSZ-3REEL

      AD9742ACPZRL7;;;ADUM1410ARWZ-RL

      ADSP-BF533SKSTZ-5V;;ADSP-BF533SKBC750X;;ADSP-BF533SKBCZ600;;ADSP-BF533SKBC-6V


      If you notice above, it appends extra ; after each string. How can we eliminate the null or how can we remove the extra semi-colons (;)?

      Thanks in advance,
      AD.
        • 1. Re: Sys_connect_by_path
          Etbin
          Instead of having to eliminate duplicates from concatenations I'd rather try to get rid of them before concatenating.

          Regards

          Etbin
          • 2. Re: Sys_connect_by_path
            Frank Kulash
            Hi,

            Whenever you have a question, please post a little sample data (CREATE TABLE and INSERT statements) and the results you want from that data. Explain how you get those results from that data.
            If you can use commonly available tables, like those in the scott schema, to show your problem, then you don't need to post any sample data; just the results explanation.

            You could use a WHERE clause to remove rows where ActualFGMaterialNum is NULL. Since the WHERE clause is applied after CONNECT BY is finished (and SYS_CONNECT_BY_PATH is formed), that probably means another sub-query. To make sure that rnum is consecutive, you'll have to be sure to compute rnum in the same sub-query where the NULLs are exclued, or at least not earlier than that.

            A less efficient approach would be to replace multiple consecutive ';'s in the final output with a single ';'. In Oracle 10 (and up), that's fairly simple:
            REGEXP_REPLACE ( txt
                        , ';+'
                        , ';'
                        )
            returns a copy of txt with all groups of consecutive ';'s (no matter how many there are) replaced by one ';'.

            Do you really need to do two nested CONNECT BY queries? Whatever you're doing, I'll bet there's a simpler, more efficient way to do it.
            • 3. Re: Sys_connect_by_path
              607171
              Hi Frank,
              Thank you for your note.
              I am new to using sys_connect_by_path and have researched through a few posts on this forum to get a basic understanding.
              The approach I used here is what I has picked up from one of the posts here which was trying to do a similar thing.

              To your point, there may be a better way to do it, but I haven't been able to figure it out.
              If you have any thoughts on how to improve it, I would be more than happy to learn.

              The solution I am trying to come up with is to basically take multiple rows and concatenate their column values to show as a single string separated by semi colon. In the process I want to avoid duplicates.

              Thanks,
              AD
              • 4. Re: Sys_connect_by_path
                Frank Kulash
                Hi,
                user604168 wrote:
                ... The solution I am trying to come up with is to basically take multiple rows and concatenate their column values to show as a single string separated by semi colon. In the process I want to avoid duplicates.
                A lot of people do that with just one CONNECT BY query; I don't know why you need two CONNECT BY queries, one nested inside the other. It depends on your data, the results you want from that data, and why you want those results (so that you'll get a solution that not only works on the sample data you post, but on any data you'll get in the future). That's why I asked you to post a little sample data (CREATE TABLE and INSERT statements), the results you want from that data, and an explanation of why you want those results from that data.
                If you have any thoughts on how to improve it, I would be more than happy to learn.
                What you posted may be a great solution for your problem.
                There may be another way, simpler and more efficient, that gets the results you want.
                As long as I don't know where you're starting from, and where you want to go, I can't give you very good directions.
                • 5. Re: Sys_connect_by_path
                  607171
                  Hi Frank,

                  Below are the details of create table / insert table
                   
                  create table RepTest1 
                  (ID NUMBER NOT NULL,
                  PRIMARY KEY (ID));
                  
                  create table RepTest2
                  (FID NUMBER NOT NULL,
                   ID NUMBER,
                   PRIMARY KEY (FID),
                   CONSTRAINT fk_ID FOREIGN KEY (ID) REFERENCES RepTest1(ID));
                  
                  create table RepTest3
                  (TID NUMBER NOT NULL,
                   PGRD VARCHAR2(3) NULL,
                   SPSU VARCHAR2(4) NULL,
                   FID NUMBER,
                   PRIMARY KEY (TID),
                   CONSTRAINT fk_fid FOREIGN KEY (FID) REFERENCES RepTest2(FID));
                  INSERT INTO RepTest1 values (8229);
                  INSERT INTO RepTest1 values (6614);
                  
                  INSERT INTO RepTest2 values (8249, 8229);
                  INSERT INTO RepTest2 values (6634, 6614);
                  
                  INSERT INTO RepTest3 values (8519, 'K', '3A', 8249);
                  INSERT INTO RepTest3 values (8520, 'K', '4A', 8249);
                  INSERT INTO RepTest3 values (8521, 'B', '4A', 8249);
                  INSERT INTO RepTest3 values (8522, 'B', '3A', 8249);
                  INSERT INTO RepTest3 values (8523, 'K', '2A', 8249);
                  INSERT INTO RepTest3 values (8525, 'K', '4A', 8249);
                  INSERT INTO RepTest3 values (8526, 'B', '4A', 8249);
                  INSERT INTO RepTest3 values (8977, 'K', '3A1', 8249);
                  INSERT INTO RepTest3 values (8417, 'K', '2A', 8249);
                  INSERT INTO RepTest3 values (8524, 'K', '3A', 8249);
                  INSERT INTO RepTest3 values (7623, 'A', '00', 6634);
                  INSERT INTO RepTest3 values (7624, '', '01', 6634);
                  INSERT INTO RepTest3 values (7625, 'A', '02', 6634); 
                  Based on the above data, I want to concatenate all the RepTest3 columns which have the same FID for each unique ID in RepTest1. Here I also want to eliminate any duplicate values while concatenating .

                  When I execute the below query:
                  SELECT ID, 
                           (trim(';' from (sys_connect_by_path(SPSU,';')))) SPSU,
                           (trim(';' from (sys_connect_by_path(PGRD,';')))) PGRD   
                       FROM (
                              SELECT ID, 
                                     CASE  
                                        WHEN SYS_CONNECT_BY_PATH(PRIOR SPSU,';') || ',' LIKE '%' || SPSU || '%' THEN NULL
                                        ELSE SPSU
                                     END SPSU,
                                     CASE  
                                        WHEN SYS_CONNECT_BY_PATH(PRIOR PGRD,';') || ',' LIKE '%' || PGRD || '%' THEN NULL
                                        ELSE PGRD
                                     END PGRD,
                                     rnum,
                                     tot
                              FROM (
                                    select rt1.ID AS ID , rt3.spsu AS spsu, rt3.pgrd AS pgrd,
                                          row_number() OVER (PARTITION BY rt1.id ORDER BY rt3.tid) rnum,
                                          COUNT(*) OVER (PARTITION BY rt1.id) tot
                                    FROM ADI.RepTest1 rt1 LEFT OUTER JOIN ADI.RepTest2 rt2 ON rt2.ID = rt1.Id 
                                    LEFT OUTER JOIN ADI.RepTest3 rt3 ON rt3.FID = rt2.FID
                                   )                
                              START WITH rnum=1 CONNECT BY PRIOR rnum = rnum-1 AND PRIOR ID = ID    
                              )     
                               WHERE rnum=tot START WITH rnum =1 CONNECT BY PRIOR rnum = rnum-1 AND PRIOR ID = ID;
                  The results are as follows:
                  _ID_        _SPSU_                      _PGRD_
                  6614            00;01;02                         A
                  8229            2A;3A;4A;;;;;;;3A1         K;;;B 
                  It returns the unique values removing the duplicates, but somehow introduces extra semicolons even when there is no null data.
                  When I use Regexp_replace on the final value, it removes the extra semicolons, but I am not sure why it is adding the extra semicolons in the first place.

                  Pls let me know if I should provide any further detail.


                  Thanks,
                  AD.
                  • 6. Re: Sys_connect_by_path
                    Frank Kulash
                    Hi,

                    I wouldn't use SYS_CONNECT_BY_PATH for this. I would use the user-defined aggregate fucntion STRAGG, which you can copy from
                    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2196162600402

                    It's about 60 lines of PL/SQL code, but once you've installed it, you can use it anywhere, on any table.
                    Look how simple your query is using STRAGG!
                    SELECT       r1.id
                    ,       REPLACE (STRAGG (DISTINCT r3.spsu), ',', ';')     AS spsu
                    ,       REPLACE (STRAGG (DISTINCT r3.pgrd), ',', ';')     AS pgrd
                    FROM              reptest1  r1
                    LEFT OUTER JOIN  reptest2  r2  ON  r1.id   = r2.id
                    LEFT OUTER JOIN      reptest3  r3  ON  r2.fid  = r3.fid
                    GROUP BY  r1.id
                    ORDER BY  r1.id
                    ;
                    Output:
                    `       ID SPSU                 PGRD
                    ---------- -------------------- --------------------
                          6614 00;01;02             A
                          8229 2A;3A;3A1;4A         B;K
                    Since STRAGG is set up once for ever, you don't get any choice about the delimiter. I installed it with ',' as the delimiter, but you could code it so ';' was the delimiter. As you can see, it's easy enough to use REPLACE if you don't want to use the standard delimiter.

                    If you wanted to use SYS_CONNECT_BY_PATH, I would unpivot the data so that there was one item (a single pgrd or spsu value) on a row, then use SYS_CONNECT_BY_PATH to get the delimited list.
                    WITH    cntr            AS
                    (
                         SELECT     'PGRD'     AS column_name     FROM dual     UNION ALL
                         SELECT     'SPSU'                  FROM dual
                    )
                    ,     unpivoted_data      AS
                    (
                         SELECT DISTINCT  r1.id
                         ,                 c.column_name
                         ,           CASE  c.column_name
                                          WHEN  'PGRD'  THEN  r3.pgrd
                                          WHEN  'SPSU'  THEN  r3.spsu
                                    END                       AS txt
                         ,           DENSE_RANK () OVER ( PARTITION BY  r1.id
                                                           ,              c.column_name
                                                   ORDER BY         CASE  c.column_name
                                                                                WHEN  'PGRD'  THEN  r3.pgrd
                                                                       WHEN  'SPSU'  THEN  r3.spsu
                                                            END
                                                 )        AS r_num
                         FROM     reptest1  r1
                         JOIN     reptest2  r2  ON   r1.id     = r2.id
                         JOIN     reptest3  r3  ON   r2.fid     = r3.fid
                         CROSS JOIN  cntr  c
                    )
                    ,     got_list_txt     AS
                    (
                         SELECT     id
                         ,     column_name
                         ,     TRIM (BOTH ';' FROM  SYS_CONNECT_BY_PATH (txt, ';'))     AS list_txt
                         FROM     unpivoted_data
                         WHERE     CONNECT_BY_ISLEAF     = 1
                         START WITH     r_num          = 1
                         CONNECT BY     r_num          = PRIOR r_num + 1
                              AND     id          = PRIOR id
                              AND     column_name     = PRIOR column_name
                    )
                    SELECT       rt1.id
                    ,       MIN (CASE WHEN column_name = 'SPSU' THEN lt.list_txt END)     AS spsu
                    ,       MIN (CASE WHEN column_name = 'PGRD' THEN lt.list_txt END)     AS pgrd
                    FROM                 reptest1      rt1
                    LEFT OUTER JOIN  got_list_txt  lt     ON  rt1.id  = lt.id
                    GROUP BY  rt1.id
                    ORDER BY  rt1.id
                    ;
                    This is more lines of code than what you had, but I think it will be faster because it is only doing one CONNECT BY query. This way works well if you need the output in a particular order.

                    Either way scales nicely if you need to add more delimited columns like spsu and pgrd.

                    The extra ';'s in your original query were caused because you weren't really excluding duplicates. Whether a value was a duplicate or not, it was causing an entry in the list, but code like this:
                    ...                CASE  
                                          WHEN SYS_CONNECT_BY_PATH(PRIOR SPSU,';') || ',' LIKE '%' || SPSU || '%' THEN NULL
                                          ELSE SPSU
                                       END SPSU,
                    was checking for duplicates, and adding NULL (plus a ';' delimiter) to the list when the item was a duplicate. This is pretty inefficient compared to built-in techniques like DISTINCT, expecially since this involves both SYS_CONNECT_BY_PATH and LIKE, which are fairly expensive SQL tools.

                    Good job on the sample data! It looks like you changed the values to short, easy-to-recognize strings, such as '2A' and 'K'. Thanks! That's very helpful, and makes it easier to be sure the answer is correct.
                    • 7. Re: Sys_connect_by_path
                      607171
                      Hi Frank,
                      Thanks for your response. I will try out the options you suggested to see how it works out for me.


                      -AD