1 2 Previous Next 21 Replies Latest reply on Jun 17, 2004 7:38 AM by 419360 Go to original post
      • 15. Re: Alternative to INNER JOIN
        419360
        I have a feeling that the approach is not right, and
        this explains the long and slow queries.

        The SQL solution for such cases is the INTERSECT
        operator. I illustrate its behaviour, which seems to
        match perfectly your INTERSECT values manipulation.

        Using:
        create table t1(
        id number,
        nm varchar2(50);
        
        create table t1(
        id number,
        nm varchar2(50);
        
        SQL> select * from t1 order by 1;
        
        ID NM
        ------ -------
        1 a
        2 b
        33
        44 m
        
        SQL> select * from t2 order by 1;
        
        ID NM
        ------ ------
        2 b
        3 c
        33
        44
        We have:
        select * from t1
        intersect
        select * from t2;
        
        ID NM
        ----- -------
        2 b
        33
        I think that you should try this approach too, and
        compare the resulted query and its performance with
        other solutions.
        I disagree with You, INTERSECT is not for this kind of problem, I have nthg to INTERSECT with. My main problem is that INNER JOIN produces to mutch rows, anyway my solution with PARTITION BY and limiting as much as possible before INNER JOIN works ok, now it takes about few seconds to complete for more than 64000 rows in ADRESY (48000 rows have duplicates in IDPOWIATU, TYPADRESU and so on) and 60000 rows from ADRESY in OBIEKTYTRANS.

        Thanks anyway
        Rafa³ Grzybowski
        • 16. Re: Alternative to INNER JOIN
          421128
          Sorry, I was wrong: the information that I had was quite confusing.

          But I want to spend a little more time with this, because this could help you in the future.

          When you say that a query containing in the SELECT clause a sub-query with "MIN (idadresu) OVER(PARTITION BY...)", like yours, produces the same result as your query with many ANDs, this mean that the table is not normalized (having columns which are dependent on another column which is not the full primary key).

          It seems to me that you have two entities in your ADRESY table.

          1. An Address entity, with (idadresu) as key and (idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) as attributes.

          2. A relationship entity (i.e. representing a M-to-N relation relationship with attributes), linking on "object" entity using the "idobiektu" FK with the Address entity, and containing link-specific attributes (status, datawer, etc).

          If my guess is correct, then you should normalize the table.

          You were right that this is NOT a place for INTERSECT, but I think that it is also NOT NOT a place to use analytical functions with PARTITION, because a GROUP BY with HAVING has the same effect as the 3 function calls, and is computed only once. But the GROUP BY is not necessary either.

          Your query seems to be equivalent with the following simpler and faster query (from which you can eliminate the DISTINCTs after normalizing the table):
          SELECT adra.idobiektu,
                 adra.status,
                 adrb.idobiektu AS znalezioneid
            FROM (SELECT DISTINCT 
                         idobiektu,
                         idadresu,
                         status
                    FROM adresy
                   WHERE status IN (2, 5)
                     AND idobiektu IN (SELECT idobiektu 
                                         FROM obiektytrans@lnkmatraakt)) adra 
           INNER JOIN
                   (SELECT DISTINCT
                         idobiektu,
                         idadresu
                    FROM adresy
                   WHERE status = 0) adrb 
            ON adra.idadresu = adrb.idadresu;
          • 17. Re: Alternative to INNER JOIN
            John Spencer
            Rafal:

            Actually, you do have something to intersect. The two queries you are joining together. Each represents a set of rows that you are interested in. Your first in-line view selects records from adresy where status is 2 or 5 and idobiektu appears in another table obiektytrans. Your second in-line view select all the rows in adresy that have a status of 0. The join between the two in-line views says find the rows where the joined fields are equal.

            The INTERSECT operator does exactly the same thing. It returns the rows where each of the columns in the two queries are identical. Basically, give me the rows that are common to both queries, which seems to be what you want to see.

            John
            • 18. Re: Alternative to INNER JOIN
              419360
              I'll try to explain my solution step by step.
              I think that I haven't clearly describe my
              problem, maybe my solutions will help me
              to clearify the problem. IDADRESU is PK for 
              ADRESY table. IDADRESU is non-unique FK 
              to another table. Let's assume that
              ADRESY table contains rows like this, 
              for simplicity we take only IDPOWIATU
              and TYPADRESU as common columns.
              
              IDOBIEKTU IDADRESU IDPOWIATU TYPADRESU STATUS
              ---------------------------------------------
              ---------------------------------------------
              1         1        1         1         1
              2         2        1         2         1
              3         3        1         2         2
              4         4        3         1         0
              5         5        3         1         1
              6         6        2         2         1
              7         7        2         2         5
              8         8        2         2         0
              9         9        5         1         0
              10        10       5         1         1
              6         11       5         1         2
              12        12       5         1         3
              7         13       5         1         4
              14        14       5         1         5
              
              
              If we divide this rows into groups having
              the same IDPOWIATU and TYPADRESU we will
              get sthg like this:
              
              IDOBIEKTU IDADRESU IDPOWIATU TYPADRESU STATUS
              ---------------------------------------------
              ---------------------------------------------
              1         1        1         1         1
              ---------------------------------------------
              2         2        1         2         1
              3         3        1         2         2
              ---------------------------------------------
              4         4        3         1         0
              5         5        3         1         1
              ---------------------------------------------
              6         6        2         2         1
              7         7        2         2         5
              8         8        2         2         0
              ---------------------------------------------
              9         9        5         1         0
              10        10       5         1         1
              6         11       5         1         2
              12        12       5         1         3
              7         13       5         1         4
              14        14       5         1         5
              
              Now, we are interested only in groups
              that have at least one row with STATUS = 0
              and at least one row with STATUS IN (2,5).
              To do this I used analytical functions
              that counts for each group number of such
              rows respectively status0 and status25 with
              this query:
              
              SELECT 
                idobiektu, 
                status,
                MIN (idadresu) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) klasa,
                SUM (DECODE (status, 2, 1, 5, 1, 0)) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) status25,
                SUM (DECODE (status, 0, 1, 0)) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) status0
              FROM adresy;
              
              MIN(IDADRESU) ... KLASA is only for replacing expensive IDPOWIATU, TYPADRESU columns and
              help me distinguish between diffrent groups. IDADRESU is PK so MIN(IDADRESU) is also unique.
              Maybe there is another wayt to number this groups, I did this way.
              
              This query returns:
              
              
              IDOBIEKTU STATUS KLASA STATUS25 STATUS0
              ---------------------------------------
              ---------------------------------------
              1         1      1     0        0 
              ---------------------------------------
              2         1      2     1        0 
              3         2      2     1        0
              ---------------------------------------
              4         0      4     0        1
              5         1      4     0        1
              ---------------------------------------
              6         1      6     1        1 
              7         5      6     1        1
              8         0      6     1        1
              ---------------------------------------
              9         0      9     2        1
              10        1      9     2        1
              6         2      9     2        1
              12        3      9     2        1
              7         4      9     2        1
              14        5      9     2        1
              
              As You can see all groups with status0 <> 0 and status25 <> 0 are in my in interest.
              Also only rows with status IN (0, 2, 5) are OK
              So here comes query:
              
              SELECT 
                idobiektu, 
                status, 
                klasa
              FROM (SELECT 
                      idobiektu, 
                      status,
                      MIN (idadresu) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) klasa,
                      SUM (DECODE (status, 2, 1, 5, 1, 0)) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) status25,
                      SUM (DECODE (status, 0, 1, 0)) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) status0
                    FROM 
                      adresy
                   )
              WHERE 
                status0 <> 0 AND 
                status25 <> 0 AND
                status IN (0, 2, 5)
              
              It gives:
              
              IDOBIEKTU STATUS KLASA
              ----------------------
              ----------------------
              7         5      6
              8         0      6
              ----------------------
              9         0      9
              6         2      9
              14        5      9
              
              After this I have simplified KLASA to test common columns
              and now I can INNER JOIN within each group. The result is:
              
              IDOBIEKTU STATUS KLASA IDOBIEKTU AS ZNALEZIONEID
              ----------------------
              ----------------------
              7         5      6     8
              ----------------------
              6         2      9     9
              14        5      9     9
              
              
              The reason why I cannot do this another way I think is that there is a lot of rows 
              with status IN (0), and a lot of rows with status IN (2,5) so querying for this 
              conditions and then INNER JOIN involves a great cartesian product - bad solution.
              I MUST minimize as much as possible before INNER JOIN
              
              The final solution looks like:
              
              SELECT
                adra.idobiektu,
                adra.status,
                adrb.idobiektu AS znalezioneid
              FROM
              (
                SELECT
                  idobiektu,
                  status,
                  klasa
                FROM
                  (
                    SELECT idobiektu, status,
                               MIN (idadresu) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) klasa,
                               SUM (DECODE (status, 2, 1, 5, 1, 0)) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) status25,
                               SUM (DECODE (status, 0, 1, 0)) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta) status0
                    FROM adresy
                  )
                WHERE
                  status0 <> 0 AND 
                  status25 <> 0 AND
                  status IN (2, 5) AND
                  idobiektu IN (SELECT idobiektu FROM obiektytrans@lnkmatraakt)
              ) adra
              INNER JOIN
              (
                SELECT 
                  idobiektu, 
                  klasa
                FROM (SELECT idobiektu, status,
                               MIN (idadresu) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta)
                                                                                      klasa,
                               SUM (DECODE (status, 2, 1, 5, 1, 0)) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta)
                                                                                   status25,
                               SUM (DECODE (status, 0, 1, 0)) OVER (PARTITION BY idpowiatu, typadresu, nazwawlasna, idkraju, idwojews, idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, nrlokalu, kodpocztowy, poczta)
                                                                                    status0
                          FROM adresy)
                 WHERE 
                   status0 <> 0 AND 
                   status25 <> 0 AND 
                   status = 0
              ) adrb
              USING (klasa);
              • 19. Re: Alternative to INNER JOIN
                421128
                Thanks, now it is much clearer. My speech about "normalization" was wrong, but the table seems still to contain more than one logical entity.

                You were right, a certain way of using analytical functions seems to be the only elegant/efficient solution in this case.

                Understanding this, I would write your query with minor improvements as:
                WITH AdrGrp 
                  AS (SELECT idobiektu,
                            status, 
                             klasa
                        FROM (SELECT idobiektu,
                                     status, 
                                     MIN(idadresu) OVER
                                         (PARTITION BY dpowiatu, typadresu, nazwawlasna, idkraju, idwojews,
                                                    idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, 
                                                    nrlokalu, kodpocztowy, poczta) AS klasa,
                                     SUM (DECODE (status, 2, 1, 5, 1, 0)) OVER
                                         (PARTITION BY dpowiatu, typadresu, nazwawlasna, idkraju, idwojews,
                                                    idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, 
                                                    nrlokalu, kodpocztowy, poczta) AS status25,
                                      SUM (DECODE (status, 0, 1, 0)) OVER 
                                         (PARTITION BY dpowiatu, typadresu, nazwawlasna, idkraju, idwojews,
                                                    idpowiatus, idgminys, idmiejsc, idulicy, nrdomu, 
                                                    nrlokalu, kodpocztowy, poczta) AS status0
                                FROM adresy
                               WHERE status IN (0, 2, 5))
                       WHERE status25 > 0
                         AND status0 > 0) 
                SELECT adra.idobiektu,
                       adra.status,
                       adrb.idobiektu AS znalezioneid
                  FROM (SELECT idobiektu,
                               klasa
                          FROM AdrGrp
                         WHERE status IN (2, 5)
                           AND idobiektu IN (SELECT idobiektu FROM obiektytrans@lnkmatraakt)) adra,
                       (SELECT idobiektu,
                               klasa
                          FROM AdrGrp
                         WHERE status = 0) adrb
                 WHERE adra.klasa = adrb.klasa
                • 20. Re: Alternative to INNER JOIN
                  419360
                  Thanks, I forgot about status IN (0,2,5) in the most internal queries with PARTITION BY.
                  
                  What's the difference between WITH query ...
                  and having two the same subqueries ? Does Oracle 
                  execute the same queries twice ??? Or maybe it's just about cleaner solution and easier maintaince and modifiactions to this duplicated subquery (no need to modify twice, less errors) ?
                  
                  Thanks All of You for beeing helpfull
                  
                  Best Regards
                     Rafa³ Grzybowski
                  • 21. Re: Alternative to INNER JOIN
                    419360
                    Thanks, I forgot about status IN (0,2,5) in the most
                    internal queries with PARTITION BY.
                    
                    What's the difference between WITH query ...
                    and having two the same subqueries ? Does Oracle 
                    execute the same queries twice ??? Or maybe it's just
                    about cleaner solution and easier maintaince and
                    modifiactions to this duplicated subquery (no need to
                    modify twice, less errors) ?
                    
                    Thanks All of You for beeing helpfull
                    
                    Best Regards
                    Rafa³ Grzybowski
                    Ok, I've read about WITH and find it very helpfull.
                    I've also tested and Your solution is faster than mine,
                    thanks.

                    Best regards
                    Rafa³ Grzybowski
                    1 2 Previous Next