8 Replies Latest reply on Oct 10, 2019 6:49 AM by mathguy

    Find missing data from composite table

    drichard0158

      I have a table that contains all possible data requirements for an object (Table A). Based on the object classification, only certain attributes are required (Table B). What I would like to do is select the object name and those required attributes that are not populated. PL/SQL is not my forte, but if that's the only way I will give it a try. Here's an example:

       

      TABLE A

      OBJECTNAMEOBJECTCLASSATTRIBUTE1ATTRIBUTE2ATTRIBUTE3ATTRIBUTE4ATTRIBUTE5
      ABCDEFCLASS11234567891109
      GHIJKCLASS21234567891011
      LMNOPCLASS112345678

       

      TABLE B

      CLASSIFICATIONATTRIBUTE
      CLASS1ATTRIBUTE1
      CLASS1ATTRIBUTE2
      CLASS1ATTRIBUTE3
      CLASS1ATTRIBUTE4
      CLASS2ATTRIBUTE1
      CLASS2ATTRIBUTE2
      CLASS2ATTRIBUTE3
      CLASS2ATTRIBUTE4
      CLASS2ATTRIBUTE5

       

      Results would be:

      OBJECTNAMEMISSINGATTRIBUTE
      ABCDEFATTRIBUTE3
      GHIJKATTRIBUTE4
      GHIJKATTRIBUTE5
      LMNOPATTRIBUTE1
      LMNOP
      ATTRIBUTE4

       

        • 1. Re: Find missing data from composite table
          Frank Kulash

          Hi,

          drichard0158 wrote:

           

          I have a table that contains all possible data requirements for an object (Table A). Based on the object classification, only certain attributes are required (Table B). What I would like to do is select the object name and those required attributes that are not populated. PL/SQL is not my forte, but if that's the only way I will give it a try. Here's an example:

           

          TABLE A

          OBJECTNAME
          OBJECTCLASS
          ATTRIBUTE1
          ATTRIBUTE2
          ATTRIBUTE3
          ATTRIBUTE4
          ATTRIBUTE5
          ABCDEFCLASS11234567891109
          GHIJKCLASS21234567891011
          LMNOPCLASS112345678

           

          TABLE B

          CLASSIFICATION
          ATTRIBUTE
          CLASS1ATTRIBUTE1
          CLASS1ATTRIBUTE2
          CLASS1ATTRIBUTE3
          CLASS1ATTRIBUTE4
          CLASS2ATTRIBUTE1
          CLASS2ATTRIBUTE2
          CLASS2ATTRIBUTE3
          CLASS2ATTRIBUTE4
          CLASS2ATTRIBUTE5

           

          Results would be:

          OBJECTNAME
          MISSINGATTRIBUTE
          ABCDEFATTRIBUTE3
          GHIJKATTRIBUTE4
          GHIJKATTRIBUTE5
          LMNOPATTRIBUTE1
          LMNOP
          ATTRIBUTE4

           

          Whenever you have a question, please post a little sample data (CREATE TABLE and INSERT statements, relevant columns only) for all the tables involved, and the exact results you want from that data, so that the people who want to help you can re-create the problem and test their ideas.

          Explain, using specific examples, how you get those results from that data.

          Always say what version of Oracle you're using (e.g. 12.2.0.2.0).

          See the forum FAQ: Re: 2. How do I ask a question on the forums?

           

          You don't need no stinking' PL/SQL for this.

           

          You can find the missing items by doing either an outer join or a MINUS operation.  (I suspect MINUS will be simpler in this problem.)  Unpivot table A to make a "table" with one row per attribute

          • 2. Re: Find missing data from composite table
            Frank Kulash

            Hi,

             

            Here's how you can use MINUS:

            SELECT    a.objectname

            ,         b.attribute

            FROM      a

            JOIN      b  ON  b.classification  = a.objectclass

                MINUS

            SELECT    objectname

            ,         attribute

            FROM      a

            UNPIVOT   (    val

                      FOR  attribute IN (attribute1, attribute2, attribute3, attribute4, attribute5)

                      )

                --

            ORDER BY  objectname, attribute  -- or whatever you want

            ;

            Of course, I can't test it without the tables.

            [EDIT: Now that Solomon has posted some usable sample data (in reply #3, below) I can test it.  The query above works fine.]

             

            If a.objectname is not unique, then the query above will have to be modified to include a unique key.

            • 3. Re: Find missing data from composite table
              Solomon Yakobson

              Next time please provide CREATE TABLE & INSTERT data statements.

               

              SQL> WITH A(OBJECTNAME,OBJECTCLASS,ATTRIBUTE1,ATTRIBUTE2,ATTRIBUTE3,ATTRIBUTE4,ATTRIBUTE5)

                2    AS (

                3        SELECT 'ABCDEF','CLASS1','1234','5678',null,'91109',null from dual union all

                4        SELECT 'GHIJK','CLASS2','1234','5678','91011',null,null from dual union all

                5        SELECT 'LMNOP','CLASS1',null,'1234','5678',null,null from dual

                6       ),

                7       B(CLASSIFICATION,ATTRIBUTE)

                8    AS (

                9        SELECT 'CLASS1','ATTRIBUTE1' from dual union all

              10        SELECT 'CLASS1','ATTRIBUTE2' from dual union all

              11        SELECT 'CLASS1','ATTRIBUTE3' from dual union all

              12        SELECT 'CLASS1','ATTRIBUTE4' from dual union all

              13        SELECT 'CLASS2','ATTRIBUTE1' from dual union all

              14        SELECT 'CLASS2','ATTRIBUTE2' from dual union all

              15        SELECT 'CLASS2','ATTRIBUTE3' from dual union all

              16        SELECT 'CLASS2','ATTRIBUTE4' from dual union all

              17        SELECT 'CLASS2','ATTRIBUTE5' from dual

              18       )

              19  SELECT  A.OBJECTNAME,

              20          B.ATTRIBUTE MISSINGATTRIBUTE

              21    FROM  A,

              22          B

              23    WHERE B.CLASSIFICATION = A.OBJECTCLASS

              24      AND (

              25              (

              26                   B.ATTRIBUTE = 'ATTRIBUTE1'

              27               AND

              28                   A.ATTRIBUTE1 IS NULL

              29              )

              30           OR

              31              (

              32                   B.ATTRIBUTE = 'ATTRIBUTE2'

              33               AND

              34                   A.ATTRIBUTE2 IS NULL

              35              )

              36           OR

              37              (

              38                   B.ATTRIBUTE = 'ATTRIBUTE3'

              39               AND

              40                   A.ATTRIBUTE3 IS NULL

              41              )

              42           OR

              43              (

              44                   B.ATTRIBUTE = 'ATTRIBUTE4'

              45               AND

              46                   A.ATTRIBUTE4 IS NULL

              47              )

              48           OR

              49              (

              50                   B.ATTRIBUTE = 'ATTRIBUTE5'

              51               AND

              52                   A.ATTRIBUTE5 IS NULL

              53              )

              54          )

              55    ORDER BY A.OBJECTNAME,

              56             B.ATTRIBUTE

              57  /

               

              OBJECT MISSINGATT

              ------ ----------

              ABCDEF ATTRIBUTE3

              GHIJK  ATTRIBUTE4

              GHIJK  ATTRIBUTE5

              LMNOP  ATTRIBUTE1

              LMNOP  ATTRIBUTE4

               

              SQL>

               

              SY.

              • 4. Re: Find missing data from composite table
                Stew Ashton

                [UPDATE: as Frank pointed out, I misunderstood the logic in his reply 2. I am striking out the mistaken text and replacing it.]

                 

                I like Frank's UNPIVOT idea, but since you want the data that has NULL values you have to say INCLUDE NULLS. I like Frank's UNPIVOT idea, but I'm going to use it the other way around. I'll use UNPIVOT INCLUDE NULLS on table A, then keep only the rows with NULL attributes that should be there according to table B.

                 

                I'm borrowing Solomon's WITH clauses with thanks!

                Please note that the solution itself starts on line 19.

                 

                WITH A(OBJECTNAME,OBJECTCLASS,ATTRIBUTE1,ATTRIBUTE2,ATTRIBUTE3,ATTRIBUTE4,ATTRIBUTE5)
                   AS (
                       SELECT 'ABCDEF','CLASS1','1234','5678',null,'91109',null from dual union all
                       SELECT 'GHIJK','CLASS2','1234','5678','91011',null,null from dual union all
                       SELECT 'LMNOP','CLASS1',null,'1234','5678',null,null from dual
                      ),
                      B(CLASSIFICATION,ATTRIBUTE)
                   AS (
                       SELECT 'CLASS1','ATTRIBUTE1' from dual union all
                      SELECT 'CLASS1','ATTRIBUTE2' from dual union all
                      SELECT 'CLASS1','ATTRIBUTE3' from dual union all
                      SELECT 'CLASS1','ATTRIBUTE4' from dual union all
                      SELECT 'CLASS2','ATTRIBUTE1' from dual union all
                      SELECT 'CLASS2','ATTRIBUTE2' from dual union all
                      SELECT 'CLASS2','ATTRIBUTE3' from dual union all
                      SELECT 'CLASS2','ATTRIBUTE4' from dual union all
                      SELECT 'CLASS2','ATTRIBUTE5' from dual
                     )
                select OBJECTNAME, OBJECTCLASS, ATTR
                from a
                unpivot include nulls (
                  attribute for attr in(ATTRIBUTE1, ATTRIBUTE2, ATTRIBUTE3, ATTRIBUTE4, ATTRIBUTE5)
                )
                where attribute is null
                and (OBJECTCLASS, ATTR) in (select * from b)
                order by 1,3;
                
                OBJECT OBJECT ATTR      
                ------ ------ ----------
                ABCDEF CLASS1 ATTRIBUTE3
                GHIJK  CLASS2 ATTRIBUTE4
                GHIJK  CLASS2 ATTRIBUTE5
                LMNOP  CLASS1 ATTRIBUTE1
                LMNOP  CLASS1 ATTRIBUTE4
                

                 

                Best regards,

                Stew Ashton

                • 5. Re: Find missing data from composite table
                  Frank Kulash

                  Hi, Stew,

                   

                  Thanks for your kind words!

                  Stew Ashton wrote:

                   

                  I like Frank's UNPIVOT idea, but since you want the data that has NULL values you have to say INCLUDE NULLS. ...

                  I think you meant to say "In the query below you have to say INCLUDE NULLS".  I don't believe you meant there was an error in reply #2.

                   

                  The whole point of MINUS is that we want the unpivoted data to include only the columns that have values, and not NULL.

                  The query in reply #2 works as posted; saying "INCLUDE NULLS" in that query would cause it to produce 0 rows.

                  • 6. Re: Find missing data from composite table
                    Frank Kulash

                    Hi,

                     

                    In case you're wondering how an outer join would work:

                    WITH    unpivoted_a    AS

                    (

                        SELECT  objectname, attribute

                        FROM    a

                        UNPIVOT   (    val

                                  FOR  attribute IN (attribute1, attribute2, attribute3, attribute4, attribute5)

                                  )

                    )

                    SELECT    a.objectname

                    ,         b.attribute

                    FROM      a 

                    JOIN      b               ON   b.classification  = a.objectclass

                    LEFT JOIN unpivoted_a  u  ON   u.objectname  = a.objectname

                                              AND  u.attribute   = b.attribute

                    WHERE     u.attribute  IS NULL

                    ORDER BY  a.objectname, b.attribute  -- or whatever you want

                    ;

                    I'm not saying this is the best solution.  In fact, it might be the worst solution posted so far.

                     

                    By the way, UNPIVOT assumes that all 5 attribute columns have the same data type (or very similar types, e.g. DATEs and TIMESTAMPs).

                    • 7. Re: Find missing data from composite table
                      Stew Ashton

                      Hi Frank,

                       

                      Absolutely right. I was in a rush and mixed up your idea with mine. You were getting everything required, minus what was not missing. I was getting what was missing and that was also required.

                       

                      Sorry about that...

                       

                      Regards, Stew

                      • 8. Re: Find missing data from composite table
                        mathguy

                        The UNPIVOT solutions must be modified in the (common) case that the attribute columns in the real-life equivalent of Table A may be of different data types. (Obviously, it is not possible to unpivot columns of different data types.) This can be fixed relatively easily in this problem - for example by pre-processing the attribute columns, wrapping each one within NVL2(col_name, 0, null) or similar. By comparison, Solomon's join solution does not suffer from this defect.

                         

                        Then: I built a table A with 600,000 rows, and kept table B as is, to do some testing. I can share the table creation code if anyone is interested. I created table A so that about 1% of rows have missing attributes (the exact number turned out to be 5,811 rows in the output; an input OBJECTNAME may miss more than one attribute, so probably the fraction of objects with missing attributes is closer to 0.7% in my example).

                         

                        It turns out that Mr. Kulash's version of UNPIVOT approach is far and away the worst - it takes about 23 seconds to complete on my test data. By comparison, Solomon's solution takes 0.41 seconds (50 times faster), and your (Stew's) UNPIVOT solution only takes 0.16 seconds - 2.5 times faster than Solomon's and two orders of magnitude faster than the Correct Answer.

                         

                        My first instinct was just the opposite of yours. The data in Table A is already "grouped" into rows. Instead of unpivoting it, I would rather pivot table B, to speed up the join in Solomon's solution. Only at the end, I still need to unpivot the results, to get the answer in the format requested by the OP (although I don't know why that format is needed). If that happens on a relatively small number of final rows, compared to the inputs in table A, this may prove to be faster. I have a purely speculative and subjective suspicion that reducing the number of rows in computations, even at the cost of a comparable increase in the complexity of processing each row, will often result in better execution (even if not by a large margin). I am not sure how relevant this is, but in my test case, the solution I was able to come up with runs in 0.11 seconds; a reduction of about 30% in execution time, not much to get excited about, but it does seem to support my suspicion. Still to be tested: what happens if there are more classes, and/or if there are more attributes (say, 30 possible attributes, not just five).

                         

                        The query looks like this:

                         

                        with

                          bb (classification, a1, a2, a3, a4, a5) as (

                            select classification, a1, a2, a3, a4, a5

                            from   b

                            pivot  (max(0) for attribute in ('ATTRIBUTE1' as a1, 'ATTRIBUTE2' as a2, 'ATTRIBUTE3' as a3, 'ATTRIBUTE4' as a4, 'ATTRIBUTE5' as a5))

                          )

                        , j (objectname, a1, a2, a3, a4, a5) as (

                            select a.objectname, case when a.attribute1 is null and bb.a1 is not null then 1 end,  -- values 1, 2, ...  used for ordering below

                                                 case when a.attribute2 is null and bb.a2 is not null then 2 end,

                                                 case when a.attribute3 is null and bb.a3 is not null then 3 end,

                                                 case when a.attribute4 is null and bb.a4 is not null then 4 end,

                                                 case when a.attribute5 is null and bb.a5 is not null then 5 end

                            from   a inner join bb

                                     on  a.objectclass = bb.classification

                                     and ( a.attribute1 is null and bb.a1 is not null or

                                           a.attribute2 is null and bb.a2 is not null or

                                           a.attribute3 is null and bb.a3 is not null or

                                           a.attribute4 is null and bb.a4 is not null or

                                           a.attribute5 is null and bb.a5 is not null    )

                          )

                        select  objectname, col as missing_attribute

                        from    j

                        unpivot (val for col in (a1 as 'ATTRIBUTE1', a2 as 'ATTRIBUTE2', a3 as 'ATTRIBUTE3', a4 as 'ATTRIBUTE4', a5 as 'ATTRIBUTE5'))

                        order   by objectname, val                           -- correct ordering is by val or similar; 'ATTRIBUTE11' < 'ATTRIBUTE2' !

                        ;