1 Reply Latest reply: Jul 24, 2013 5:27 PM by Frank Kulash RSS

    Does Outer join works and why so fast?

    Thomas Morgan

      I have a table called Subscription_Features with at least following columns:

       

      USER_ID           NUMBER

      FEATURE_ID    NUMBER

       

      In this table I can represent sets of users that have a given feature associated with them.  I need to findout the set of users that have feature X but do not have feature Y.

       

      One way to do this is to use following query:

       

          SELECT sf1.User_Id

          FROM   Subscription_Features sf1

          WHERE  sf1.Feature_Id       = X

          AND        sf1.User_Id        not in (select sf2.user_id from  Subscription_Features sf2 where sf2.Feature_id  = Y);

       

      However, when I have two features that each have around 650,000 records, this query takes a long time.

       

      So, I found that following appears to work and it only takes a couple of seconds:

       

          SELECT sf1.User_Id

          FROM   Subscription_Features sf1  LEFT JOIN Subscription_Features sf2 ON sf1.User_Id = sf2.User_Id AND sf2.Feature_id  = Y

          WHERE  sf1.Feature_Id       = X

          AND    sf2.Subscription_Id  is null;

       

      Two questions:

       

        1) Is the second query correct.

       

        2) If answer to 1 is Yes, does anybody have an idea why it works so fast?

       

      Thanks,

       

      Thomas

        • 1. Re: Does Outer join works and why so fast?
          Frank Kulash

          Hi, Thomas,

           

          Both queries you posted will give you the correct user_ids.

          If the combination (user_id, feature_id) is not unique, then the outer join may give you more duplicate rows.  I doubt that's a problem for you, since the first approach may produce duplicates also, just not so many of them.

           

          If you're really interested in why the outer join is faster, generate and post the epxlain plan results, as mentioned in the forum FAQ: https://forums.oracle.com/message/9362003

          I suspect that user_id is indexed, and is failry selective; that is, out of, say, 3,000,000 rows in the table, you might have 1,000,000 distinct values of user_id.  If that's the case, then every time you find a row with an X, you only have to join an average of 3 other rows.  If you do an IN sub-query, You're saying it will produce a list of about 650,000 user_ids, and that will be unindexed, so searching for a given id in that list will be much slower than 3 indexed fetches.