Skip navigation

Bert Scalzo's Blog

June 2016 Previous month Next month

In Part 1 of this topic I proposed that while SQL is not an overly complex language in theory the SELECT command syntax permits numerous equivalent ways to write queries. Therefore the SQL language really is not as simple as we generally believe.

 

Here’s the simple query I utilized as the starting point for discussion:

 

select  a.LASTNAME, a.FIRSTNAME, b.RENTALDATE,

           d.COPYFORMAT, sum(b.TOTALCHARGE) REVENUE

from employee a, movierental b, rentalitem c, moviecopy d

where a.EMPLOYEEID = b.EMPLOYEEID

   and b.RENTALID = c.RENTALID

   and c.MOVIECOPYID = d.MOVIECOPYID

   and a.LASTNAME = 'Smith'

   and b.CUSTOMERID IN (

              select customerid

              from customer e

              where e.address like 'A%' or

                        e.address like 'F%' or

                        e.address like 'T%' or

                        e.address like 'Z%'

              )

group by a.LASTNAME, a.FIRSTNAME, b.RENTALDATE, d.COPYFORMAT

 

In Part 1 I showed four additional distinct yet equivalent ways to code the above culminating in the ideal solution (i.e. using all the latest and greatest syntax features) shown here:

 

with f as ( select customerid

                from customer e

                 where e.address like 'A%' or

                          e.address like 'F%' or

                          e.address like 'T%' or

                          e.address like 'Z%'

             )

select  a.LASTNAME, a.FIRSTNAME, b.RENTALDATE,

           d.COPYFORMAT, sum(b.TOTALCHARGE) REVENUE

from employee a JOIN movierental b using(EMPLOYEEID)

                          JOIN rentalitem c using (RENTALID)

                          JOIN moviecopy d using(MOVIECOPYID),

                          f     

where a.LASTNAME = 'Smith'

   and b.CUSTOMERID = f.CUSTOMERID             

group by a.LASTNAME, a.FIRSTNAME, b.RENTALDATE, d.COPYFORMAT

 

The key point is that the SELECT syntax supports multiple ways to accomplish the same results. Add in that SQL is a set oriented language and that various Oracle versions’ optimizers work differently, and the true number of combinations is even greater. Thus the first way one codes a SELECT might capture the correct solution concept (effectiveness) but might require additional thought for what will yield the fastest speed (efficiency).

 

For example using Oracle 11g R2 with current statistics gathered on both the tables and indexes the explain for query styles #1, #2, #4 and #5 is:

 

img5.png

 

Whereas the explain for query style #3 is almost the same – with just the addition of an extra sort unique added by the EXISTS clause (and highlighted by a red arrow):

 

img6.png

 

For the test data loaded the run times are identical. So even though query style #3 has an extra step, all five coding styles essentially result in the exact same performance in this case. But that might not always be true. It may well happen that the coding style best leveraging newer features could result in a more efficient explain plan and better run time. You’ll need to verify the results for yourself. As President Reagan used to say “Trust but verify”, in this case meaning trust the Oracle optimizer to do its job but verify that we provide it the best query code to optimize.

 

Of course when you don’t have current statistics collected, all five query styles result in a highly inefficient explain plan as shown below (note all the red highlighted FULL TABLE SCANS). So always make sure you have current statistics gathered for tables and indexes.

 

img7.png

SQL is not an overly complex language in theory. The data manipulation language or DML consist of four primary commands: INSERT, UPDATE, DELETE and SELECT. Newer versions of Oracle also offer the MERGE command, which is an “upsert” – meaning it tries an update followed by an insert as a single command. Since the SELECT command is the one we use most since queries are the majority of database operations, let’s examine statement complexity in real life to see if the SQL language really is simple.

 

Let’s use the following data model for a movie rental system as the basis for our SQL construction examples that follow.

 

img3.png

 

What we want is a basic report that yields the rental date, total revenue per movie format made by employees with last name of  “Smith” for customers whose address starts with the letter A, F, T or Z. The results would match the following.

 

img4.png

 

So here’s the way a typical person might write the query:

 

select  a.LASTNAME, a.FIRSTNAME, b.RENTALDATE,

           d.COPYFORMAT, sum(b.TOTALCHARGE) REVENUE

from employee a, movierental b, rentalitem c, moviecopy d

where a.EMPLOYEEID = b.EMPLOYEEID

   and b.RENTALID = c.RENTALID

   and c.MOVIECOPYID = d.MOVIECOPYID

   and a.LASTNAME = 'Smith'

   and b.CUSTOMERID IN (

              select customerid

              from customer e

              where e.address like 'A%' or

                        e.address like 'F%' or

                        e.address like 'T%' or

                        e.address like 'Z%'

              )

group by a.LASTNAME, a.FIRSTNAME, b.RENTALDATE, d.COPYFORMAT

 

What are all of the “main stream” or normally expectably ways that one can rewrite this query and yet retain the exact same results? A tool like Quest’s SQL Optimizer might return hundreds or even thousands of rewrites in an effort to find the most efficient code. But what are the primary rewrites a human might be expected to write?

First, we know that using non-ANSI syntax for JOIN’s is to generally be avoided. So we can change the SELECT as follows.

 

select  a.LASTNAME, a.FIRSTNAME, b.RENTALDATE,

           d.COPYFORMAT, sum(b.TOTALCHARGE) REVENUE

from employee a JOIN movierental b using(EMPLOYEEID)

                          JOIN rentalitem c using (RENTALID)

                          JOIN moviecopy d using(MOVIECOPYID)

where a.LASTNAME = 'Smith'

   and b.CUSTOMERID IN (

              select customerid

              from customer e

              where e.address like 'A%' or

                        e.address like 'F%' or

                        e.address like 'T%' or

                        e.address like 'Z%'

              )

group by a.LASTNAME, a.FIRSTNAME, b.RENTALDATE, d.COPYFORMAT

 

That makes the WHERE clause much easier to read since the JOIN code is now in the FROM clause with the tables. This exposes that we have an IN clause – which sometimes might not be the best way to code it. So we’ll next switch to using an EXISTS clause. Yes – some newer versions of the Oracle optimizer might automatically make this switch internally for us. But when in doubt, it’s always better to code it the proper or most efficient way for all cases. So now we can change the SELECT as follows.

 

select  a.LASTNAME, a.FIRSTNAME, b.RENTALDATE,

                                 d.COPYFORMAT, sum(b.TOTALCHARGE) REVENUE

from employee a JOIN movierental b using(EMPLOYEEID)

                          JOIN rentalitem c using (RENTALID)

                          JOIN moviecopy d using(MOVIECOPYID)

where a.LASTNAME = 'Smith'

   and EXISTS (

              select 1

              from customer e

              where e.CUSTOMERID = b.CUSTOMERID

                and (e.address like 'A%' or

                        e.address like 'F%' or

                        e.address like 'T%' or

                        e.address like 'Z%'

                      ) 

              )

group by a.LASTNAME, a.FIRSTNAME, b.RENTALDATE, d.COPYFORMAT

 

Looking better – but we could also code that correlated sub-select as an “inline view”. So yet once again we can change the SELECT as follows.

 

select  a.LASTNAME, a.FIRSTNAME, b.RENTALDATE,

           d.COPYFORMAT, sum(b.TOTALCHARGE) REVENUE

from employee a JOIN movierental b using(EMPLOYEEID)

                          JOIN rentalitem c using (RENTALID)

                          JOIN moviecopy d using(MOVIECOPYID),

        ( select customerid

              from customer e

              where e.address like 'A%' or

                        e.address like 'F%' or

                        e.address like 'T%' or

                        e.address like 'Z%'

        ) e

where a.LASTNAME = 'Smith'

   and b.CUSTOMERID = e.CUSTOMERID             

group by a.LASTNAME, a.FIRSTNAME, b.RENTALDATE, d.COPYFORMAT

 

But we’re still not quite done yet. SQL also offers the very useful WITH clause that can often be very efficient in some cases. So finally we can change the SELECT as follows.

 

with f as ( select customerid

                from customer e

                 where e.address like 'A%' or

                          e.address like 'F%' or

                          e.address like 'T%' or

                          e.address like 'Z%'

             )

select  a.LASTNAME, a.FIRSTNAME, b.RENTALDATE,

           d.COPYFORMAT, sum(b.TOTALCHARGE) REVENUE

from employee a JOIN movierental b using(EMPLOYEEID)

                          JOIN rentalitem c using (RENTALID)

                          JOIN moviecopy d using(MOVIECOPYID),

                          f     

where a.LASTNAME = 'Smith'

   and b.CUSTOMERID = f.CUSTOMERID             

group by a.LASTNAME, a.FIRSTNAME, b.RENTALDATE, d.COPYFORMAT

 

Wow – that’s five major ways to rewrite this query. Since ANSI joins are the recommended way to go, the question remains which of these four coding styles is the more correct? The academia reply is using the WITH clause, as it leverages all the latest and greatest SQL language constructs available. But the correlated sub-query and inline view solutions are very typical of what someone might write – and there is seemingly nothing wrong with those approaches either. So I would not recommend a forced rewrite as long as the code is like one of the last three examples.

 

So which one yields the best performance? We’ll leave that discussion to next time.

There are two very distinct and unique types of bitmap indexes within Oracle: bitmap indexes and bitmap join indexes. Let’s examine the fundamental difference between them.

 

Let’s assume we’re indexing a star schema designed set of tables. To keep it simple let’s suppose the design has one fact table and two dimensions as show by the physical data model here in Figure #1.

 

img1.png

 

If we implement the above table design using traditional bitmap indexes then the fact table DDL would look as follows:

 

-- Create indexes for table FACT using traditional bitmap indexes

CREATE UNIQUE INDEX FACT_PK ON FACT ( FACT_DATE, DIM1_FK, DIM2_FK );

CREATE BITMAP INDEX DIM1_FK ON FACT ( DIM1_FK );

CREATE BITMAP INDEX DIM2_FK ON FACT ( DIM2_FK );

 

The FACT_PK index would most likely only be active (i.e. valid or online) during batch load operations to maintain the data fidelity. Then during predominately ad-hoc query only times the two bitmap indexes would be used in order to affect the Oracle “star transformation” explain plan. As for the FACT_DATE column, the table would most likely be partitioned by that value – therefore an index might or might not be advantageous (i.e. depends upon your data, its distribution, its skew and other factors).

 

If now we instead implement the same table design using the specialized bitmap join index then the fact table DDL would look as follows:

 

-- Create indexes for table FACT using specialized bitmap join index

CREATE UNIQUE INDEX FACT_PK ON FACT ( FACT_DATE, DIM1_FK, DIM2_FK );

CREATE BITMAP INDEX FACT_BJ1 ON FACT ( DIM1_FK )

FROM FACT, DIM1

WHERE FACT.DIM1_FK = DIM1.ATTRIBUTE1;

CREATE BITMAP INDEX FACT_BJ2 ON FACT ( DIM2_FK )

FROM FACT, DIM2

WHERE FACT.DIM2_FK = DIM2.ATTRIBUTE1;

 

In fact you might even decide that a single bitmap join index can handle all the queries, in which case the DDL would now look like this:

 

-- Create indexes for table FACT using specialized bitmap join index

CREATE UNIQUE INDEX FACT_PK ON FACT ( FACT_DATE, DIM1_FK, DIM2_FK );

CREATE BITMAP INDEX FACT_BJ ON FACT ( DIM1_FK, DIM2_FK )

FROM FACT, DIM1, DIM2

WHERE FACT.DIM1_FK = DIM1.ATTRIBUTE1

  AND FACT.DIM2_FK = DIM2.ATTRIBUTE1;

 

Let’s examine this last case to see just what a bitmap join index is and how it works. Back when data warehouses we first becoming popular there was a database vendor and platform called “Red Brick” founded by Ralph Kimball of data warehousing fame. A critical concept in many of these specialized data warehouse database engines was the concept of a “star index” – i.e. an index on the fact table which contains the row id’s for both the fact table rows and the dimension table rows. That’s exactly what we constructed in the last example using the bitmap join index. Here’s a simple diagram that displays what such a “star index” looks like – which is what a “bitmap join” including all dimensions implements.

 

img2.jpg