Suppose we have data as in the table TEST_DATA below (ID is primary key, all columns are NUMBER data type, blank spaces signify NULL):
ID COL1 COL2 COL3 COL4 COL5
---------- ---------- ---------- ---------- ---------- ----------
1 100 200 300 400 500
2 300 100 100 300 300
3 300 100 400
4
5 300 300
And we want to write a query that will return the ID and the highest NON-NULL value in each row (return NULL only if all column values, COL1 through COL5, are NULL, as for ID = 4).
Factored subquery to replicate the table is provided below, and statements to create a much bigger table, for testing, are provided further down in the message.
One may ask, why is the data in that structure to begin with. (It would be better, perhaps, if the table had just three columns: ID, COL_NAME, and VALUE.) For the sake of this exercise, let's accept that the base table just is in this structure.
If no NULLs were present, we could simply use GREATEST(COL1, ... , COL5). Alas, GREATEST and LEAST return NULL if at least one argument is NULL, and that's not what our problem requires.
MIN and MAX work the way we want (they ignore NULLs) but they only work on columns, not on rows. It is easy to write a query that unpivots the base table, then groups by ID and takes the MAX. But this has two disadvantages. One is that, in the original arrangement, each set of five values was in effect already "grouped" by ID (by the fact that the values were all in one row). This is something that we can use to our advantage - for example if a function like GREATEST IGNORE NULLS (COL1, ... , COL5) existed. Unfortunately it doesn't. And the second disadvantage, demonstrated recently in another thread by Solomon Yakobson, is that UNPIVOT, unfortunately, does a UNION ALL behind the scenes - in this case, the table is read five times. (So UNPIVOT will do the same job as "manual unpivoting", which is just a UNION ALL of SELECT ID, COL1 FROM TEST_DATA UNION ALL SELECT ID, COL2 FROM... etc.)
with
test_data ( id, col1, col2, col3, col4, col5 ) as (
select 1, 100 , 200 , 300 , 400 , 500 from dual union all
select 2, 300 , 100 , 100 , 300 , 300 from dual union all
select 3, null, 300 , 100 , null, 400 from dual union all
select 4, null, null, null, null, null from dual union all
select 5, null, 300 , null, null, 300 from dual
)
select id, max(val) as high_val
from test_data
unpivot include nulls ( val for source in (col1, col2, col3, col4, col5) )
group by id
order by id
;
ID HIGH_VAL
---------- ----------
1 500
2 300
3 400
4
5 300
So - in order to avoid un-grouping only to group again just so we can use MAX, we may instead work a little harder on the original idea - use GREATEST, but find a way to handle NULLs. Here is one way:
select id,
greatest ( coalesce(col1, col2, col3, col4, col5),
coalesce(col2, col3, col4, col5, col1),
coalesce(col3, col4, col5, col1, col2),
coalesce(col4, col5, col1, col2, col3),
coalesce(col5, col1, col2, col3, col4)
) as high_val
from test_data
;
There are other ways to write this, but they are all ugly (and become uglier with the number of columns involved). However, testing - described later - shows that it's efficient.
There are various ways to write solutions with correlated subqueries, but they don't seem efficient. In Oracle 12 and above, we can use the LATERAL clause (see below). I was hoping this may be more efficient, but I couldn't figure out how to use it efficiently (or perhaps there is no "good" way to use it). Here is what I came up with.
select t.id, s.high_val
from test_data t,
lateral
( select max(y.val) as high_val
from (
select t.id as id
from dual
) x,
lateral (
select id, val
from test_data
unpivot ( val for source in (col1, col2, col3, col4, col5) )
where id = x.id
) y
) s
order by id
;
(There is also a more concise way to write this, with only one LATERAL clause, but that version - in effect - correlates two levels deep; as Solomon Yakobson pointed out in another thread today, that only works in some versions of Oracle, and only because Oracle didn't get around to prevent it yet. And performance testing shows there is no improvement; the query is shorter but the execution time is about the same.)
My question in this thread is: Are there better ways than the ones already discussed, to take advantage of the way the values are already presented in rows?
Statements to create a bigger TEST_DATA table: (NOTE - I didn't bother to add NULLs since that is unlikely to affect performance much, as long as I don't change the solutions.)
create table test_data ( id number, col1 number, col2 number, col3 number, col4 number, col5 number );
insert into test_data
select level, dbms_random.value(), dbms_random.value(), dbms_random.value(), dbms_random.value(), dbms_random.value()
from dual
connect by level <= 1000000
;
commit;
On this data, the solution with GREATEST, rigged to handle NULLs, runs in a little over 3 seconds on my laptop. The UNPIVOT and MAX solution takes about 8 seconds, and the LATERAL solution about 7 seconds (with some variation from one test to the next; I would say there isn't much of a difference between UNPIVOT/MAX and LATERAL).
Cheers, - mathguy