Discussions
Categories
- 196.9K All Categories
- 2.2K Data
- 239 Big Data Appliance
- 1.9K Data Science
- 450.3K Databases
- 221.7K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 550 MySQL Community Space
- 478 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3K ORDS, SODA & JSON in the Database
- 545 SQLcl
- 4K SQL Developer Data Modeler
- 187K SQL & PL/SQL
- 21.3K SQL Developer
- 295.9K Development
- 17 Developer Projects
- 138 Programming Languages
- 292.6K Development Tools
- 107 DevOps
- 3.1K QA/Testing
- 646K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 155 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.1K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 18 Java Essentials
- 160 Java 8 Questions
- 86K Java Programming
- 80 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 204 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 440 LiveLabs
- 38 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 171 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 232 Portuguese
Best practice query

Hi,
For a while I'm questioning myself about a best practice query for performance.
I've 2 tables.
EMPLOYEES and DEPARTMENTS
The EMPLOYEES table can contains millions of records.
Which query is better for performance ?
Query 1
Select t1.emp_id, t1.emp_name, t2.dept_name from EMPLOYEES t1, DEPARTMENTS t2 where t1.dept_id = t2.dept_id order by t1.emp_name
Query 2
Select emp_id, emp_name, (select mypackage.get_dept_name(dept_id) from dual) as dept_name from EMPLOYEES order by emp_name
Answers
-
Query 1b
Use ANSI joins
Select t1.emp_id, t1.emp_name, t2.dept_name from EMPLOYEES t1 Left join DEPARTMENTS t2. On t1.dept_id = t2.dept_id order by t1.emp_name
In Query 2, the scalar sub query is ran for each returning row of
EMPLOYEES
. -
To be equivalent, emp.department_id must be fk not null. And depends on function.
-
Calling a pl/sql function to return a string that could simply be a column (or concatenation of columns) from a table is introducing an overhead and a possible threat. It's not necessary.
Assuming you have the standard referential integrity between employees and departments (i.e. unique id for departments, foreign key on employees to departments) then your inline scalar subquery could be:
(select dep.dept_name from departments dep where dep.dept_id = emp.dept_id)
It's worth noting that the return from the function would be varchar2(4000) which might affect the output of the final query, or if you're pulling the data across a network might (depending on the client tool) waste a lot of memory if (for example) the client code allocated memory based on the maximum possible return length).
If employees allows for employees not yet assigned to a department (i.e. null dept_id) this would report NULL in the final query, whereas the join would lose the employee unless you changed the join to an outer join.
You are presumably hoping to benefit from scalar subquery caching with the scalar subquery approach - or possibly result caching, or pl/sql function caching. Two things to remember if you depend on such things - how long does the cache persist, what happens if you get a cache collision. The more different departments you have (and millions of employees may mean thousands of departments) the more likely you are to get a hash collision on the cache and end up with randomly variable performance.
In your example the simply join is likely to be a hash join using the departments table as the build table with optimal workarea usage - which means an in-memory hash lookup by each employee, and you can't really get much faster than that.
With an example like this "millions of employees, no filtering" your performance problems are more likely to be related to transporting the volume of data to the client - so setting a large fetch arraysize is likely to be far more important than fiddling with such simple SQL. And while there is a way to write the query so that it can perform a little faster, there are two other aspects of coding to consider: how fragile is the code, and what are the chances that the next person who has to modify the code will not understand the tweaks you've used and do something that makes things worse.
Here's a very old example of the performance instability that can appear with scalar subquery caching - in this case it's a filter subquery, but the principle is the same: https://jonathanlewis.wordpress.com/2006/11/06/filter-subqueries/
Regards
Jonathan Lewis
-
Thank you all for your replies & tips.
So better go for Query 1 or Query 1b.
Is there any difference in term of performance between Query 1 & Query 1b ?
-
Bear in mind that if employees.dept_id is allowed to be null then the two queries will return different results. In that case the outer join equivalent using Oracle's traditional syntax would be:
Select t1.emp_id, t1.emp_name, t2.dept_name from EMPLOYEES t1, DEPARTMENTS t2 where t1.dept_id = t2.dept_id(+) order by t1.emp_name
(which you could view as "we add an empty row to t2/departments to give employees without a department something to join to")
Generally it is now seen strategically as a good thing to use the "ANSI" syntax rather than traditional Oracle syntax. There are a few oddities (whichevet you choose) that mean in some boundary cases you get different plans as you switch. In this case there would be no performance difference.
Regards
Jonathan Lewis
-
Thank you Jonathan.
Regards