Discussions
Categories
- 196.7K All Categories
- 2.2K Data
- 235 Big Data Appliance
- 1.9K Data Science
- 449.8K Databases
- 221.5K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 549 MySQL Community Space
- 477 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3K ORDS, SODA & JSON in the Database
- 532 SQLcl
- 4K SQL Developer Data Modeler
- 186.8K SQL & PL/SQL
- 21.2K SQL Developer
- 295.4K Development
- 17 Developer Projects
- 138 Programming Languages
- 292.1K Development Tools
- 104 DevOps
- 3.1K QA/Testing
- 645.9K Java
- 27 Java Learning Subscription
- 37K Database Connectivity
- 153 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.1K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 17 Java Essentials
- 158 Java 8 Questions
- 85.9K Java Programming
- 79 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.2K Java SE
- 13.8K Java Security
- 203 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 390 LiveLabs
- 37 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.6K Other Languages
- 2.3K Chinese
- 170 Deutsche Oracle Community
- 1K Español
- 1.9K Japanese
- 230 Portuguese
Pipelined Stored Procedure optimization

This has probably been beaten to death but my search yields many answers.
Wondering if someone can point me to a decent resource on accomplishing the below goals
1. Returning a large table from complex query (5k-2.5M) records from 11M records resultset of the complex query
2. Applying Row Level Security (preferably from stored procedure) for best performance and best security to a highly sensitive set of data
Questions,
1. Do I need a Table Type, Row Type, and Function or Package to return this efficiently?
2. I have resources online saying BULKCOLLECT (performance issue in shared environment), REF CURSOR, SYS REF CURSOR. Which way to go?
3. Should I run in PARALLEL hint or any others?
Any links or syntax help is appreciated
Thanks,
Answers
-
-
Do you already have a pipelined function, as your question title implies or not?
1) If you want to do pipelined then you need a table type based on a row type and a function - it doesn't work any other way. You don't need a package but you should be putting all your functions/procedures in a package as a matter of course.
2) Bulk Collect and ref cursors do fundamentally different things so aren't meaningfully comparable.
Bulk Collect is an efficient way of fetching data from a select into an array inside PL/SQL.
ref cursors mainly exist to give external programs a way to interact with a select statement in the DB.
3) Depends on whether your server can handle it and whether it'd do anything useful for the query - and we know nothing about the query.
If you want more detailed suggestions you really need to give us a lot more details about what you are trying to do. The actual query would be a good idea.
-
brilliant wrote:2. Applying Row Level Security (preferably from stored procedure) for best performance and best security to a highly sensitive set of data
For this part you should not build your own: use VPD: https://docs.oracle.com/database/121/DBSEG/vpd.htm#DBSEG007
Why reinvent the wheel?
-
brilliant wrote:This has probably been beaten to death but my search yields many answers.Wondering if someone can point me to a decent resource on accomplishing the below goals1. Returning a large table from complex query (5k-2.5M) records from 11M records resultset of the complex query2. Applying Row Level Security (preferably from stored procedure) for best performance and best security to a highly sensitive set of dataQuestions,1. Do I need a Table Type, Row Type, and Function or Package to return this efficiently?2. I have resources online saying BULKCOLLECT (performance issue in shared environment), REF CURSOR, SYS REF CURSOR. Which way to go?3. Should I run in PARALLEL hint or any others?Any links or syntax help is appreciatedThanks,
Just to be 100% clear, taking a query and sticking it in a pipelined function is not going to make it execute and return rows faster. It will probably be slower. Pipelined functions are for when you have to produce the rows using PL/SQL and means that instead of generating the full data set and then transferring it, you generate it chunk by chunk.
It is extremely unusual to require any query returns that sort of volume of data, are you missing an aggregate? Do you only want the first few results? Think of your end users, they are not going to read all that information, ever.
If your bottleneck is in transferring 2.5 million rows from DB server to the client over a network then parallelism will only hurt you.
-
All great suggestions and pointers. Here's my problem
1. I have an external system being used for Analytics/Data Visualizations
2. The data is super sensitive therefore I have a live connect with between the external system and Oracle DB
3. Given it's sensitivity, having a stored procedure and forcing an input parameter of the user id ensures secured access versus Views or Materialized Views
4. The procedure can be wrapped into a package, function, utilizing Table Type via Row Type
5. There's 11M records in the SELECT statement outcome
6. After applying Row Level Security, user fetches can vary between a few thousands to a few million records but never the entire recordset of 11M records
7. the external system computes the ratios and other aggregations.
8. The data is as compressed an aggregated with just the right amount of fields to support useful analysis
Given the above parameters of my scenario I am trying to implement a stored procedure via pipelined approach to return results to the external system. If I wait for 2M records to accumulate in the DB and then send it across via BULKCOLLECT, this could be a drain on server resources and cause possible network delays etc.
What's teh best strategy to accomplish this with reasonable performance in DB? Views and Materialized views slowed to a crawl.
Procedures being compiled code, we have leverage to control the execution plan and optimize along with indexes etc.
Hope this gives a better picture. I could paste the syntax but this is more of a performance optimization best practice and standards versus syntactic problem.
Thoughts?
-
Just to be clear -- does your pipelined function do anything other than apply your "row-level" security? That is, does it transform your input query results, or just filter them?
Gerard
-
brilliant wrote:All great suggestions and pointers. Here's my problem1. I have an external system being used for Analytics/Data Visualizations2. The data is super sensitive therefore I have a live connect with between the external system and Oracle DB3. Given it's sensitivity, having a stored procedure and forcing an input parameter of the user id ensures secured access versus Views or Materialized Views4. The procedure can be wrapped into a package, function, utilizing Table Type via Row Type5. There's 11M records in the SELECT statement outcome6. After applying Row Level Security, user fetches can vary between a few thousands to a few million records but never the entire recordset of 11M records7. the external system computes the ratios and other aggregations.8. The data is as compressed an aggregated with just the right amount of fields to support useful analysisGiven the above parameters of my scenario I am trying to implement a stored procedure via pipelined approach to return results to the external system. If I wait for 2M records to accumulate in the DB and then send it across via BULKCOLLECT, this could be a drain on server resources and cause possible network delays etc.What's teh best strategy to accomplish this with reasonable performance in DB? Views and Materialized views slowed to a crawl.Procedures being compiled code, we have leverage to control the execution plan and optimize along with indexes etc.Hope this gives a better picture. I could paste the syntax but this is more of a performance optimization best practice and standards versus syntactic problem.Thoughts?
It sounds to me like you're doing something like this:
Call asks for everything for USER A
Procedure selects massive result set (11 million rows)
Procedure removes everything from the massive result set that doesn't belong to USER A (discards up to 10.9 million rows)
Procedure gives the remaining rows to the caller
Caller then does some grouping on the result set and probably returns barely anything to the end user
1) Move your filter to act against the data as soon as you can
select my_cols from big_view where user=my_user;
2) Let the client fetch from that query, use a ref cursor
open returning_ref_cursor for select my_cols from big_view where user=my_user;
3) Change the SQL so it does the whole thing (ratios and aggregations are so simple to be done in the database and will be much quicker that sending a ton of data across the network) - It's usually possible to create fast refresh MViews that store this sort of data and can be updated very quickly when changes are made to the source data.
The dbms_rls way would be to:
Add a policy to the views that the caller can use that filters on user_id (or whatever your column is called)
Allow the caller to directly query the view.
Again, aggregation can be done inside the view to really reduce the overhead everywhere.
-
Hi,
thanks for for your suggestions. The dbms_rls methodology while great is not practical as the external system connect with the database using a system account with expanded privileges. This is a limitation of the external system In order to provide a smooth user experience versus prompting the user for their account and password for every visual that leverages it.
The user in in my case is a column in the database in an entitlements tabLe that has traditional,
user, reducingnfieldname
user1, value1
user1, value2
user2, value1
and so on.
As for aggs cannot be done in database as there can be countless ways to interact with the visuals for analysis. Especially with weighted calculations. That mounts to building a cube/universe which is not efficient the moment a field is added and it can to recommits everything
Very limiting conditions whereby magic is expected.
Has there been such scenarios in the past and how were they dealt with. Assuming this is a common problem with new age analytical tools.
-
brilliant wrote:Hi,thanks for for your suggestions. The dbms_rls methodology while great is not practical as the external system connect with the database using a system account with expanded privileges. This is a limitation of the external system In order to provide a smooth user experience versus prompting the user for their account and password for every visual that leverages it. The user in in my case is a column in the database in an entitlements tabLe that has traditional,user, reducingnfieldnameuser1, value1user1, value2user2, value1 and so on. As for aggs cannot be done in database as there can be countless ways to interact with the visuals for analysis. Especially with weighted calculations. That mounts to building a cube/universe which is not efficient the moment a field is added and it can to recommits everythingVery limiting conditions whereby magic is expected. Has there been such scenarios in the past and how were they dealt with. Assuming this is a common problem with new age analytical tools.
It sounds like this system has some very severe flaws. It should not be connecting as system. It prompting the user for an account and password on every step is just weird - that doesn't happen in any application I've ever used...
Anyway, it looks like it should be very easy to push the filter to the main query so you don't have to read the rows that aren't needed. Have you modified your process so that this is done? Is this now fast enough? At some point you're going to need to look at where the time is really going if you want to do anything about it.
"new age analytical tools" usually support some ETL process so you can take the data from your database and store it again but with the application in memory. You would write this so that only necessary data is sent to the application (i.e. what's changed).
Really good analytical tools will do the heavy lifting in the database.
-
brilliant wrote:Has there been such scenarios in the past and how were they dealt with. Assuming this is a common problem with new age analytical tools.
Of course. How they were dealt with? Creating a data warehouse for analysis purposes, and both the application database and the data warehouse database were using VPD. An ETL between the two separate databases was making sure the data warehouse was kept in synch as much as possible, depending on the taxation of the main system (which had absolute top priority since human lives literally depended on it - so we monitored activity to only synch during quiet hours).
And I'm talking really sensitive data here: coworkers sitting almost next to eachother were not allowed by law to see eachothers interactions with the application/database/external input. So rethink your approach, and use the standard tools that are available: rolling your own will never perform or scale as well.