Pipelined Stored Procedure optimization

KayK

May be you get more answers here

Cookiemonster76

Do you already have a pipelined function, as your question title implies or not?

1) If you want to do pipelined then you need a table type based on a row type and a function - it doesn't work any other way. You don't need a package but you should be putting all your functions/procedures in a package as a matter of course.

2) Bulk Collect and ref cursors do fundamentally different things so aren't meaningfully comparable.

Bulk Collect is an efficient way of fetching data from a select into an array inside PL/SQL.

ref cursors mainly exist to give external programs a way to interact with a select statement in the DB.

3) Depends on whether your server can handle it and whether it'd do anything useful for the query - and we know nothing about the query.

If you want more detailed suggestions you really need to give us a lot more details about what you are trying to do. The actual query would be a good idea.

1 person found this helpful

[Deleted User]

brilliant wrote:

2. Applying Row Level Security (preferably from stored procedure) for best performance and best security to a highly sensitive set of data

For this part you should not build your own: use VPD: https://docs.oracle.com/database/121/DBSEG/vpd.htm#DBSEG007

Why reinvent the wheel?

1 person found this helpful

AndrewSayer

brilliant wrote:
This has probably been beaten to death but my search yields many answers.
Wondering if someone can point me to a decent resource on accomplishing the below goals
1. Returning a large table from complex query (5k-2.5M) records from 11M records resultset of the complex query
2. Applying Row Level Security (preferably from stored procedure) for best performance and best security to a highly sensitive set of data
Questions,
1. Do I need a Table Type, Row Type, and Function or Package to return this efficiently?
2. I have resources online saying BULKCOLLECT (performance issue in shared environment), REF CURSOR, SYS REF CURSOR. Which way to go?
3. Should I run in PARALLEL hint or any others?
Any links or syntax help is appreciated
Thanks,

Just to be 100% clear, taking a query and sticking it in a pipelined function is not going to make it execute and return rows faster. It will probably be slower. Pipelined functions are for when you have to produce the rows using PL/SQL and means that instead of generating the full data set and then transferring it, you generate it chunk by chunk.

It is extremely unusual to require any query returns that sort of volume of data, are you missing an aggregate? Do you only want the first few results? Think of your end users, they are not going to read all that information, ever.

If your bottleneck is in transferring 2.5 million rows from DB server to the client over a network then parallelism will only hurt you.

1 person found this helpful

brilliant

All great suggestions and pointers. Here's my problem

1. I have an external system being used for Analytics/Data Visualizations

2. The data is super sensitive therefore I have a live connect with between the external system and Oracle DB

3. Given it's sensitivity, having a stored procedure and forcing an input parameter of the user id ensures secured access versus Views or Materialized Views

4. The procedure can be wrapped into a package, function, utilizing Table Type via Row Type

5. There's 11M records in the SELECT statement outcome

6. After applying Row Level Security, user fetches can vary between a few thousands to a few million records but never the entire recordset of 11M records

7. the external system computes the ratios and other aggregations.

8. The data is as compressed an aggregated with just the right amount of fields to support useful analysis

Given the above parameters of my scenario I am trying to implement a stored procedure via pipelined approach to return results to the external system. If I wait for 2M records to accumulate in the DB and then send it across via BULKCOLLECT, this could be a drain on server resources and cause possible network delays etc.

What's teh best strategy to accomplish this with reasonable performance in DB? Views and Materialized views slowed to a crawl.

Procedures being compiled code, we have leverage to control the execution plan and optimize along with indexes etc.

Hope this gives a better picture. I could paste the syntax but this is more of a performance optimization best practice and standards versus syntactic problem.

Thoughts?

gaverill

Just to be clear -- does your pipelined function do anything other than apply your "row-level" security? That is, does it transform your input query results, or just filter them?

Gerard

AndrewSayer

brilliant wrote:
All great suggestions and pointers. Here's my problem
1. I have an external system being used for Analytics/Data Visualizations
2. The data is super sensitive therefore I have a live connect with between the external system and Oracle DB
3. Given it's sensitivity, having a stored procedure and forcing an input parameter of the user id ensures secured access versus Views or Materialized Views
4. The procedure can be wrapped into a package, function, utilizing Table Type via Row Type
5. There's 11M records in the SELECT statement outcome
6. After applying Row Level Security, user fetches can vary between a few thousands to a few million records but never the entire recordset of 11M records
7. the external system computes the ratios and other aggregations.
8. The data is as compressed an aggregated with just the right amount of fields to support useful analysis
Given the above parameters of my scenario I am trying to implement a stored procedure via pipelined approach to return results to the external system. If I wait for 2M records to accumulate in the DB and then send it across via BULKCOLLECT, this could be a drain on server resources and cause possible network delays etc.
What's teh best strategy to accomplish this with reasonable performance in DB? Views and Materialized views slowed to a crawl.
Procedures being compiled code, we have leverage to control the execution plan and optimize along with indexes etc.
Hope this gives a better picture. I could paste the syntax but this is more of a performance optimization best practice and standards versus syntactic problem.
Thoughts?

It sounds to me like you're doing something like this:

Call asks for everything for USER A

Procedure selects massive result set (11 million rows)

Procedure removes everything from the massive result set that doesn't belong to USER A (discards up to 10.9 million rows)

Procedure gives the remaining rows to the caller

Caller then does some grouping on the result set and probably returns barely anything to the end user

1) Move your filter to act against the data as soon as you can

select my_cols from big_view where user=my_user;

2) Let the client fetch from that query, use a ref cursor

open returning_ref_cursor for select my_cols from big_view where user=my_user;

3) Change the SQL so it does the whole thing (ratios and aggregations are so simple to be done in the database and will be much quicker that sending a ton of data across the network) - It's usually possible to create fast refresh MViews that store this sort of data and can be updated very quickly when changes are made to the source data.

The dbms_rls way would be to:

Add a policy to the views that the caller can use that filters on user_id (or whatever your column is called)

Allow the caller to directly query the view.

Again, aggregation can be done inside the view to really reduce the overhead everywhere.

brilliant

Hi,

thanks for for your suggestions. The dbms_rls methodology while great is not practical as the external system connect with the database using a system account with expanded privileges. This is a limitation of the external system In order to provide a smooth user experience versus prompting the user for their account and password for every visual that leverages it.

The user in in my case is a column in the database in an entitlements tabLe that has traditional,

user, reducingnfieldname

user1, value1

user1, value2

user2, value1

and so on.

As for aggs cannot be done in database as there can be countless ways to interact with the visuals for analysis. Especially with weighted calculations. That mounts to building a cube/universe which is not efficient the moment a field is added and it can to recommits everything

Very limiting conditions whereby magic is expected.

Has there been such scenarios in the past and how were they dealt with. Assuming this is a common problem with new age analytical tools.

AndrewSayer

brilliant wrote:

Hi,

thanks for for your suggestions. The dbms_rls methodology while great is not practical as the external system connect with the database using a system account with expanded privileges. This is a limitation of the external system In order to provide a smooth user experience versus prompting the user for their account and password for every visual that leverages it.

The user in in my case is a column in the database in an entitlements tabLe that has traditional,

user, reducingnfieldname

user1, value1

user1, value2

user2, value1

and so on.

As for aggs cannot be done in database as there can be countless ways to interact with the visuals for analysis. Especially with weighted calculations. That mounts to building a cube/universe which is not efficient the moment a field is added and it can to recommits everything

Very limiting conditions whereby magic is expected.

Has there been such scenarios in the past and how were they dealt with. Assuming this is a common problem with new age analytical tools.

It sounds like this system has some very severe flaws. It should not be connecting as system. It prompting the user for an account and password on every step is just weird - that doesn't happen in any application I've ever used...

Anyway, it looks like it should be very easy to push the filter to the main query so you don't have to read the rows that aren't needed. Have you modified your process so that this is done? Is this now fast enough? At some point you're going to need to look at where the time is really going if you want to do anything about it.

"new age analytical tools" usually support some ETL process so you can take the data from your database and store it again but with the application in memory. You would write this so that only necessary data is sent to the application (i.e. what's changed).

Really good analytical tools will do the heavy lifting in the database.

1 person found this helpful

[Deleted User]

brilliant wrote:

Has there been such scenarios in the past and how were they dealt with. Assuming this is a common problem with new age analytical tools.

Of course. How they were dealt with? Creating a data warehouse for analysis purposes, and both the application database and the data warehouse database were using VPD. An ETL between the two separate databases was making sure the data warehouse was kept in synch as much as possible, depending on the taxation of the main system (which had absolute top priority since human lives literally depended on it - so we monitored activity to only synch during quiet hours).

And I'm talking really sensitive data here: coworkers sitting almost next to eachother were not allowed by law to see eachothers interactions with the application/database/external input. So rethink your approach, and use the standard tools that are available: rolling your own will never perform or scale as well.

brilliant

Agreed. There is a warehouse where I am fetching the information from, however, the view and materialized view are currently extremely slow. View giving a 60 second per click response, while materialized view giving a 12 second response per click. I find it hard to understand why a stored procedure is against standard practice whereby a parameter is required to fetch any data thereby ensuring security. I see that VPD allows for the same but given the new age tools, limitations bring us down to having traditional methods of securing and analyzing data.

Might you have a template of how this is accomplished using a procedure within a package using functions, table type and row type? I don't know what the performance will be but worth a try. Getting it close to 4-5 seconds or less per click is my target for the user experience.

Here's another thought, how about I go with your VPD suggestion, does that mean, I can still call it through a procedure, which accepts the user id as a parameter since the connection is with a system account and the procedure calls the materialized view using the userid that was passed as parameter, sort of like impersonate a user session? Thereby improving my fetch time?

AndrewSayer

brilliant wrote:

Agreed. There is a warehouse where I am fetching the information from, however, the view and materialized view are currently extremely slow. View giving a 60 second per click response, while materialized view giving a 12 second response per click. I find it hard to understand why a stored procedure is against standard practice whereby a parameter is required to fetch any data thereby ensuring security. I see that VPD allows for the same but given the new age tools, limitations bring us down to having traditional methods of securing and analyzing data.

Might you have a template of how this is accomplished using a procedure within a package using functions, table type and row type? I don't know what the performance will be but worth a try. Getting it close to 4-5 seconds or less per click is my target for the user experience.

Here's another thought, how about I go with your VPD suggestion, does that mean, I can still call it through a procedure, which accepts the user id as a parameter since the connection is with a system account and the procedure calls the materialized view using the userid that was passed as parameter, sort of like impersonate a user session? Thereby improving my fetch time?

Take a step back.

Trace where the time is currently going for your 12 second response. Use extended sql trace and tkprof.

1 person found this helpful

Dom Brooks

The super-sensitive nature of the data and the current approach to applying security do not sit well with each other.

The chance of accidentally exposing the wrong data is currently very high.

The data /underlying tables should be protected via VPD at least.

Accessing the data via the system should cause an application context to be set which gives access to the appropriate data via the usual VPD methods.

Accessing the data without this context being set should cause no data to be returned.

Then whether the data is accessed directly via views or stored procs is neither here nor there.

If you want to set the context by calling a proc which sets the context , that's up to you/ your architecture.

3 people found this helpful

L. Fernigrini

That's how I usually do, call an SP that sets the context after checking whatever needs to be checked (sometimes even time, some data should not be accessed outside working hours).

Once the context is set, just run your query on VIEWs that filter data by reading the context...

Then unset the context for the session.

brilliant

might you have an example of this implementation. This is new to me. Or pointer to resources on how to build?

L. Fernigrini

I do not have examples tight now, but I can explain the idea.

Context are like "variables" that can be accessed by and Oracle session. Here is some info ion how to create and set a context:

https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/CREATE-CONTEXT.html#GUID-FDF62812-A884-479C-9C1B-5BD…

https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/SYS_CONTEXT.html#GUID-B9934A5D-D97B-4E51-B01B-80C76A…

The approach I would use is something like this

Scenario: You need to query table Invoices, showing all the information from user A. You are user X.

1) Create a Context (let say its called "InvoiceQuery") and a procedure in a package (let's call it PKG_SET_CONTEXT.PR_SET_INVOICEUSER) to set values in it. The procedure should accept at least 2 values

a) Desired user to be queried

b) User running the query

And it should have all the logic to confim that the user running the query can view invoices from the user to be queried... If that logic is fullfilled, then the parameter "UserID" of the context "InvoiceQuery" is set to 'A'

2) Create a view like this:

CREATE OR REPLACE VIEW vw_InvoicesByUser

AS

SELECT i.InvoiceDate, i.Amount, i.UserID , i. .... ... ... ..

FROM Invoices i

WHERE i.UserID = SYS_CONTEXT ('InvoiceQuery', 'UserID');

Then allow the app user to query the view and also call the SP that sets the context:

GRANT SELECT ON vw_InvoicesByUser TO AppUser;

GRANT EXECUTE ON PKG_SET_CONTEXT TO AppUser;

Then you need to do something like

PKG_SET_CONTEXT.PR_SET_INVOICEUSER ('A'.'X');

SELECT * FROM Finance.vw_InvoicesByUser;

PKG_SET_CONTEXT.PR_SET_INVOICEUSER (NULL.'X'); -- To reset the context

brilliant

This is great reference thanks and looks pretty secure!

Does this mean I still need VPD for defining what A can see because X does not have access to SELECT from the invoices table. I am guessing yes, since that's where I would define where A see what values in the invoices table based on say Region or Category or whatever else

Thoughts?

L. Fernigrini

Actually, this is a "cheap" way to create something similar to VPD.

The user that connects to the DB to query the invoice data (AppUser in my example) should have only access to the package that allows to set the context values, and to the "filtered" views.

The tables required to manage your users and how they access data must be unreachable to him.

Let's say you want to see invoices based on the regions assigned to a particular user, then you will need a table like this:

TABLE UsersByRegions ( UserID, RegionID, ISEnabled)

Then your view would be

CREATE OR REPLACE VIEW vw_InvoicesByUserRegion

AS

SELECT i.InvoiceDate, i.Amount, RegionID, i. .... ... ... ..

FROM Invoices i

JOIN UsersByRegions ur ON ur.RegionID = i.RegionID

WHERE ur.UserID = SYS_CONTEXT ('InvoiceQuery', 'UserID');

But again, your "AppUser" must have no access to the users, invoices nor UsersByRegions tables, he should just see the appropiate views, and they would take care of filtering ...

brilliant

but sounds like the View needs to be in a procedure whereby the userID for switching context needs to be sent across so that the system user is able to switch contexts between users.

Questions:

1. What's the performance impact of doing the context switching on a tables with millions of records with concurrency of say just 10 users

2. What's the performance impact of context switching and resetting since I believe it will need to be done

Thanks,

L. Fernigrini

I really do not fully understand what you are saying/asking, I believe you are mixing two different things associated to the "context" word.

1) Views are not in a procedure.

2) You just SELECT from a VIEW that filters the information. The filter is not a hardcoded value, the value is obtained by calling the SYS_CONTEXT function.

3) In order to set a value into a context, you need to execute a stored procedure that does that. When you create a context, you must tell which package contains (or will contain) the procedure(s) that can modify that particular context.

4) You can control which Oracle user can execute that package.

This CONTEXT feature has not to be confused with context switching, that usually means switching from SQL language/engine to PL/SQL language/engine.

When you query the view, it is a SQL sentence that will just execute the SYS_CONTEXT function only once and then use that value to execute the query, no matter how many rows exist on the table. The performance will be the same as usign a bind variable inside a PL/SQL block or using a hardcoded value.

Imagine the call to SYS_CONTEXT() as a variable,a placeholder. The benefit is that in Views there are no variables, but you can use Contexts to mimic the behavior.

[Deleted User]

brilliant wrote:

This is great reference thanks and looks pretty secure!

Not as secure as VPD...

Reread Dom's reply #13...

Might you have a template of how this is accomplished using a procedure within a package using functions, table type and row type?

No. We used VPD, so we did not write any packages or functions, that wasn't necessary.

I don't know what the performance will be but worth a try. Getting it close to 4-5 seconds or less per click is my target for the user experience.

Well, writing your own stuff won't be as fast, that's a given. Also be very careful with performance "goals per click", since that means you are involving a lot more than just the database. networks, clients, firewalls, routers etc all come in to play when you mention a "performance per click". It also really depends on what happens when the end user "clicks" (whatever that means): do you need 20 select statements to satisfy whatever the user wants, or does it take only 1? Is there some transport involved? Does it mean a download of data? Does it mean firing up extra .exe's, loading other dll's?

Like Andrew says (Reply #12): Take a step back. Figure out first where time is spent.

If this data is really that sensitive, then they can't complain spending money on protecting it, if money is an objection against VPD (otherwise they might as well just throw it on wikileaks right away....). So start reading this: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/dbseg/index.html and use existing tools.

Dom Brooks

VPD is standard part of Enterprise license

brilliant

I think VPD is the route I will be taking but I have to call the query via a procedure that calls the view that has the VPD policy and context since the system user that is connecting from the the external system connects with Oracle using the system account. therefore calling the procedure gives me leverage to pass the logged on user's id to the procedure that then calls the view. I can't think of any other way to have ironclad security. The user experience depreciates if I use the userid of the logged on user to call the view directly as it prompts for password for every screen (annoying but true of the external system.)

L. Fernigrini

If you can afford (or already have) VPD that would be the first choice, no doubts about it.

I have used the context approach on some particular scenarios, dealing with specific places where filtering was required and not directly handled by the application. On those apps, all normal transactional activity was done through a PL/SQL API (calling procedures to do both DML operations and querying, thus the security validation was done directly on those SPs) but for some end-user reporting our customer used a third party reporting tool that required access to the data.

Obviously we did not grant direct access to the tables, we just produced a set of views that dynamically filtered sensitive data using context.

Hope this helps!!!

SQL & PL/SQL

Pipelined Stored Procedure optimization

Comments

Post Details