Forum Stats

  • 3,815,712 Users
  • 2,259,070 Discussions
  • 7,893,213 Comments

Discussions

Pipelined Stored Procedure optimization

brilliant
brilliant Member Posts: 14
edited May 10, 2019 10:24AM in SQL & PL/SQL

This has probably been beaten to death but my search yields many answers.

Wondering if someone can point me to a decent resource on accomplishing the below goals

1. Returning a large table from complex query (5k-2.5M) records from 11M records resultset of the complex query

2. Applying Row Level Security (preferably from stored procedure) for best performance and best security to a highly sensitive set of data

Questions,

1. Do I need a Table Type, Row Type, and Function or Package to return this efficiently?

2. I have resources online saying BULKCOLLECT (performance issue in shared environment), REF CURSOR, SYS REF CURSOR. Which way to go?

3. Should I run in PARALLEL hint or any others?

Any links or syntax help is appreciated

Thanks,

Tagged:
Mustafa_KALAYCIBEDEMike KutzL. FernigriniSBJ
«13

Answers

  • KayK
    KayK Member Posts: 1,713 Bronze Crown
    edited May 8, 2019 4:07AM

    May be you get more answers here

  • Cookiemonster76
    Cookiemonster76 Member Posts: 3,410
    edited May 8, 2019 6:40AM

    Do you already have a pipelined function, as your question title implies or not?

    1) If you want to do pipelined then you need a table type based on a row type and a function - it doesn't work any other way. You don't need a package but you should be putting all your functions/procedures in a package as a matter of course.

    2) Bulk Collect and ref cursors do fundamentally different things so aren't meaningfully comparable.

    Bulk Collect is an efficient way of fetching data from a select into an array inside PL/SQL.

    ref cursors mainly exist to give external programs a way to interact with a select statement in the DB.

    3) Depends on whether your server can handle it and whether it'd do anything useful for the query - and we know nothing about the query.

    If you want more detailed suggestions you really need to give us a lot more details about what you are trying to do. The actual query would be a good idea.

    Mustafa_KALAYCI
  • SBJ
    SBJ Member Posts: 1,751 Silver Trophy
    edited May 8, 2019 7:25AM
    brilliant wrote:2. Applying Row Level Security (preferably from stored procedure) for best performance and best security to a highly sensitive set of data

    For this part you should not build your own: use VPD: https://docs.oracle.com/database/121/DBSEG/vpd.htm#DBSEG007

    Why reinvent the wheel?

    Mike Kutz
  • AndrewSayer
    AndrewSayer Member Posts: 13,007 Gold Crown
    edited May 8, 2019 7:38AM
    brilliant wrote:This has probably been beaten to death but my search yields many answers.Wondering if someone can point me to a decent resource on accomplishing the below goals1. Returning a large table from complex query (5k-2.5M) records from 11M records resultset of the complex query2. Applying Row Level Security (preferably from stored procedure) for best performance and best security to a highly sensitive set of dataQuestions,1. Do I need a Table Type, Row Type, and Function or Package to return this efficiently?2. I have resources online saying BULKCOLLECT (performance issue in shared environment), REF CURSOR, SYS REF CURSOR. Which way to go?3. Should I run in PARALLEL hint or any others?Any links or syntax help is appreciatedThanks,

    Just to be 100% clear, taking a query and sticking it in a pipelined function is not going to make it execute and return rows faster. It will probably be slower. Pipelined functions are for when you have to produce the rows using PL/SQL and means that instead of generating the full data set and then transferring it, you generate it chunk by chunk.

    It is extremely unusual to require any query returns that sort of volume of data, are you missing an aggregate? Do you only want the first few results? Think of your end users, they are not going to read all that information, ever.

    If your bottleneck is in transferring 2.5 million rows from DB server to the client over a network then parallelism will only hurt you.

    BEDE
  • brilliant
    brilliant Member Posts: 14
    edited May 8, 2019 4:09PM

    All great suggestions and pointers. Here's my problem

    1. I have an external system being used for Analytics/Data Visualizations

    2. The data is super sensitive therefore I have a live connect with between the external system and Oracle DB

    3. Given it's sensitivity, having a stored procedure and forcing an input parameter of the user id ensures secured access versus Views or Materialized Views

    4. The procedure can be wrapped into a package, function, utilizing Table Type via Row Type

    5. There's 11M records in the SELECT statement outcome

    6. After applying Row Level Security, user fetches can vary between a few thousands to a few million records but never the entire recordset of 11M records

    7. the external system computes the ratios and other aggregations.

    8. The data is as compressed an aggregated with just the right amount of fields to support useful analysis

    Given the above parameters of my scenario I am trying to implement a stored procedure via pipelined approach to return results to the external system. If I wait for 2M records to accumulate in the DB and then send it across via BULKCOLLECT, this could be a drain on server resources and cause possible network delays etc.

    What's teh best strategy to accomplish this with reasonable performance in DB? Views and Materialized views slowed to a crawl.

    Procedures being compiled code, we have leverage to control the execution plan and optimize along with indexes etc.

    Hope this gives a better picture. I could paste the syntax but this is more of a performance optimization best practice and standards versus syntactic problem.

    Thoughts?

  • gaverill
    gaverill Member Posts: 390 Silver Badge
    edited May 8, 2019 5:14PM

    Just to be clear -- does your pipelined function do anything other than apply your "row-level" security? That is, does it transform your input query results, or just filter them?

    Gerard

  • AndrewSayer
    AndrewSayer Member Posts: 13,007 Gold Crown
    edited May 8, 2019 5:36PM
    brilliant wrote:All great suggestions and pointers. Here's my problem1. I have an external system being used for Analytics/Data Visualizations2. The data is super sensitive therefore I have a live connect with between the external system and Oracle DB3. Given it's sensitivity, having a stored procedure and forcing an input parameter of the user id ensures secured access versus Views or Materialized Views4. The procedure can be wrapped into a package, function, utilizing Table Type via Row Type5. There's 11M records in the SELECT statement outcome6. After applying Row Level Security, user fetches can vary between a few thousands to a few million records but never the entire recordset of 11M records7. the external system computes the ratios and other aggregations.8. The data is as compressed an aggregated with just the right amount of fields to support useful analysisGiven the above parameters of my scenario I am trying to implement a stored procedure via pipelined approach to return results to the external system. If I wait for 2M records to accumulate in the DB and then send it across via BULKCOLLECT, this could be a drain on server resources and cause possible network delays etc.What's teh best strategy to accomplish this with reasonable performance in DB? Views and Materialized views slowed to a crawl.Procedures being compiled code, we have leverage to control the execution plan and optimize along with indexes etc.Hope this gives a better picture. I could paste the syntax but this is more of a performance optimization best practice and standards versus syntactic problem.Thoughts?

    It sounds to me like you're doing something like this:

    Call asks for everything for USER A

    Procedure selects massive result set (11 million rows)

    Procedure removes everything from the massive result set that doesn't belong to USER A (discards up to 10.9 million rows)

    Procedure gives the remaining rows to the caller

    Caller then does some grouping on the result set and probably returns barely anything to the end user

    1) Move your filter to act against the data as soon as you can

    select my_cols from big_view where user=my_user;

    2) Let the client fetch from that query, use a ref cursor

    open returning_ref_cursor for select my_cols from big_view where user=my_user;

    3) Change the SQL so it does the whole thing (ratios and aggregations are so simple to be done in the database and will be much quicker that sending a ton of data across the network) - It's usually possible to create fast refresh MViews that store this sort of data and can be updated very quickly when changes are made to the source data.

    The dbms_rls way would be to:

    Add a policy to the views that the caller can use that filters on user_id (or whatever your column is called)

    Allow the caller to directly query the view.

    Again, aggregation can be done inside the view to really reduce the overhead everywhere.

  • brilliant
    brilliant Member Posts: 14
    edited May 8, 2019 7:28PM

    Hi,

    thanks for for your suggestions. The dbms_rls methodology while great is not practical as the external system connect with the database using a system account with expanded privileges. This is a limitation of the external system In order to provide a smooth user experience versus prompting the user for their account and password for every visual that leverages it.

    The user in in my case is a column in the database in an entitlements tabLe that has traditional,

    user, reducingnfieldname

    user1, value1

    user1, value2

    user2, value1

    and so on.

    As for aggs cannot be done in database as there can be countless ways to interact with the visuals for analysis. Especially with weighted calculations. That mounts to building a cube/universe which is not efficient the moment a field is added and it can to recommits everything

    Very limiting conditions whereby magic is expected.

    Has there been such scenarios in the past and how were they dealt with. Assuming this is a common problem with new age analytical tools.

  • AndrewSayer
    AndrewSayer Member Posts: 13,007 Gold Crown
    edited May 8, 2019 7:49PM
    brilliant wrote:Hi,thanks for for your suggestions. The dbms_rls methodology while great is not practical as the external system connect with the database using a system account with expanded privileges. This is a limitation of the external system In order to provide a smooth user experience versus prompting the user for their account and password for every visual that leverages it. The user in in my case is a column in the database in an entitlements tabLe that has traditional,user, reducingnfieldnameuser1, value1user1, value2user2, value1 and so on. As for aggs cannot be done in database as there can be countless ways to interact with the visuals for analysis. Especially with weighted calculations. That mounts to building a cube/universe which is not efficient the moment a field is added and it can to recommits everythingVery limiting conditions whereby magic is expected. Has there been such scenarios in the past and how were they dealt with. Assuming this is a common problem with new age analytical tools. 

    It sounds like this system has some very severe flaws. It should not be connecting as system. It prompting the user for an account and password on every step is just weird - that doesn't happen in any application I've ever used...

    Anyway, it looks like it should be very easy to push the filter to the main query so you don't have to read the rows that aren't needed. Have you modified your process so that this is done? Is this now fast enough? At some point you're going to need to look at where the time is really going if you want to do anything about it.

    "new age analytical tools" usually support some ETL process so you can take the data from your database and store it again but with the application in memory. You would write this so that only necessary data is sent to the application (i.e. what's changed).

    Really good analytical tools will do the heavy lifting in the database.

    SBJ
  • SBJ
    SBJ Member Posts: 1,751 Silver Trophy
    edited May 9, 2019 3:57AM
    brilliant wrote:Has there been such scenarios in the past and how were they dealt with. Assuming this is a common problem with new age analytical tools. 

    Of course. How they were dealt with? Creating a data warehouse for analysis purposes, and both the application database and the data warehouse database were using VPD. An ETL between the two separate databases was making sure the data warehouse was kept in synch as much as possible, depending on the taxation of the main system (which had absolute top priority since human lives literally depended on it - so we monitored activity to only synch during quiet hours).

    And I'm talking really sensitive data here: coworkers sitting almost next to eachother were not allowed by law to see eachothers interactions with the application/database/external input. So rethink your approach, and use the standard tools that are available: rolling your own will never perform or scale as well.