Categories
- All Categories
- 75 Oracle Analytics News
- 7 Oracle Analytics Videos
- 14K Oracle Analytics Forums
- 5.2K Oracle Analytics Idea Labs
- Oracle Analytics User Groups
- 40 Oracle Analytics Trainings
- 59 Oracle Analytics Data Visualizations
- 2 Oracle Analytics Data Visualizations Challenge
- 3 Oracle Analytics Career
- 4 Oracle Analytics Industry
- Find Partners
- For Partners
Recommended frequency to extract the active primary keys/application data
We are currently using BICC to extract FSCM, CRM, etc. data to build a data warehouse for analytics (one time full load followed by incremental loads). Noticed that BICC Application Data Extracts only captures updated and new records, but doesn't capture deletes.
We figured that Active Primary key extracts, will provide the list of all primary keys in a PVO. Using this, we are performing soft deletes in our warehouse.
We would like to understand the following.
- What is the recommended frequency to extract the active primary keys? We see that BICC (Job Schedule) by default offers a minimum of 1 hour. But, we were planning to use SOAP APIs to schedule the jobs more frequently. Is it okay to extract Active primary keys every 15 mins/every hour? What's the advisable frequency here?
- Similarly, what is the recommended frequency to extract the application data? We were planning to use SOAP APIs to schedule the jobs more frequently. Is it okay to extract Application Data every 15 mins? What's the advisable frequency here?
Answers
-
Hi Harishma,
In my humble opinion there can be no "technical" answer to this question. Solution designers use replication technologies like "golden gate" etc to get transactions out of the logs before it is even written to disk for for use cases that require "real time" databases for analytics and reporting.
But if you schedule extract loads then your question is really for your users who need this information to gain insights, make decisions, take actions. What information do they need to operate?
It is similar, it that it is a question about tolerance, to the business continuity questions you have asked them when gathering your supplemental requirements like Recovery Point/Time Objective. So the question is "how much data inaccuracy, how many transactions exist that are in fact now deleted, can you tolerate, before it impacts the operation of your business process?" which is perhaps the same as the question you have asked for inserts/updates "how out of date can the data be before it impacts the operation of your business process". For example, what is your appetite to risk on making a bad decision based on old data.
The problem with regular extracts during the working day as at the same time as users viewing information from a business user perspective is that if in your data load you have downstream post load process that builds aggregates and calculations across multiple rows and entity based on the transaction data you just loaded then your users will get temporary inconsistent data if they are using information while a data load is "in progress". For example, "sales in Jan in UK" if the sum of the transactions in the detail table is 100 then the sum in the aggregate table used by the summary report must always also be 100 otherwise your users will not "trust" your information if they have "alternative facts". If your dw is oracle then materialized views can be a good solution for this problem since the can be setup to automatic refresh when the child data is changed.
So if this is not tolerable to users you can only load data "out of hours" or you must deny access during the data loads or must design your data model as an operational data store where somehow each entity is independent for example, you load parents before children and you have no calculations or aggregates "across" rows/entity.
0 -
Hi @Nathan CCC, thanks for your response. My question is more related to 'why is it that BICC has a minimum frequency of 1 hour and not less?'. We are trying to understand the reason for it to be that way, if there are implications of running BICC jobs more frequently. Like will it load the fusion db (just an example), etc?
0