Automatically Generated Focused Aggregations for Essbase Block Storage Option (BSO) [ARTICLE]

Version 14

    Dmitry Kryuk’s article presents a process for automatically generating focused aggregations for Essbase and Hyperion Planning in order to run efficient focused calculations, allow users to work in a familiar manner, and keep budgets under control.


    by Dmitry Kryuk

     

    Introduction

    In this article we will review the process of automatically generating focused aggregations for Essbase and Hyperion Planning.

    It is well known that Hyperion Planning provides the ability to design focused aggregations and run them on save. This creates a real-time view of aggregated data. It also solves many issues related to scheduled calculations, such as long maintenance windows and the need to wait hours until the next calculation to see updated aggregated data.


    In reality, however, the transition from scheduled calculations to focused business rules usually has many roadblocks. You need to use Smart View and Planning, and redesign your calculations. Most importantly, you need to change your business processes. What if an organization is already using Essbase extensively, and has thousands of Excel templates? You'll need to convince users to switch from an Excel add-in to Smart View,  convert their templates to pre-defined planning forms, and limit their ability to report and submit data. This can be an uphill battle with many casualties and slim chances of winning.


    Let's imagine for a moment that you are lucky enough to be on a new project that implements Planning, and that you design your focused Business Rules (BRs) by the book.


    It is simple to develop focused aggregations for 3 or 4 dimensions, but aggregations become more complex for a larger number of dimensions. True, it is the same algorithm regardless of the number of dimensions; however, if you write your focused calculation for, say, 6 dimensions, the script itself becomes large, and you can make an error going through all aggregation combinations, especially when you have multiple dimensions in rows/columns. Workarounds--like using user variables for rows/columns--require additional maintenance, development effort, and that the users define those variables. Further, given that requirements and forms layout change multiple times throughout the life cycle of a standard project, focused aggregations redesign and testing could consume a significant portion of the budget.


    Finally, do focused Business Rules really give us the best possible performance? Can we further optimize aggregations? When we have one or more dimensions in the rows, it is unlikely that all rows are being changed every time the form is submitted. Sometimes forms have hundreds or thousands of rows simply because they inherited their design from Excel add-in templates. If only 1% of rows has actually changed, and if we aggregate for 100% of the rows, the aggregation is not really focused or efficient, is it?


    So we have three issues on our hands:

    1. Users are unwilling to use Planning forms or to switch from Excel add-ins to Smart View.
    2. The implementation budget for Planning and focused BRs is much higher (due to development costs, but also because of licensing fees and infrastructure footprint).
    3. Even if we do implement focused BRs, they do not guarantee optimal performance if users keep the structure of the old Excel templates.


    So, can we run efficient focused calculations, let users work the way they are used to, and reduce the budget using any alternatives? This is what we will explore in this paper.


    On  Efficiency

    One Dimension in Rows: Focused Aggregation Efficiency

    An example of a sub-optimal approach to focused aggregation is to use  a variable to specify a member, descendants of which are displayed in the form. Descendants and ancestors of that member are then aggregated in calculation.


    Let's consider the outline below, and use a member called Dim1_11 as a value for the user variable.

    4-7-2014 3-08-52 PM.png

    Descendants of Dim1_11, members Dim1_111 and Dim1_112 are displayed in the form as rows. For simplicity, assume Dim2 has a similar hierarchy structure, and its level 0 members are displayed in a page drop-down. Member Dim2_112 is selected:

    Also assume we have data in four level-0 combinations, and that we've already aggregated the data once. Now we'll update the Dim1_111->Dim2_112 combination.


    The matrix below shows the updated cell, and numbers show the order of calculation.

    4-8-2014 9-15-13 AM.png


    The following calculation would be used to aggregate a web form. Obviously, specific member names would be substituted with run-time-prompts and variables:

    /*---Script BR001---*/

    SET MSG DETAIL;

    SET UPDATECALC OFF;

    SET CLEARUPDATESTATUS OFF;

    SET EMPTYMEMBERSETS ON;

    SET CACHE DEFAULT;


    /*-----Part1---------*/

    FIX("DIM2_112")

    @IDESCENDANTS("DIM1_11");

    @ANCESTORS("DIM1_11");

    ENDFIX

    /*-----Part2---------*/

    FIX(@IDESCENDANTS("DIM1_11"), @ANCESTORS("DIM1_11"))

    @ANCESTORS("DIM2_112");

    ENDFIX

     

     

    If only one member were updated on the form (say, Dim1_111), we would need to calculate only 15 cells that were impacted by the change:

    (3 ancestors of dim1_111 + dim1_111 member itself) x (3 ancestors of dim2_112 + dim2_112 member itself) - changed level-0 combination.


    What happened in practice? Below is the output from the calculation.

    The output from the first FIX:


    Output from Script BR001 - Part1

    Calculating [ Dim1(Dim1_111,Dim1_112,Dim1_11,Dim1_1,Dim1)] with fixed members [Dim2(Dim2_112)].

    Executing Block - [Dim1_11], [Dim2_112], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_1], [Dim2_112], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1], [Dim2_112], [Working], [BUDGET], [FY14].


    Total Block Created: [0.0000e+00] Blocks

    Sparse Calculations: [3.0000e+00] Writes and [8.0000e+00] Reads

    Dense Calculations: [0.0000e+00] Writes and [0.0000e+00] Reads

    Sparse Calculations: [3.9000e+01] Cells

    Dense Calculations: [0.0000e+00] Cells.

     

    One thing to notice is the number of reads. While executing the block, Essbase reads all child blocks and currently calculated block. So, to calculate:


    Dim1_11  Essbase reads blocks Dim1_111, Dim1_112 and Dim1_11 itself.

    Dim1_1:  reads blocks Dim1_11, Dim1_1

    Dim1: reads blocks Dim1_11, Dim1_12, Dim1.

    Hence, we have the total of 8 blocks to read appearing in the output.


    The output from the second FIX:


    Output from Script BR001 - Part2

    Calculating [ Dim2(Dim2_11,Dim2_1,Dim2)] with fixed members [Dim1(Dim1_111, Dim1_112, Dim1_11, Dim1_1, Dim1)].

    Executing Block - [Dim1_111], [Dim2_11], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_11], [Dim2_11], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_1], [Dim2_11], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1], [Dim2_11], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_111], [Dim2_1], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_11], [Dim2_1], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_1], [Dim2_1], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1], [Dim2_1], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_111], [Dim2], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_11], [Dim2], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_1], [Dim2], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1], [Dim2], [Working], [BUDGET], [FY14].


    Total Block Created: [0.0000e+00] Blocks

    Sparse Calculations: [1.5000e+01] Writes and [3.5000e+01] Reads

    Dense Calculations: [0.0000e+00] Writes and [0.0000e+00] Reads

    Sparse Calculations: [1.5600e+02] Cells

    Dense Calculations: [0.0000e+00] Cells.

     

     

     

    Total Block Created = 0 since the database was pre-aggregated. The changed cell already had value before it was changed.


    But instead of 15 blocks we calculated 18 blocks. This is because we have 2 rows on a form, and our calculation is designed to calculate all rows. Hence the efficiency of our calculation is 83%. The metric for efficiency would simply be the optimal number of blocks that need calculation divided by the actual calculated number. So the best efficiency is 100%; the worse it becomes, it asymptotically converges to 0. Losing 17% of efficiency may not seem like  a big deal, but we’ll see later that lost efficiency grows quickly.



    One Dimension in Rows: Native Agg Efficiency

    Let's compare the BR001 result to the calc script that uses a native AGG function:

    /*---Script BR001b---*/

    SET MSG DETAIL;

    SET UPDATECALC OFF;

    SET CLEARUPDATESTATUS OFF;

    SET EMPTYMEMBERSETS ON;

    SET CACHE DEFAULT;

    Agg("Dim1", "Dim2");

     


    And the output:


    Output from Script BR001b

    Multiple bitmap mode calculator cache memory usage has a limit of [16666] bitmaps..

    Aggregating [ Dim1(All members) Dim2(All members)].


    Executing Block - [Dim1_11], [Dim2_111], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_1], [Dim2_111], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1], [Dim2_111], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_11], [Dim2_112], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_1], [Dim2_112], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1], [Dim2_112], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_111], [Dim2_11], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_112], [Dim2_11], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_11], [Dim2_11], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_1], [Dim2_11], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1], [Dim2_11], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_111], [Dim2_1], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_112], [Dim2_1], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_11], [Dim2_1], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_1], [Dim2_1], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1], [Dim2_1], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_111], [Dim2], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_112], [Dim2], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_11], [Dim2], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1_1], [Dim2], [Working], [BUDGET], [FY14].

    Executing Block - [Dim1], [Dim2], [Working], [BUDGET], [FY14].


    Total Block Created: [0.0000e+00] Blocks

    Sparse Calculations: [2.1000e+01] Writes and [4.9000e+01] Reads

    Dense Calculations: [0.0000e+00] Writes and [0.0000e+00] Reads

    Sparse Calculations: [2.7300e+02] Cells

    Dense Calculations: [0.0000e+00] Cells.

     

     

    4-8-2014 9-13-44 AM.png



    Nothing here is terribly surprising--complete aggregation across two dimensions has written and read more blocks than the focused aggregation. Now consider this fact: Our database contained data in 4 level-0 cells (marked in grey and red). So, by running focused aggregation we calculated 50% of the existing data set; when we used native AGG, we calculated the complete data set.


    What happens if we clear the database and input data into a single cell, the one that is a focus of our focused aggregation (dim1_111-->dim2_112)?


    Let's rerun both BR001 and BR001b calculations:


    Read/Writes for BR001:

    Sparse Calculations: [3.0000e+00] Writes and [8.0000e+00] Reads

    Sparse Calculations: [1.2000e+01] Writes and [2.8000e+01] Reads


    Read/Writes for BR001b:

    Sparse Calculations: [1.5000e+01] Writes and [3.0000e+01] Reads

     

    The order of calculated cells is identical for both:


    4-8-2014 9-17-15 AM.png

     


    Well, in this case complete agg is more efficient than focused agg, since it had to read only 30 blocks instead of 36. But we are not comparing apples to apples here, and the reason is this line in the log:

     

     

    Calculator Cache With Multiple Bitmaps For: [Currency].

     

     

    In our focused aggregation, we notice this in the output:


     

    Calculator Cache: [Disabled].

     

     

    Why was calculator cache disabled for focused aggregation? Because by default it is enabled only when at least one full sparse dimension is calculated. To override that default we used a SET CACHE ALL; statement. To have consistent results, we either need to SET CACHE ALL for focused calculations, or SET CACHE OFF for native AGG. If we disable calc cache for native AGG, we get the same:


    Sparse Calculations: [1.5000e+01] Writes and [3.6000e+01] Reads

     

    In further examples, we disable calculator cache for consistency.


    Aggregations Based on Transaction Log


    Let's assume now we don’t need any front-end application that will define the structure of our focused calculation. Instead,  users continue to use their Excel add-in, and there are no Planning forms. But we enabled transactions logging and are going to generate focused aggregations based on transactions.


    Calculating for Every Cell (Bad Idea)

    Assume we can identify the specific cell that has changed. To create a focused aggregation for that cell in a one-dimensional database, we would simply calculate all ancestors of a given member. Then we would get to the next cell, and would repeat. . . .


    But wait, those cells may have some common ancestors. In the best case, only one ancestor--top dimension node. In the worst case, if 2 members are siblings, all ancestors will be the same, and we would redundantly calculate them.


    Consider also that changed level-0 cells usually come in batches with similar dimensionality--either from Planning forms, Smart View, or other sources. And if the batch contains 1000 cells, running individual focused calculation for each one will result in super-inefficient overall performance due to redundant calculations of upper levels.


    To illustrate this point, let's use the outline and Smart View ad hoc template below:

    Let's use this data form:


    What does the calculation order look like if we just run focused aggregation for every changed cell by using the @Ancestors function?

     

    /*---Script BR004b---*/

    //ESS_LOCALE English_UnitedStates.Latin1@Binary

    /*---Script BR001---*/

    SET MSG DETAIL;

    SET UPDATECALC OFF;

    SET CLEARUPDATESTATUS OFF;

    SET EMPTYMEMBERSETS ON;

    SET CACHE DEFAULT;


    FIX("Working","BUDGET","LOC","FY14")

    /*-----Part1---------*/

    FIX("Dim2_111","Dim2_122")

    @ANCESTORS("DIM1_111");

    @ANCESTORS("DIM1_121");

    ENDFIX

    /*-----Part2---------*/

    FIX(@IANCESTORS("DIM1_111"),@IANCESTORS("DIM1_121"))

    @ANCESTORS("Dim2_111");

    @ANCESTORS("Dim2_122");

    ENDFIX

    ENDFIX

     

    Now we have the expected results:


    But we also have undesirable cells order:


    Some cells have 2 order numbers, meaning they are calculated twice.


    16 cells are calculated twice out of the lowest possible 32. Hence, our efficiency is only 32/48=66%.

    The output:


    Total Block Created: [8.0000e+00] Blocks

    Sparse Calculations: [1.2000e+01] Writes and [3.6000e+01] Reads

    Total Block Created: [2.4000e+01] Blocks

    Sparse Calculations: [3.6000e+01] Writes and [1.0800e+02] Reads

     

    So what’s the problem, you ask?  Just remove redundancy by composing focused aggregation for the complete batch (changed set).


    Okay, let's give it a try.



    Calculating for Shared Ancestors

    The idea is to construct a list of ancestors, similar to @IALLANCESTORS function, but for a set instead of for a single member.  In the diagram below, changed level-0 cells are represented by red circles. Green pluses represent their ancestors that need calculation.


    But wait a second, isn’t there a function (@ILANCESTORS) that does exactly that? From technical reference:


    @ILANCESTORS

    Returns the members of the specified member list and either all ancestors of the members or all ancestors up to the specified generation or level.

    You can use the @ILANCESTORS function as a parameter of another function, where the function requires a list of members. The result does not contain duplicate members.

    Example:

    @ILANCESTORS(@LIST(“100–10”,"200–20”))


    Do we need to reinvent the wheel (@ILANCESTORS)?

    The following script:


    /*---Script BR004a---*/


    SET MSG DETAIL;

    SET UPDATECALC OFF;

    SET CLEARUPDATESTATUS OFF;

    SET EMPTYMEMBERSETS ON;

    SET CACHE DEFAULT;



    FIX("Working","BUDGET","LOC","FY14")


    /*-----Part1---------*/

    FIX("Dim2_111","Dim2_122")

    @ILANCESTORS(@LIST("DIM1_111","Dim1_121"));

    ENDFIX

    /*-----Part2---------*/

    FIX(@ILANCESTORS(@LIST("DIM1_111","Dim1_121")))

    @ILANCESTORS(@LIST("Dim2_111","Dim2_122"));

    ENDFIX


    ENDFIX

     

    And results:


    Hmm… not exactly what we expected. And the reason is evident from the blocks' execution order. Numbers below show the order of execution:


    You can see from the order of execution that the @ILANCESTORS function generates ancestors for the first member in the list (Dim1_11, Dim1_1, Dim1), and then ancestors for the second member (Dim1_12, Dim1_1, Dim1). But remember the statement from the tech ref: “The result does not contain duplicate members”? That’s why Dim1_1, Dim1 are removed from the set of ancestors of the second member. So we end up with Dim1_11, Dim1_1, Dim1, Dim1_12--which is clearly not what we want. The second calculation part of Dim2 works the same way: the calculation order is Dim2_11,Dim2_1,Dim2, Dim2_12.


    The peculiar fact is that the order of @ILANCESTORS-generated ancestors in the FIX is different: Dim1_111, Dim1_11, Dim1_121, Dim1_12, Dim1_1, Dim1.


    But in terms of  efficiency, we have written 32 blocks and read 96 blocks, so we get 100% efficiency (if we take into consideration only writes). Here’s the output:


    Total Block Created: [8.0000e+00] Blocks

    Sparse Calculations: [8.0000e+00] Writes and [2.4000e+01] Reads

    Total Block Created: [2.4000e+01] Blocks

    Sparse Calculations: [2.4000e+01] Writes and [7.2000e+01] Reads

     


    Reinventing The Wheel

    We saw that we cannot use the @ILAncestors function, but in terms of efficiency it is as efficient as it gets. So how about we create our own function that would return sets of ancestors properly ordered?


    To accomplish this, we will use Essbase Java API and will parse the transactions log. When you enable transactions logging, Essbase registers all changes to the data. So even if a user has a huge Excel add-in template with 600,000 cells, out of which only a few hundred had changed, the transaction log will write only changed cells.


    In the screenshot below, you can see that the log contains calculation script and then data submitted from Smart View ad hoc analysis.



    Members Classification

    From this chunk of log we need to get two things:

    1. Dimensions and members that need to be aggregated and used in construction of focused aggregation. 
    2. Dimensions that do not need to be aggregated and are used in the POV section of the calc script.


    This information does not show up in the log. But we can distinguish a data point from a member name.

    We can also see where one transaction starts and where it ends. The rest we can do by connecting to the Essbase database and getting additional information about each member. We can find out:

    • To which dimension the member belongs
    • Who is its parent
    • Its ancestors up to the top node
    • Generation of each parent
    • Its storage type


    This is enough to

    • Categorize members that show up in transactions by dimensions
    • Get distinct set of their ancestors
    • Order them by generation
    • Based on storage property of each ancestor, classify whether dimension is as POV or if it should participate in focused aggregation.


    Let's take a look at the code snippet that gets ancestors of the members present in a single line of the transaction log, and assigns members to different LinkedHashMaps.


    //  DOCUMENT

    //  Splitting the line with space, except when surrounded with double quotes.

      String[] row = line.split(transactionDelim

              +"(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);

      for (int temp = 0; temp < row.length; temp++) {

    //    DOCUMENT

    //    First check is if there's a quote in the element.

    //    Members are surrounded with double quotes.

        if(row[temp].contains("\"")){

          mbrCounter++;

          currMbr=row[temp].replace("\"", "").trim();

    //      DOCUMENT

    //      Check if member was already parsed.

          if(allMbrs.get(currMbr)==null){

            stopAncestorsSearch=0;

            try {

            mbr = otl.findMember(currMbr);

            currMbrDimension=mbr.getDimensionName();

            currMbrParent=mbr.getParentMemberName();

            currMbrGeneration=-1*mbr.getGenerationNumber();

            currMbrLevel=mbr.getLevelNumber();

    //        DOCUMENT

    //        Check if dimension was added to dimMap map.

    //        If not - add to the map.

            if (!dimMap.containsKey(currMbrDimension)) {

              dimMap.put(currMbrDimension,

                  new LinkedHashMap<String, Double>());

              dimProperties.put(currMbrDimension, "");

            }

    //        DOCUMENT

    //        Check if member was associated with dimension already.

    //        If it was - do not run parents scroll in the next step..

            if (dimMap.get(currMbrDimension).containsKey(currMbrParent)) {

              stopAncestorsSearch=1;

            }else{

              dimMap.get(currMbrDimension)

                      .put(currMbr, currMbrGeneration);

              allMbrs.put(currMbr, currMbrLevel);

            }

    //        DOCUMENT

    //        get all the parents of the current member.

            while((currMbrGeneration<-1) && stopAncestorsSearch==0){

              mbr = otl.findMember(currMbrParent);

              currMbrGeneration=-1*mbr.getGenerationNumber();

              currMbrLevel=mbr.getLevelNumber();

              if(!(mbr.getShareOption().toString().contains("Label only")

                  ||mbr.getShareOption().toString()

                      .contains("Dynamic calc"))

                  ||dimMap.get(currMbrDimension)                          .containsKey(currMbrParent)

                  ){

               

                    dimMap.get(currMbrDimension)

                    .put(currMbrParent, currMbrGeneration);

                    dimProperties.put(currMbrDimension,

                      "Has Parents: "+currMbrParent+";");

                    allMbrs.put(currMbrParent, currMbrLevel);

                    currMbrParent=mbr.getParentMemberName();

              }else{

                stopAncestorsSearch=1;

              }


            }

           

            } catch (EssException x) {

              System.out.println("WARNING: "+currMbr+" : " + x.getMessage());


            }

          }

        }

      }

     

    Below are results of member/dimension classification:


    Version: {Final=-2.0}
    Scenario: {Actual=-2.0}
    Version: {Final=-2.0}
    Scenario: {Actual=-2.0}
    View: {YTD=-2.0, Periodic=-2.0}
    Year: {FY14=-2.0, FY13=-2.0, FY15=-2.0}
    Intercompany Partner: {[ICP None]=-3.0, [ICP Top]=-2.0,
    Intercompany Partner=-1.0}
    HFM Value: {[Parent]=-3.0, USD=-3.0, [Proportion]=-2.0, HFM Value=-1.0}
    Entity: {E_401=-7.0, IrelandCombined=-6.0, ABCD_EMEA=-5.0, ABCD_APAC=-5.0,
    EMEA=-4.0, APAC=-4.0, International=-3.0, ABCD=-2.0, Entity=-1.0}
    PCC: {P_411=-8.0, Benelux_EMEA_PCC=-7.0, P_428=-7.0, P_613=-7.0,
    North_EMEA=-6.0, UK_I_EMEA=-6.0, ANZ=-6.0, EMEA_PCC=-5.0,
    APAC_PCC=-5.0, International_PCC=-4.0, ABCD_PCC=-3.0, AllPCCs=-2.0, PCC=-1.0}
    BusinessUnit: {B_2205=-7.0}
    CostCenter: {C_3020=-8.0, ConsultInt=-7.0, Consult=-6.0, ProfSrv=-5.0,
    ProfSuptSrv=-4.0, COSCostCenter=-3.0, AllCostCenters=-2.0, CostCenter=-1.0}
    Account: {600001=-12.0, 600004=-12.0, 600424=-12.0, 602010=-12.0, 602011=-12.0,
    602001=-12.0, 602023=-12.0, 602040=-12.0, 601020=-12.0,
    601022=-12.0, 600006=-12.0, 600426=-12.0, 601021=-12.0, 601024=-12.0,
    600420=-12.0, 601000=-12.0, 600430=-11.0, 600440=-11.0, 603000=-11.0,
    642000=-11.0, 642003=-11.0, 642004=-11.0, 650001=-10.0, 650002=-10.0,
    650005=-10.0, 650006=-10.0, 651000=-10.0, 651002=-10.0,ConsFuncNonGAAP=-10.0,
    ConsFunc=-10.0, 641001=-10.0, 650000=-10.0, 650003=-10.0, 650004=-10.0,
    832001=-10.0, 640001=-10.0, 810001=-9.0, 960106=-9.0, 811000=-9.0, 830002=-9.0,
    812000=-9.0, 812003=-9.0, 820000=-8.0, 820002=-8.0, 910002=-7.0, 910007=-7.0,
    910008=-7.0, 690000=-6.0, 690020=-6.0, PERFTE=-5.0, MRATE=-5.0,
    ValidateNonGAAP=-4.0, ACCRUALRATES=-4.0}
    dimProperties: {Version=, Scenario=, View=, Year=,
    Intercompany Partner=Has Parents: Intercompany Partner;,
    HFM Value=Has Parents: HFM Value;,Entity=Has Parents: Entity;,
    PCC=Has Parents: PCC;, BusinessUnit=, CostCenter=Has Parents: CostCenter;,
    Account=Has Parents: ACCRUALRATES;}
    ================================
    AGGREGATED DIMENSIONS:
    {1=Intercompany Partner, 2=HFM Value, 3=Entity, 4=PCC, 5=CostCenter, 6=Account}
    POV: {1=Version, 2=Scenario, 3=View, 4=Year, 5=BusinessUnit}
    
    
    
    
    
    
    
    
    

     

     

    These results are based on a different database with real data volume and with large transactions going through ad hoc reporting.

    You see that there are 6 dimensions that require aggregation; the rest are POV dimensions. All ancestors of the members from POV dimensions are dynamic calc or label-only members. Let's take a look at one of the aggregated dimensions, PCC for example. Members are ordered by generation. On the other hand, a dimension like Year has a single generation =2, meaning it does not need to be aggregated. (While parsing members we should also take into consideration shared members, but we will not get into that here.)


    Constructing the Aggregation

    Once we have members' classifications, we can start constructing the calculation. But how can that be done automatically? The algorithm is pretty simple. You just need to generate focused aggregation for three dimensions and then extend it to any number of dimensions.


    Let's write a pseudo-code for three dimensions. @GETCHANGEDSET is a function that returns all changed members of one dimension ordered properly.

     

    FIX (@GETCHANGEDSET(Dim1))

      FIX (@GETCHANGEDSET(Dim2))

        @GETCHANGEDSET(Dim3);

      ENDFIX

      FIX (@GETCHANGEDSET(Dim3))

        @GETCHANGEDSET(Dim2);

      ENDFIX

    ENDFIX

    FIX (@GETCHANGEDSET(Dim2))

      FIX (@GETCHANGEDSET(Dim3))

        @GETCHANGEDSET(Dim1);

      ENDFIX

    ENDFIX

     

    You can see that we put each dimension forward for aggregation, while the rest of the dimensions become fixed dimensions.


    FIX        FIX        AGG

    Dim1     Dim2     Dim3

    Dim1     Dim3     Dim2

    Dim3     Dim2     Dim1


    For four dimensions, the aggregation plan would look like this:

    FIX        FIX        FIX        AGG

    Dim1     Dim2     Dim3    Dim4

    Dim1     Dim2     Dim4    Dim3

    Dim1     Dim4    Dim3    Dim2

    Dim4    Dim3      Dim2    Dim1


    And for N dimensions:

    FIX        FIX        FIX    AGG

    Dim1     Dim2     ...      DimN

    Dim1     Dim2     ...      DimN-1

    Dim1     Dim2     ...      DimN-2

                                ...

    DimN    DimN-1  ...    Dim1


    Now let's transform the code into recursive form. {REDUCED FORM FOR 2 DIMENSIONS} is the reduction of our problem from three dimensions to two dimensions.


    FIX (@GETCHANGEDSET(Dim1))

      {REDUCED FORM FOR 2 DIMENSIONS}

    ENDFIX

    FIX (@GETCHANGEDSET(Dim2),@GETCHANGEDSET(Dim3)

      @GETCHANGEDSET(Dim1);

    ENDFIX

     

    Once we reduce the problem, we can solve it for any number of dimensions. So, for N dimensions it would look like this:


    Reduced Form 1

    FIX (@GETCHANGEDSET(Dim1))

      {REDUCED FORM FOR N-1 DIMENSIONS}

    ENDFIX

    FIX (@GETCHANGEDSET(DimN),@GETCHANGEDSET(N-1)...@GETCHANGEDSET(2))

      @GETCHANGEDSET(Dim1);

    ENDFIX

     

     

    In fact, we can optimize the algorithm even further if we we don't allow updates of the upper level members, by limiting the first fix to level-0 members only:


    Reduced Form 2

    FIX (@GETCHANGEDSET(Dim1,0))

      {REDUCED FORM FOR N-1 DIMENSIONS}

    ENDFIX

    FIX (@GETCHANGEDSET(DimN),@GETCHANGEDSET(N-1)...@GETCHANGEDSET(2))

      @GETCHANGEDSET(Dim1);

    ENDFIX

     

     

    Let's come back to our transaction and see what focused aggregation would look like.



    Notice that this calculation is:

    • Generated automatically for the specific data set that has changed. So it wouldn’t matter how many dimensions we have in the rows or columns.
    • Since it doesn’t contain extra members that haven’t changed in terms of writing blocks its efficiency is better than  using focused aggregations in BRs based on RTPs provided from the form.
    • If you implement this method, you don’t need to write a single line of code to create focused aggregations for all existing forms, or to change them when users want to change forms.

     

    Monitoring Transaction Log And Triggering Options

    Initial Environment Setup

    So far we've presented a general concept of parsing the transaction log and getting ancestors of members that have changed, as well as the algorithm to generate focused aggregation. What we  also need is a mechanism to detect changes in the transaction log, and then trigger aggregation script generation and execution. Let's go through the steps needed to implement this kind of solution in a production environment.


    First of all, we need to enable transaction logging on our database. This is done with the TRANSACTIONLOGLOCATION setting in the Essbase configuration file. On my lab environment I have:


    TRANSACTIONLOGLOCATION TST_EA EADO_B8 /home/oracle/windowsshare NATIVE ENABLE

    TRANSACTIONLOGLOCATION TST_FA FA1 /home/oracle/windowsshare NATIVE ENABLE

     

    That is, there are two databases for which transaction logging is enabled.


    /home/oracle/windowsshare is a target directory into which the log is written. I have a virtual Linux box for the Essbase server, and Windows on my host machine. In this example I shared a directory on my Windows machine and mounted it on the Essbase server. This is just one of many possible configurations. I don’t have to write into the transaction log on the Windows box, but my Java scripting framework is running on the host. That is just a matter of convenience, and dependence on other projects. I decided to show this use case since it reflects the complexity often present in real production environments.


    The mechanism to detect changes in the transaction log is Java WatchService API, which is part of tbe java.nio.file package. That service cannot detect changes on remote servers. For example, if I tried to run a Java program that detects changes on the Windows box, while the Essbase transaction log was on separate Linux machine, and the Linux directory was set up as a shared drive on Windows, it wouldn’t work. The WatchService API wouldn’t be able to detect changes in the log.


    At a high level, the diagram below describes my lab environment:



    As you can see, the following things will happen after the user submits the data:

    1. Essbase writes transaction into log on a Windows box.
    2. WatchService API detects the change in the file.
    3. Java program parses a new transaction, generates focused aggregation, and executes on Essbase server.


    Configuring The Monitor

    To orchestrate this process, I use a framework discussed in Automated Essbase Optimization, Testing, Monitoring, Resource Planning. That framework allows me to automatically generate sequences of MAXL scripts, gather stats, and use other tools. In the context of automatic optimization, mining modules generate multiple test cases and execute them. We may need to generate and run multiple automatic aggregation, so “test cases” become separate focused aggregations in this context.


    As in the case of different mining modules, i have an XML definition file that will look like this:


    <sequence>
      <SET>
        <APPNAME><![CDATA[TST_EA]]></APPNAME>
        <CUBENAME><![CDATA[EADO_B8]]></CUBENAME>
        <LOGFILE><![CDATA[
      D:\AutomationsV2\TetsSets\KscopeFA\
      TRANSLOG\TST_EA\EADO_B8\log0000000001.lvl]]></LOGFILE>
        <WATCHFILE><![CDATA[
      D:\AutomationsV2\TetsSets\KscopeFA\
      TRANSLOG\TST_EA\EADO_B8\log0000000001.lfl]]></WATCHFILE>
        <DELIM><![CDATA[ ]]></DELIM>
        <LASTPARSEDROWFILE><![CDATA[D:\\AutomationsV2\\
      TetsSets\\KscopeFA\\LASTPARSEDROW_EADO_B8.txt]]>
      </LASTPARSEDROWFILE>
        <SETFILE><![CDATA[
      D:\\AutomationsV2\\TetsSets\\KscopeFA\\SetFA01.xml
      ]]></SETFILE>
        <FASCRIPT><![CDATA[]]></FASCRIPT>
        <STARTWITHUPPERLEVELALLOWED><![CDATA[1]]></STARTWITHUPPERLEVELALLOWED>
        <!-- <OPERATIONMODE><![CDATA[1 - Automatic, 2 - Scheduled]]>
      </OPERATIONMODE> -->
        <OPERATIONMODE><![CDATA[1]]></OPERATIONMODE>
        <!-- <SCHEDULEDELAY><![CDATA[Number of seconds]]></SCHEDULEDELAY> -->
        <SCHEDULEDELAY><![CDATA[60]]></SCHEDULEDELAY>
        <!-- <NUMBEROFTRANSACTIONS><![CDATA[
      Number of transactions to process at once. 0
        - merge all transactions.]]></NUMBEROFTRANSACTIONS> -->
        <NUMBEROFTRANSACTIONS><![CDATA[1]]></NUMBEROFTRANSACTIONS>
      </SET>
    </sequence>
    
    
    
    
    
    
    
    

     

     

    Parameters in this XML:

    • APPNAME, CUBENAME: We need to connect to Essbase application to find member ancestors, generation, etc. All those properties we discussed in Members Classification section.
    • LOGFILE: The actual transaction log file we are going to parse.
    • WATCHFILE: The file being watched. When Essbase writes into transaction log, it updates two files: LFL and LVL. Changes in the .lfl file are detected more quickly.
    • LASTPARSEDROWFILE: Last parsed row file. When the Java program parses a new transaction, this file helps it know where to start by  registering the last row parsed in the transaction log.  It looks something like this:

    3533 2015/08/12 12:57:50

    3547 2015/08/12 12:57:51

    3603 2015/08/12 12:57:53

    3617 2015/08/12 12:59:25

    3631 2015/08/12 12:59:26

    3687 2015/08/12 12:59:29
    The first integer is the last row in the transaction log that was parsed. When a change in the transaction log is detected by WatchService API, the program gets to the last parsed row of the log and parses the next transaction.

    • SETFILE: XML file that defines the structure of the calc script and allows automatic execution of multiple transactions in scheduled mode. For more details, see Automated Essbase Optimization, Testing, Monitoring, Resource Planning.
    • STARTWITHUPPERLEVELALLOWED: Defines whether we allow upper-level updates. If so, the algorithm will use Reduced Form 1 (see "Constructing the Aggregation," above). Otherwise, we can optimize calculation by using Reduced Form 2.
    • OPERATIONMODE and SCHEDULEDELAY: See next section.


    Modes of Operation

    Automatic Mode

    So far, we've mentioned only one  mode of operation--triggering aggregation immediately after new data was submitted into the Essbase cube. As mentioned, this is done with WatchService API. In some situations, we would prefer to run in scheduled mode, when the transaction log is parsed every N seconds and new transactions are executed. Those situations could include:

    • Large transaction logs
    • Multiple concurrent users
    • Extremely large and distributed data updates

    To define operation mode, we use the OPERATIONMODE parameter, which can be 1 for automatic or 2 for scheduled mode.

    Scheduled/Delayed Mode

    Running focused aggregations in scheduled mode is actually easier than running automatically on each update. In automatic mode, once the change is detected the transaction log is analyzed, starting from the last parsed row until the end of the file. We need a similar behavior in a scheduled mode.


    This is easily accomplished with any scheduler--we just need to call the same method every N seconds. But it would be more efficient to start parsing only if WatchService identified the change. The parameter that controls the delay interval is SCHEDULEDELAY in the properties file. 


    Below is the diagram representing the process.

    1. Essbase writes transaction into log on a Windows box.
    2. WatchService API detects the change in the file.
    3. Java program checks if delay interval has passed since the last parsing. If so, it goes to the next step. Otherwise, the program sleeps for the rest of the interval.
    4. Java program parses a new transaction, generates focused aggregation, and executes on the Essbase server.


    Trigger With CDF

    Triggering the focused aggregation can also be done with CDF, but (unlike with the prior case) our JAR file needs to be on the Essbase server. If our log is on a different server we will not be able to use WatchService, but we don’t get much benefit from it anyways. In the worst case, we would trigger CDF when no data was updated.

    Here are the steps in the scenario when the user submits a Planning form, and BR is triggered on Save.  BR contains CDF that runs the parsing/aggregation generation:

    1. Essbase writes transaction into log on a Windows box.
    2. BR is triggered on Save, which runs CDF.
    3. Java program parses a new transaction, generates focused aggregation, and executes on Essbase server.


    WatchService Code Example

    An example of how WatchService can be used is given here. Below is a code snippet from a modified processEvents()  method, which takes care of different operation modes.


    //DOCUMENT

    //Check if the WatchService detected modification in the right file.

    if(String.valueOf(child).contains(tuner.transFile.get("WATCHFILE"))){

    //DOCUMENT

    //If operation mode is automatic (OPERATIONMODE=1)

    //then execute parseTransFromPointToEOF immediately.

    //In scheduled mode (OPERATIONMODE=2) calculate timeSinceLastParse

    //value first. That is how long was it since the last transaction, minus

    //delay interval.

    if(Integer.parseInt(tuner.transFile.get("OPERATIONMODE"))==2){

      long timeSinceLastParse=System.currentTimeMillis()-tuner.

                lastTransactionParsing-

                Integer.parseInt(tuner.transFile.get("SCHEDULEDELAY"))*1000;

    //DOCUMENT

    //If delay interval since the last transaction has NOT passed

    //sleep for the duration of the rest of the interval,

    //then run parseTransFromPointToEOF

      if(timeSinceLastParse<0){

        try {

          Thread.sleep(timeSinceLastParse*-1);

        } catch (InterruptedException e) {

          // TODO Auto-generated catch block

          e.printStackTrace();

        }

      }

      tuner.parseTransFromPointToEOF();

    }else{

      tuner.parseTransFromPointToEOF();

      }

    }

     

     

    A few other related things to keep in mind:

    • When a new focused aggregation is generated and executed, that calculation is also registered in the transaction log. So for every data submission you’ll have 2 transactions: the original data and focused aggregation.

    8-14-2015 10-09-56 AM.png

     

     

    • Without going into too much detail, WatchService API captures two update events: content update and a metadata update (e.g., modify date). It could lead to unnecessary parsing of the transaction log. To avoid this we want to perform transaction parsing and aggregation only when content is updated.
    • Data input transactions have trailing empty lines which calculation transactions don’t have.


    Performance Stats

     

    In our test database there are 11 dimensions. Below is a table with performance statistics of four types of transactions, differing by the number of cells updated and members parsed.


    In the second row is a transaction that contained 3912 changed cells. It had 326 rows with 3586 members from all dimensions. It took 25 sec to parse and classify all members and to create focused aggregation. And it took 14 sec to execute the aggregation.


    In other examples  of large transactions, parsing and calculations took around 9 ms per changed cell. In a more reasonable example of a template that contained 52 rows and 12 columns, calculation took 5.5 sec.


    For a small transaction of 1 row and 12 columns, the overhead of parsing was 0.4 sec, and total time was .7 sec. Compare this to the 40 min that it takes to do a complete aggregation of this database. Obviously, it depends on the database structure and how transaction data is distributed across multiple dimensions.

     

     

     

    Rows

    Members

    Cells

    Parsing (sec)

    Calc

    (sec)

    Total (sec)

    Time Per Cell (ms)

    36446

    400950

    437352

    318

    64

    382

    0.873438

    326

    3586

    3912

    25

    14

    39

    9.969325

    101

    1111

    1212

    5

    5

    10

    8.250825

    52

    566

    624

    3

    2.5

    5.5

    8.814103

    1

    11

    12

    0.4

    0.29

    0.69

    57.5

     

    About The Author


    Dmitry Kryuk is an independent Hyperion consultant and Solutions Architect. He has helped clients including Google, Salesforce, Gilead, Amgen, Dolby and others to design and build their enterprise solutions. His work focuses mainly on Essbase, Planning, ODI, DRM, Oracle DB and custom-built solutions like the one discussed in this article.  He codes in Python, Jython, Java, and PL/SQL.  For more insight from Dmitry please visit his blog: hookmax.blogspot.com

     


    Note: This article represents the expertise, findings, and opinion of the author.  It has been published by Oracle in this space as part of a larger effort to encourage the exchange of such information within this Community, and to promote evaluation and commentary by peers. This article has not been reviewed by the relevant Oracle product team for compliance with Oracle's standards and practices, and its publication should not be interpreted as an endorsement by Oracle of the statements expressed therein.