Oracle Analytics Cloud and Server

The OBIEE Equivalent of Countif or FIlter on Duplicates

Received Response
1312
Views
18
Comments

In my company we are having an issue where people open a requisition as one Position Description and then they change it after the fact.... creating a duplicate row in our hire table.  I know it's not actually a duplicate because the Position Description is different...and I understand why these records come into my query...however I need to identify only the duplicates.  Can some one give me a little guidance to get the below results?

I have data that looks like this:

Current Query:

OBIEE Sample.jpg

I would like to create a query that only returns the rows with conflicting Position Descriptions.....i..e. the duplicates:

New Query Expected Result:

Expected Result.jpg

Any help would be much appreciated.  Thanks in advance.

Note: People are often hired more than once in a given period.  So in order to get the duplicates it may be necessary to take both Hire Date and Employee ID into account.... i.e. If an employee is rehired they will have the same Employee ID.  Thus if Employee ID #1 is hired again on say 6/1/2017 then this ID will occur in the table again.

Answers

  • Robert Angel
    Robert Angel ✭✭✭✭✭

    A simple way in answers would be to put a min function on Position Description, and a max function on Position Description - 2 columns and then a third column with a case statement saying if column 1 <>  column 2  then 'Duplicate' else 'Ok' - stick a filter on that and job done.

  • Try to add a filter on COUNT("Fact"."Employee ID" By "Fact"."Employee ID" , "Fact"."Hire Date" ) > 1

  • I FEEL LIKE THIS ADDITIONAL INFORMATION MAY BE IMPORTANT:  Sometimes there can be 3 rows for one Unique Employee ID / Hire Date

    Robert thanks for your reply!  I tried your method:

    I was able to get MIN("Position"."Position Description") to work correctly. 

    However when I try to use: MAX("Position"."Position Description")-2

    I get an error.  Guessing it’s because the column is a text field.  Should I use Cast to try to force a particular datatype?

    Case Statement looks like this currently:

    CASE WHEN MIN("Position"."Position Description") <> MAX("Position"."Position Description") THEN 'Duplicate' ELSE 'Not a Duplicate' END

    Hopefully this will help.

  • Robert Angel
    Robert Angel ✭✭✭✭✭

    Sorry, the -2 is just to indicate it is the second column, don't minus two!

    And you still need the other columns that are unique in the answers report.

  • Martin thanks for your reply as well! I tried your method:

    I put this in a calculated column to try help me understand the logic…

    COUNT("Employee Attributes"."Employee Number" By "Employee Attributes"."Employee Number", "Workforce Profile Event Fact Details"."Last Hire Date")

    Also I don’t quite understand what the by and the comma in the COUNT() function are doing.  Can you help explain the logic please?

    I can see that COUNT() Calculates the number or rows having non-null values for the expression.  Just not sure about the expression   Could you please explain

    Anyways when I implemented the helper column and exported the data I got a lot of values that were well above 1.  For example I had numbers like 5, 8, 11, 12 etc.... When I exported the data and concatenated the Employee ID and Last Hire Date Columns the numbers greater than 1 were not consistent with duplicates.  I found that sometimes these numbers like 8 and 11 were not duplicates. However I did notice that when I filtered on values of 1 that none of these values were ever a duplicate.

    Hope this helps.  Thanks again for taking the time to respond.

  • Unfortunately, after I've changed the logic to just Max() and Min() as you suggested the Position Descriptions all come back the same.  i.e. the case statement evaluates to 'Not a Duplicate' in every single row of data.

    Current Case Statement:

    CASE WHEN MIN("Position"."Position Description") <> MAX("Position"."Position Description") THEN 'Duplicate' ELSE 'Not a Duplicate' END

    Thanks again for the help!

  • Robert Angel
    Robert Angel ✭✭✭✭✭

    Make sure you have server complex aggregate on this, a lot of the time the result of case statements fails because OBIEE tries to evaluate AFTER aggregation

  • Robert Angel
    Robert Angel ✭✭✭✭✭

    Btw, the case statement was your addition, I actually meant to put this in columns and then reference one column in the condition of the other column, and make sure you include your fields that must be unique as a combination.

    p.s. Sounds like you need a surrogate key on the data to reflect the slowing changing data behaviour...

  • I wrongly assumed the current_query showed detail records, hence your results are not correct.

    The 'By'  is used to group by items, in this case Employee ID and HireDate. So it counts all the records in the result set that have the same employee Number and Last hire date

    If Position Description is always different among duplicates you may also try:

    COUNT(DISTINCT "Position"."Position Description" By "Employee Attributes"."Employee Number", "Workforce Profile Event Fact Details"."Last Hire Date") >1

    That will count the number of distinct Position descriptions by Employee Number and Last Hire Date. If it's higher than 1 it means another Position Descripion was entered on the same day.

  • "Make sure you have server complex aggregate on this, a lot of the time the result of case statements fails because OBIEE tries to evaluate AFTER aggregation"

    How do I toggle this setting on?

    "Btw, the case statement was your addition, I actually meant to put this in columns and then reference one column in the condition of the other column, and make sure you include your fields that must be unique as a combination."

    I made up two additional calculation columns like you suggested.  When I use the Answers UI query builder it toggles them back to the underlying calculation in the third field

    "p.s. Sounds like you need a surrogate key on the data to reflect the slowing changing data behaviour..."

    I only have access to the presentation layer.  I know that the actual database keys exist behind the scenes....I just can't access them with my security settings.  Not sure I can create a surrogate key because I don't have a unique field.

  • “COUNT(DISTINCT "Position"."Position Description" By "Employee Attributes"."Employee Number", "Workforce Profile Event Fact Details"."Last Hire Date")”

    Unfortunately, the above doesn’t work. It returns inconsistent results as well. It looks like it’s closer.  The numbers returned in the columns are only 1, 2 or 3.  This returns about 1300 > 1.  However when I look at the results in Excel using a countif of the concatenated(column it only returns 400.  Maybe Robert is correct about the aggregation issue....  but if this is true it looks to be happening before aggregation not after???  RIght?

    COUNT(CONCAT("Employee Attributes"."Employee Number", CAST("Workforce Profile Event Fact Details"."Last Hire Date” AS CHAR)))

    I thought the above may work but I get some kind of strange error:

    OBI_Error.jpg

  • asim cholas
    asim cholas ✭✭✭✭✭

    Hello,

    Martins first answer will work

    COUNT("Fact"."Employee ID" By "Fact"."Employee ID" , "Fact"."Hire Date" ) > 1

    add a column for the above formula and keep a filter for the same as >1 and delete the column from criteria

  • Thomas Dodds
    Thomas Dodds ✭✭✭✭✭

    Use the audit insert date to the table ... if you don't have one there's your issue - it's a deficient physical model in the database.  Easy then to rank on the order of insert dates and filter for rank != 1 ...

    This also gives you the benefit of:

    1. last row in (filter on a rank=1 where rank is over insert date ordered descending)
    2. first row in (filter on a rank=1 where rank is over insert date ordered ascending)
    3. average number of re-inserts (average of rank column)
  • Asim,

    Unfortunately this does not work:

    “COUNT(DISTINCT "Position"."Position Description" By "Employee Attributes"."Employee Number", "Workforce Profile Event Fact Details"."Last Hire Date")”>1

    Unfortunately, the above doesn’t work. It returns inconsistent results as well. It looks like it’s closer.  The numbers returned in the columns are only 1, 2 or 3.  This returns about 1300 > 1.  However when I look at the results in Excel using a countif of the concatenated(column it only returns 400.  Maybe Robert is correct about the aggregation issue....  but if this is true it looks to be happening before aggregation not after???  RIght?

    Not sure why but I can definitely confirm that there are no EE ID & Last Hire Date combinations that occur 8 or 11 times in the subset of data.  I think the Count logic is being applied before my criteria filters which makes the results inaccurate.

  • asim cholas
    asim cholas ✭✭✭✭✭

    Please provide create statement(tables) and insert script(for few sample records), so i can check it in my environment.

  • Robert Angel
    Robert Angel ✭✭✭✭✭

    Hi again,

    see => https://gerardnico.com/wiki/dat/obiee/obis/measure_nested_aggregate

    In particular -> Complex Nested Aggregate : Aggregate of a conditional aggregate

    Thoughts; -

    Do you have any access to the ETL to add a new measure to achieve what you need?

    Have you tried including ALL columns involved and adding an additional filter column to do count of values when match up to all of the columns present apart from the column having the duplicates; sure there was a better way to express that! - my reason for ALL columns is that this will guarantee the integrity of what is returned and prevent any pre-client aggregation that could throw your result out.

    Whenever I see case and sum logic in OBIEE my heart sinks and if I can then I push the logic back to the repository or the ETL, I have seen this work and then after a save and reload not work, when everything should be the same.

  • Thanks for the tip.  When I use Date in the query it eliminates the duplicates.  How simple!  I hope I haven't wasted to much of your time! 

    THANK YOU EVERYONE, PROBLEM SOLVED!!!

  • Robert & Asim,

    I don't have access to the ETL and can't provide the script to create tables....  However, by simply adding in the Date Dimension it fixed the duplicate issue.

    Thank you for taking the time to review my inquiry.  I appreciate the assistance.