Oracle Analytics Cloud and Server

Welcome to the Oracle Analytics Community: Please complete your User Profile and upload your Profile Picture

The OBIEE Equivalent of Countif or FIlter on Duplicates

Received Response
1382
Views
18
Comments
mrmmickle1
mrmmickle1 Rank 4 - Community Specialist

In my company we are having an issue where people open a requisition as one Position Description and then they change it after the fact.... creating a duplicate row in our hire table.  I know it's not actually a duplicate because the Position Description is different...and I understand why these records come into my query...however I need to identify only the duplicates.  Can some one give me a little guidance to get the below results?

I have data that looks like this:

Current Query:

OBIEE Sample.jpg

I would like to create a query that only returns the rows with conflicting Position Descriptions.....i..e. the duplicates:

New Query Expected Result:

Expected Result.jpg

Any help would be much appreciated.  Thanks in advance.

Note: People are often hired more than once in a given period.  So in order to get the duplicates it may be necessary to take both Hire Date and Employee ID into account.... i.e. If an employee is rehired they will have the same Employee ID.  Thus if Employee ID #1 is hired again on say 6/1/2017 then this ID will occur in the table again.

«1

Answers

  • Robert Angel
    Robert Angel Rank 8 - Analytics Strategist

    A simple way in answers would be to put a min function on Position Description, and a max function on Position Description - 2 columns and then a third column with a case statement saying if column 1 <>  column 2  then 'Duplicate' else 'Ok' - stick a filter on that and job done.

  • Martin van Donselaar
    Martin van Donselaar Rank 6 - Analytics Lead

    Try to add a filter on COUNT("Fact"."Employee ID" By "Fact"."Employee ID" , "Fact"."Hire Date" ) > 1

  • mrmmickle1
    mrmmickle1 Rank 4 - Community Specialist

    I FEEL LIKE THIS ADDITIONAL INFORMATION MAY BE IMPORTANT:  Sometimes there can be 3 rows for one Unique Employee ID / Hire Date

    Robert thanks for your reply!  I tried your method:

    I was able to get MIN("Position"."Position Description") to work correctly. 

    However when I try to use: MAX("Position"."Position Description")-2

    I get an error.  Guessing it’s because the column is a text field.  Should I use Cast to try to force a particular datatype?

    Case Statement looks like this currently:

    CASE WHEN MIN("Position"."Position Description") <> MAX("Position"."Position Description") THEN 'Duplicate' ELSE 'Not a Duplicate' END

    Hopefully this will help.

  • Robert Angel
    Robert Angel Rank 8 - Analytics Strategist

    Sorry, the -2 is just to indicate it is the second column, don't minus two!

    And you still need the other columns that are unique in the answers report.

  • mrmmickle1
    mrmmickle1 Rank 4 - Community Specialist

    Martin thanks for your reply as well! I tried your method:

    I put this in a calculated column to try help me understand the logic…

    COUNT("Employee Attributes"."Employee Number" By "Employee Attributes"."Employee Number", "Workforce Profile Event Fact Details"."Last Hire Date")

    Also I don’t quite understand what the by and the comma in the COUNT() function are doing.  Can you help explain the logic please?

    I can see that COUNT() Calculates the number or rows having non-null values for the expression.  Just not sure about the expression   Could you please explain

    Anyways when I implemented the helper column and exported the data I got a lot of values that were well above 1.  For example I had numbers like 5, 8, 11, 12 etc.... When I exported the data and concatenated the Employee ID and Last Hire Date Columns the numbers greater than 1 were not consistent with duplicates.  I found that sometimes these numbers like 8 and 11 were not duplicates. However I did notice that when I filtered on values of 1 that none of these values were ever a duplicate.

    Hope this helps.  Thanks again for taking the time to respond.

  • mrmmickle1
    mrmmickle1 Rank 4 - Community Specialist

    Unfortunately, after I've changed the logic to just Max() and Min() as you suggested the Position Descriptions all come back the same.  i.e. the case statement evaluates to 'Not a Duplicate' in every single row of data.

    Current Case Statement:

    CASE WHEN MIN("Position"."Position Description") <> MAX("Position"."Position Description") THEN 'Duplicate' ELSE 'Not a Duplicate' END

    Thanks again for the help!

  • Robert Angel
    Robert Angel Rank 8 - Analytics Strategist

    Make sure you have server complex aggregate on this, a lot of the time the result of case statements fails because OBIEE tries to evaluate AFTER aggregation

  • Robert Angel
    Robert Angel Rank 8 - Analytics Strategist

    Btw, the case statement was your addition, I actually meant to put this in columns and then reference one column in the condition of the other column, and make sure you include your fields that must be unique as a combination.

    p.s. Sounds like you need a surrogate key on the data to reflect the slowing changing data behaviour...

  • Martin van Donselaar
    Martin van Donselaar Rank 6 - Analytics Lead

    I wrongly assumed the current_query showed detail records, hence your results are not correct.

    The 'By'  is used to group by items, in this case Employee ID and HireDate. So it counts all the records in the result set that have the same employee Number and Last hire date

    If Position Description is always different among duplicates you may also try:

    COUNT(DISTINCT "Position"."Position Description" By "Employee Attributes"."Employee Number", "Workforce Profile Event Fact Details"."Last Hire Date") >1

    That will count the number of distinct Position descriptions by Employee Number and Last Hire Date. If it's higher than 1 it means another Position Descripion was entered on the same day.

  • mrmmickle1
    mrmmickle1 Rank 4 - Community Specialist

    "Make sure you have server complex aggregate on this, a lot of the time the result of case statements fails because OBIEE tries to evaluate AFTER aggregation"

    How do I toggle this setting on?

    "Btw, the case statement was your addition, I actually meant to put this in columns and then reference one column in the condition of the other column, and make sure you include your fields that must be unique as a combination."

    I made up two additional calculation columns like you suggested.  When I use the Answers UI query builder it toggles them back to the underlying calculation in the third field

    "p.s. Sounds like you need a surrogate key on the data to reflect the slowing changing data behaviour..."

    I only have access to the presentation layer.  I know that the actual database keys exist behind the scenes....I just can't access them with my security settings.  Not sure I can create a surrogate key because I don't have a unique field.