Categories
- All Categories
- 89 Oracle Analytics News
- 7 Oracle Analytics Videos
- 14.2K Oracle Analytics Forums
- 5.3K Oracle Analytics Idea Labs
- Oracle Analytics User Groups
- 53 Oracle Analytics Trainings
- 59 Oracle Analytics Data Visualizations Gallery
- 2 Oracle Analytics Data Visualizations Challenge
- 4 Oracle Analytics Career
- 4 Oracle Analytics Industry
- Find Partners
- For Partners
The OBIEE Equivalent of Countif or FIlter on Duplicates
In my company we are having an issue where people open a requisition as one Position Description and then they change it after the fact.... creating a duplicate row in our hire table. I know it's not actually a duplicate because the Position Description is different...and I understand why these records come into my query...however I need to identify only the duplicates. Can some one give me a little guidance to get the below results?
I have data that looks like this:
Current Query:
I would like to create a query that only returns the rows with conflicting Position Descriptions.....i..e. the duplicates:
New Query Expected Result:
Any help would be much appreciated. Thanks in advance.
Note: People are often hired more than once in a given period. So in order to get the duplicates it may be necessary to take both Hire Date and Employee ID into account.... i.e. If an employee is rehired they will have the same Employee ID. Thus if Employee ID #1 is hired again on say 6/1/2017 then this ID will occur in the table again.
Answers
-
A simple way in answers would be to put a min function on Position Description, and a max function on Position Description - 2 columns and then a third column with a case statement saying if column 1 <> column 2 then 'Duplicate' else 'Ok' - stick a filter on that and job done.
0 -
Try to add a filter on COUNT("Fact"."Employee ID" By "Fact"."Employee ID" , "Fact"."Hire Date" ) > 1
0 -
I FEEL LIKE THIS ADDITIONAL INFORMATION MAY BE IMPORTANT: Sometimes there can be 3 rows for one Unique Employee ID / Hire Date
Robert thanks for your reply! I tried your method:
I was able to get MIN("Position"."Position Description") to work correctly.
However when I try to use: MAX("Position"."Position Description")-2
I get an error. Guessing it’s because the column is a text field. Should I use Cast to try to force a particular datatype?
Case Statement looks like this currently:
CASE WHEN MIN("Position"."Position Description") <> MAX("Position"."Position Description") THEN 'Duplicate' ELSE 'Not a Duplicate' END
Hopefully this will help.
0 -
Sorry, the -2 is just to indicate it is the second column, don't minus two!
And you still need the other columns that are unique in the answers report.
0 -
Martin thanks for your reply as well! I tried your method:
I put this in a calculated column to try help me understand the logic…
COUNT("Employee Attributes"."Employee Number" By "Employee Attributes"."Employee Number", "Workforce Profile Event Fact Details"."Last Hire Date")
Also I don’t quite understand what the by and the comma in the COUNT() function are doing. Can you help explain the logic please?
I can see that COUNT() Calculates the number or rows having non-null values for the expression. Just not sure about the expression Could you please explain
Anyways when I implemented the helper column and exported the data I got a lot of values that were well above 1. For example I had numbers like 5, 8, 11, 12 etc.... When I exported the data and concatenated the Employee ID and Last Hire Date Columns the numbers greater than 1 were not consistent with duplicates. I found that sometimes these numbers like 8 and 11 were not duplicates. However I did notice that when I filtered on values of 1 that none of these values were ever a duplicate.
Hope this helps. Thanks again for taking the time to respond.
0 -
Unfortunately, after I've changed the logic to just Max() and Min() as you suggested the Position Descriptions all come back the same. i.e. the case statement evaluates to 'Not a Duplicate' in every single row of data.
Current Case Statement:
CASE WHEN MIN("Position"."Position Description") <> MAX("Position"."Position Description") THEN 'Duplicate' ELSE 'Not a Duplicate' END
Thanks again for the help!
0 -
Make sure you have server complex aggregate on this, a lot of the time the result of case statements fails because OBIEE tries to evaluate AFTER aggregation
0 -
Btw, the case statement was your addition, I actually meant to put this in columns and then reference one column in the condition of the other column, and make sure you include your fields that must be unique as a combination.
p.s. Sounds like you need a surrogate key on the data to reflect the slowing changing data behaviour...
0 -
I wrongly assumed the current_query showed detail records, hence your results are not correct.
The 'By' is used to group by items, in this case Employee ID and HireDate. So it counts all the records in the result set that have the same employee Number and Last hire date
If Position Description is always different among duplicates you may also try:
COUNT(DISTINCT "Position"."Position Description" By "Employee Attributes"."Employee Number", "Workforce Profile Event Fact Details"."Last Hire Date") >1
That will count the number of distinct Position descriptions by Employee Number and Last Hire Date. If it's higher than 1 it means another Position Descripion was entered on the same day.
0 -
"Make sure you have server complex aggregate on this, a lot of the time the result of case statements fails because OBIEE tries to evaluate AFTER aggregation"
How do I toggle this setting on?
"Btw, the case statement was your addition, I actually meant to put this in columns and then reference one column in the condition of the other column, and make sure you include your fields that must be unique as a combination."
I made up two additional calculation columns like you suggested. When I use the Answers UI query builder it toggles them back to the underlying calculation in the third field
"p.s. Sounds like you need a surrogate key on the data to reflect the slowing changing data behaviour..."
I only have access to the presentation layer. I know that the actual database keys exist behind the scenes....I just can't access them with my security settings. Not sure I can create a surrogate key because I don't have a unique field.
0