When I query against the w_employee_d table to get a list of all employees in the company, Names of Company X's applicants appear instead of Company X's employees . w_employee_ds (staging table) is the extraction target for tasks SDE_PSFT_EmployeeDimension_Applicant, SDE_PSFT_EmployeeDimension, SDE_PSFT_EmployeeDimension_FIN, SDE_EmployeeDimension_NonEmployee_Collector and SDE_PSFT_EmployeeDimension_SalesRep. The Integration_ID is where i'm suspecting the issue because for example - Mason is the Director of Accounting - he has an Integration_ID = 9854 which is shared with applicant Stacey who also has Integration_ID = 9854 , and so now - when i query for Director of Accounting, Stacey's name appears and not Mason. Please Advise how i can reverse and correct this. thanks
Have you confirmed the following:
1. Are you 100% sure this is not a PSoft side DATA issue. In general, the Integration ID for any OBIA table is unique and reflects a natural key in the source system. In Psoft, when you query for 9854, do you see two people or one? If you see the SAME ID in PSoft exist for two people concurrently, this usually indicates a ERP side anomaly. For HR Analytics, its common to have various "types" of employees/applicants/contingent workers, etc. please confirm that the problem is NOT on the ERP side before you fix the ETLs.
2. Have you confirmed that there is no Patch etc for the version of OBIA you are using? Pls mark helpful/correct if this helps
W_EMPLOYEE_D would have an unique index to prevent the duplicates W_EMPLOYEE_D_U1. So There might be a case to have same Integration_ID in the system if the warehouse is populated from multiple source systems.
To check if that its loading from multiple source systems check the below query. If you get multiple datasource_num_id's then the system is loading from multiple source systems.
select distinct integration_id, datasource_num_id from w_employee_d;
Let me know if this helped.
It seems the system is loading from multiple data sources yes, I ran the sql and I got 2 distinct datasource_num_ids - 11 and 12. example of TOAD grid below
0071 ...................... 11
0071 ...................... 12
0120 ...................... 11
0120 ...................... 12
what would be the way forward in sucjh a case. I will definitely assign helpful and correct remarks to your responses.
Edited by: 933130 on Feb 5, 2013 12:47 PM
Edited by: 933130 on Feb 5, 2013 12:48 PM
Edited by: 933130 on Feb 5, 2013 12:49 PM
If you have multiple datasources, you should still only have a UNIQUE index in the Dimension on INTEGRATION ID concatenated with DATASOURCENUM ID. The index should not break. With regard to your original question..how is this a problem? You are basically saying you have 2 PSoft sources and employees with the same ID across 2 systems, which is always a possibility. What exactly is the issue/question?
So you will be having 2 different "Director of Accounting" if you are loading from 2 different systems. To help with this scenario you have to find how the 2 Directors of Accounting differ. Lets say if the ORG is different then for you report you have to push ORG filters to all your queries. So that they would not cross each others data.
Let me know if this helped.
Ok sorry for any confusion- here is the problem... ther is one Director of Accounting, and he is a full time employee and flagged correctly in the ps_employees (source) table. When data from this table is extracted to the w_employee_d table, he has an integration ID that is similar to the integration ID of a person who applied for a job with my company. That person is not an employee and didnt get a job here but a record of that job applicant is being shown in Answers and the Dashboards in place of the correct employee(Director of Accounting). I was querying the table to see a probable reason why this applicant's name is being shown as the Director of Accounting and the common relationship between the two was a similar integration id. Im just trying to find why this applicants name is being represented instead of the real employee.
Ok...here are some things to consider:
1. In HR Analytics, there is usually a "Employee" Flag in the Fact table that indicates if a person is an EMPLOYEE versus APPLICANT. So, I am not sure how you are building you answers report but you may want to use the proper FACT table measure that uses this FLAG. This flag is used to differentiate EMPLOYEES from APPLICANTS. You may not be building the reports using this measure. Note that this is on the FACT table, not on the Employee DImension. Either way, the APPLICANT and EMPLOYEE should both have DIFFERENT values for this flag as one is clearly not an Employee yet.
2. Secondly, are you TRYING to show data from both sources in the same report? If so, then you will always have the possibility of the same EMPLOYEE ID existing in 2 systems. If you are only trying to show a report for one system, you can edit the RPD or Report to use DATASOURCENUMID as a filter.
Hope this helps
So in such a case you can have a fix at the etl level. Check if you ever need the applicant information in the warehouse system. If not then filter the applicants from in the ETL that would be the best option. If not then for the functional area/Subject area you are working on have the content filter in the BMM to filter out applicants.
Let me know if this helps.
That was my initial thought honestly - i have the task SDE_EmployeeDimension_Applicant loading into w_employee_ds and I was thinkin of making this task inactive and running a full load. Would this be the way you would filter? I just want to make sure before i do it.
That would be a question to the Business. If they ever report on Applicants. If they say they would not then this would be the best approach. If they say yes then content filter in the RPD would help.
Let me know if this helps.
Ahsan to get back to what you mentioned earlier,
The fact table W_WRKFC_EVT_MONTH_F is the one joined with W_EMPLOYEE_D. I did a simple query SELECT DISTINCT DATASOURCE_NUM_ID FROM OBAW.W_WRKFC_EVT_MONTH_F and only one datasource_id resulted and this is the same datasource_id that is associated with applicants in the dimension table. The datasource_id related to employees is not present in the Fact table. W_EMPLOYEE_D.ROW_WID=Fact_W_WRKFC_EVT_MONTH_F.EMPLOYEE_WID but when i query row_wid vs employee_wid in the Fact table, only those of applicant appear in the fact table. Would populating the fact with the other employee data missing probably fix this problem, if so how can i go about this? thank you much for your help, your navigation have helped so much!
You need to check WHY the Employee records for that ID dont make it to the W_WRKFC_EVT_MONTH_F table. You need to trace backwards...star w the W_WRKFC_EVT_F table (the more granular one) and see if the data is there. The MONTHLY is basically a MONTHLY snapshot so check if that record made it to the base fact. If it isnt there, keep tracing back..you will see various data flows for Applicants, Employees, Events..etc. So you need to see where the EMPLOYEE data is not getting passed through and correct that at the ETL level. I assume there is either something to be fixed at the ETL level or the Data has anomalies that cause it to not show properly. For example, can you confirm that the EMPLOYEE has a proper HIRE DATE, etc in the source system?
If helpful, mark it..