Forum Stats

  • 3,784,135 Users
  • 2,254,897 Discussions
  • 7,880,705 Comments

Discussions

Aggregate data with null and duplicated values

Carl C
Carl C Member Posts: 72
edited Oct 4, 2018 8:49AM in SQL & PL/SQL

Hi,

I've this issue, I hope you can help.

I've this dataset, with 3 specifiques cases :

Rule :

1 person = 1 birthday

Cases

Some people have same name with same ID but they've diff birth date  (Case ID 1000)

Some people do not have information about birthday (Case ID 2000)

Most people have one birthday but also have some records where birthday is null (CAse ID 1000, 2000,3000,4000)

and finally

Some people are twins, they have differents name, and surnames but same birthday (Case ID 4000) which also include some records with null values

    

ID Name SurnameBirthdayComments
1000AAABBB19000101
1000AAABBBNullDiff brthday
1000AAABBB19201006
2000HHHGGGnullBrthday Not available
3000MMMRRRnull
3000MMMRRR19620723
4000TTTJJJnull
4000TTTJJJ20010801Twins
4000SSSYYY20010801Twins

I need to get this result

   

ID Name SurnameBirthday
1000AAABBB19000101
1000AAABBB19201006
2000HHHGGGnull
3000MMMRRR19620723
4000TTTJJJ20010801
4000SSSYYY20010801

Sum up

To get each people id, name, surname and birthday,

eliminating duplicates,

Attached file, script to create  dataset.

I almost forgot, Oracle version 11g2.

Thank you if you can help

Tagged:
Carl C

Best Answer

  • mathguy
    mathguy Member Posts: 10,229 Blue Diamond
    edited Oct 3, 2018 11:32AM Accepted Answer

    Here's one way - using NVL to replace null birtdates with the max birthday for the same id, name and surname, and then de-duplicating with SELECT DISTINCT. I left out your Comment column since it is not needed for the query anyway. It is not clear what the data type of your dates is (string? number? SHOULD be date!) but the query works regardless. The WITH clause is just to simulate the inputs in the query itself; when you use it, make sure to remove the WITH clause, and change the SELECT and FROM clauses to use your actual table and column names.

    with  sample_data(id, name, surname, birthday) as (    select 1000, 'AAA', 'BBB', 19000101 from dual union all       select 1000, 'AAA', 'BBB', null     from dual union all       select 1000, 'AAA', 'BBB', 19201006 from dual union all       select 2000, 'HHH', 'GGG', null     from dual union all       select 3000, 'MMM', 'RRR', null     from dual union all       select 3000, 'MMM', 'RRR', 19620723 from dual union all       select 4000, 'TTT', 'JJJ', null     from dual union all       select 4000, 'TTT', 'JJJ', 20010801 from dual union all       select 4000, 'SSS', 'YYY', 20010801 from dual  )select   distinct id, name, surname,         nvl(birthday, max(birthday) over (partition by id, name, surname)) as birthdayfrom     sample_dataorder by id, name, surname, birthday -- if needed;     ID NAME     SURNAME    BIRTHDAY------- -------- -------- ----------   1000 AAA      BBB        19000101   1000 AAA      BBB        19201006   2000 HHH      GGG                  3000 MMM      RRR        19620723   4000 SSS      YYY        20010801   4000 TTT      JJJ        20010801

Answers

  • mathguy
    mathguy Member Posts: 10,229 Blue Diamond
    edited Oct 3, 2018 11:32AM Accepted Answer

    Here's one way - using NVL to replace null birtdates with the max birthday for the same id, name and surname, and then de-duplicating with SELECT DISTINCT. I left out your Comment column since it is not needed for the query anyway. It is not clear what the data type of your dates is (string? number? SHOULD be date!) but the query works regardless. The WITH clause is just to simulate the inputs in the query itself; when you use it, make sure to remove the WITH clause, and change the SELECT and FROM clauses to use your actual table and column names.

    with  sample_data(id, name, surname, birthday) as (    select 1000, 'AAA', 'BBB', 19000101 from dual union all       select 1000, 'AAA', 'BBB', null     from dual union all       select 1000, 'AAA', 'BBB', 19201006 from dual union all       select 2000, 'HHH', 'GGG', null     from dual union all       select 3000, 'MMM', 'RRR', null     from dual union all       select 3000, 'MMM', 'RRR', 19620723 from dual union all       select 4000, 'TTT', 'JJJ', null     from dual union all       select 4000, 'TTT', 'JJJ', 20010801 from dual union all       select 4000, 'SSS', 'YYY', 20010801 from dual  )select   distinct id, name, surname,         nvl(birthday, max(birthday) over (partition by id, name, surname)) as birthdayfrom     sample_dataorder by id, name, surname, birthday -- if needed;     ID NAME     SURNAME    BIRTHDAY------- -------- -------- ----------   1000 AAA      BBB        19000101   1000 AAA      BBB        19201006   2000 HHH      GGG                  3000 MMM      RRR        19620723   4000 SSS      YYY        20010801   4000 TTT      JJJ        20010801
  • Paulzip
    Paulzip Member Posts: 8,545 Blue Diamond
    edited Oct 3, 2018 11:33AM

    The enclosed data isn't correct, as it is including nulls as 'null'. But anyway, here's one way...

    select distinct id, name, surname, coalesce(birthday, max(birthday) over (partition by id, name, surname)) as birthday

    from   test

    order by id, name, surname, birthday

    Carl CCarl C
  • Paulzip
    Paulzip Member Posts: 8,545 Blue Diamond
    edited Oct 3, 2018 11:39AM

    Just noticed your post Mathguy, it seems we think alike, although mine should be slightly more efficient as the max(..) over.. will only be evaluated if birthday is null with mine, whereas yours will evaluate that regardless.

    Carl CCarl C
  • Carl C
    Carl C Member Posts: 72
    edited Oct 4, 2018 8:49AM

    @mathguy

    Thank you for your help.

    Very appreciated