6 Replies Latest reply: Nov 26, 2010 5:52 AM by naga shankar RSS

    sum column B with distinct column A?

    763900
      Hello,

      Would anybody know if this is possible at all? Preferably within the function without further grouping or nesting as this is a much simpified explanation of an issue I am having with a far larger and more esoteric query...

      If I have a table with ID codes in column A, which are often repeated over several rows, and USD amounts in column A.

      eg.

      ID¦USD
      1¦1,000
      2¦5,000
      3¦2,000
      3¦2,000
      3¦2,000
      3¦2,000
      4¦5,000
      4¦5,000
      5¦20,000
      5¦20,000
      5¦20,000

      I would like to sum the USD amount in column B, but only once per unique ID in column A, ie (1000+5000+2000+5000+20000 = 33000)

      I tried the following syntax to no avail;
      sum(USD) over(distinct ID) ob
      I am assuming that the following would count each distinct USD amount, this is not what I am looking for;
      sum(distinct USD) ob
      FYI here is the query shell that I am trying to use this in, the problem I have is that the ID codes in the query results i've called 'table' duplicate (correctly) IDs on occassion, I only want my counts and sums to count each ID once though. I can easily count my distinct IDs but I am not sure how get it to sum the values in the other columns based on this.
      select  geo_segment_name 
            , bus_grp_typ
            , agg_grp_sctr
            , sum(original_balance_usdm) ob                   --need this for distict class_ids
            , sum(current_balance_qport_usdm) cb         --need this for distict class_ids
            , count(distinct deal_id) deal_count
            , count(distinct class_id) class_count 
      from (table)
      group by geo_segment_name, bus_grp_typ, agg_grp_sctr
      Could anyone please advise?

      In your debt!

      Thanks

      Jon
        • 1. Re: sum column B with distinct column A?
          Frank Kulash
          Hi, Jon,
          berlinhammer wrote:
          Hello,

          Would anybody know if this is possible at all? Preferably within the function without further grouping or nesting as this is a much simpified explanation of an issue I am having with a far larger and more esoteric query...
          A nested GROUP BY (in a sub-query) might be the simplest and most efficient solution. Don't rule it out.

          Analytic functions are applied after GROUP BY, so if the usd you are displaying is the result of some aggregate function, you can use it as the argument to an analytic function, like this:
          SUM (MAX (usd)) OVER ()   AS total_usd
          When MAX is the aggregate fucntion. (It could be any function, including SUM).

          This assumes that all rows in the reuslt set that have the same id also have the same usd. If they don't, how to you decide which value of usd to use?

          In your preal problem, you'll proably need to partition by some, but not all, of your GROUP BY columns. (The actual problem is almost certainly more clear to you that it is to be me; it could hardly be less.)
          I tried the following syntax to no avail;
          sum(USD) over(distinct ID) ob
          No, that's a syntax error. The first thing in the analytic clause (right after the keyword OVER) must be either "PARTITION BY" or "GROUP BY".
          I am assuming that the following would count each distinct USD amount, this is not what I am looking for;
          sum(distinct USD) ob
          You're absolutely correct; it would count only distinct values, and that's not what you want, because two distinct ids may coincidentally have the same usd value.
          FYI here is the query shell that I am trying to use this in, the problem I have is that the ID codes in the query results i've called 'table' duplicate (correctly) IDs on occassion, I only want my counts and sums to count each ID once though. I can easily count my distinct IDs but I am not sure how get it to sum the values in the other columns based on this.
          select  geo_segment_name 
          , bus_grp_typ
          , agg_grp_sctr
          , sum(original_balance_usdm) ob                   --need this for distict class_ids
          , sum(current_balance_qport_usdm) cb         --need this for distict class_ids
          , count(distinct deal_id) deal_count
          , count(distinct class_id) class_count 
          from (table)
          group by geo_segment_name, bus_grp_typ, agg_grp_sctr
          Could anyone please advise?
          Whenever you have a problem, post a little sample data (CREATE TABLE and INSERT statements) and the results you want from that data.
          If people can't reproduce the problem, they are likely to mis-understand it, and they can't possibly test their ideas.
          I know it can be a lot of work, but you'll get better answers sooner if you provide some useable sample data, and post the exact results you want from that data.
          • 2. Re: sum column B with distinct column A?
            Sayan Malakshinov.
            If Oracle >=10g:
            with t(ID,USD) as (
                select 1,1.000 from dual union all
                select 2,5.000 from dual union all
                select 3,2.000 from dual union all
                select 3,2.000 from dual union all
                select 3,2.000 from dual union all
                select 3,2.000 from dual union all
                select 4,5.000 from dual union all
                select 4,5.000 from dual union all
                select 5,20.000 from dual union all
                select 5,20.000 from dual union all
                select 5,20.000 from dual
            )
            select 
                s
            from t
            group by id
            model
                 return updated rows
                 dimension by (id)
                 measures (max(usd) s)
                 rules(s[1]=sum(s)[any])
            Regards,
            Sayan M.
            • 3. Re: sum column B with distinct column A?
              763900
              Hi Frank

              Thanks a lot for replying.

              Apologies for the lack of 'sample' data but I am afraid I do not have much in the way of access rights on my system and have never used the INSERT or CREATE statements in my life. I just don't know how to do it I'm afraid I only know how to query from tables that already exist.

              You're points are very helpful but I'm not sure I completely understand what I should do now.

              I'll try and adapt the simple example I left a little to hopefully simulate the situation I have

              table

              ID¦USD¦Country
              1¦1,000¦Italy
              2¦5,000¦Spain
              3¦2,000¦UK
              3¦2,000¦UK
              3¦2,000¦UK
              3¦2,000¦UK
              4¦5,000¦Germany
              4¦5,000¦Germany
              5¦20,000¦UK
              5¦20,000¦UK
              5¦20,000¦UK

              let's say I want the following output:

              COUNTRY¦ID_CNT¦SUM_USD
              Italy¦1¦1,000
              Spain¦1¦5,000
              UK¦2¦22,000
              Germany¦1¦5,000
              select country, count(distinct id) id_cnt, (sum usd for distinct id syntax??) sum_usd
              from table
              group by country
              How could I alter the code above to get those results?

              I'll try and give your suggestions some more reads but I'm not I'm following you completely to be honest, it is late in the day though.....

              Thanks Frank,

              Jon
              • 4. Re: sum column B with distinct column A?
                Frank Kulash
                Hi, Jon,
                berlinhammer wrote:
                Hi Frank

                Thanks a lot for replying.

                Apologies for the lack of 'sample' data but I am afraid I do not have much in the way of access rights on my system and have never used the INSERT or CREATE statements in my life. I just don't know how to do it I'm afraid I only know how to query from tables that already exist.
                You really ought to have the ability to create small test tables. It can be very useful, when developing and testing any query, to make a test table showing exactly the kinds of problems you need to solve. Convince your employer (or whomever) that you need this ability. It can be in a very restricted schema on a purely development database.
                If you can't get the necessary privileges in the database where you work, download your own copy of Oracle Express Editiion on your own workstation.

                Even if all you can do is select from existing tables, select some made-up sample data from the existing dual table, like this:
                WITH     sample_data     AS
                (
                     SELECT     1 AS id,      1000 AS usd,     'Italy' AS country     FROM dual     UNION ALL
                     SELECT     2,           5000,          'Spain'               FROM dual     UNION ALL
                     SELECT     3,           2000,          'UK'               FROM dual     UNION ALL
                     SELECT     3,           2000,          'UK'               FROM dual     UNION ALL
                     SELECT     3,           2000,          'UK'               FROM dual     UNION ALL
                     SELECT     3,           2000,          'UK'               FROM dual     UNION ALL
                     SELECT     4,           5000,          'Germany'          FROM dual     UNION ALL
                     SELECT     4,           5000,          'Germany'          FROM dual     UNION ALL
                     SELECT     5,          20000,          'UK'               FROM dual     UNION ALL
                     SELECT     5,          20000,          'UK'               FROM dual     UNION ALL
                     SELECT     5,          20000,          'UK'               FROM dual
                )
                SELECT  *
                FROM    sanple_data;
                Whenever you have a problem, posting a WITH clause like the one above is almost as good as posting CREATE TABLE and INSERT statements

                To get the results you want from this sample data, without any additional sub-queries:
                SELECT DISTINCT
                       MIN (country)                              AS country
                ,       COUNT (*)       OVER (PARTITION BY MIN (country))     AS id_cnt
                ,       SUM (MIN (usd)) OVER (PARTITION BY MIN (country))     AS usd
                FROM       sample_data
                GROUP BY  id
                ;
                If you don't understand it right away, or if you have trouble adapting this to a more complicated query, I don't blame you. It's a clever way of getting a "GROUP BY id" query to look like a "GROUP BY country" query where only one country is associated to any id. (When talking about code, "clever" is not a compliment.)

                The basic problem here is that you have duplicate rows. Whenever two rows have the same id, they always have the same usd and country. You want to ignore all but one copy of each, which you can easily do using SELECT BY DISTINCT :
                WITH     no_duplicates     AS
                (
                     SELECT DISTINCT       id, usd, country
                     FROM            sample_data
                )
                SELECT       country
                ,       COUNT (*)     AS id_cnt
                ,       SUM (usd)     AS total_usd
                FROM       no_duplicates
                GROUP BY  country
                ;
                It's a very bad table design that has completely identical rows like that. It's such a bad design, that I suspect you don't really have a table like that. I think what you posted must be a view, or the result set of some query (which is exactly what a view is) that intentionally replicates some of the data on multiple rows.
                If so, the best solution might invlove dealing with the base tables directly. The view may be perfect for some jobs, but a hindrance for this one.

                Given that you do have to work with what you posted, you can remove the duplicates like I did above, or, depending on your actual data, you might want to GROUP BY id and country first (in a sub-query), and then (in the main query) GROUP BY country alone, like this:
                WITH     id_summary     AS
                (
                     SELECT       id, country
                     ,       MIN (usd)     AS usd
                     FROM       sample_data
                     GROUP BY  id, country
                )
                SELECT       country
                ,       COUNT (*)     AS id_cnt
                ,       SUM (usd)     AS total_usd
                FROM       id_summary
                GROUP BY  country
                ;
                Edited by: Frank Kulash on Nov 25, 2010 2:15 PM
                • 5. Re: sum column B with distinct column A?
                  763900
                  Frank that is a fantastic post thank you.

                  You are right it is a far simplified example from a much more complicated query. I only included the fields that seemed relevant, the actual query I'm trying to analyse creates about 60 rows and the duplicates are important because of information in the other fields, they just are a pain when I want to do my sums! FYI I work for a ratings agency and sometimes more than one type of rating (eg international or national) gets assigned to each financial product(class), these are what cause the duplicates.

                  I had no idea about that WITH statement that will be very helpful henceforth, so thanks for pointing that out. You're right about the rights too I'll have a word with my boss.

                  As for my immediate problem I think I like the look of you select distinct idea. I'll get cracking with it and let you know how I get on!

                  Cheers,

                  Jon
                  • 6. Re: sum column B with distinct column A?
                    naga shankar
                    HI ,

                    select rownum , sum(m_value) over (order by m_value ) sum
                    from ( select distinct max(value) over (partition by id order by rownum) m_value
                    from (select id,value from t order by id ))

                    i guess this works for you . if like to tune this query , add the sum function to the max function directly .
                    results willl be like

                    ROWNUM     SUM
                    1     1000
                    2     3000
                    3     6000
                    4     10000
                    5     15000

                    i like to write complex queries :) . hope this may help

                    As my guru Thomas Kyte said " Analytics rock and roll"