6 Replies Latest reply: Jan 30, 2013 12:07 PM by 942572 RSS

    Update columns with subquery

    942572
      Hello,

      I am using Oracle 11.2, and need update columns with sub query.

      Create table tb_base(id number(5), score number(5), YR_MO number(6), total_score_season number(5), season_start_yr_mo number(6));

      Create table tb_season(season_id number(5), start_yr_mo number(6), end_yr_mo number(6));

      insert into tb_base (1, 30, 201005, 0, 0);
      insert into tb_base (1, 12, 201008, 0, 0);
      insert into tb_base (1, 43, 201105, 0, 0);
      insert into tb_base (1, 10, 201107, 0, 0);
      insert into tb_base (2, 15, 201006, 0, 0);

      insert into tb_season (1, 201005, 201009);
      insert into tb_season (2, 201104, 201108);

      I would like to updat the column total_score_season and season_start_yr_mo with one UPDATE statement as following:
      tb_base 1, 30, 201005, 30, 201005
      1, 12, 201008, 42, 201005
      1, 43, 201105, 43, 201104
      1, 10, 201107, 53, 201104
      2, 15, 201006, 15, 201005
        • 1. Re: Update columns with subquery
          Frank Kulash
          Hi,

          If it's not obvious how to do an UPDATE, then maybe UPDATE is the wrong tool for the job. Try MERGE instead:
          MERGE INTO  tb_base     dst
          USING (
                    SELECT  b.id, b.score, b.yr_mo
                 ,       SUM (b.score) OVER ( PARTITION BY  b.id
                                             ,          s.start_yr_mo
                                             ORDER BY          b.yr_mo
                                   ) AS running_score_season
                 ,       s.start_yr_mo
                 FROM       tb_base     b
                 JOIN       tb_season     s  ON   b.yr_mo  BETWEEN  s.start_yr_mo
                                                AND       s.end_yr_mo
                )               src 
          ON    (    dst.id     = src.id
                AND  dst.yr_mo     = src.yr_mo
                )
          WHEN MATCHED THEN UPDATE
          SET   dst.total_score_season     = src.running_score_season
          ,     dst.season_start_yr_mo     = src.start_yr_mo
          ;
          "Total_Score_Season" implies the total score at the end of the season. You're uisng that column name for a cumulative score up to some point, perhaps in mid-season. Maybe some other name, such as <b>running</b>scoreseason, or season_score_<b>so_far</b> would be clearer.

          No matter what you call it, a cumulative total calls for the analytic SUM function. While you can use analytic functions in the corellated sub-queries used in UPDATE statements, they are extremely inefficient, because they have to be re-computed for every row, so the fact that you want a cumulative total suggests you want MERGE rather than UPDATE.
          • 2. Re: Update columns with subquery
            942572
            Hi Frank,

            What if the table tb_season change to the following:

            create table tb_season(id number(5), season_day date, start_yr_mo number(6), end_yr_mo number(6));

            insert into tb_season values( 1, to_date('05012010', 'ddmmyyyy'), 201005, 201009);
            insert into tb_season values( 2, to_date('05022010', 'ddmmyyyy'), 201005, 201009);
            insert into tb_season values( 3, to_date('05032010', 'ddmmyyyy'), 201005, 201009);
            insert into tb_season values( 4, to_date('06012010', 'ddmmyyyy'), 201005, 201009);
            insert into tb_season values( 5, to_date('07012010','ddmmyyyy'), 201005, 201009);

            Thanks for your help!
            • 3. Re: Update columns with subquery
              Frank Kulash
              Hi,
              939569 wrote:
              Hi Frank,

              What if the table tb_season change to the following:
              What are the results you want from the new data?
              create table tb_season(id number(5), season_day date, start_yr_mo varchar2(6), end_yr_mo varchar2(6));

              insert into tb_season values( 1, to_date('05012010', 'ddmmyyyy'), '201005', '201009');
              insert into tb_season values( 2, to_date('05022010', 'ddmmyyyy'), '201005', '201009');
              insert into tb_season values( 3, to_date('05032010', 'ddmmyyyy'), '201005', '201009');
              insert into tb_season values( 4, to_date('06012010', 'ddmmyyyy'), '201005', '201009');
              insert into tb_season values( 5, to_date('07012010','ddmmyyyy'), '201005', '201009');
              Is start_yr_mo really supposed to be the same on all rows? Is end_yr_mo always the same, too?
              What role does the new season_day column play in this problem?

              Points in time (such as start_yr_mo and end_yr_mo) should be stored in DATE columns. In this particular problem, you're not causing yourself extra trouble by using NUMBERs or VARCHAR2s, but you could have problems in other queries involving the same data, or if your requirements change.

              Given that you do want to use NUMBERs or VARCHAR2s, why not pick one or the other, and use that data type consistently? That is why do you want to have NUMBERs like 201005 in tb_base, and at the same time have VARCHAR2s like '201005' in tb_season?

              If you really must use different data types, then use TO_CHAR to convert the NUMBER to a VARCHAR2 when you need to compare them. Don't use TO_NUMBER, because you'll get run-time errors when you have invalid data.
              • 4. Re: Update columns with subquery
                942572
                I would like to have the same result as before. Sorry, I made a mistake on the data type. It is shall remain as NUMBER(6). Currently, I am not sure how the sum function works here?

                Create table tb_base(id number(5), score number(5), YR_MO number(6), total_score_season number(5), season_start_yr_mo number(6));

                Create table tb_season(season_day date, start_yr_mo number(6), end_yr_mo number(6));

                insert into tb_base values (1, 30, 201005, 0, 0);
                insert into tb_base values (1, 12, 201008, 0, 0);
                insert into tb_base valuse (1, 43, 201105, 0, 0);
                insert into tb_base valuse (1, 10, 201107, 0, 0);
                insert into tb_base values (2, 15, 201006, 0, 0);

                insert into tb_season(to_date('20100501', 'yyyymmdd'), 201005, 201009);
                insert into tb_season(to_date('20100502', 'yyyymmdd'), 201005, 201009);
                insert into tb_season(to_date('20100503', 'yyyymmdd'),
                201005, 201009);
                insert into tb_season(to_date('20100801', 'yyyymmdd'),
                201005, 201009);
                insert into tb_season(to_date('20110501', 'yyyymmdd'),
                201104, 201110);
                insert into tb_season(to_date('20110502', 'yyyymmdd'), 201104, 201110);
                insert into tb_season(to_date('20110611', 'yyyymmdd'), 201104, 201110);
                insert into tb_season(to_date('20110612', 'yyyymmdd'), 201104, 201110);
                insert into tb_season(to_date('20110710', 'yyyymmdd'), 201104, 201110);

                I would like to have the season start time and latest total score for this season updated for tb_base. The season start time and which month belong to this season are based on the information in tb_season. The result will be as following:

                tb_base
                1, 30, 201005, 30, 201005
                1, 12, 201008, 42, 201005
                1, 43, 201105, 43, 201104
                1, 10, 201107, 53, 201104
                2, 15, 201006, 15, 201005

                Note. The start_yr_mo and end_yr_mo will be the same for the same season.
                • 5. Re: Update columns with subquery
                  Frank Kulash
                  Hi,
                  939569 wrote:
                  ... Currently, I am not sure how the sum function works here?
                  It's a cumulative, or running, total.
                  On any given row,
                  ...       ,       SUM (b.score) OVER ( PARTITION BY  b.id
                                                     ,          s.start_yr_mo
                                                     ORDER BY          b.yr_mo
                                           ) AS running_score_season
                  is the sum of the score column on that row and all other rows that have the same b.id and s.start_yr_mo as the current row (that's what PARTITION BY is doing), and that have b.yr_mo less than or equal to b.yr_mo on the current row (that's what ORDER BY is doing).
                  ...
                  Create table tb_season(season_day date, start_yr_mo number(6), end_yr_mo number(6));
                  ...
                  insert into tb_season(to_date('20100501', 'yyyymmdd'), 201005, 201009);
                  insert into tb_season(to_date('20100502', 'yyyymmdd'), 201005, 201009); ...
                  I'll try to post code that really works, and I'll test my code before I post it. Will you do the same?
                  The INSERT statements are all missing the VALUES keyword.
                  I would like to have the season start time and latest total score for this season updated for tb_base. The season start time and which month belong to this season are based on the information in tb_season.
                  It looks like tb_season is not normalized. That is, the fact that the 2010 season started ran from May through September is repeated on every row that concerns that season. A better table design would be to have a separate table (let's call it s1 for now) with one row per season, having columns for start_yr_mo, end_yr_mo and (maybe) a separate primary key. Season_day would go in a separate table, containing season_day and the primary key of s1.

                  If you had a table like s1, you would use it in place of the existing tb_season in the MERGE4 statement I posted yesterday.
                  Given that you don't have such a table, we can generate a result set that looks like that table on the fly. That's what the sub-query unique_season below is:
                  MERGE INTO  tb_base     dst
                  USING (
                         WITH     unique_season     AS
                         (
                            SELECT DISTINCT
                                    start_yr_mo
                            ,     end_yr_mo
                            FROM    tb_season
                         )
                            SELECT  b.id, b.score, b.yr_mo
                         ,       SUM (b.score) OVER ( PARTITION BY  b.id
                                                     ,          s.start_yr_mo
                                                     ORDER BY          b.yr_mo
                                           ) AS running_score_season
                         ,       s.start_yr_mo
                         FROM       tb_base     b
                         JOIN       unique_season     s  ON   b.yr_mo  BETWEEN  s.start_yr_mo
                                                        AND       s.end_yr_mo
                        )               src 
                  ON    (    dst.id     = src.id
                        AND  dst.yr_mo     = src.yr_mo
                        )
                  WHEN MATCHED THEN UPDATE
                  SET   dst.total_score_season     = src.running_score_season
                  ,     dst.season_start_yr_mo     = src.start_yr_mo
                  ;
                  
                  SELECT       *
                  FROM       tb_base
                  ORDER BY  id
                  ,            yr_mo
                  ;
                  Note that this is exactly what I posted yesterday, except for the sub-query unique_season, and the main USING ... SELECT clause now references unique_season instead of tb_season.

                  The results are what you requested:
                  `                TOTAL_ SEASON_
                                   SCORE_  START_
                  ID SCORE   YR_MO SEASON   YR_MO
                  -- ----- ------- ------ -------
                   1    30  201005     30  201005
                   1    12  201008     42  201005
                   1    43  201105     43  201104
                   1    10  201107     53  201104
                   2    15  201006     15  201005
                  • 6. Re: Update columns with subquery
                    942572
                    It is working, many thanks!