4 Replies Latest reply: Jan 11, 2013 9:29 AM by Frank Kulash RSS

    Padding numbers within a string

    633014
      Hi Folks,

      I have some strings which are all a mix of letters and numbers - no particular rules, but no special characters.
      For ordering purposes, I would like to be able to lpad each number string to 5 digits (in reality there will never be more than 3 digits in a row)

      For example:

      XX01 -> XX00001
      XX100 -> XX00100
      XX1YY10ZZ1 ->XX00001YY00010ZZ00001

      Any thoughts on how this can be achieved?

      I am currently trying to get something working with REGEXP_REPLACE but cant quite get it working:

      select pc.pcode,REGEXP_REPLACE(pc.pcode, '([[:digit:]]+)',lpad('\1',5,0))
      from test_tab pc
      order by 2
      /

      The actual/desired results are as follows:

      ORIG DESIRED RESULT ACTUAL RESULT
      === =========== ==========
      XX01 XX00001 XX00001 (correct)
      XX10 XX00010 XX00010 (correct)
      XX100 XX00100 XX000100 (incorrect)
      XX1YY10ZZ1 XX00001YY00010ZZ00001 XX0001YY00010ZZ0001 (incorrect)

      Thanks for any input you may have!

      Matt

      Edited by: Matthew Morgan on Jan 10, 2013 7:59 PM

      Edited by: Matthew Morgan on Jan 10, 2013 7:59 PM

      Edited by: Matthew Morgan on Jan 10, 2013 8:00 PM
        • 1. Re: Padding numbers within a string
          Manik
          I am not an expert and still learning about regular expressions.. But just tried your problem with the following query.. (seems messy but may just solve the purpose) .. I know its not efficient way of writing.. I am sure experts/gurus around here will help you.. But just have a look :)
          WITH str AS
                  (SELECT 'XX1YY10ZZ1' str FROM DUAL
                   UNION ALL
                   SELECT 'XX1YY10ZZ71' str FROM DUAL
                   union all
                   SELECT 'XX1YY100ZZ7' str FROM DUAL),
               t AS
                  (SELECT REGEXP_REPLACE (str, '([[:digit:]]+)', ',\1,') str FROM str),
               t1 AS
                  (    SELECT LEVEL rn,
                              str strid,
                              REGEXP_SUBSTR (str,
                                             '[^,]+',
                                             1,
                                             LEVEL)
                                 dc
                         FROM t
                   CONNECT BY     REGEXP_SUBSTR (str,
                                                 '[^,]+',
                                                 1,
                                                 LEVEL)
                                     IS NOT NULL
                              AND PRIOR str = str
                              AND PRIOR SYS_GUID () IS NOT NULL)
            SELECT replace(strid,',') str,listagg (
                      CASE
                         WHEN SUBSTR (dc, 1, 1) BETWEEN '0' AND '9' THEN LPAD (dc, 5, 0)
                         ELSE dc
                      END,
                      '')
                   WITHIN GROUP (ORDER BY rn)
                      finstr
              FROM t1
          GROUP BY strid;
          output:
          STR                       FINSTR
          ----------------------------------------------------------
          XX1YY10ZZ1     XX00001YY00010ZZ00001
          XX1YY10ZZ71     XX00001YY00010ZZ00071
          XX1YY100ZZ7     XX00001YY00100ZZ00007
          Edited by: simplified a bit

          Edited by: Added Multiple rows
          • 2. Re: Padding numbers within a string
            Frank Kulash
            Hi, Matt

            Here's one way:
            SELECT       str
            ,       REGEXP_REPLACE ( REGEXP_REPLACE ( str
                                               , '(\d+)'
                                       , '0000\1'
                                       )
                               , '\d+(\d{5})'
                               , '\1'
                               )     AS padded_str
            FROM       str
            ORDER BY  padded_str
            ;
            The inner REGEXP_REPLACE adds 4 spaces before each sub-string of digits. This guarantees that all numbers are at least 5 digits.
            The outer REGEXP_REPLACE removes digits from the beginning of each number, so that each number is now exactly 5 digits.
            • 3. Re: Padding numbers within a string
              chris227
              A more cautious version of frank solution could be to replace only the kind of digits added previously
              SELECT       str
              ,       REGEXP_REPLACE ( REGEXP_REPLACE ( str
                                                 , '(\d+)'
                                         , '0000\1'
                                         )
                                 , '0+(\d{5})' --changed \d to 0
                                 , '\1'
                                 )     AS padded_str
              FROM       str
              ORDER BY  padded_str
              • 4. Re: Padding numbers within a string
                Frank Kulash
                Hi, Matt,
                Matthew Morgan wrote:
                ... I am currently trying to get something working with REGEXP_REPLACE but cant quite get it working:

                select pc.pcode,REGEXP_REPLACE(pc.pcode, '([[:digit:]]+)',lpad('\1',5,0))
                from test_tab pc
                order by 2
                /
                In case you're wondering why that didn't work:
                The arguments to REGEXP_REPLACE can be any kind of expressions of the right datatype. In particular, the 3rd argument is supposed to be a string, so it's perfectly okay to use a function that returns a string, such as
                LPAD ('\1', 5, '0')
                in that place. (The 3rd argument to LPAD is supposed to be string. You passed a NUMBER instead, but Oracle implicitly converted that to a string for you, and no harm was done this time.)
                When you pass an expression like that as an argument to REGEXP_REPLACE, the expression is evaluated first, and then it's results are passed back to REGEXP_REPLACE. What does
                LPAD ('\1', 5, '0')
                evaluated to? It's the 5-character string:
                000\1
                '\1' has no special meaning to LPAD; it's just a 2-character string. '\1' has special meaning in REGEXP_REPLACE, but this is not being done in the context of REGEXP_REPLACE, in fact, this has to be done before REGEXP_REPLACE can do its job. So, regardless of what's in pcode, what you wrote is equvalent to
                select    pc.pcode
                ,       REGEXP_REPLACE ( pc.pcode
                                   , '([[:digit:]]+)'
                                   , '000\1'
                                   )
                from      test_tab pc
                order by  2
                which adds exactly 3 0's before every number, regardless of how many digits are in that number. So when you happened to have 2-digit numbers (as in 'XX01') it happened to return the results you wanted.
                chris227 wrote:
                A more cautious version of frank solution could be to replace only the kind of digits added previously
                SELECT       str
                ,       REGEXP_REPLACE ( REGEXP_REPLACE ( str
                                                   , '(\d+)'
                                           , '0000\1'
                                           )
                                   , '0+(\d{5})' --changed \d to 0
                                   , '\1'
                                   )     AS padded_str
                FROM       str
                ORDER BY  padded_str
                Good idea!

                Edited by: Frank Kulash on Jan 11, 2013 10:28 AM