This discussion is archived
4 Replies Latest reply: Jan 11, 2013 7:29 AM by Frank Kulash RSS

Padding numbers within a string

633014 Newbie
Currently Being Moderated
Hi Folks,

I have some strings which are all a mix of letters and numbers - no particular rules, but no special characters.
For ordering purposes, I would like to be able to lpad each number string to 5 digits (in reality there will never be more than 3 digits in a row)

For example:

XX01 -> XX00001
XX100 -> XX00100
XX1YY10ZZ1 ->XX00001YY00010ZZ00001

Any thoughts on how this can be achieved?

I am currently trying to get something working with REGEXP_REPLACE but cant quite get it working:

select pc.pcode,REGEXP_REPLACE(pc.pcode, '([[:digit:]]+)',lpad('\1',5,0))
from test_tab pc
order by 2
/

The actual/desired results are as follows:

ORIG DESIRED RESULT ACTUAL RESULT
=== =========== ==========
XX01 XX00001 XX00001 (correct)
XX10 XX00010 XX00010 (correct)
XX100 XX00100 XX000100 (incorrect)
XX1YY10ZZ1 XX00001YY00010ZZ00001 XX0001YY00010ZZ0001 (incorrect)

Thanks for any input you may have!

Matt

Edited by: Matthew Morgan on Jan 10, 2013 7:59 PM

Edited by: Matthew Morgan on Jan 10, 2013 7:59 PM

Edited by: Matthew Morgan on Jan 10, 2013 8:00 PM
  • 1. Re: Padding numbers within a string
    Manik Expert
    Currently Being Moderated
    I am not an expert and still learning about regular expressions.. But just tried your problem with the following query.. (seems messy but may just solve the purpose) .. I know its not efficient way of writing.. I am sure experts/gurus around here will help you.. But just have a look :)
    WITH str AS
            (SELECT 'XX1YY10ZZ1' str FROM DUAL
             UNION ALL
             SELECT 'XX1YY10ZZ71' str FROM DUAL
             union all
             SELECT 'XX1YY100ZZ7' str FROM DUAL),
         t AS
            (SELECT REGEXP_REPLACE (str, '([[:digit:]]+)', ',\1,') str FROM str),
         t1 AS
            (    SELECT LEVEL rn,
                        str strid,
                        REGEXP_SUBSTR (str,
                                       '[^,]+',
                                       1,
                                       LEVEL)
                           dc
                   FROM t
             CONNECT BY     REGEXP_SUBSTR (str,
                                           '[^,]+',
                                           1,
                                           LEVEL)
                               IS NOT NULL
                        AND PRIOR str = str
                        AND PRIOR SYS_GUID () IS NOT NULL)
      SELECT replace(strid,',') str,listagg (
                CASE
                   WHEN SUBSTR (dc, 1, 1) BETWEEN '0' AND '9' THEN LPAD (dc, 5, 0)
                   ELSE dc
                END,
                '')
             WITHIN GROUP (ORDER BY rn)
                finstr
        FROM t1
    GROUP BY strid;
    output:
    STR                       FINSTR
    ----------------------------------------------------------
    XX1YY10ZZ1     XX00001YY00010ZZ00001
    XX1YY10ZZ71     XX00001YY00010ZZ00071
    XX1YY100ZZ7     XX00001YY00100ZZ00007
    Edited by: simplified a bit

    Edited by: Added Multiple rows
  • 2. Re: Padding numbers within a string
    Frank Kulash Guru
    Currently Being Moderated
    Hi, Matt

    Here's one way:
    SELECT       str
    ,       REGEXP_REPLACE ( REGEXP_REPLACE ( str
                                       , '(\d+)'
                               , '0000\1'
                               )
                       , '\d+(\d{5})'
                       , '\1'
                       )     AS padded_str
    FROM       str
    ORDER BY  padded_str
    ;
    The inner REGEXP_REPLACE adds 4 spaces before each sub-string of digits. This guarantees that all numbers are at least 5 digits.
    The outer REGEXP_REPLACE removes digits from the beginning of each number, so that each number is now exactly 5 digits.
  • 3. Re: Padding numbers within a string
    chris227 Guru
    Currently Being Moderated
    A more cautious version of frank solution could be to replace only the kind of digits added previously
    SELECT       str
    ,       REGEXP_REPLACE ( REGEXP_REPLACE ( str
                                       , '(\d+)'
                               , '0000\1'
                               )
                       , '0+(\d{5})' --changed \d to 0
                       , '\1'
                       )     AS padded_str
    FROM       str
    ORDER BY  padded_str
  • 4. Re: Padding numbers within a string
    Frank Kulash Guru
    Currently Being Moderated
    Hi, Matt,
    Matthew Morgan wrote:
    ... I am currently trying to get something working with REGEXP_REPLACE but cant quite get it working:

    select pc.pcode,REGEXP_REPLACE(pc.pcode, '([[:digit:]]+)',lpad('\1',5,0))
    from test_tab pc
    order by 2
    /
    In case you're wondering why that didn't work:
    The arguments to REGEXP_REPLACE can be any kind of expressions of the right datatype. In particular, the 3rd argument is supposed to be a string, so it's perfectly okay to use a function that returns a string, such as
    LPAD ('\1', 5, '0')
    in that place. (The 3rd argument to LPAD is supposed to be string. You passed a NUMBER instead, but Oracle implicitly converted that to a string for you, and no harm was done this time.)
    When you pass an expression like that as an argument to REGEXP_REPLACE, the expression is evaluated first, and then it's results are passed back to REGEXP_REPLACE. What does
    LPAD ('\1', 5, '0')
    evaluated to? It's the 5-character string:
    000\1
    '\1' has no special meaning to LPAD; it's just a 2-character string. '\1' has special meaning in REGEXP_REPLACE, but this is not being done in the context of REGEXP_REPLACE, in fact, this has to be done before REGEXP_REPLACE can do its job. So, regardless of what's in pcode, what you wrote is equvalent to
    select    pc.pcode
    ,       REGEXP_REPLACE ( pc.pcode
                       , '([[:digit:]]+)'
                       , '000\1'
                       )
    from      test_tab pc
    order by  2
    which adds exactly 3 0's before every number, regardless of how many digits are in that number. So when you happened to have 2-digit numbers (as in 'XX01') it happened to return the results you wanted.
    chris227 wrote:
    A more cautious version of frank solution could be to replace only the kind of digits added previously
    SELECT       str
    ,       REGEXP_REPLACE ( REGEXP_REPLACE ( str
                                       , '(\d+)'
                               , '0000\1'
                               )
                       , '0+(\d{5})' --changed \d to 0
                       , '\1'
                       )     AS padded_str
    FROM       str
    ORDER BY  padded_str
    Good idea!

    Edited by: Frank Kulash on Jan 11, 2013 10:28 AM

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points