Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

SQL Substring from specific caracter

user7375700Apr 25 2016 — edited Apr 25 2016

Hi everybody,

i have a problem with sql and can't resolve.

how can i get ID_2 from ID_1 by sql ?

  

ID_1ID_2
S-0000125463-ALL0000125463
SS-0000125464A-1-SEGA0000125464A-1
KBC-000025634B-2-BALL000025634B-2

separator character is always "-" but I can also have a "-" in the second part before the last "-".


thanks for your help.

This post has been answered by Paulzip on Apr 25 2016
Jump to Answer

Comments

Frank Kulash Apr 25 2016 — edited on Apr 25 2016

Hi,

user7375700 wrote:

Hi everybody,

i have a problem with sql and can't resolve.

how can i get ID_2 from ID_1 by sql ?

ID_1 ID_2
S-0000125463-ALL 0000125463
SS-0000125464A-1-SEGA 0000125464A-1
KBC-000025634B-2-BALL 000025634B-2

separator character is always "-" but I can also have a "-" in the second part before the last "-".


thanks for your help.

Use INSTR to find the position of the first hyphen, and also of the last hyphen.  Then use SUBSTR to get all the characters in between those 2 points, like this:

WITH    got_pos    AS

(

    SELECT   id_1

    ,        INSTR (id_1, '-')    + 1  AS start_pos

    ,        INSTR (id_1, '-', -1)     AS end_pos

    FROM     table_x

)

SELECT    id_1

,         SUBSTR ( id_1

                 , start_pos

                 , end_pos - start_pos

                 )   AS id_2

FROM      got_pos

;

If you'd care to post CREATE TABLE and INSERT statements for the sample data, then I could rest it.

What if id_1 only has 1 hyphen, or none at all?

You could also use REGEXP_SUBSTR or REGEXP_REPLACE, but it will be less efficient, even if it doesn't  as much code.

Chris Hunt Apr 25 2016

One way to do it:

with my_data (id_1) as

(select 'S-0000125463-ALL' from dual union  all

select 'SS-0000125464A-1-SEGA' from dual union  all

select 'KBC-000025634B-2-BALL' from dual)

-- end of test data --

select id_1,

      substr(id_1a,1,instr(id_1a,'-',-1)-1)         -- Remove the part including and after the last hyphen

from (select id_1,

             substr(id_1,instr(id_1,'-')+1) id_1a   -- Remove the part up to and including the first hyphen

      from   my_data)

mathguy Apr 25 2016 — edited on Apr 25 2016

The OP's sample data already has values with no hyphens, in which case this solution won't work... Not important for the first hyphen, but instr(...) will return 0 for the second hyphen if there is no second hyphen, and that will cause a problem. (No first hyphen is OK because the first instr will return 0 then, which is OK.)

The really interesting question is what happens when there is only one hyphen. (No such value provided in the "sample data" by the OP.) Is the hyphen the "first" hyphen or the "last" hyphen in that case? Or is it that if there are fewer than two hyphens, the output should be NULL?

Cheers,   mathguy-ro

Rafiq D Apr 25 2016 — edited on Apr 25 2016

Hi User7375700,

For the specific data given, this query will work for you.

select id_1, REGEXP_SUBSTR(id_1, '\d(.+)\d') id_2 from test1;

This may not work for all cases but for now it will help in your dataset

Frank Kulash Apr 25 2016 — edited on Apr 25 2016

Hi,

mathguy-ro wrote:

The OP's sample data already has values with no hyphens, in which case this solution won't work... Not important for the first hyphen, but instr(...) will return 0 for the second hyphen if there is no second hyphen, and that will cause a problem. (No first hyphen is OK because the first instr will return 0 then, which is OK.)

Cheers,   mathguy-ro

Depends on what you mean by "won't work".  The query in reply #1 returns NULL when id_1 contains fewer than 2 hyphens, which may be what OP wants.

Whether the sample data has 3 rows (all with at least 2 hyphens) or 5 (some without hyphens) doesn't change that.  The message is ambiguous, and illustrates why it's so important to post CREATE TABLE and INSERT statements for the sample data, or a WITH clause, like Chris did.

mathguy Apr 25 2016 — edited on Apr 25 2016

You are right; when I first wrote that, I had something in mind like ABD-2201002 and thought it should return the second part. Later I realized the OP did not say that (he didn't cover the case of a single hyphen), I re-read your message and I saw you asked specifically about that case, I realized I needed to clarify my reply to you, I added the bottom part, but neglected to "retract" the first part. Waiting for the OP to clarify... 

Cheers,   mathguy-ro

Rafiq D Apr 25 2016

here is a more general solution that I think will also work fine:

CREATE TABLE test1(

      id_1  VARCHAR2(60)

    , id_2  VARCHAR2(60)

);

INSERT INTO test1(id_1)

VALUES('S-0000125463-ALL');

INSERT INTO test1(id_1)

VALUES('SS-0000125464A-1-SEGA');

INSERT INTO test1(id_1)

VALUES('KBC-000025634B-2-BALL');

select id_1, case

                when regexp_like(id_1,'\-\d{1}\-') then regexp_substr(id_1,'(.*\-)(.*\-\d+)(\-\w*$)',1,1,'i',2)

                when regexp_like(id_1,'(.*\-)(\d+)(\-\w*$)') then regexp_substr(id_1,'(.*\-)(\d+)(\-\w*$)',1,1,'i',2)

            end

    as id_2

from test1;

now you can do an update like so

update test1 ot set id_2 = (

select case

                when regexp_like(id_1,'\-\d{1}\-') then regexp_substr(id_1,'(.*\-)(.*\-\d+)(\-\w*$)',1,1,'i',2)

                when regexp_like(id_1,'(.*\-)(\d+)(\-\w*$)') then regexp_substr(id_1,'(.*\-)(\d+)(\-\w*$)',1,1,'i',2)

            end

from test1 it where ot.id_1 = it.id_1);

SELECT * FROM test1;

Vysakh Suresh - 3035408 Apr 25 2016 — edited on Apr 25 2016

Dear Frank,

Then a case statement will do the job for us ritee..?

WITH got_pos AS  (

   SELECT   id_1, INSTR (id_1, '-')+1  AS start_pos,

            INSTR (id_1, '-', -1)     AS end_pos

            FROM temp)

SELECT id_1,

               CASE WHEN REGEXP_COUNT(id_1,'-')=1 THEN SUBSTR ( id_1, start_pos)

                          ELSE SUBSTR ( id_1, start_pos, end_pos - start_pos)

               END AS id_2

FROM got_pos;

Paulzip Apr 25 2016 — edited on Apr 25 2016
Answer

Here's my attempt.  Unlike most of the solutions, I handle single hyphen and no hyphen situations...

with my_data(ID_1) as (

  select 'S-0000125463-ALL' from dual union all

  select 'SS-0000125464A-1-SEGA' from dual union all

  select 'KBC-000025634B-2-BALL' from dual union all

  select 'SSX-0000125464B' from dual union all

  select '0000125464B-FFG' from dual union all

  select '0123456789' from dual union all

  select '1' from dual union all

  select null from dual

)

select ID_1, substr(id_1, first_hyph + 1, case when last_hyph <= first_hyph then length(id_1) else last_hyph - 1 end - first_hyph) ID_2

from (select id_1, instr(id_1,'-') first_hyph, instr(id_1,'-', -1) last_hyph

      from my_data) p

ID_1ID_2
S-0000125463-ALL0000125463
SS-0000125464A-1-SEGA0000125464A-1
KBC-000025634B-2-BALL000025634B-2
SSX-0000125464B0000125464B
0000125464B-FFGFFG
01234567890123456789
11
--

8 rows selected.

Marked as Answer by user7375700 · Sep 27 2020
Frank Kulash Apr 25 2016 — edited on Apr 25 2016

Hi,

Vysakh Suresh - 3035408 wrote:

Dear Frank,

Then a case statement will do the job for us ritee..?

got_pos AS  (

   SELECT   id_1, INSTR (id_1, '-')+1  AS start_pos,

            INSTR (id_1, '-', -1)     AS end_pos

            FROM temp)

SELECT id_1,

               CASE WHEN REGEXP_COUNT(id_1,'-')=1 THEN SUBSTR ( id_1, start_pos)

                          ELSE SUBSTR ( id_1, start_pos, end_pos - start_pos)

               END AS id_2

FROM got_pos;

What exactly are you trying to do?  OP never said, and you didn't either.  When I don't know where you're trying to go, then I can't give you very good directions.

Are you missing something (perhaps the keyword WITH) at the very beginning of that query?

Don't use regular expressions unless you really need to.   If you need to distinguish letters from digits (as in reply #4), then regular expressions might be a good idea.  If you're just looking for hyphens, then there are faster ways, perhaps just as easy.  For example, if you want to see if there are fewer than 2 hyphens then

start_pos <= end_pos

is TRUE if (and only if) id_1 contains 2 or more.

End pos will be 0 if there are no hyphens, and end_pos will be greater than 0 (and equal to start_pos - 1) when there is exactly 1 hyphen.

So, if you want

  • everything between the first and last hyphens (when there are 2 or more), or
  • everything after the hyphen (when there is exactly 1), or
  • the string itself (when it doesn't contain a hyphen)

then you could use a CASE expression (which is different than a CASE statement) like this:

WITH    got_pos     AS

(

    SELECT   id_1

    ,         INSTR (id_1, '-')     +1  AS start_pos

    ,         INSTR (id_1, '-', -1)     AS end_pos

    FROM      table_x

)

SELECT  id_1

,       CASE

            WHEN  start_pos  > end_pos

            THEN  SUBSTR (id_1, start_pos)

            ELSE  SUBSTR (id_1, start_pos, end_pos - start_pos)

        END     AS id_2

FROM    got_pos;

Frank Kulash Apr 25 2016 — edited on Apr 25 2016

Hi,

Paulzip wrote:

Here's my attempt.  Unlike most of the solutions, I handle single hyphen and no hyphen situations...

...

What do you mean?  Don't all of the solutions posted so far handle those situations? 

Paulzip Apr 25 2016

Frank Kulash wrote:

What do you mean?  Don't all of the solutions posted so far handle those situations?

If you think OP would rather have nulls for both those situations (I don't), then perhaps.

Rafiq D Apr 25 2016

The OP said

separator character is always "-" but I can also have a "-" in the second part before the last "-". 

which means the string is always in three parts separated by '-' except that the second part can also include a hyphen in which case the text before the hyphen is alpha and that following the hyphen, numeric. So our solutions were specific to his data structure. To accurately extract a pattern from text, the text must be in specific structure that is recognizable by regexp or any other text recognizing fns. Expanding the scope of the solution is good but it may not be the solution. For example, 223-aajh-009987u-9-88uujh-00009878 is obviously out of scope in this situation.

Paulzip Apr 25 2016

Rafiq D wrote:

The OP said

separator character is always "-" but I can also have a "-" in the second part before the last "-".

which means the string is always in three parts separated by '-' except that the second part can also include a hyphen in which case the text before the hyphen is alpha and that following the hyphen, numeric. So our solutions were specific to his data structure. To accurately extract a pattern from text, the text must be in specific structure that is recognizable by regexp or any other text recognizing fns. Expanding the scope of the solution is good but it may not be the solution. For example, 223-aajh-009987u-9-88uujh-00009878 is obviously out of scope in this situation.

No it doesn't mean there is always three parts, you cannot say that for certain without further clarification from the OP! There is ambiguity.  You could be correct, but you might not be.

If I said to you, my CSV files use a comma separator, does that mean records with single fields should be ignored? NO!

I could read his statement as :

If there is something to separate, then the separator character is always "-", but in those cases I can also have a "-" in the second part before the last "-"

Rafiq D Apr 25 2016 — edited on Apr 25 2016

Hi Paul,

There is no ambiguity in the OP's statement. It's clear and his example data even make clearer. There's always three parts. user...can you mark one of the answers as correct to close this thread? At least some of the solutions work.

Paulzip Apr 25 2016 — edited on Apr 25 2016

He has given you 3 records as an example and so you are making all of your assumptions on his data based on that?!!  I don't know where you work, but in my job that could prove disastrous.  He could have 15 million records in his dataset with combinations you or he hasn't thought of.  Like I said, you COULD be correct, but you MIGHT NOT be.

If I had a penny for every time I've seen an OP have to elaborate further on this forum, because answers only fit the simple example he's given and not his actual problem, I'd buy a bottle of champagne to celebrate.

Rafiq D Apr 25 2016

Hi Paul, no! I am not making any assumptions at all...and no, he didn't only give examples, he made a clear statement regarding the structure of the data. Yes, there may be data in some other structure and I am not disputing that but our answers are specific to his requirements as stated and exemplified. For that, we cannot be wrong. Read the statement together with the example and tell us your interpretation.

Paulzip Apr 25 2016 — edited on Apr 25 2016

Hi Rafiq, regardless of your opinion, my solution is both safer and much faster than yours.  Why safer?  Because if OP did something like updated data based on solutions, and you are correct - my solution STILL gives the correct answer.  If I'm correct, data doesn't get nulled and information isn't lost.  Nulls can be intractable.... I've fixed mangled data where there is data left after something like that, nulls can be a different kettle of fish and a nightmare. 


Seen it, had it, done it, bought the T-Shirt.

mathguy Apr 25 2016 — edited on Apr 25 2016

@Frank   -   lol, only now after the back-and-forth between other posters did I realize what you meant by "whether the sample data has 3 rows or 5" - I didn't get it even after you explained it. We definitely need "create table" statements... 

1 - 19
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on May 23 2016
Added on Apr 25 2016
19 comments
4,861 views