SQL Substring from specific caracter

Frank Kulash

Hi,

user7375700 wrote:

Hi everybody,

i have a problem with sql and can't resolve.

how can i get ID_2 from ID_1 by sql ?

ID_1	ID_2
S-0000125463-ALL	0000125463
SS-0000125464A-1-SEGA	0000125464A-1
KBC-000025634B-2-BALL	000025634B-2

separator character is always "-" but I can also have a "-" in the second part before the last "-".

thanks for your help.

Use INSTR to find the position of the first hyphen, and also of the last hyphen. Then use SUBSTR to get all the characters in between those 2 points, like this:

WITH got_pos AS

(

SELECT id_1

, INSTR (id_1, '-') + 1 AS start_pos

, INSTR (id_1, '-', -1) AS end_pos

FROM table_x

)

SELECT id_1

, SUBSTR ( id_1

, start_pos

, end_pos - start_pos

) AS id_2

FROM got_pos

;

If you'd care to post CREATE TABLE and INSERT statements for the sample data, then I could rest it.

What if id_1 only has 1 hyphen, or none at all?

You could also use REGEXP_SUBSTR or REGEXP_REPLACE, but it will be less efficient, even if it doesn't as much code.

Chris Hunt

One way to do it:

with my_data (id_1) as
(select 'S-0000125463-ALL' from dual union  all
select 'SS-0000125464A-1-SEGA' from dual union  all
select 'KBC-000025634B-2-BALL' from dual)
-- end of test data --
select id_1,
      substr(id_1a,1,instr(id_1a,'-',-1)-1)         -- Remove the part including and after the last hyphen
from (select id_1,
             substr(id_1,instr(id_1,'-')+1) id_1a   -- Remove the part up to and including the first hyphen
      from   my_data)

mathguy

The OP's sample data already has values with no hyphens, in which case this solution won't work... Not important for the first hyphen, but instr(...) will return 0 for the second hyphen if there is no second hyphen, and that will cause a problem. (No first hyphen is OK because the first instr will return 0 then, which is OK.)

The really interesting question is what happens when there is only one hyphen. (No such value provided in the "sample data" by the OP.) Is the hyphen the "first" hyphen or the "last" hyphen in that case? Or is it that if there are fewer than two hyphens, the output should be NULL?

Cheers, mathguy-ro

Rafiq D

Hi User7375700,

For the specific data given, this query will work for you.

select id_1, REGEXP_SUBSTR(id_1, '\d(.+)\d') id_2 from test1;

This may not work for all cases but for now it will help in your dataset

Frank Kulash

Hi,

mathguy-ro wrote:

The OP's sample data already has values with no hyphens, in which case this solution won't work... Not important for the first hyphen, but instr(...) will return 0 for the second hyphen if there is no second hyphen, and that will cause a problem. (No first hyphen is OK because the first instr will return 0 then, which is OK.)

Cheers,   mathguy-ro

Depends on what you mean by "won't work". The query in reply #1 returns NULL when id_1 contains fewer than 2 hyphens, which may be what OP wants.

Whether the sample data has 3 rows (all with at least 2 hyphens) or 5 (some without hyphens) doesn't change that. The message is ambiguous, and illustrates why it's so important to post CREATE TABLE and INSERT statements for the sample data, or a WITH clause, like Chris did.

mathguy

You are right; when I first wrote that, I had something in mind like ABD-2201002 and thought it should return the second part. Later I realized the OP did not say that (he didn't cover the case of a single hyphen), I re-read your message and I saw you asked specifically about that case, I realized I needed to clarify my reply to you, I added the bottom part, but neglected to "retract" the first part. Waiting for the OP to clarify...

Cheers, mathguy-ro

Rafiq D

here is a more general solution that I think will also work fine:

CREATE TABLE test1(
      id_1  VARCHAR2(60)
    , id_2  VARCHAR2(60)
);

INSERT INTO test1(id_1)
VALUES('S-0000125463-ALL');

INSERT INTO test1(id_1)
VALUES('SS-0000125464A-1-SEGA');

INSERT INTO test1(id_1)
VALUES('KBC-000025634B-2-BALL');

select id_1, case
                when regexp_like(id_1,'\-\d{1}\-') then regexp_substr(id_1,'(.*\-)(.*\-\d+)(\-\w*$)',1,1,'i',2)
                when regexp_like(id_1,'(.*\-)(\d+)(\-\w*$)') then regexp_substr(id_1,'(.*\-)(\d+)(\-\w*$)',1,1,'i',2)
            end
    as id_2
from test1;

now you can do an update like so

update test1 ot set id_2 = (
select case
                when regexp_like(id_1,'\-\d{1}\-') then regexp_substr(id_1,'(.*\-)(.*\-\d+)(\-\w*$)',1,1,'i',2)
                when regexp_like(id_1,'(.*\-)(\d+)(\-\w*$)') then regexp_substr(id_1,'(.*\-)(\d+)(\-\w*$)',1,1,'i',2)
            end
from test1 it where ot.id_1 = it.id_1);

SELECT * FROM test1;

Vysakh Suresh - 3035408

Dear Frank,

Then a case statement will do the job for us ritee..?

WITH got_pos AS (

SELECT id_1, INSTR (id_1, '-')+1 AS start_pos,

INSTR (id_1, '-', -1) AS end_pos

FROM temp)

SELECT id_1,

CASE WHEN REGEXP_COUNT(id_1,'-')=1 THEN SUBSTR ( id_1, start_pos)

ELSE SUBSTR ( id_1, start_pos, end_pos - start_pos)

END AS id_2

FROM got_pos;

Paulzip

Answer

Here's my attempt. Unlike most of the solutions, I handle single hyphen and no hyphen situations...

with my_data(ID_1) as (

select 'S-0000125463-ALL' from dual union all

select 'SS-0000125464A-1-SEGA' from dual union all

select 'KBC-000025634B-2-BALL' from dual union all

select 'SSX-0000125464B' from dual union all

select '0000125464B-FFG' from dual union all

select '0123456789' from dual union all

select '1' from dual union all

select null from dual

)

select ID_1, substr(id_1, first_hyph + 1, case when last_hyph <= first_hyph then length(id_1) else last_hyph - 1 end - first_hyph) ID_2

from (select id_1, instr(id_1,'-') first_hyph, instr(id_1,'-', -1) last_hyph

from my_data) p

ID_1	ID_2
S-0000125463-ALL	0000125463
SS-0000125464A-1-SEGA	0000125464A-1
KBC-000025634B-2-BALL	000025634B-2
SSX-0000125464B	0000125464B
0000125464B-FFG	FFG
0123456789	0123456789
1	1
-	-

8 rows selected.

Marked as Answer by user7375700 · Sep 27 2020

Frank Kulash

Hi,

Vysakh Suresh - 3035408 wrote:

Dear Frank,

Then a case statement will do the job for us ritee..?

got_pos AS  (
   SELECT   id_1, INSTR (id_1, '-')+1  AS start_pos,
            INSTR (id_1, '-', -1)     AS end_pos
            FROM temp)
SELECT id_1,
               CASE WHEN REGEXP_COUNT(id_1,'-')=1 THEN SUBSTR ( id_1, start_pos)
                          ELSE SUBSTR ( id_1, start_pos, end_pos - start_pos)
               END AS id_2
FROM got_pos;

What exactly are you trying to do? OP never said, and you didn't either. When I don't know where you're trying to go, then I can't give you very good directions.

Are you missing something (perhaps the keyword WITH) at the very beginning of that query?

Don't use regular expressions unless you really need to. If you need to distinguish letters from digits (as in reply #4), then regular expressions might be a good idea. If you're just looking for hyphens, then there are faster ways, perhaps just as easy. For example, if you want to see if there are fewer than 2 hyphens then

start_pos <= end_pos

is TRUE if (and only if) id_1 contains 2 or more.

End pos will be 0 if there are no hyphens, and end_pos will be greater than 0 (and equal to start_pos - 1) when there is exactly 1 hyphen.

So, if you want

everything between the first and last hyphens (when there are 2 or more), or
everything after the hyphen (when there is exactly 1), or
the string itself (when it doesn't contain a hyphen)

then you could use a CASE expression (which is different than a CASE statement) like this:

WITH got_pos AS

(

SELECT id_1

, INSTR (id_1, '-') +1 AS start_pos

, INSTR (id_1, '-', -1) AS end_pos

FROM table_x

)

SELECT id_1

, CASE

WHEN start_pos > end_pos

THEN SUBSTR (id_1, start_pos)

ELSE SUBSTR (id_1, start_pos, end_pos - start_pos)

END AS id_2

FROM got_pos;

1 person found this helpful

Frank Kulash

Hi,

Paulzip wrote:

Here's my attempt. Unlike most of the solutions, I handle single hyphen and no hyphen situations...

...

What do you mean? Don't all of the solutions posted so far handle those situations?

Paulzip

Frank Kulash wrote:

What do you mean? Don't all of the solutions posted so far handle those situations?

If you think OP would rather have nulls for both those situations (I don't), then perhaps.

Rafiq D

The OP said

separator character is always "-" but I can also have a "-" in the second part before the last "-".

which means the string is always in three parts separated by '-' except that the second part can also include a hyphen in which case the text before the hyphen is alpha and that following the hyphen, numeric. So our solutions were specific to his data structure. To accurately extract a pattern from text, the text must be in specific structure that is recognizable by regexp or any other text recognizing fns. Expanding the scope of the solution is good but it may not be the solution. For example, 223-aajh-009987u-9-88uujh-00009878 is obviously out of scope in this situation.

Paulzip

Rafiq D wrote:

The OP said

separator character is always "-" but I can also have a "-" in the second part before the last "-".

which means the string is always in three parts separated by '-' except that the second part can also include a hyphen in which case the text before the hyphen is alpha and that following the hyphen, numeric. So our solutions were specific to his data structure. To accurately extract a pattern from text, the text must be in specific structure that is recognizable by regexp or any other text recognizing fns. Expanding the scope of the solution is good but it may not be the solution. For example, 223-aajh-009987u-9-88uujh-00009878 is obviously out of scope in this situation.

No it doesn't mean there is always three parts, you cannot say that for certain without further clarification from the OP! There is ambiguity. You could be correct, but you might not be.

If I said to you, my CSV files use a comma separator, does that mean records with single fields should be ignored? NO!

I could read his statement as :

If there is something to separate, then the separator character is always "-", but in those cases I can also have a "-" in the second part before the last "-"

Rafiq D

Hi Paul,

There is no ambiguity in the OP's statement. It's clear and his example data even make clearer. There's always three parts. user...can you mark one of the answers as correct to close this thread? At least some of the solutions work.

Paulzip

He has given you 3 records as an example and so you are making all of your assumptions on his data based on that?!! I don't know where you work, but in my job that could prove disastrous. He could have 15 million records in his dataset with combinations you or he hasn't thought of. Like I said, you COULD be correct, but you MIGHT NOT be.

If I had a penny for every time I've seen an OP have to elaborate further on this forum, because answers only fit the simple example he's given and not his actual problem, I'd buy a bottle of champagne to celebrate.

Rafiq D

Hi Paul, no! I am not making any assumptions at all...and no, he didn't only give examples, he made a clear statement regarding the structure of the data. Yes, there may be data in some other structure and I am not disputing that but our answers are specific to his requirements as stated and exemplified. For that, we cannot be wrong. Read the statement together with the example and tell us your interpretation.

Paulzip

Hi Rafiq, regardless of your opinion, my solution is both safer and much faster than yours. Why safer? Because if OP did something like updated data based on solutions, and you are correct - my solution STILL gives the correct answer. If I'm correct, data doesn't get nulled and information isn't lost. Nulls can be intractable.... I've fixed mangled data where there is data left after something like that, nulls can be a different kettle of fish and a nightmare.

Seen it, had it, done it, bought the T-Shirt.

1 person found this helpful

mathguy

@Frank - lol, only now after the back-and-forth between other posters did I realize what you meant by "whether the sample data has 3 rows or 5" - I didn't get it even after you explained it. We definitely need "create table" statements...

SQL & PL/SQL

SQL Substring from specific caracter

Comments

Post Details