This discussion is archived
1 2 Previous Next 16 Replies Latest reply: May 16, 2013 5:24 PM by davidp 2 RSS

Less tedious code for creating individual variables from strings?

652774 Newbie
Currently Being Moderated
Hi, we are trying to convert code stored as a long string of 0's and 1's in one variable into separate variables. Currently we have:

var_string
0000
0100
null
1111

create table newtable as select
case when var_string is null then '0' else substr(var_string,1,1) end as var1,
case when var_string null then '0' else substr(var_string,2,1) end as var2,
case when var_string is null then '0' else substr(var_string,3,1) end as var3,
case when var_string is null then '0' else substr(var_string,4,1) end as var4
from oldtable;

Could someone think of a less tedious way to do this? We have several hundred of these and it is tedious to do it this way.

-Mary
  • 1. Re: Less tedious code for creating individual variables from strings?
    rp0428 Guru
    Currently Being Moderated
    >
    Hi, we are trying to convert code stored as a long string of 0's and 1's in one variable into separate variables. Currently we have:

    var_string
    0000
    0100
    null
    1111

    create table newtable as select
    case when var_string is null then '0' else substr(var_string,1,1) end as var1,
    case when var_string null then '0' else substr(var_string,2,1) end as var2,
    case when var_string is null then '0' else substr(var_string,3,1) end as var3,
    case when var_string is null then '0' else substr(var_string,4,1) end as var4
    from oldtable;

    Could someone think of a less tedious way to do this? We have several hundred of these and it is tedious to do it this way.
    >
    No.

    You need to pick each character off separately.

    The table should be created manually using standard DDL. Then you can just insert the data.

    Can you provide more information about the actual problem you are dealing with?

    How did the data wind up that way to begin with? is it really binary data where each digit is ONLY a one or a zero?

    Why would you want to store each digit in a separate column? How is the data going to be used AFTER it is stored in the table?
  • 2. Re: Less tedious code for creating individual variables from strings?
    Frank Kulash Guru
    Currently Being Moderated
    Hi, Mary,
    ml23 wrote:
    Hi, we are trying to convert code stored as a long string of 0's and 1's in one variable into separate variables. Currently we have:

    var_string
    0000
    0100
    null
    1111

    create table newtable as select
    case when var_string is null then '0' else substr(var_string,1,1) end as var1,
    case when var_string null then '0' else substr(var_string,2,1) end as var2,
    case when var_string is null then '0' else substr(var_string,3,1) end as var3,
    case when var_string is null then '0' else substr(var_string,4,1) end as var4
    from oldtable;

    Could someone think of a less tedious way to do this? We have several hundred of these and it is tedious to do it this way.
    Do you mean you have hundreds of columns in newtable?
    If so, I don't see any way around having hundreds of expressions.
    They can be a little shorter if you use NVL instead of CASE, like this:
     CREATE TABLE     newtable
    AS
    SELECT     NVL (SUBSTR (var_string, 1, 1), '0')     AS var1
    ,     NVL (SUBSTR (var_string, 2, 1), '0')     AS var2
    ,     NVL (SUBSTR (var_string, 3, 1), '0')     AS var3
    ,     NVL (SUBSTR (var_string, 4, 1), '0')     AS var4
    FROM     oldtable;
    This assumes that, if var_string is not NULL, but is shorter than expected, you want a '0' in the missing column.

    You could also do the functions (NVL, SUBSTR) just once, by unpivoting the data and then pivoting it back. Exactly how to do that depends on your Oracle version.


    Post CREATE TABLE and INSERT statements for a little sample data, and the results you want from that data (that is, the contents of newtable when everything is finsihed). If you really have 400 columns (and strings up to 400 characters long), just say so, but continue to post sample data that only has 4 columns. Include some 1-, 2- or 3-character var_strings in your sample data, if that's a situation that can occur in your real table.
    Post sample data and results like this whenever you have a question, and always say which version of Oracle you're using (e.g. 11.2.0.2.0).
    See the forum FAQ {message:id=9360002}
  • 3. Re: Less tedious code for creating individual variables from strings?
    652774 Newbie
    Currently Being Moderated
    It came from a SAS dataset; but they can use arrays in SAS and more easily convert this into variables. I don't see that there are arrays, or lists in Oracle.

    It is for a healthcare analysis, each 0/1 is a flag indicating a condition that the patient has; so we could convert it into variable names like each patient history of disease, i.e, had breast cancer history, had heart disease history, and so on, so I hope to eventually get into more descriptive variable names, but it is the having to do the case... on each variable that is tedious. We will use it to cluster different kinds of patients together. Will give the nvl a try.
  • 4. Re: Less tedious code for creating individual variables from strings?
    652774 Newbie
    Currently Being Moderated
    Here's a test table:

    create table test
    (bitstring varchar2(10));
    commit;

    insert into test(bitstring) values ('0000');
    insert into test(bitstring) values ('1000');
    insert into test(bitstring) values ('0111');
    insert into test(bitstring) values (null);
    commit;
  • 5. Re: Less tedious code for creating individual variables from strings?
    rp0428 Guru
    Currently Being Moderated
    >
    It came from a SAS dataset; but they can use arrays in SAS and more easily convert this into variables. I don't see that there are arrays, or lists in Oracle.

    It is for a healthcare analysis, each 0/1 is a flag indicating a condition that the patient has; so we could convert it into variable names like each patient history of disease, i.e, had breast cancer history, had heart disease history, and so on, so I hope to eventually get into more descriptive variable names, but it is the having to do the case... on each variable that is tedious. We will use it to cluster different kinds of patients together
    >
    But you most important question is the one I ask that you didn't answer:
    >
    How is the data going to be used AFTER it is stored in the table?
    >
    The architecture should be driven by how the data is going to be used and how often it will be changed or updated.

    A simple vertical table of attributes can support what you suggest:
    CREATE TABLE all_conditions (
      condition_id number,
      description varchar2(50)
    );
    
    CREATE TABLE patient_conditions (
      patient_id number,
      condition_id number
    )
    For each of your conditions (var1, var2, etc) you would create a separate row in the PATIENT_CONDITIONS table for that patient. That PIVOTS/UNPIVOTS your long VARCHAR2 containing '0010000100. . .' into a single row for each position that contains a '1'.

    Then you can query the 'patient_conditions' table to find all patients that have a particular condition or combination of conditions.

    First define how the data will be used and base the architecture on that.

    Post an example query showing how the data will actually be used once you have it stored.
  • 6. Re: Less tedious code for creating individual variables from strings?
    652774 Newbie
    Currently Being Moderated
    I did reply to the after; we are using it to cluster patients together; i.e group them into categories.

    The nvl does look promising

    create table test
    (bitstring varchar2(10));
    commit;

    insert into test(bitstring) values ('0000');
    insert into test(bitstring) values ('1000');
    insert into test(bitstring) values ('0111');
    insert into test(bitstring) values (null);
    commit;

    CREATE TABLE     test2
    AS
    SELECT     NVL (SUBSTR (bitstring, 1, 1), '0')     AS var1
    ,     NVL (SUBSTR (bitstring, 2, 1), '0')     AS var2
    ,     NVL (SUBSTR (bitstring, 3, 1), '0')     AS var3
    ,     NVL (SUBSTR (bitstring, 4, 1), '0')     AS var4
    FROM     test;
  • 7. Re: Less tedious code for creating individual variables from strings?
    652774 Newbie
    Currently Being Moderated
    Yes, that makes sense to flip it long; I think we will need it wide for analysis, such as cluster analysis in Oracle ODM or Oracle R Enterprise would need variables wide.
  • 8. Re: Less tedious code for creating individual variables from strings?
    rp0428 Guru
    Currently Being Moderated
    >
    Yes, that makes sense to flip it long; I think we will need it wide for analysis, such as cluster analysis in Oracle ODM or Oracle R Enterprise would need variables wide.
    >
    Ok - 'wide versus long' is part of it. But you STILL need to know HOW the data will actually be used to know what the implementation options might be.

    Because with WIDE, using your approach each column has to have a name that is meaningful or the data can't be accessed at all. But the same data (same meaning your long VARCHAR2) could be stored in one column of type RAW or BLOB. Then instead of doing your SUBSTR when you put the data IN the table you do it when you query the data OUT of the table.

    For a extreme example of the power of using actual bitmapped data see my replies in these two threads.

    This thread shows an example that actually uses just ONE column of type NUMBER and can handle up to about 125 different attributes (you said you had hundreds).
    Re: Can Oracle convert literals to bind variables?

    This thread shows a 'star schema' example of using an individual column for each attribute (like you proposed) and shows how BITMAP indexes on those attributes can be used effectively to query the data.
    Re: How to design a fact table to keep track of active dimensions?
    create table star_fact (
        fact_key varchar2(30) DEFAULT 'N/A' not null,
        age      varchar2(30) DEFAULT 'N/A' not null,
        beer    varchar2(30) DEFAULT 'N/A' not null,
        marital_status varchar2(30) DEFAULT 'N/A' not null,
        softdrink varchar2(30) DEFAULT 'N/A' not null,
        state    varchar2(30) DEFAULT 'N/A' not null,
        summer_sport varchar2(30) DEFAULT 'N/A' not null,
        constraint star_fact_pk PRIMARY KEY (fact_key)
    );
    
    SQL> select rowid from star_fact where
      2   (state = 'CA') or (state = 'CO')
      3  and (age = 'young') and (marital_status = 'divorced')
      4  and (((summer_sport = 'baseball') and (softdrink = 'pepsi'))
      5  or ((summer_sport = 'golf') and (beer = 'coors')));
    Edited by: rp0428 on May 15, 2013 3:48 PM to fix second link.
  • 9. Re: Less tedious code for creating individual variables from strings?
    652774 Newbie
    Currently Being Moderated
    I can't get to the second thread. On the first thread, yes, really it would be nice to refer to the element in the string like this:

    Element_Attr_Id_Table_In(i)

    But I would want to do that in a SQL script, not in PL/SQL, and don't know how to do that without the substr function. Like in Python, one can have one dictionary value that is actually a list and you can refer to the elements within the list by its subscript. But this code referenced looks too hard to understand; I'm a statistical programmer and we are trying to move a lot of the analysis we used to do in SAS into Oracle ODM and Oracle R Enterprise, which our shop has. But I'm assuming giving you R code or ODM would not help you any as you are unlikely to know these, only that I need the data wide (but it could be subscripts, such as a[1], a[2], and so forth. )
  • 10. Re: Less tedious code for creating individual variables from strings?
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    ml23 wrote:
    I can't get to the second thread. On the first thread, yes, really it would be nice to refer to the element in the string like this:

    Element_Attr_Id_Table_In(i)

    But I would want to do that in a SQL script, not in PL/SQL,
    Why? If you don't explain your reasons, you're likely to get a solution that does what you said, but is even worse at meeting the requirements,
    and don't know how to do that without the substr function. Like in Python, one can have one dictionary value that is actually a list and you can refer to the elements within the list by its subscript.
    In Oracle, SUBSTR is the best way. There are alternatives, but they're less efficent and harder to code than SUBSTR.

    Again, you could get by with calling SUBSTR just once, regarless of how many columns you have in newtable, by unpivoting and pivoting. If you'd like help, post your Oracle version (e.g., 11.2.0.2.0).

    In your sample data, all the strings were 4 characters long. Does that mean, in your real data, all the strings will be the same length (unless they're NULL, of course)?
  • 11. Re: Less tedious code for creating individual variables from strings?
    rp0428 Guru
    Currently Being Moderated
    >
    I can't get to the second thread.
    >
    I fixed the link. Try it now.
    >
    On the first thread, yes, really it would be nice to refer to the element in the string like this:

    Element_Attr_Id_Table_In(i)

    But I would want to do that in a SQL script, not in PL/SQL, and don't know how to do that without the substr function.
    >
    Using just ONE column (e.g. VARCHAR2) you would either need to use the SUBSTR in the query itself or call a PL/SQL function that does it for you.

    You can use a NESTED table to store the data in one column but you don't want to do that. A nested table is just stored as another regular table anyway so you might as well do that yourself.
    Element_Attr_Id_Table_In(i)
    That is basically how you would reference the 'lookup' (child) table. The 'i' would be the 'CONDITION_ID' I used earlier.
    >
    Like in Python, one can have one dictionary value that is actually a list and you can refer to the elements within the list by its subscript. But this code referenced looks too hard to understand; I'm a statistical programmer and we are trying to move a lot of the analysis we used to do in SAS into Oracle ODM and Oracle R Enterprise, which our shop has. But I'm assuming giving you R code or ODM would not help you any as you are unlikely to know these, only that I need the data wide (but it could be subscripts, such as a[1], a[2], and so forth. )
    >
    Correct - that code would not help me at all.

    As already stated your options depend heavily on how often the data is changed or altered. BITMAP indexes are NOT appropriate for tables that have even a moderate amount of DDL on them. The are most effective for data warehouses where the indexes can be dropped before the DML/UPDATEs are done and then rebuilt afterward.

    If data is only updated in batch mode you also may be able to create some 'rolllup' or report-ready tables that already stored information about particulars combinations of attributes that might be useful.

    For example, consider a given subset of attributes (using my simple example above) such as:
     and (age = 'young') and (marital_status = 'divorced')
     and (((summer_sport = 'baseball') and (softdrink = 'pepsi'))
    Then a new 'contrived' attribute named 'young_divorced_baseball_pepsi_nuts" could be created that already represents that particular attribute combination.

    Then you could use that 'new' attribute name in a query instead of having to reproduce the original complex condition.

    Oracle also has the UTL_NLA (non-linear algebra I believe) package that supports some vector and matrix operations.
    http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/u_nla.htm
    {quote}
    212 UTL_NLA

    The UTL_NLA package exposes a subset of the BLAS and LAPACK (Version 3.0) operations on vectors and matrices represented as VARRAYs.
    I include it only for reference; not because I think it is applicable to your use case.
    {quote}
    ---------------------------------------
    Updated May 15, 2013 4:33 PM to include simple example

    Here is some simple sample code to illustrate the WIDE method using a function that does the SUBSTR:
    {code}
    create table patient_attributes (patient_id number, attributes varchar2(100))

    insert into patient_attributes
    with data as (select 1, '0000' myVar from dual
    union all
    select 2, '0100' from dual
    union all
    select 3, '0001' from dual
    union all
    select 4, '1111' from dual)
    select * from data

    CREATE OR REPLACE FUNCTION get_attribute_value(attributes varchar2, attribute_id number)
    RETURN VARCHAR2 IS
    BEGIN
    return substr(attributes, attribute_id, 1);
    END;

    select patient_id from patient_attributes
    where get_attribute_value(attributes, 2) = '1'

    PATIENT_ID
    2
    4

    select patient_id from patient_attributes
    where get_attribute_value(attributes, 2) = '1'
    or get_attribute_value(attributes, 4) = '1'

    PATIENT_ID
    2
    3
    4
    {code}
  • 12. Re: Less tedious code for creating individual variables from strings?
    davidp 2 Pro
    Currently Being Moderated
    Could someone think of a less tedious way to do this? We have several hundred of these and it is tedious to do it this way.
    A bit less tedious, if you have oracle 11g (11.1 or 11.2), which is why people asked you several times what version you have:
    with colgen as (select level pos from dual connect by level <= 5)
    select * from (select bitstring, pos, NVL(substr(bitstring, pos, 1),'0') itm from colgen, test)
    pivot ( max(itm) for pos in (1 as col1, 2 as col2, 3 as col3, 4 as col4, 5 as col5))
    
    BITSTRING  COL1 COL2 COL3 COL4 COL5
    ---------- ---- ---- ---- ---- ----
               0    0    0    0    0
    1000       1    0    0    0    0
    0000       0    0    0    0    0
    0111       0    1    1    1    0
    Another option is to use PL/SQL to generate the text to define a view that presents each field with an appropriate name, using either PIVOT or your original 'tedious' SQL. It's not tedious if you don't ever have to type it.
    With function based indexes, you can actually create a function based bitmap index on expressions such as NVL(substr(var_string,4,1),'0') and then use the view to retrieve the rows by queries on those field names, without ever actually storing a table with each field! E.g. If the view has NVL(substr(var_string,4,1),'0') as col4 and you have a bitmap index on NVL(substr(var_string,4,1),'0') of the original table and similarly for the 5th character as col5, you can query when col4='1' and col5=0 and it will use the bitmap indexes to calculate the answer on the original table!
  • 13. Re: Less tedious code for creating individual variables from strings?
    652774 Newbie
    Currently Being Moderated
    Thanks, these last two approaches, using a function or pivot; look very promising. Will work on it this morning.
  • 14. Re: Less tedious code for creating individual variables from strings?
    652774 Newbie
    Currently Being Moderated
    I'm trying out this this morning:

    create or replace function bit(mask,position)
    return varchar2 is
    begin
    return nvl(substr(mask,position,1),0)
    end



    And then the selects call the function

    create table newtable as

    select
    bit(longstring,1) as var1
    bit(longstring,2) as var2


    Will let you know how it goes. I am on Oracle 11.2.

    -Mary
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points