1 2 Previous Next 16 Replies Latest reply: May 16, 2013 7:24 PM by davidp 2 RSS

    Less tedious code for creating individual variables from strings?

    652774
      Hi, we are trying to convert code stored as a long string of 0's and 1's in one variable into separate variables. Currently we have:

      var_string
      0000
      0100
      null
      1111

      create table newtable as select
      case when var_string is null then '0' else substr(var_string,1,1) end as var1,
      case when var_string null then '0' else substr(var_string,2,1) end as var2,
      case when var_string is null then '0' else substr(var_string,3,1) end as var3,
      case when var_string is null then '0' else substr(var_string,4,1) end as var4
      from oldtable;

      Could someone think of a less tedious way to do this? We have several hundred of these and it is tedious to do it this way.

      -Mary
        • 1. Re: Less tedious code for creating individual variables from strings?
          rp0428
          >
          Hi, we are trying to convert code stored as a long string of 0's and 1's in one variable into separate variables. Currently we have:

          var_string
          0000
          0100
          null
          1111

          create table newtable as select
          case when var_string is null then '0' else substr(var_string,1,1) end as var1,
          case when var_string null then '0' else substr(var_string,2,1) end as var2,
          case when var_string is null then '0' else substr(var_string,3,1) end as var3,
          case when var_string is null then '0' else substr(var_string,4,1) end as var4
          from oldtable;

          Could someone think of a less tedious way to do this? We have several hundred of these and it is tedious to do it this way.
          >
          No.

          You need to pick each character off separately.

          The table should be created manually using standard DDL. Then you can just insert the data.

          Can you provide more information about the actual problem you are dealing with?

          How did the data wind up that way to begin with? is it really binary data where each digit is ONLY a one or a zero?

          Why would you want to store each digit in a separate column? How is the data going to be used AFTER it is stored in the table?
          • 2. Re: Less tedious code for creating individual variables from strings?
            Frank Kulash
            Hi, Mary,
            ml23 wrote:
            Hi, we are trying to convert code stored as a long string of 0's and 1's in one variable into separate variables. Currently we have:

            var_string
            0000
            0100
            null
            1111

            create table newtable as select
            case when var_string is null then '0' else substr(var_string,1,1) end as var1,
            case when var_string null then '0' else substr(var_string,2,1) end as var2,
            case when var_string is null then '0' else substr(var_string,3,1) end as var3,
            case when var_string is null then '0' else substr(var_string,4,1) end as var4
            from oldtable;

            Could someone think of a less tedious way to do this? We have several hundred of these and it is tedious to do it this way.
            Do you mean you have hundreds of columns in newtable?
            If so, I don't see any way around having hundreds of expressions.
            They can be a little shorter if you use NVL instead of CASE, like this:
             CREATE TABLE     newtable
            AS
            SELECT     NVL (SUBSTR (var_string, 1, 1), '0')     AS var1
            ,     NVL (SUBSTR (var_string, 2, 1), '0')     AS var2
            ,     NVL (SUBSTR (var_string, 3, 1), '0')     AS var3
            ,     NVL (SUBSTR (var_string, 4, 1), '0')     AS var4
            FROM     oldtable;
            This assumes that, if var_string is not NULL, but is shorter than expected, you want a '0' in the missing column.

            You could also do the functions (NVL, SUBSTR) just once, by unpivoting the data and then pivoting it back. Exactly how to do that depends on your Oracle version.


            Post CREATE TABLE and INSERT statements for a little sample data, and the results you want from that data (that is, the contents of newtable when everything is finsihed). If you really have 400 columns (and strings up to 400 characters long), just say so, but continue to post sample data that only has 4 columns. Include some 1-, 2- or 3-character var_strings in your sample data, if that's a situation that can occur in your real table.
            Post sample data and results like this whenever you have a question, and always say which version of Oracle you're using (e.g. 11.2.0.2.0).
            See the forum FAQ {message:id=9360002}
            • 3. Re: Less tedious code for creating individual variables from strings?
              652774
              It came from a SAS dataset; but they can use arrays in SAS and more easily convert this into variables. I don't see that there are arrays, or lists in Oracle.

              It is for a healthcare analysis, each 0/1 is a flag indicating a condition that the patient has; so we could convert it into variable names like each patient history of disease, i.e, had breast cancer history, had heart disease history, and so on, so I hope to eventually get into more descriptive variable names, but it is the having to do the case... on each variable that is tedious. We will use it to cluster different kinds of patients together. Will give the nvl a try.
              • 4. Re: Less tedious code for creating individual variables from strings?
                652774
                Here's a test table:

                create table test
                (bitstring varchar2(10));
                commit;

                insert into test(bitstring) values ('0000');
                insert into test(bitstring) values ('1000');
                insert into test(bitstring) values ('0111');
                insert into test(bitstring) values (null);
                commit;
                • 5. Re: Less tedious code for creating individual variables from strings?
                  rp0428
                  >
                  It came from a SAS dataset; but they can use arrays in SAS and more easily convert this into variables. I don't see that there are arrays, or lists in Oracle.

                  It is for a healthcare analysis, each 0/1 is a flag indicating a condition that the patient has; so we could convert it into variable names like each patient history of disease, i.e, had breast cancer history, had heart disease history, and so on, so I hope to eventually get into more descriptive variable names, but it is the having to do the case... on each variable that is tedious. We will use it to cluster different kinds of patients together
                  >
                  But you most important question is the one I ask that you didn't answer:
                  >
                  How is the data going to be used AFTER it is stored in the table?
                  >
                  The architecture should be driven by how the data is going to be used and how often it will be changed or updated.

                  A simple vertical table of attributes can support what you suggest:
                  CREATE TABLE all_conditions (
                    condition_id number,
                    description varchar2(50)
                  );
                  
                  CREATE TABLE patient_conditions (
                    patient_id number,
                    condition_id number
                  )
                  For each of your conditions (var1, var2, etc) you would create a separate row in the PATIENT_CONDITIONS table for that patient. That PIVOTS/UNPIVOTS your long VARCHAR2 containing '0010000100. . .' into a single row for each position that contains a '1'.

                  Then you can query the 'patient_conditions' table to find all patients that have a particular condition or combination of conditions.

                  First define how the data will be used and base the architecture on that.

                  Post an example query showing how the data will actually be used once you have it stored.
                  • 6. Re: Less tedious code for creating individual variables from strings?
                    652774
                    I did reply to the after; we are using it to cluster patients together; i.e group them into categories.

                    The nvl does look promising

                    create table test
                    (bitstring varchar2(10));
                    commit;

                    insert into test(bitstring) values ('0000');
                    insert into test(bitstring) values ('1000');
                    insert into test(bitstring) values ('0111');
                    insert into test(bitstring) values (null);
                    commit;

                    CREATE TABLE     test2
                    AS
                    SELECT     NVL (SUBSTR (bitstring, 1, 1), '0')     AS var1
                    ,     NVL (SUBSTR (bitstring, 2, 1), '0')     AS var2
                    ,     NVL (SUBSTR (bitstring, 3, 1), '0')     AS var3
                    ,     NVL (SUBSTR (bitstring, 4, 1), '0')     AS var4
                    FROM     test;
                    • 7. Re: Less tedious code for creating individual variables from strings?
                      652774
                      Yes, that makes sense to flip it long; I think we will need it wide for analysis, such as cluster analysis in Oracle ODM or Oracle R Enterprise would need variables wide.
                      • 8. Re: Less tedious code for creating individual variables from strings?
                        rp0428
                        >
                        Yes, that makes sense to flip it long; I think we will need it wide for analysis, such as cluster analysis in Oracle ODM or Oracle R Enterprise would need variables wide.
                        >
                        Ok - 'wide versus long' is part of it. But you STILL need to know HOW the data will actually be used to know what the implementation options might be.

                        Because with WIDE, using your approach each column has to have a name that is meaningful or the data can't be accessed at all. But the same data (same meaning your long VARCHAR2) could be stored in one column of type RAW or BLOB. Then instead of doing your SUBSTR when you put the data IN the table you do it when you query the data OUT of the table.

                        For a extreme example of the power of using actual bitmapped data see my replies in these two threads.

                        This thread shows an example that actually uses just ONE column of type NUMBER and can handle up to about 125 different attributes (you said you had hundreds).
                        Re: Can Oracle convert literals to bind variables?

                        This thread shows a 'star schema' example of using an individual column for each attribute (like you proposed) and shows how BITMAP indexes on those attributes can be used effectively to query the data.
                        Re: How to design a fact table to keep track of active dimensions?
                        create table star_fact (
                            fact_key varchar2(30) DEFAULT 'N/A' not null,
                            age      varchar2(30) DEFAULT 'N/A' not null,
                            beer    varchar2(30) DEFAULT 'N/A' not null,
                            marital_status varchar2(30) DEFAULT 'N/A' not null,
                            softdrink varchar2(30) DEFAULT 'N/A' not null,
                            state    varchar2(30) DEFAULT 'N/A' not null,
                            summer_sport varchar2(30) DEFAULT 'N/A' not null,
                            constraint star_fact_pk PRIMARY KEY (fact_key)
                        );
                        
                        SQL> select rowid from star_fact where
                          2   (state = 'CA') or (state = 'CO')
                          3  and (age = 'young') and (marital_status = 'divorced')
                          4  and (((summer_sport = 'baseball') and (softdrink = 'pepsi'))
                          5  or ((summer_sport = 'golf') and (beer = 'coors')));
                        Edited by: rp0428 on May 15, 2013 3:48 PM to fix second link.
                        • 9. Re: Less tedious code for creating individual variables from strings?
                          652774
                          I can't get to the second thread. On the first thread, yes, really it would be nice to refer to the element in the string like this:

                          Element_Attr_Id_Table_In(i)

                          But I would want to do that in a SQL script, not in PL/SQL, and don't know how to do that without the substr function. Like in Python, one can have one dictionary value that is actually a list and you can refer to the elements within the list by its subscript. But this code referenced looks too hard to understand; I'm a statistical programmer and we are trying to move a lot of the analysis we used to do in SAS into Oracle ODM and Oracle R Enterprise, which our shop has. But I'm assuming giving you R code or ODM would not help you any as you are unlikely to know these, only that I need the data wide (but it could be subscripts, such as a[1], a[2], and so forth. )
                          • 10. Re: Less tedious code for creating individual variables from strings?
                            Frank Kulash
                            Hi,
                            ml23 wrote:
                            I can't get to the second thread. On the first thread, yes, really it would be nice to refer to the element in the string like this:

                            Element_Attr_Id_Table_In(i)

                            But I would want to do that in a SQL script, not in PL/SQL,
                            Why? If you don't explain your reasons, you're likely to get a solution that does what you said, but is even worse at meeting the requirements,
                            and don't know how to do that without the substr function. Like in Python, one can have one dictionary value that is actually a list and you can refer to the elements within the list by its subscript.
                            In Oracle, SUBSTR is the best way. There are alternatives, but they're less efficent and harder to code than SUBSTR.

                            Again, you could get by with calling SUBSTR just once, regarless of how many columns you have in newtable, by unpivoting and pivoting. If you'd like help, post your Oracle version (e.g., 11.2.0.2.0).

                            In your sample data, all the strings were 4 characters long. Does that mean, in your real data, all the strings will be the same length (unless they're NULL, of course)?
                            • 11. Re: Less tedious code for creating individual variables from strings?
                              rp0428
                              >
                              I can't get to the second thread.
                              >
                              I fixed the link. Try it now.
                              >
                              On the first thread, yes, really it would be nice to refer to the element in the string like this:

                              Element_Attr_Id_Table_In(i)

                              But I would want to do that in a SQL script, not in PL/SQL, and don't know how to do that without the substr function.
                              >
                              Using just ONE column (e.g. VARCHAR2) you would either need to use the SUBSTR in the query itself or call a PL/SQL function that does it for you.

                              You can use a NESTED table to store the data in one column but you don't want to do that. A nested table is just stored as another regular table anyway so you might as well do that yourself.
                              Element_Attr_Id_Table_In(i)
                              That is basically how you would reference the 'lookup' (child) table. The 'i' would be the 'CONDITION_ID' I used earlier.
                              >
                              Like in Python, one can have one dictionary value that is actually a list and you can refer to the elements within the list by its subscript. But this code referenced looks too hard to understand; I'm a statistical programmer and we are trying to move a lot of the analysis we used to do in SAS into Oracle ODM and Oracle R Enterprise, which our shop has. But I'm assuming giving you R code or ODM would not help you any as you are unlikely to know these, only that I need the data wide (but it could be subscripts, such as a[1], a[2], and so forth. )
                              >
                              Correct - that code would not help me at all.

                              As already stated your options depend heavily on how often the data is changed or altered. BITMAP indexes are NOT appropriate for tables that have even a moderate amount of DDL on them. The are most effective for data warehouses where the indexes can be dropped before the DML/UPDATEs are done and then rebuilt afterward.

                              If data is only updated in batch mode you also may be able to create some 'rolllup' or report-ready tables that already stored information about particulars combinations of attributes that might be useful.

                              For example, consider a given subset of attributes (using my simple example above) such as:
                               and (age = 'young') and (marital_status = 'divorced')
                               and (((summer_sport = 'baseball') and (softdrink = 'pepsi'))
                              Then a new 'contrived' attribute named 'young_divorced_baseball_pepsi_nuts" could be created that already represents that particular attribute combination.

                              Then you could use that 'new' attribute name in a query instead of having to reproduce the original complex condition.

                              Oracle also has the UTL_NLA (non-linear algebra I believe) package that supports some vector and matrix operations.
                              http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/u_nla.htm
                              {quote}
                              212 UTL_NLA

                              The UTL_NLA package exposes a subset of the BLAS and LAPACK (Version 3.0) operations on vectors and matrices represented as VARRAYs.
                              I include it only for reference; not because I think it is applicable to your use case.
                              {quote}
                              ---------------------------------------
                              Updated May 15, 2013 4:33 PM to include simple example

                              Here is some simple sample code to illustrate the WIDE method using a function that does the SUBSTR:
                              {code}
                              create table patient_attributes (patient_id number, attributes varchar2(100))

                              insert into patient_attributes
                              with data as (select 1, '0000' myVar from dual
                              union all
                              select 2, '0100' from dual
                              union all
                              select 3, '0001' from dual
                              union all
                              select 4, '1111' from dual)
                              select * from data

                              CREATE OR REPLACE FUNCTION get_attribute_value(attributes varchar2, attribute_id number)
                              RETURN VARCHAR2 IS
                              BEGIN
                              return substr(attributes, attribute_id, 1);
                              END;

                              select patient_id from patient_attributes
                              where get_attribute_value(attributes, 2) = '1'

                              PATIENT_ID
                              2
                              4

                              select patient_id from patient_attributes
                              where get_attribute_value(attributes, 2) = '1'
                              or get_attribute_value(attributes, 4) = '1'

                              PATIENT_ID
                              2
                              3
                              4
                              {code}
                              • 12. Re: Less tedious code for creating individual variables from strings?
                                davidp 2
                                Could someone think of a less tedious way to do this? We have several hundred of these and it is tedious to do it this way.
                                A bit less tedious, if you have oracle 11g (11.1 or 11.2), which is why people asked you several times what version you have:
                                with colgen as (select level pos from dual connect by level <= 5)
                                select * from (select bitstring, pos, NVL(substr(bitstring, pos, 1),'0') itm from colgen, test)
                                pivot ( max(itm) for pos in (1 as col1, 2 as col2, 3 as col3, 4 as col4, 5 as col5))
                                
                                BITSTRING  COL1 COL2 COL3 COL4 COL5
                                ---------- ---- ---- ---- ---- ----
                                           0    0    0    0    0
                                1000       1    0    0    0    0
                                0000       0    0    0    0    0
                                0111       0    1    1    1    0
                                Another option is to use PL/SQL to generate the text to define a view that presents each field with an appropriate name, using either PIVOT or your original 'tedious' SQL. It's not tedious if you don't ever have to type it.
                                With function based indexes, you can actually create a function based bitmap index on expressions such as NVL(substr(var_string,4,1),'0') and then use the view to retrieve the rows by queries on those field names, without ever actually storing a table with each field! E.g. If the view has NVL(substr(var_string,4,1),'0') as col4 and you have a bitmap index on NVL(substr(var_string,4,1),'0') of the original table and similarly for the 5th character as col5, you can query when col4='1' and col5=0 and it will use the bitmap indexes to calculate the answer on the original table!
                                • 13. Re: Less tedious code for creating individual variables from strings?
                                  652774
                                  Thanks, these last two approaches, using a function or pivot; look very promising. Will work on it this morning.
                                  • 14. Re: Less tedious code for creating individual variables from strings?
                                    652774
                                    I'm trying out this this morning:

                                    create or replace function bit(mask,position)
                                    return varchar2 is
                                    begin
                                    return nvl(substr(mask,position,1),0)
                                    end



                                    And then the selects call the function

                                    create table newtable as

                                    select
                                    bit(longstring,1) as var1
                                    bit(longstring,2) as var2


                                    Will let you know how it goes. I am on Oracle 11.2.

                                    -Mary
                                    1 2 Previous Next