1 2 Previous Next 22 Replies Latest reply on Sep 27, 2018 10:32 PM by sect55

    Trouble importing a delimited multiple line text file

    sect55

      I exported a Microsoft Access table to a delimited text file that generates multiple lines per record since it has several LONG data type fields. How can I create one row per multiple line? I do not know when the row is finished (since it is not a fixed number of rows or the continued rows does not have a character to distinguish it as a continuation line. The delimited character is a pipe ("l"). I attached the file. The declaration of the table is as follows:

       

      CREATE TABLE HIST_SUBC

      (

      TAA_Key_Number VARCHAR2(4000),

                   TAA_Key_Suffix VARCHAR2(4000),

                   Submission VARCHAR2(4000),

                   Received VARCHAR2(4000),

                   Sent_to_Groups VARCHAR2(4000),

                   Due_from_Groups VARCHAR2(4000),

                   Due_Date VARCHAR2(4000),

                   To_grp01 VARCHAR2(4000),

                   To_grp02 VARCHAR2(4000),

                   To_grp03 VARCHAR2(4000),

                   To_grp04 VARCHAR2(4000),

                   To_grp05 VARCHAR2(4000),

                   To_grp06 VARCHAR2(4000),

                   To_grp07 VARCHAR2(4000),

                   To_grp08 VARCHAR2(4000),

                   To_grp09 VARCHAR2(4000),

                   To_grp10 VARCHAR2(4000),

                   To_grp11 VARCHAR2(4000),

                   To_grp12 VARCHAR2(4000),

                   To_grp13 VARCHAR2(4000),

                   To_grp14 VARCHAR2(4000),

                   To_grp15 VARCHAR2(4000),

                   To_grp16 VARCHAR2(4000),

                   From_grp01 VARCHAR2(4000),

                   From_grp02 VARCHAR2(4000),

                   From_grp03 VARCHAR2(4000),

                   From_grp04 VARCHAR2(4000),

                   From_grp05 VARCHAR2(4000),

                   From_grp06 VARCHAR2(4000),

                   From_grp07 VARCHAR2(4000),

                   From_grp08 VARCHAR2(4000),

                   From_grp09 VARCHAR2(4000),

                   From_grp10 VARCHAR2(4000),

                   From_grp11 VARCHAR2(4000),

                   From_grp12 VARCHAR2(4000),

                   From_grp13 VARCHAR2(4000),

                   From_grp14 VARCHAR2(4000),

                   From_grp15 VARCHAR2(4000),

                   From_grp16 VARCHAR2(4000),

                   Job_status VARCHAR2(4000),

                   Letter_Date VARCHAR2(4000),

                   SpecialInstruction CLOB,

                   Rider CLOB,

                   Rider_Completed VARCHAR2(4000),

                   Rider_Super_Signed VARCHAR2(4000),

                   Rider_Mgt_Signed VARCHAR2(4000),

                   Tenant_Letter_Faxed VARCHAR2(4000),

                   Tenant_Letter_Sent VARCHAR2(4000),

                   Tenant_Response VARCHAR2(4000),

                   Remarks VARCHAR2(4000),

                   Tag1 VARCHAR2(4000),

                   Tag2 NUMBER,

                   Tag3 TIMESTAMP,

                   Tag4 NUMBER,

                   Tag5 VARCHAR2(4000),

                   Rider_Text CLOB,

                   Remarks_Tag VARCHAR2(4000),

                   Tenant_Due_Date VARCHAR2(4000),

                   Comments_grp01 VARCHAR2(4000),

                   Comments_grp02 VARCHAR2(4000),

                   Comments_grp03 VARCHAR2(4000),

                   Comments_grp04 VARCHAR2(4000),

                   Comments_grp05 VARCHAR2(4000),

                   Comments_grp06 VARCHAR2(4000),

                   Comments_grp07 VARCHAR2(4000),

                   Comments_grp08 VARCHAR2(4000),

                   Comments_grp09 VARCHAR2(4000),

                   Comments_grp10 VARCHAR2(4000),

                   Comments_grp11 VARCHAR2(4000),

                   Comments_grp12 VARCHAR2(4000),

                   Comments_grp13 VARCHAR2(4000),

                   Comments_grp14 VARCHAR2(4000),

                   Comments_grp15 VARCHAR2(4000),

                   Comments_grp16 VARCHAR2(4000),

                   Grp01 VARCHAR2(4000),

                   Grp02 VARCHAR2(4000),

                   Grp03 VARCHAR2(4000),

                   Grp04 VARCHAR2(4000),

                   Grp05 VARCHAR2(4000),

                   Grp06 VARCHAR2(4000),

                   Grp07 VARCHAR2(4000),

                   Grp08 VARCHAR2(4000),

                   Grp09 VARCHAR2(4000),

                   Grp10 VARCHAR2(4000),

                   Grp11 VARCHAR2(4000),

                   Grp12 VARCHAR2(4000),

                   Grp13 VARCHAR2(4000),

                   Grp14 VARCHAR2(4000),

                   Grp15 VARCHAR2(4000),

                   Grp16 VARCHAR2(4000),

                   RR_Remarks CLOB,

                   Rider_Text_Copy CLOB,

                   Response_Sub number,

                   Received_Letter_Date VARCHAR2(4000),

                   Rider_Requirements CLOB,

                   CopyTo VARCHAR2(4000),

                   Change_ID VARCHAR2(4000),

                   Change_Date VARCHAR2(4000),

                   Change_Time VARCHAR2(4000)

      );

       

      I am using SQL Developer 18.2 / Oracle 11g

       

      Robert

        • 1. Re: Trouble importing a delimited multiple line text file

          If you don't know, and can't determine, when a row is finished then Oracle won't be able to either.

           

          Also - this question appears to have NOTHING to do with Sql Developer which is what this forum is for.

           

          Unless you can tie it in to sql developer somehow please move the thread to the sql and pl/sql forum.

          SQL & PL/SQL

          • 2. Re: Trouble importing a delimited multiple line text file
            sect55

            I put in sql developer because I was using the Import Data option when you right click on tables.

            Also, since I was using delimiters, I was hoping there was a way SQL Developer can including all lines until the next delimiter.

            • 3. Re: Trouble importing a delimited multiple line text file

              I put in sql developer because I was using the Import Data option when you right click on tables.

              Also, since I was using delimiters, I was hoping there was a way SQL Developer can including all lines until the next delimiter.

              It DOES include all data until the next delimiter.

               

              But if the data includes the delimiter character then that character indicates NO MORE DATA for the row so the following data belongs to the next row.

               

              Delimited formats will NOT work properly if the data includes the characters used for delimiting the rows. And if the data includes the character used for separating the fields then the field data value needs to be enclosed in double quotes.

              • 4. Re: Trouble importing a delimited multiple line text file
                Mike Kutz

                I ran across an OpenSource project that "stopped testing on Oracle" because it didn't follow a specific RFC.

                As I understood the discussion, the RFC stated that you can include the row terminator within the field data as long as it is enclosed within double quotes.

                All other database vendors seem to have some way to allow such data to be imported.  That particular project used some code to convert the data to something that Oracle could handle.

                 

                (I wish I kept a link to everything).

                 

                For the OP, try the following

                • Export the data as XML.  INSERT .... SELECT ... FROM XMLTABLE( );
                • create a database link within the Oracle database and use INSERT...SELECT
                • create a connection to the Access database within SQL*Developer and use the Data Migration Wizard.

                 

                MK

                • 5. Re: Trouble importing a delimited multiple line text file

                  I ran across an OpenSource project that "stopped testing on Oracle" because it didn't follow a specific RFC.

                  It isn't just an Oracle issue. There are MANY implementers (even big ones like Informatica)that don't allow embedded row delimiters. They can't import the same data they just exported and they can't even check/log that the exported data is using embedded delimiters.

                  As I understood the discussion, the RFC stated that you can include the row terminator within the field data as long as it is enclosed within double quotes.

                  Perhaps you are referring to RFC 4180 and this section?

                  6. Fields containing line breaks (CRLF), double quotes, and commas

                    should be enclosed in double-quotes. For example:

                   

                    "aaa","b CRLF

                    bb","ccc" CRLF

                    zzz,yyy,xxx

                  Unfortunately that assumes THREE THINGS and two of them generally are NOT done:

                   

                  1. line breaks are actually used to delimit the rows - and as the RFC states some implementations use other characters

                   

                  2. the implementer parses by fields rather than by rows.

                   

                  3. the data producer actually checks the data for embedded delimiters and handles them properly.

                   

                  Items 2 and 3 above are what result in problems.

                  All other database vendors seem to have some way to allow such data to be imported.

                   

                  Can't agree with that statement at all. I've worked with ALL of the major ones: Burroughs, IBM, Sperry, Oracle to name four of them. None of them do it correctly as I can EASILY produce data that once they export it in delimited format they choke when they try to import the file they just produced.

                   

                  Those four I mentioned don't even use the same standard among themselves. So some data that works on one can fail on a different one of the four.

                   

                  The SIMPLEST SOLUTION, which virtually no one implements, is to just define/allow an 'escape sequence' (a single escape character even a string of characters) and follow one basic rule when producing/consuming the file:

                   

                  1. a single occurrence of the escape sequence means 'take the next character as data no matter what it is

                   

                  Thus the escape sequence can be included as data simply by using two of them in a row.

                   

                  Of course the producer MUST examine/analyze every single character added to the file in order to know if it needs to be 'escaped' or not. Most vendors don't do that at all.

                   

                  1. it is tedious

                  2. it affects performance

                  3. they think it isn't needed since their data is 'clean'

                   

                  The performance issue is that you can't insert an escape sequence 'after the fact' without moving all of the following data. So if you are constructing a 'record' you have to construct the entire thing before you know if an escape sequence is needed.

                   

                  The programmatic way to address this is to use a byte buffer that has an escape sequence at the start. Then you add the 'data' after it and construct the row. Once the row is complete you will know if the escape was needed. If it was you write out the entire buffer including the escape sequence. If an escape wasn't needed you just skip the sequence bytes and write out the remaining buffer bytes.

                   

                  And when constructing each 'field' in the row you use a separate buffer in a similar manner. You can also use a separate buffer (of maximum length) for each field and an 'embedded' indicator for each field. Then you can write out the record by writing out the fields in order. If any field has its 'embedded' indicator set then you write out the escape sequence also - otherwise you don't.

                   

                  The vast majority of parsers I have seen/used try to parse the records first and then tokenize them into fields. That is a primary reason the double quote implementations fail if the data contains embedded record delimiters.

                   

                  Using the simple escape mechanism always works no matter what delimiters are used and no matter what the file format really is since it is 'stream based'. But almost no one implements that method.

                  • 6. Re: Trouble importing a delimited multiple line text file
                    jflack

                    First of all, the problem is NOT field delimiters.  The main reason for using a pipe delimiter instead of comma separated values with optional quotes is that the data might include commas and quotes but probably doesn't include pipe characters.

                     

                    The problem is RECORD delimiters - SQL Developer is trying to read a record and it using CRLF (Windows), LF (Unix/Linux and Mac), or CR to find the end of a record.  But there are end of record characters embedded in the data. Unfortunately, there is no way to tell SQL Developer: If you see a CRLF before you see the pipe ending the field, this is DATA, not an end of record.

                     

                    Fortunately, when you specify the delimiters for a delimited file, you can also specify the line terminator - look for this field:

                    This may take some editing of your data file to use.  What you need to do is add a line terminator of some sort to the end of each full record.  I use Notepad++ which can do search and replace with a "\n" to search and replace LF and "\r" to search and replace CR.  If you can depend that a pipe character ending the last field in a record is always followed by a LF (or a CRLF for Windows) than this should be easy.  In Notepad++, I'd replace:

                    |\n

                    with

                    |{EOL}

                    Make sure that you get {EOL} at the end of the file too.  If you can't do this with search and replace, you might need to do it manually - or get the program that produced the file to do it. Then I'd specify that {EOL} is the Line Terminator.

                    That should do the job. By the way, I DO hope you will change those dates and number fields to DATE and NUMBER fields instead of VARCHAR2(4000).

                    1 person found this helpful
                    • 7. Re: Trouble importing a delimited multiple line text file
                      sect55

                      The problem is that I cannot  depend that a pipe character ending the last field in a record is always followed by a LF (or a CRLF for Windows). I tried to use Notepad++ but the \n and the \r are within the record. How can I add a pipeline to the end?

                       

                      As an alternative, I exported the data as XML. It doesn't seem to be a way in SQL Developer to import from an XML file.  Is there an easy way to import the data from an xml file into the same table using a different tool like SQL*Loader or another Oracle tool? (Should I post this separately in the SQL and PL/SQL forum?)

                       

                      Robert

                      • 8. Re: Trouble importing a delimited multiple line text file
                        jflack

                        One way that you might be able to do this is with SQL*Loader, IF you can export each of those big text fields to a separate file with a unique filename.  Say the input looks like:

                        CREATE TABLE my_data (

                        my_id INTEGER,

                        my_text1 VARCHAR2(4000),

                        my_text2 VARCHAR2(4000)

                        )

                        If you could export so that:

                        my_text1 from the record with my_id = 1 goes into a file named: my_text1_1.txt

                        my_text2 from the record with my_id = 1 goes into a file named: my_text2_1.txt

                        my_text1 from the record with my_id = 2 goes into a file named: my_text1_2.txt

                        my_text2 from the record with my_id = 2 goes into a file named: my_text2_2.txt

                        Then write a file, my_data.csv containing:

                        my_id, my_text1, my_text2

                        1, "my_text1_1.txt", "my_text2_1.txt"

                        2, "my_text1_2.txt", "my_text2_2.txt"

                         

                        You can write a SQL*Loader control file that can use the filenames to load data from files - this is called a "LOBFILE". See the SQL*Loader manual:

                        https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sutil/loading-objects-using-oracle-sql-loader.html#GUID-…

                        for details.

                         

                        Or you might be able to load the XML file into a single row, single XMLTYPE column, then use Oracle's built in XML parsing capability.

                        1 person found this helpful
                        • 9. Re: Trouble importing a delimited multiple line text file
                          sect55

                          Thanks for your reply.

                           

                          There are too many rows to use the 1st option. It would take significant time to create a file for each column and row (there are over 6,000 rows and four columns).

                           

                          I will look into the 2nd option and/or use the DBMS_XMLStore package. I need to investigate more.

                           

                          Also, I am seeing if APEX > SQL Workshop > Utilities > Data Upload > XML utility will work too.(I created a discussion https://community.oracle.com/thread/4174341 in APEX Discussion Forum as well)

                           

                          Robert

                          • 10. Re: Trouble importing a delimited multiple line text file

                            The problem is that I cannot depend that a pipe character ending the last field in a record is always followed by a LF (or a CRLF for Windows). I tried to use Notepad++ but the \n and the \r are within the record. How can I add a pipeline to the end?

                            You position the cursor at the end and add the pipe character.

                             

                            Not sure I understand your question.

                             

                            As I tried to explain before Sql Dev, and many other tools use this process to parse:

                             

                            1. read a line

                            2. parse the line into fields

                             

                            For step #1 they search for whatever has been specified as the record delimiter. So you can NOT have the record delimiter characters as part of the data.

                            • 11. Re: Trouble importing a delimited multiple line text file
                              sect55

                              When I exported from Access, It seems the column has the record delimiter characters as part of the data because when I open it in Notepad++, it looks like this:

                               

                               

                               

                              atams_notpadplus_eample.jpg

                              So I cannot add the delimiter at the end of each line easily.

                               

                              Robert

                              • 12. Re: Trouble importing a delimited multiple line text file

                                You have to find what you consider to be the end of the record and then do it.

                                 

                                No one suggested it would be easy.

                                 

                                That is exactly the issue with embedded delimiters - there is no easy way to know which delimiter is REALLY the end of a record.

                                 

                                The ONLY accurate method is to use an 'escape' mechanism of some sort where an escape character(s) is used to indicate that the following character is data no matter what it is.

                                 

                                Files that use that method then need to parse character-by-character rather than use the record-by-record method many parsers use.

                                1 person found this helpful
                                • 13. Re: Trouble importing a delimited multiple line text file
                                  Mike Kutz

                                  sect55 wrote:

                                   

                                  Thanks for your reply.

                                   

                                  There are too many rows to use the 1st option. It would take significant time to create a file for each column and row (there are over 6,000 rows and four columns).

                                   

                                  I will look into the 2nd option and/or use the DBMS_XMLStore package. I need to investigate more.

                                   

                                  Also, I am seeing if APEX > SQL Workshop > Utilities > Data Upload > XML utility will work too.(I created a discussion https://community.oracle.com/thread/4174341 in APEX Discussion Forum as well)

                                   

                                  Robert

                                  You should be able to import XML data via External Tables.  https://oracle-base.com/articles/12c/exteral-tables-and-xmltag-to-load-xml-documents-12cr2

                                   

                                  Also, I still believe the code for External Tables should be available to LOBs.   This way, BLOB/CLOB data that has been uploaded to the database (eg APEX) can be processed with the same power as an External Table.

                                   

                                  parse CLOBS/BLOBS using "external table" capability

                                   

                                  MK

                                  • 14. Re: Trouble importing a delimited multiple line text file
                                    jflack

                                    Wait a minute - I was re-reading the posts, and I had missed something important - the fact that you are migrating data from a Microsoft Access database to an Oracle database.  SQL Developer has a migration workbench built into it that can actually copy data directly from the Access database to the Oracle database.

                                     

                                    This page - https://www.oracle.com/technetwork/database/migration/access-084991.html starts a step by step tutorial on how to do it.  It also has a link to a video.

                                    And if you do a web search on "import Microsoft Access to Oracle" you should find some more information.

                                    1 person found this helpful
                                    1 2 Previous Next