2 Replies Latest reply on Dec 9, 2017 2:24 PM by EdStevens

    loading twitter data into oracle

    RocioPerezNunez-Oracle

      Hello all,

       

      I have some problems loading a twitter data from a csv, with some multiline records. I have 4 columns id_tweet,text,created_at, username .

      When text is enclosed by “  the register is multiline or has comas inside, I show you 3 examples, 1st enclosed by “ with comas inside the text, 2nd enclosed by “ and multiline, 3rd no enclosed by coma no multiline:

       

      793418980342063104,"@realDonaldTrump I am also a $ donor, i can not give much in the way of $ but i do.(poor) My advice is more valuable if you have used it.",2016-11-01T11:46:40.000Z, ReynardGosso

      793418981415583745,"@FranklinFoer                                                                                                                                                                                                                                                                                                                                                                                   

      Thanks @realDonaldTrump you’re doing a great Job helping to elect @HillaryClinton !                                                                                                                                                                                                                                                                                                                                                                                        

      https://t.co/q9qO0uKZxA",2016-11-01T11:46:40.000Z,759714473556840448,false,wabroussard1,,,0,-1,1,0                                                                                                                                                                                                                             

      793418985458929664,This is the 5 year old Nazi German-American @realDonaldTrump that wants 2 be president. No  Americans elect adults. https://t.co/IFdQkdDyJN,2016-11-01T11:46:41.000Z, AnthonyNnani      

       

      I´m using sqlldr but when I use

       

      load data

      infile 'tweetv1.csv' "str '\r\n'"

      append continueif last != ','

      into table TWEETS

      fields terminated by ','

      OPTIONALLY ENCLOSED BY '"' AND '"'

      trailing nullcols

                 ( id_TWEET CHAR(4000),

                   text CHAR(4000),

                   created_at CHAR(4000),

                   userid CHAR(4000)

                 )

       

      It doesn´t recognize when it’s a new logical record.

       

      Any advice on how to do it?

       

      Thanks a lot,

      Rocio

        • 1. Re: loading twitter data into oracle
          Barbara Boehmer

          You need to have something in your data that can be used to either identify when a record begins or when a record ends.  You have used

           

          continueif last != ','

           

          to tell it that the record is continued on the next line if the last character on the current line is not a comma, but there are no commas at the ends of your records.  In the following example, I have instead used

           

          continueif next preserve(1) != '7'

           

          to tell it that the record is continued on the next line if the next character on the next line is not a 7 and, if it is a 7, then that begins the next record and to preserve that 7 as part of the first field.  This, obviously will only work if every id_tweet begins with a 7.  If not, then you will need to obtain your data in a different format, in order to ensure being able to load all records, finding some way to either add an end of record character or beginning of record character.

           

           


          SCOTT@orcl_12.1.0.2.0> host type tweetv1.csv
          793418980342063104,"@realDonaldTrump I am also a $ donor, i can not give much in the way of $ but i do.(poor) My advice is more valuable if you have used it.",2016-11-01T11:46:40.000Z, ReynardGosso
          793418981415583745,"@FranklinFoer
          Thanks @realDonaldTrump youΓÇÖre doing a great Job helping to elect @HillaryClinton !
          ",2016-11-01T11:46:40.000Z,759714473556840448,false,wabroussard1,,,0,-1,1,0
          793418985458929664,This is the 5 year old Nazi German-American @realDonaldTrump that wants 2 be president. No Americans elect adults.,2016-11-01T11:46:41.000Z, AnthonyNnani

           

          SCOTT@orcl_12.1.0.2.0> host type test.ctl
          load data
          infile 'tweetv1.csv' "str '\r\n'"
          append continueif next preserve(1) != '7'
          into table TWEETS
          fields terminated by ','
          OPTIONALLY ENCLOSED BY '"' AND '"'
          trailing nullcols
          ( id_TWEET    CHAR(4000)
          , text        CHAR(4000)
          , created_at  CHAR(4000)
          , userid      CHAR(4000) )

           

          SCOTT@orcl_12.1.0.2.0> create table tweets
            2    (id_tweet    varchar2(4000),
            3     text        varchar2(4000),
            4     created_at  varchar2(4000),
            5     userid      varchar2(4000))
            6  /

          Table created.

           

          SCOTT@orcl_12.1.0.2.0> host sqlldr scott/tiger control=test.ctl log=test.log

          SQL*Loader: Release 12.1.0.2.0 - Production on Fri Dec 8 20:38:26 2017

          Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

          Path used:      Conventional
          Commit point reached - logical record count 2
          Commit point reached - logical record count 3

          Table TWEETS:
            3 Rows successfully loaded.

          Check the log file:
            test.log
          for more information about the load.

           

          SCOTT@orcl_12.1.0.2.0> column id_tweet    format a18
          SCOTT@orcl_12.1.0.2.0> column text        format a60 word_wrapped
          SCOTT@orcl_12.1.0.2.0> column created_at  format a24
          SCOTT@orcl_12.1.0.2.0> column userid      format a13 word_wrapped
          SCOTT@orcl_12.1.0.2.0> select * from tweets
            2  /

          ID_TWEET           TEXT                                                         CREATED_AT               USERID
          ------------------ ------------------------------------------------------------ ------------------------ -------------
          793418980342063104 @realDonaldTrump I am also a $ donor, i can not give much in 2016-11-01T11:46:40.000Z ReynardGosso
                             the way of $ but i do.(poor) My advice is more valuable if
                             you have used it.

          793418981415583745 @FranklinFoerThanks @realDonaldTrump youΓÇÖre doing a great  2016-11-01T11:46:40.000Z 7597144735568
                             Job helping to elect @HillaryClinton !                                                40448

          793418985458929664 This is the 5 year old Nazi German-American @realDonaldTrump 2016-11-01T11:46:41.000Z AnthonyNnani
                             that wants 2 be president. No Americans elect adults.


          3 rows selected.