Forum Stats

  • 3,837,472 Users
  • 2,262,260 Discussions
  • 7,900,290 Comments

Discussions

Compare table data with data in csv file

1038060
1038060 Member Posts: 4
edited Sep 4, 2013 12:57PM in General Database Discussions

Hi, I have a database table with ~32 milj rows. I want to compare the data from a csv file(~ 1500000 rows) with the data in a specific database column. And as a result I want to see what rows exists in both the csv file and db and what records in csv file are unique. The text I'm searching for is VARCHAR2. I am not allowed to export the database table so that's not an alternative. This is also a production environment so one need to be careful not disturbing the production.

Rgds J

Tagged:
1038060

Answers

  • sb92075
    sb92075 Member Posts: 42,196 Blue Diamond
    2c0eaa24-46c9-4ff7-9f01-564f37567899 wrote:
    
    Hi, I have a database table with ~32 milj rows. I want to compare the data from a csv file(~ 1500000 rows) with the data in a specific database column. And as a result I want to see what rows exists in both the csv file and db and what records in csv file are unique. The text I'm searching for is VARCHAR2. I am not allowed to export the database table so that's not an alternative. This is also a production environment so one need to be careful not disturbing the production.
    
    Rgds J
    

    How do I ask a question on the forums?

    https://forums.oracle.com/message/9362002#9362002

    Does CVS fille reside on DB Server system now?

  • DK2010
    DK2010 Member Posts: 1,542 Silver Trophy

    Hi,

    You can use External Table Option, use the select Query between your database Table(permanent table) and External Table.. find out the Result

  • 1038060
    1038060 Member Posts: 4

    No, the csv file reside on another server outside the production environment.

  • sb92075
    sb92075 Member Posts: 42,196 Blue Diamond
    2c0eaa24-46c9-4ff7-9f01-564f37567899 wrote:
    
    No, the csv file reside on another server outside the production environment.
    

    does any Oracle DB reside on this system or can you install a sandbox DB on this system?

  • 1038060
    1038060 Member Posts: 4

    There is a DB on the same system as where the csv file are.

  • sb92075
    sb92075 Member Posts: 42,196 Blue Diamond
    2c0eaa24-46c9-4ff7-9f01-564f37567899 wrote:
    
    There is a DB on the same system as where the csv file are.
    

    Then use EXTERNAL TABLE to access the data in the CSV file then use DBLINK to compare against table in Production DB.

    1038060
  • 1038060
    1038060 Member Posts: 4

    Thanks for your quick replies. I will try to figure out how to use your suggested solution. Sounds very good. Are there any risk for any performance problems with this solution? I mean, there are about 1,5 milj records to be compared with 32 milj,

    Thanks again!

  • sb92075
    sb92075 Member Posts: 42,196 Blue Diamond
    2c0eaa24-46c9-4ff7-9f01-564f37567899 wrote:
    
    Thanks for your quick replies. I will try to figure out how to use your suggested solution. Sounds very good. Are there any risk for any performance problems with this solution? I mean, there are about 1,5 milj records to be compared with 32 milj,
    Thanks again! 
    

    any & every SQL statement consume resources; some more than others.

    You can't get something for nothing.

    If the answers are to be produced, then CPU cycles & I/O operations must occur.

    The alternative is to not run any SQL to absolutely minimize the impact n the DB; but a 100% idle DB is quite useless.

  • jgarry
    jgarry Member Posts: 13,844 Gold Crown

    The big risk with a dblink solution is that it may suck all the data from the production server to your server before applying any filters.  This could mean your answers may be very slow.  You may get better performance filtering on the production server, but as has been said, that consumes resources too - though it could be less than sucking all the data over the network.  Google driving_site hint.

    Personally, I'd push to do the export, as doing it over the network moves the same number of bits as you'd move over the dblink with a query, but you'd only have to do it once, and then bang on the data locally until you get what you want.  But I can understand there's more to databases than bits.

    I'm actually doing a "not in" compare right now between two tables on my server, hogging an entire cpu, no idea when it will be done (though I can guess a couple hours).  It's the only way to answer the question of whether some missing data is a major app problem that no one noticed for years, or just some rare consequence of VM clients.  Well, I could do it in PL/SQL or another 3GL, but that would be slower.

This discussion has been closed.