Forum Stats

  • 3,727,354 Users
  • 2,245,375 Discussions
  • 7,852,756 Comments

Discussions

Reading large table using cx_Oracle

User619492-OC
User619492-OC Member Posts: 4
edited July 2020 in Python

Hi Experts,

I have requirement to export large table (100M) to CSV. I am trying to do this using cx_Oracle which is working fine(tested 10% of data). I have few questions related how it consume server resources when using SessionPool.

My code:

pool  = cx_Oracle.SessionPool(db_config.user, db_config.pw, db_config.dsn,min = 5, max = 6, increment = 1)

file = open("file.csv", "w")

output = csv.writer(file, delimiter=',', lineterminator="\n")

with pool.acquire() as con:

    cur = con.cursor()

    cur.arraysize=2000

    tmp = cur.execute('select * from 100MRowsTable')

    rows = cur.fetchmany()

My questions are below:

1. For each 2000 records(arraysize) does it establish a connection through pool ?

2. Does it establish 5 connections(initially) and read in parallel each connection 2000 records at a time ? then it reuse same connections to fetch next set of records ?

3. if I want to improve performance can I use threading=true and pass chunk of  100MRowsTable for each thread to speed up extraction process? if yes each thread need to create a separate sessionpool or it has to use same session pool ?

Appreciate your help!! Thanks!!!

User619492-OC

Best Answer

  • Anthony Tuininga-Oracle
    Anthony Tuininga-Oracle Member Posts: 37
    edited July 2020 Accepted Answer

    If this is a simple utility, there is no need to create a pool at all. Instead, you can do the following:

    conn = cx_Oracle.connect(db_config.user, db_config.pw, db_config.dsn)

    cur = conn.cursor()

    cur.arraysize = 2000

    for row in cur.execute("select * from 100MRowsTable"):

       output.writerow(row)

    The code you provided is incomplete so I can't really answer all of your questions. The pool creation in your code does indeed create 5 initial connections. The with block will acquire the connection and at the end of the block close the connection (which will return it to the pool). If you have a way to select blocks of records, you could indeed use threads. These threads could use the same pool so long as you use threaded=True when you create the pool. You can also create a standalone connection for each thread which will prevent the need for enabling threaded mode.

    Hope that helps!

    User619492-OCUser619492-OC

Answers

  • Anthony Tuininga-Oracle
    Anthony Tuininga-Oracle Member Posts: 37
    edited July 2020 Accepted Answer

    If this is a simple utility, there is no need to create a pool at all. Instead, you can do the following:

    conn = cx_Oracle.connect(db_config.user, db_config.pw, db_config.dsn)

    cur = conn.cursor()

    cur.arraysize = 2000

    for row in cur.execute("select * from 100MRowsTable"):

       output.writerow(row)

    The code you provided is incomplete so I can't really answer all of your questions. The pool creation in your code does indeed create 5 initial connections. The with block will acquire the connection and at the end of the block close the connection (which will return it to the pool). If you have a way to select blocks of records, you could indeed use threads. These threads could use the same pool so long as you use threaded=True when you create the pool. You can also create a standalone connection for each thread which will prevent the need for enabling threaded mode.

    Hope that helps!

    User619492-OCUser619492-OC
  • Christopher Jones-Oracle
    Christopher Jones-Oracle Member Posts: 1,628 Employee
    edited July 2020

    For the record there are some tuning tips in the manual Tuning cx_Oracle.

    User619492-OCUser619492-OC
  • User619492-OC
    User619492-OC Member Posts: 4
    edited July 2020

    Thank you Anthony!!

    I tried with bellow approach already which is taking longer time to export into files then switched my program to use SessionPool.

    conn = cx_Oracle.connect(db_config.user, db_config.pw, db_config.dsn)

    cur = conn.cursor()

    cur.arraysize = 2000

    for row in cur.execute("select * from 100MRowsTable"):

       output.writerow(row)

    with SessionPool my program is executing fast but before I put my code into production environment I just want to understand how sessionpool works.

    my fetch size is 2000 so when ever it fetch it uses one of the 5 connections and fetch is parallel with 5 threads ?

    can you please clarify on this ?

  • Anthony Tuininga-Oracle
    Anthony Tuininga-Oracle Member Posts: 37
    edited July 2020

    You're welcome. I can't, however, answer your question as you're not showing all of the relevant code. The snippet you showed creates a pool, acquires a connection from that pool, fetches 2000 rows and then does nothing with them! I'll need more information before I can answer your question.

  • User619492-OC
    User619492-OC Member Posts: 4
    edited July 2020

    Here is my complete code, pls help me now with my questions:

    pool  = cx_Oracle.SessionPool(db_config.user, db_config.pw, db_config.dsn,min = 5, max = 6, increment = 1)

    file = open("file7.csv", "w")

    output = csv.writer(file, delimiter=',', lineterminator="\n")

    with pool.acquire() as con:

        cur = con.cursor()

        cur.arraysize=25000

        tmp = cur.execute('select * from table')

        columns = [i[0] for i in cur.description]

        output.writerow(columns)

        while True:

            rows = cur.fetchmany()

            if not rows:

                print('Before Break')

                break

            output.writerows(rows)

    file.close()

    print("File successfully exported")

  • Anthony Tuininga-Oracle
    Anthony Tuininga-Oracle Member Posts: 37
    edited July 2020

    In your code you are only acquiring one connection from the pool. No threads are involved so it is essentially the same as the code I gave you -- except that the rows are being fetched and written in batches instead of one row at at a time. If you want to fetch in parallel you'll need a lot more code to manage that!

  • Christopher Jones-Oracle
    Christopher Jones-Oracle Member Posts: 1,628 Employee
    edited July 2020

    You might want to look at https://github.com/oracle/python-cx_Oracle/blob/master/samples/ConnectionPool.py which starts a connection pool and runs a couple of threads that execute different SQL statements.

Sign In or Register to comment.