Forum Stats

  • 3,769,993 Users
  • 2,253,045 Discussions
  • 7,875,263 Comments

Discussions

Arraysize over 5000

limor
limor Member Posts: 38
edited Sep 20, 2016 9:35AM in Python

Hi,

I have created a webserver with python and cx_Oracle.

But now I have a problem to fetch the data. Sometimes my tiles have more than 5000 output rows. So in the cx_Oracle interface it is written that the default value for arraysize is 50. If I increase the value of 10.000 (cursor.arraysize = 10000) it works fine. But a value lower than 5000 needs milliseconds and a value bigger than 5000 needs some seconds. You can see the results in the picture

!overview.JPG

Furthermore it is written in the Oracle Docu:

SET ARRAY[SIZE] {15 | n}

Sets the number of rows, called a batch, that SQL*Plus will fetch from the database at one time.Valid values are 1 to 5000. A large value increases the efficiency of queries and subqueries that fetch many rows, but requires more memory.

Does anyone have experience with sql queries more than 5000 output rows by using webframeworks (web.py, ...)???

So I think the results are regarding to this that over 5000 is not a valid value so the execution time is increased strongly...Is this right?

Is there a way to avoid this ?

Best Answer

  • limor
    limor Member Posts: 38
    edited Sep 20, 2016 9:34AM Accepted Answer

    I found out that the solution is to select the SDO_ORDINATE array instead the hole SDO_GEOMETRY object.

    cursor.execute(

                """SELECT a.id , c.geometry.sdo_ordinates, d.Classname FROM   building a, THEMATIC_SURFACE b, SURFACE_GEOMETRY c, OBJECTCLASS d  WHERE  a.grid_id_500 = 516 AND a.id = b.BUILDING_ID AND b.LOD2_MULTI_SURFACE_ID = c.ROOT_ID AND c.GEOMETRY IS NOT NULL AND b.OBJECTCLASS_ID = d.ID""")

    It seems that there is quite a bit of data on the SDO_GEOMETRY object that you don't require but were dragging down to the client.

    So all in all my web service needs only 0.7 seconds instead of 11 seconds...

«1

Answers

  • Gaz in Oz
    Gaz in Oz Member Posts: 3,785 Bronze Crown
    edited Aug 30, 2016 1:38AM

    Hi limor,

    Can you explain exactly what your table of results is showing, as I am a bit confused as to whether it is showing a fixed arraysize of 10000 and a varying recordset size, or what?

    Not sure what you think the relevance of ARRAYSIZE in SQL*Plus has to do with your python program issue, although it is worth noting from:

    https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_eight.htm#i1037682

    SET ARRAYSIZE

    Sets the number of rows that SQL*Plus will fetch from the database at one time. Valid values are 1 to 5000.The effectiveness of setting ARRAYSIZE depends on how well Oracle Database fills network packets and your network latency and throughput. In recent versions of SQL*Plus and Oracle Database, ARRAYSIZE may have little effect. Overlarge sizes can easily take more SQL*Plus memory which may decrease overall performance.

    Tuning arraysizes for programs is an iterative process, in that you need to find the most acceptable size for your program to be able to process rows as efficiently as possible. Try varying sizes.

    Here's a bit of trivia:

    SQL*Plus ARRAYSIZE default is 15 as you have mentioned. As can be seen from the Oracle documentation, changing it can have adverse effects.

    PL/SQL implicit cursor arraysize default is 100 in 10g and above. (Using BULK COLLECT and the LIMIT clause can be used to specify a different size).

    cx_oracle cursor.arraysize defaults to 50.

    Your post would suggest that setting cursor.arraysize over 5000 isn't helping your program, so set it to something less.

    Is there a way to avoid this ?

    ...and what is it you are actually asking about trying to avoid?

    Increasing the array size to hold more records to prcess will increase memory usage as more records can be held in the array. This may or may not help performance.

    Cheers,

    Gaz.

  • limor
    limor Member Posts: 38
    edited Aug 30, 2016 8:08AM

    Yes the image is showing a fixed arraysize (10.000). What I want to show you is the time difference. If I do a request with more than 5000 output ( e.g. 500m_Tile_C) rows it needs seconds (10.1s) and if I do a query with less than 5000 ouput rows (500m_Tile_E) it only needs miliseconds (0.326s). So there is a realy big time difference focused on 5000 output rows.

    Yeah at the beginning I thought this is regarding to the limit of the arraysize which is described in the oracle docu. But this is only regarding to the sql* plus. So this is not my problem.

    I need the as yet unreleased version of cx_Oracle and web.py for my webservice. So the problem is the default setting is arraysize = 50 in cx_Oracle. But I have more than 50 and so I have to increase the arraysize. Otherwise I get an internal server error. I wrote Anthony Thuning which is working on cx_Oracle and he wrote this is a known issue at the moment. The only way is to work around this with increasing the arraysize.

    Nevertheless this works fine for a query with less than 5000 rows. But for queries with more than 5000 rows I need as you can see in the image much more time.

    I don´t know why this is happening?

  • Gaz in Oz
    Gaz in Oz Member Posts: 3,785 Bronze Crown
    edited Aug 30, 2016 10:32AM

    OK, so an arraysize set to 10,000 with 0ver 5000 records in the array, "things" slow down. How about with an arraysize of 100, 200, 500, 1000, 2000, 5000 operating on the record set of over 5000 records?

    Sounds like your solution, if no arraysize larger than 50 works, is to wait for the release of web.py and the new cx_oracle.

  • limor
    limor Member Posts: 38
    edited Aug 30, 2016 10:32AM

    I can´t everytime if I set arraysize lower then the number of output rows of my query I get the Internal Server Error:

    <class 'cx_Oracle.DatabaseError'> at /grid

    ORA-21500: Interner Fehlercode, Argumente: [%s], [%s], [%s], [%s], [%s], [%s], [%s], [%s]

  • Gaz in Oz
    Gaz in Oz Member Posts: 3,785 Bronze Crown
    edited Aug 30, 2016 10:53AM
    SQL> oerr ORA-2150021500, 00000, "internal error code, arguments: [%s], [%s], [%s], [%s], [%s], [%s], [%s], [%s]"// *Cause:  This is the generic error number for the OCI//          environment (client-side) internal errors.  This indicates//          that the OCI environment has encountered an exceptional//          condition.// *Action: Report as a bug - the first argument is the internal error number.

    could it be your are mishandling the record set records when it is greater than the record set size? To be honest, it could be anything. You seem to be looking for a work around for this generic error, when it sounds like you've been told a fix will be in the next release...

  • limor
    limor Member Posts: 38
    edited Aug 30, 2016 11:08AM

    So I got the information from the developer "The error you got is an internal error. There is a bug that is causing this to occur. The only workaround at the moment is to adjust the array size or reduce the number of rows that you are querying at one time". It seems this is a bug which will be solved in the futute. But sadly I do not know when. I need to finish this project in the near future.

  • Gaz in Oz
    Gaz in Oz Member Posts: 3,785 Bronze Crown
    edited Aug 30, 2016 11:47AM

    Right, so you want a work around to some python code, that uses arrays's in some way for some sort of sql query... and you've tried an arraysize of 50 that fails with ORA-21500 as the record set is larger than that. It also fails with an arraysize that is smaller than the recordset size, so the proposed workaround you've tried is to create an arraysize larger than the record set size.

    Based on that, set your arraysize to, say 5000, and make your query only return, at most, 5000 every time it is called. So in effect, manage your array size filling by managing your query to always select less than the array size and set your arraysize to a value where you know is performant. You could even set it to 2312 as selecting 2311 rows was very fast.

    Manage the record set size fetched in to the array by limiting the record set size returned by the sql query,to wholy fit within the array to avoid ORA-21500.

    Can you post a simple working example showing the error?

    ...and how about some versions information:

    cx_Oracle

    python

    database (SELECT * FROM v$version)

  • limor
    limor Member Posts: 38
    edited Aug 30, 2016 2:40PM

    So in the code below you can see my webservice. How can I manage my arraysize like you told by managing my query to always select less than the array size?

    I am using Oracle 11g R2 with the Locator Extension, Python 2.7 and cx_Oracle the unreleased version ( https://bitbucket.org/anthony_tuininga/cx_oracle ) .

    import cx_Oracle

    import json

    import web

    urls = (

        "/", "index",

        "/grid", "grid",

    )

    app = web.application(urls, globals())

    web.config.debug = True

    connection = cx_Oracle.Connection("TEST_3D/[email protected]:1521/sdetest", )

    typeObj = connection.gettype("MDSYS.SDO_GEOMETRY")

    class grid:

        def GET(self):

            web.header('Access-Control-Allow-Origin', '*')

            web.header('Access-Control-Allow-Credentials', 'true')

            web.header('Content-Type', 'application/json')

            cursor = connection.cursor()

            cursor.arraysize = 5000# default = 50

            cursor.execute(

                """SELECT a.id , c.geometry, d.Classname FROM   building a, THEMATIC_SURFACE b, SURFACE_GEOMETRY c, OBJECTCLASS d  WHERE  a.grid_id_500 = 516 AND a.id = b.BUILDING_ID AND b.LOD2_MULTI_SURFACE_ID = c.ROOT_ID AND c.GEOMETRY IS NOT NULL AND b.OBJECTCLASS_ID = d.ID""")

            result = []

      for id, geometry, classname in cursor:

                mo = geometry.SDO_ORDINATES.aslist()

                result.append({"building_nr":id,"geometry":{"type":"polygon","coordinates":zip(*[iter(mo)] * 3),},"polygon_typ":classname,})

      return json.dumps(result)

    if __name__ == "__main__":

        app.run(web.profiler)

  • Gaz in Oz
    Gaz in Oz Member Posts: 3,785 Bronze Crown
    edited Aug 31, 2016 12:12AM

    One way would be to use either

    .Cursor.fetchmany(<num_rows>)

    Cursor Object — cx_Oracle 5.2.1 documentation

    ...or perhaps Cursor.fetchraw()

    Cursor Object — cx_Oracle 5.2.1 documentation

    to fetch the rows and take control of the record set that way.

    This document may also be of help

    Mastering Oracle+Python, Part 1: Querying Best Practices

    So your code would change to process batches of <num_rows> in a loop. (Where <num_rows> would be 5000 or less as> 5000 seems to be the issue).

    ...and you mention 2 tables. It would be nice if you posted "CREATE TABLE" and "INSERT INTO" scripts as without those it is impossible to replicate your error/s.

    ...and python 2.7 what 2.7.0, 2.7.11?

    python -V

    Have ave you read this Oracle article?

    Integrating Oracle Spatial with Google Earth

    Very interesting and might be also useful to you.

  • limor
    limor Member Posts: 38
    edited Aug 31, 2016 8:48AM

    thanks for your information

    I am using in detail:

    -  Oralce 11.2.0.3.0 64bit version

    -  Python 2.7.10

    I tried the following:

    1.) fetchmany(size)

    cursor.arraysize = 2000

    cursor= cursor.fetchmany(2000)

    -> it is working I get 2000 rows for a tile with more than 2000 rows

    2.) fetchraw()

    cursor= cursor.fetchraw()

    -> it is not working. I get the error message:

    <type 'exceptions.TypeError'> at /grid

    'int' object is not iterable

    3.) rownum

    add at the end of the sql querie "WHERE ROWNUM <= 10;"

    -> I get onle the first 10 rows

    -> this is working

    Nevertheless I have the following situatuion:

    Inside my databse are hundreds of tiles. Some tiles are less than 5000 rows and few tiles have more than 5000 rows as output.

    With the following solutions like fetchmany or rownum I can limit the number of output rows. This is working. But at the end I need all rows and not only the first 2000 rows.

    So maybe I have to create a batch that every time a tile have more than 5000 rows I have to query two times for exmaple. You now what I mean?

    I did some tests on my webservice and it seems to me that the problem is in creating the list of coordinates and the Iteration on this.

    For example:

    1.) If I only execute the query (cursor.execute(sql) and print the result the browser needs around 0,8s

    2.) If I do the for-loop this way:

    for id, geometry, classname in cursor:

                result.append({"building_nr":id,"geometry":{"type":"polygon","coordinates":geometry,},"polygon_typ":classname,})

    retunr result

    I need 0,2 seconds

    3.) But If I create a List and iterate over this list:

      for id, geometry, classname in cursor:

                mo = geometry.SDO_ORDINATES.aslist()

                result.append({"building_nr":id,"geometry":{"type":"polygon","coordinates":zip(*[iter(mo)] * 3),},"polygon_typ":classname,})

      return json.dumps(result)

    I need 11 seconds!!! What is wrong? Is this time difference normal? I sadly I need the coordinates so I have to use the last one...I think the most time is for creating the list. Is there a workaround?

This discussion has been closed.