5 Replies Latest reply: May 23, 2007 8:10 PM by Simon Greener RSS

    Oracle Spatial Performance with 10-20.000 users

    Pat
      Does anyone have any experience when Oracle Spatial is used with say 20.000 concurrent users. I am not interested in MapViewer response time, but lets say there is:
      - an app using 800 different tables each having an sdo_geometry column
      - the app is configured with different tables visible on different view scales
      - let's say an average of 40-50 tables is visible at any given time
      - some tables will have only a few records, while other can hold millions.
      - there is no client side caching
      - clients can zoom in/out pan.

      Anwers I am interested in:
      - What sort of server would be required
      - How can Oracle serve all that data (each Refresh renders the map and retrieves the data over the wire as there is no client side caching).
      - What sort of network infrastructure would be required.
      - Can clients connect to different servers and hence use load balancing or does Oracle have an automatic mechanism for that?

      Thanks in advance,

      Patrick
        • 1. Re: Oracle Spatial Performance with 10-20.000 users
          Luiz Soares
          Hi,

          My application use 500 different tables each having sdo_geometry column. 50 tables has millions records, we use Autodesk Mapguide 6.5 and have just 40 concurrent users.

          We are facing some performance problem, but just with our web server (Tomcat) and with Mapguide. Oracle Spatial is always stable.

          Best regards,

          Luiz
          • 2. Re: Oracle Spatial Performance with 10-20.000 users
            23154
            Spatial is very resource intensive, both CPU and Disk (lots of random I/O), especially when large tables (> 1M records) and especially when the queries are likely to return a lot of data (not very selective).

            It seems that spatial queries use the spatial index to do first pass checks relatively quickly, but then retrieve the data records one at a time via rowid, which can be expensive (forces random I/O) if lots of rows must be returned. The exact behavior probably depends on what functions are being used.

            My app has lots of data but few users and performance is a major issue, but for the most part is usable for our purpose.

            I'd suggest doing some tests, and determining how your your app behaves under load.

            You can probably go to RAC to distribute CPU loads, but random I/O is a bit harder to deal with if it is a bottleneck.

            --Peter                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
            • 3. Re: Oracle Spatial Performance with 10-20.000 users
              510467
              re: -1

              You might want to check out the Pro Oracle Spatial Book by Ravi Korhuri and Albert Godfrind.

              Chapter 14 has a section on reorganizing data to minimize the random I/O that seems to be effecting your performance.

              Hope this helps, and kind regards,

              Dan
              • 4. Re: Oracle Spatial Performance with 10-20.000 users
                33450
                Does anyone have any experience when Oracle Spatial
                is used with say 20.000 concurrent users. I am not
                interested in MapViewer response time, but lets say
                there is:
                - there is no client side caching
                There is your problem. Efficient systems cache at multiple levels, and are fast.

                For our new Java Web Start viewer/editor we cache at the db, middle-tier, and client. For items that are read-only (reference data such as streets, buildings, etc.) the cache is persistent. For editable items, the cache is refreshed only when required. Data is only pulled when needed, at the resolution required to meet the screen detail.

                The effect is that we get about 10-20 frames per second, not 1 screen per few seconds. Scalability is simple, as the client takes almost all of the load (more like Google Earth). Oracle Spatial, even with workspace manager, is not the bottleneck in the system. That would be firewalls along the network path (delay), but that is still very manageable with async systems.
                Anwers I am interested in:
                - What sort of server would be required
                Hard to say, we run a few hundred users on one four-code Opteron Sun box, the database rarely goes above idle.
                - How can Oracle serve all that data (each Refresh
                renders the map and retrieves the data over the wire as there is no client side caching).
                Again, unless there is some reason the client cannot cache - this is your problem.
                - What sort of network infrastructure would be
                required.
                Again, that depends. We get away with very little bandwidth, since we strive to only send data once. But then, we cache heavily.
                - Can clients connect to different servers and hence use load balancing or does Oracle have an automatic
                mechanism for that?
                The way we do it, yes. In fact, we use a load balancer in front of the app servers so that everything has the same url, so as far as the app is concerned, it is always the same server.

                Bryan
                • 5. Re: Oracle Spatial Performance with 10-20.000 users
                  Simon Greener
                  Patrick, et al.

                  There are lots of things one can do to improve performance in mapping environments because of a lot of the visualisation is based on "background" or read-only data. Here are some "tips":

                  1. Spatially sort read-only data.

                  This tip makes sure that data that is close to each other in space are next to each other on disk! Dan gave a good suggestion when he referenced Chapter 14, "Reorganize the Table Data to Minimize I/O" pp 580- 582, Pro Oracle Spatial. But just as easily one can create a table as select ... where sdo_filter() where the filtering object is an optimized rectangle across the whole of the dataset. (This is quite quick on 10g and above but much slower on earlier releases.)

                  When implementing this make sure that the created table is created such that its blocks are next to each other in the tablespace. (Consider tablespace defragmentation beforehand.) Also, if the data is READ ONLY set the PCTFREE to 0 in order to pack the data up into as small a number of blocks as possible.

                  2. Generalise data

                  Rendering spatial data can be expensive where the data is geometrically detailed (many vertices) esp where the data is being visualised at smaller scales than it was captured at. So, if your "zoom thresholds" allow 1:10,000 data to be used at 1:100,000 then you are going to have problems. Consider pre-generalising the data (see sdo_util.simplify) before deployment. You can add multiple columns to your base table to hold this data. Be careful with polygon data because generalising polygons that share boundaries will create gaps etc as the data is more generalised. Often it is better to export the data to a GIS which can maintain the boundary relationships when generalising (say via topological relationships).

                  Oracle's MapViewer has excellent on-the-fly generalisation but here one needs to be careful. Application tier caching (cf Bryan's comments) can help here a lot.

                  3. Don't draw data that is sub-pixel.

                  As one zooms out objects become smaller and smaller until they reach a point where the whole object can be drawn within a single pixel. If you have control over your map visualisation application you might want to consider setting the SDO_FILTER parameter "min_resolution" flag dynamically so that its value is the same as the number of meters / pixel (eg min_resolution=10). If this is set Oracle Spatial will only include spatial objects in the returned search set if one side of a geometry's MBR is greater than or equal to this value. Thus any geometries smaller than a pixel will not be returned. Very useful for large scale data being drawn at small scales and for which no selection (eg identify) is required. With Oracle MapViewer this behaviour can be set via the generalized_pixels parameter.

                  3. SDO_TOLERANCE, Clean Data

                  If you are querying data other than via MBR (eg find all land parcels that touch each other) then make sure that your sdo_tolerance values are appropriate. I have seen sites where data captured to 1cm had an sdo_tolerance value set to a millionth of a meter!

                  A corollary to this is make sure that all your data passes validation at the chosen sdo_tolerance value before deploying to visualisation. Run sdo_geom.validate_geometry()/validate_layer()...

                  4. Rtree Spatial Indexing

                  At 10g and above lots of great work went in to the RTree indexing. So, make sure you are using RTrees and not QuadTrees. Also, many GIS applications create sub-optimal RTrees by not using the additional parameters available at 10g and above.

                  4.1 If your table/column sdo_geometry data contains only points, lines or polygons then let the RTree indexer know (via layer_gtype) as it can implement certain optimizations based on this knowledge.

                  4.2 With 10g you can set the RTree's spatial index data block use via sdo_pct_free. Consider setting this parameter to 0 if the table/column sdo_geometry data is read only.

                  4.3 If a table/column is in high demand (eg it is the most commonly used table in all visualisations) you can consider loading (a part of) the RTree index into memory. Now, with the RTree indexing, the sdo_non_leaf_tbl=true parameter will split the RTree index into its leaf (contains actual rowid reference) and non-leaf (the tree built on the leaves) components. Most RTrees are built without this so only the MDRT*** secondary tables are built. But if sdo_non_leaf_tbl is set to true you will see the creation of an additional MDNT*** secondary table (for the non_leaf part of the rtree index). Now, if appropriate, the non_leaf table can be loaded into memory via the following:

                  ALTER TABLE MDNT*** STORAGE(BUFFER_AREA KEEP);

                  This is NOT a general panacea for all performance problems. One should investigate other options before embarking on this (cf Tom Kyte's books such as Expert Oracle Database Architecture, 9i and 10g Programming Techniques and Solutions.)

                  4.4 Don't forget to check your spatial index data quality regularly. Because many sites use GIS package GUI tools to create tables, load data and index them, there is a real tendency to not check what they have done or regularly monitor the objects. Check the SDO_RTREE_QUALITY column in USER_SDO_INDEX_METADATA and look for indexes with an SDO_RTREE_QUALITY setting that is > 2. If > 2 consider rebuilding or recreating the index.

                  5. The rendering engine.

                  Whatever rendering engine one uses make sure you try and understand fully what it can and cannot do. AutoDesk's MapGuide is an excellent product but I have seen it simply cache table/column data and never dynamically access it. Also, I have been at one site which was running Deegree and MapViewer and MapViewer was so fast in comparison to Deegree that I was called in to find out why. I discovered that Deegree was using SDO_RELATE(... ANYINTERACT ...) for all MBR queries while MapViewer was using SDO_FILTER. Just this difference was causing some queries to perform at < 10% of the speed of MapViewer!!!!

                  6. Consider "denormalising" data

                  There is an old adage in databases that is "normalise for edit, denormalise for performance". When we load spatial data we often get it from suppliers in a fairly flat or normalised form. In consort with spatial sorting, consider denormalising the data via aggregations based on a rendering attribute and some sort of spatial unit. For example, if you have 1 million points stored as single points in SDO_GEOMETRY.SDO_POINT which you want to render by a single attribute containing 20 values, consider aggregating the data using this attribute AND some sort of spatial BUCKET or BIN. So, consider using SDO_AGGR_UNION coupled with Spatial Analysis and Mining package functions to GROUP the data BY <<column_name>> and a set of spatial extents.

                  6. Tablespace use

                  Finally, talk to your DBA in order to find out how the oracle database's physical and logical storage is organised. Is a SAN being used or SAME arranged disk arrays? Knowing this you can organise your spatial data and indexes using more effective and efficient methods that will ensure greater scalability.

                  7. Network fetch

                  If your rendering engine (app server) and database are on separate machines you need to investigate what sort of fetch sizes are being used when returning data from queries to the middle-tier. Fetch sizes for attribute only data rows and rows containing spatial data can be, and normally are, radically different. Accepting the default settings for these sizes could be killing you (as could the sort_area_size of the Oracle session the application server has created on the database). For example I have been informed that MapInfo Pro uses a fixed value of 25 records per fetch when communicating with Oracle. I have done some testing to show that this value can be too small for certain types of spatial data. SQL Developer's GeoRaptor uses 100 which is generally better (but this one can modify this). Most programmers accept defaults for network properties when programming in ADO/ODBC/OLEDB/JDBC: just be careful as to what is being set here. (This is one of the great strengths of ArcSDE: its TCP/IP network transport is well written, tuneable and very efficient.)

                  8. Physical Format

                  Finally, while Oracle's excellent MapViewer requires data its spatial data to be in Oracle, other commercial rendering engines do not. So, consider using alternate, physical file formats that are more optimal for your rendering engine. For example, Google Earth Enterprise "compiles" all the source data into an optimal format which the server then serves to Google Earth Enterprise clients. Similarly, a shapefile on local disk to the application server (with spatial indexing) may be faster that storing the data back in Oracle on a database server that is being shared with other business databases (eg Oracle financials). If you don't like this approach and want to use Oracle only consider using a dedicated Oracle XE on the application server for the data that is read only and used in most of your generated maps eg contour or drainage data.

                  Just some things to think about.

                  regards
                  Simon