Forum Stats

  • 3,734,283 Users
  • 2,246,938 Discussions
  • 7,857,218 Comments

Discussions

Oracle Performance on WAN

651748
651748 Member Posts: 7
edited Sep 9, 2008 6:32AM in General Database Discussions
Hi Everyone,

Good day!

I've posted an almost similar question regarding this topic. In that post, I have asked for troubleshooting methods I can take to optimize/improve the poor performance of our Oracle deployment on two sites. Unfortunately, we have not solved the case yet and our in-house Oracle DBA is almost giving up on the situation.

This time, I would like to seek your opinions on whether the below setup is "acceptable" or a workable solution. If not, I'd appreciate opinions from Oracle experts and professionals as to what direction should we take to achieve near-LAN performance on a WAN link. If anyone here has a similar setup, kindly post your current implementation setup, too.

Here's our basic setup:
=================================
Site 1:
Oracle Database Server
App Server
Clients

Site 2:
Oracle App Server
Clients

Both sites are linked via a 1mbps MPLS line (on Cisco routers).

Clients on Site 1 have good transaction performance but clients on Site 2 have very bad performance (e.g. a single transaction takes 20 minutes to complete).
=================================

From what I've read and skimmed on some articles and forum threads, to achieve a near-LAN performance for this type of setup, application of a Citrix solution or a Cisco WAAS solution are great options.

Problem is, the management is keen on utilizing the current infrastructure and having the above setup work without any purchase additions (whether hardware or software). Your inputs will help us convince the management or sell the idea that the above setup will really require add-ons (if that is really the case) that need purchasing.

Thank you very much. Any input will be of great help.

Answers

  • Satish Kandi
    Satish Kandi Member Posts: 9,627
    In your earlier thread (2669433 Charles had asked you to post some results and you had forwarded those to your DBA. There have been no updates on that thread since then.

    Could you post the outcome of the trace that was carried out?
  • JustinCave
    JustinCave Member Posts: 30,293 Gold Crown
    Have you analyzed why transactions at site 2 take 20 minutes? Unless a transaction involves moving GB of data around, that seems exceptionally high.

    How long does it take for a client on site 2 to ping the database?
    How many users are active on site 2? How much data does each one need to send & retrieve in the course of a single transaction?
    Is anything other than your application using the 1 mpbs link?

    It is reasonably common to find that applications have been designed to be overly "chatty", sending too many small packets between the app server and the database. This generally isn't crippling on a LAN because the transit time is relatively quick, but it can kill WAN performance where ping times are an order of magnitude greater. Luckily, though, fixing chatty applications tends to be easier than fixing lots of other sorts of performance problems. Sometimes, it is as simple as changing the fetch size (the number of rows the application server fetches from the database in a single round-trip), doing array binds if multiple rows are going to be inserted, and moving data-intensive logic into stored procedures where data doesn't have to travel at all. There may also be benefits from tuning network settings to bump up the SDU.

    Justin
  • 651748
    651748 Member Posts: 7
    edited Sep 9, 2008 5:43AM
    Hi Satish,

    Unfortunately, our DBA is currently at the client's site so I cannot get the details from him. But I will post it here once I get a hold of him.

    [Edit: I got a hold of him and he said he lost the trace file already. But I asked him to do another trace 10046 for this purpose. I'll post it here when done.]

    While doing the testing, we did capture the packets from the client machine using Wireshark. Will the capture packet file help?

    Thanks.

    Edited by: user648745 on Sep 9, 2008 2:41 AM
  • Billy Verreynne
    Billy Verreynne Member Posts: 28,280 Red Diamond
    Clients on Site 1 have good transaction performance but clients on Site 2 have very bad performance (e.g. a single transaction takes 20 minutes to complete).
    So? This is describing the symptoms. I do not see any problem description here.

    Why is site 2 slow? Is it network latency? Is it due to a significant amount of packet drops and retransmits? Perhaps it is because of the QoS used for the 1521/tcp traffic that runs in a lower DiffServ class on the router, where other traffic gets bumped into a higher class? Maybe it is sun spots? Or solar flares?

    It can be literally anything. And not just networking. So why is string 2 shorter than string 1? No idea.

    What are the wait states like for the Oracle server sessions serving site 2? Has a network sniff and analysis being done at site 2? Has the router MIBs and router config for site been checked for anomalies?

    Problems cannot be solved on identified symptoms only. The #1 rule in software engineering is to identify the actual problem/requirement. I suggest that you do exactly that.
  • 651748
    651748 Member Posts: 7
    edited Sep 9, 2008 3:15AM
    Hi Justin,

    The transaction involves adding and fetching records only and does not involve movement of GB data.

    Ping from the client to the database takes 58ms at an average (using 32 bytes). Is this an acceptable ping time?

    During the testing, we have stopped all network activities utilizing the pipe. The database transaction between the client and the database alone used an average of 300 kbps on the MPLS link (PRTG report).

    A "show interface" entry in Site 1's router has this line: "Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 718". The number of drops was from the transaction testing alone (i.e. we did a "clear counters" before the test).

    As I've said in the previous post, we've done a packet capture on the client machine. It's just that I don't know what to look for and analyze in the trace file.

    Edited by: user648745 on Sep 9, 2008 12:11 AM
  • 652656
    652656 Member Posts: 122
    edited Sep 9, 2008 3:46AM
    Hi,

    U can check the 2 networks speed by downloading a 10mb file to the cleint machine1,then check on cleint machine2,cleint machine3 from 2 networks use FTP for this no other copy tools,I am telling u this because sometime the client machines can also be the culprit,The network speed on one client machine network adapter will be fast and the otherone is slow.

    Regards

    Kaunain
  • 651748
    651748 Member Posts: 7
    Hi Kaunain,

    We''ve tried this as well and we saw no problems in transferring files (either through FTP copy or network share copy) between different machines.

    We've also suspected that maybe it's the client machine itself so we have tried using different test machines (different database machine on Site 1 and different application and client machines on Site 2; same Oracle installation and configuration). Still, the same slow performance existed.
  • Ganadeva
    Ganadeva Member Posts: 118
    what is the os?

    Are the network settings optimum at the OS level?
  • 651748
    651748 Member Posts: 7
    Hi Ganadeva,

    The database server is running on AIX v5R3. The app server is running Windows 2003. The client machine is running Windows XP.

    Can you detail what you mean when you said network settings optimum at the OS level?

    Thanks.
  • 652656
    652656 Member Posts: 122
    Hi,

    can u do this and check tcp_ip properties-->internet protocol(TCP/IP)--->configure--->advanced--->speed&duplex

    make it to 100MB full and also check auto both on application server machine and clients

    Regards

    Kaunain
  • Ganadeva
    Ganadeva Member Posts: 118
    edited Sep 9, 2008 5:24AM
    We had faced a situation on a aix db server where the network card was set for 100Mbps transmission. The network card had a capacity for 1 Gbps transmission. The oversight was detected during troubleshooting for slow database access, as the syasdmin had kept the card setting intentionally to a lower value for some downtime testing activity and had forgotten to switch it back.

    There were a few other OS level settings for the network that were actually at some default values and needed some configuration. There should be some option within 'smit' tool for the network. Either check out the man pages for the respective sub-menu operations or get a confirmation from an experienced Aix Admin.
  • Billy Verreynne
    Billy Verreynne Member Posts: 28,280 Red Diamond
    We''ve tried this as well and we saw no problems in transferring files (either through FTP copy or network share copy) between different machines.
    Each test like this only tests a specific component of connectivity and path between client and server. A FTP or netcopy cannot be used as an indication of the performance and thru-put of another client application.

    The router can implement DiffServ (more and more common these days) that will make different network applications behave differently. Firewalls can implement different rules, and one rule set can be slower than another. Sizes of the packets send can have a huge effect on performance (and CPU).

    If a client is a typical web client, it will make use of stateless session connections to the server. In such a case, there can be a big difference between using a shared server versus a dedicated server.

    The service dealing with that network service (accepting connections) can make user of reverse DNS lookup. This again can be a large overhead.

    It could be a routing issue. It could be a bad cable. It could be a bad hub or switch.

    Random tests like running pings (ICMP is a very different protocol than TCP), or copying data, and so on, is not going to assist in isolating the actual performance issue.

    You need to identify the actual network topology with components between client and server. Identify all software layers between client and server. Then draw up a test plan to isolate each of these and test each one in turn. Or else it is like dealing with a needle in a haystack.
This discussion has been closed.