This discussion is archived
3 Replies Latest reply: Jun 27, 2013 3:31 PM by marksmithusa RSS

Exadata performance issue after Power Cycle Maintenance. Seems to be a Load Balance Issue

f4aac746-fd2c-4eec-a714-c40e5bf10112 Newbie
Currently Being Moderated

Hello All,

 

Good Day,

 

Last Sunday, we had performed a EPM (Emergency Preventive Maintenance) activity on our Exadata Half Rack. This machine only has our Production instance running. We had brought up all the services successfully. However, since Monday, the first day after the maintenance activity, our load have been taking significantly higher time (almost double the time than before). We had assumed this could be an issue with the Storage Indexes and found that Exadata has scenarios wherein, after a complete cycle, the performance can be slow for the initial couple of days and later picks up once the storage indexes are built. However, we also assume it could be an issue with the load balancing, as we see that most of the requests are going to the node 2, out of the 4 nodes, which was observed even yesterday. We have scan ip enabled and all the listeners services are up and running. Appreciate if anyone can throw some light on our issue as it is very critical for us, this being our Production database.

 

 

Kindest Regards,

Vikram.

  • 1. Re: Exadata performance issue after Power Cycle Maintenance. Seems to be a Load Balance Issue
    tychos Expert
    Currently Being Moderated

    Hi Vikram,

    Can you run the following as root and post the output?
    dcli -l root -g dbs_group 'uptime'
    crsctl status resource -t -w "TYPE = ora.database.type"

    As oracle connect to different instances and compare the work load from before and after the stop start with:
    awrddrpt.sql (OH/rdbms/admin).
    Please post your findings.

    Regards,
    Tycho

  • 2. Re: Exadata performance issue after Power Cycle Maintenance. Seems to be a Load Balance Issue
    f4aac746-fd2c-4eec-a714-c40e5bf10112 Newbie
    Currently Being Moderated

    Hello Tychos,

     

    Apologize for the delay. After the EPM, we had received flash disk alerts for almost 7 Flash Disks which however were resolved. Also, last Wednesday, we had observed that two of the flash disks on one of the cell nodes had thrown alerts again. We then re-enabled the cells and did a power cycle of just that particular cell node after which the performance was back to normal.

     

    We have observed that these are the best practices to avoid any downfall in the Exadata performance after a power cycle or an Emergency Preventive Maintenance activity.

     

    1. Not to run multiple loads at one time immediately after the power cycle. As the Exadata will be in the process of rebuilding the storage indexes, there will be lot of load on the flash disks and this may lead to the failure of the flash disks.

    2. Do a smoke test immediately after the power cycle so that it can give an idea about the performance when the actual load runs.

    3. Exadata might show slow performance for the initial couple of days after a power cycle, however, will gradually pick up the performance once the storage indexes are completely built.

     

    Kind Regards,

    Vikram.

  • 3. Re: Exadata performance issue after Power Cycle Maintenance. Seems to be a Load Balance Issue
    marksmithusa Journeyer
    Currently Being Moderated

    Is it a permanent performance loss - or does it last for a relatively short time (24-48 hours)?

     

    As you said, when you power cycle a storage cell, I believe that the storage indexes can/will (not sure, it doesn't seem consistent) get rebuilt and that does take some time. For your queries which make use of the query offloading to the cells, we've seen a significant drop in performance the following morning, but it goes back to normal the morning after that.

     

    This has not always happened and hasn't even been 'standard' when we've bounced all the cells: two cells might be fine, but one cell sucks. I'm not sure of the way it figures it out, but I do tend to warn my user population that some of the bigger queries might see a temporary degradation in their performance for 24 hours after a power cycle.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points