Last Sunday, we had performed a EPM (Emergency Preventive Maintenance) activity on our Exadata Half Rack. This machine only has our Production instance running. We had brought up all the services successfully. However, since Monday, the first day after the maintenance activity, our load have been taking significantly higher time (almost double the time than before). We had assumed this could be an issue with the Storage Indexes and found that Exadata has scenarios wherein, after a complete cycle, the performance can be slow for the initial couple of days and later picks up once the storage indexes are built. However, we also assume it could be an issue with the load balancing, as we see that most of the requests are going to the node 2, out of the 4 nodes, which was observed even yesterday. We have scan ip enabled and all the listeners services are up and running. Appreciate if anyone can throw some light on our issue as it is very critical for us, this being our Production database.
Can you run the following as root and post the output?
dcli -l root -g dbs_group 'uptime'
crsctl status resource -t -w "TYPE = ora.database.type"
As oracle connect to different instances and compare the work load from before and after the stop start with:
Please post your findings.
Apologize for the delay. After the EPM, we had received flash disk alerts for almost 7 Flash Disks which however were resolved. Also, last Wednesday, we had observed that two of the flash disks on one of the cell nodes had thrown alerts again. We then re-enabled the cells and did a power cycle of just that particular cell node after which the performance was back to normal.
We have observed that these are the best practices to avoid any downfall in the Exadata performance after a power cycle or an Emergency Preventive Maintenance activity.
1. Not to run multiple loads at one time immediately after the power cycle. As the Exadata will be in the process of rebuilding the storage indexes, there will be lot of load on the flash disks and this may lead to the failure of the flash disks.
2. Do a smoke test immediately after the power cycle so that it can give an idea about the performance when the actual load runs.
3. Exadata might show slow performance for the initial couple of days after a power cycle, however, will gradually pick up the performance once the storage indexes are completely built.
Is it a permanent performance loss - or does it last for a relatively short time (24-48 hours)?
As you said, when you power cycle a storage cell, I believe that the storage indexes can/will (not sure, it doesn't seem consistent) get rebuilt and that does take some time. For your queries which make use of the query offloading to the cells, we've seen a significant drop in performance the following morning, but it goes back to normal the morning after that.
This has not always happened and hasn't even been 'standard' when we've bounced all the cells: two cells might be fine, but one cell sucks. I'm not sure of the way it figures it out, but I do tend to warn my user population that some of the bigger queries might see a temporary degradation in their performance for 24 hours after a power cycle.