Every DBA has at some point asked: “What the heck happened to my database?” In this blog, we’ll review some of the more common scenarios where some little detail slips through the proverbial crack and that often results in some kind of database anomaly which momentarily raises tensions and blood pressures.
Supposedly nothing is new or has changed, yet without any obvious reason, database performance has suddenly noticeably slowed. Some possible causes and effects to look for:
This last issue has one more wrinkle to be aware of, virtualization. Now the DBA might not be able to as easily identify resource issues due to the shared nature of the resources adding an additional layer of complexity. Moreover, the virtualization infrastructure might further complicate proper forensic analysis. Assume that your database server virtual machine (VM) can be dynamically relocated to overcome some resource imbalances. Now imagine that a performance problem occurs this month, but that the dynamic VM relocation masks the impact of the issue. Thus, when the problem becomes perceptible the first question might well be when did it really first occur? It may have been a problem for weeks or months.
Now let’s move that same database server to the cloud. Not only do you have the same virtualization related concerns about problems potentially being masked, but now there could be questions about whether we have purchased the correctly sized cloud components from the ever-growing catalog of deployment options available. Moreover, are we on shared, elastic disk storage when we should have maybe more properly chosen dedicated SSD or NVMe storage? What about network latency, are we split across regions or availability zones for an application whose nature requires something different? One could argue that one of the most useful artificial intelligence or machine learning applications the market needs is a cloud monitor, governor and advice tool. There simply aren’t enough knowledgeable experts on proper cloud provisioning. Most people are essentially stumbling their way through given that it’s so easy to scale up when wrong – it just costs more money.
Don’t get me wrong, I love the cloud. I am just smart enough to know that I don’t know all the answers and there are far fewer people who do at the current time than we really need. Not even the big cloud providers can hire enough people fast enough and train them to fill this gap. Therefore, for the next few years we are probably in for some interesting times. I think Robert Kennedy’s quote most apropos:
Like it or not, we live in interesting times. They are times of danger and uncertainty, but they are also the most creative of any time in the history of mankind. And everyone here will ultimately be judged - will ultimately judge himself - on the effort he has contributed to building a new world society and the extent to which his ideals and goals have shaped that effort.
It’s not uncommon to observe rampant over reliance of database application code on TEMP tables, most often as a SQL language programming crutch. I have a theory as to why. Programmers are chiefly taught record oriented or “record at a time” processing logic as for accessing data. Some lucky students might experience some exposure to a set oriented language such as LIS, but usually insufficient to adequately prepare them for SQL’s set based foundation. As such they often muddle along as best they can – which most often means liberally using TEMP tables. Now while there are legitimate cases where a TEMP table makes sense and is required, it should be the exception rather than the rule. If nothing else, it should at least be the choice of last resort after attempting to accomplish the task within a Single SQL statement.
I also believe that the way programmers think is more about “how” vs. “what”. You’ve most likely heard the theory about “right-brained” vs “left-brained. I content that “how” is more analytical and methodical thinking and thus left-brained. Therefore “what” is more creative or artistic, and therefore right-brained. While this observation is more opinion than fact, evidence seems to fit such a conclusion. For now let’s accept that premise and see how it might compound this problem of over reliance on TEMP tables.
The business person might state a problem like this, what percentage of our employees are making 20% more than their department average in our largest 10% of departments? But the programmer tends to dissect that into several smaller problems:
There’s nothing wrong with dissecting a problem per se. Remember we all learned about the advantages of the “divide and conquer” approach to problem solving and coding. The main problem is that a “record at a time” attitude will quite often result in multiple SQL statements with intermediate results stored in TEMP tables to be passed on to the following SQL statement. You might argue “So what as the correct result is returned and that’s what matters?” However by subdividing the original problem into multiple smaller ones and thus multiple SQL statements, the database SQL query optimizer can only formulate execution plans for those smaller SQL statements rather than potentially finding a superior approach when the whole problem is presented as a single SQL statement. Moreover creating TEMP tables will consume disk space which when later dropped could end up fragmenting your storage space. Plus you may need to create indexes on your TEMP tables to duplicate what may already exist on the table, so once again it could exasperate your storage issues.
I used a simple data model with just two tables as shown here.
The single SQL statement query to obtain the answer that I came up with for SQL Server was as follows (there could of course be other answers which might be superior – so please comment your ideas):
with num_emps as (select count(eno) as xcnt from emp),
avg_dsal as (select dno, count(eno) as xcnt, avg(salary) as xavg
from emp group by dno),
dep_data as (select count(*) as xcnt
from emp join avg_dsal on emp.dno=avg_dsal.dno
and avg_dsal.xcnt in (select top 10 percent
count(*) as xcnt from dept))
select dep_data.xcnt, num_emps.xcnt, CONVERT(VARCHAR(50),
cast(dep_data.xcnt as decimal)/
cast(num_emps.xcnt as decimal)*100)+' %' AS [%]
from dep_data, num_emps
The result for my sample data which I calculated by hand to make sure was 100% correct was:
While writing a query such as this might take more time, I suggest that it’s well worth the time for all the reasons already stated. I also think from a maintainability and readability standpoint it’s also superior. I often have to ask people who create TEMP tables lots of questions when I have to work on their code.
When is data less valuable? How do you manage data as it becomes less valuable?
In the good old days the answer to these questions was easy; it was simply based on what the company was willing to pay for relatively expensive onsite disk space. So while a company might prefer to keep data around as long as possible, the realty was always more of what is the bare minimum the company could afford to keep and was willing to pay for space to house that data. Back then this was not a major tradeoff because we did not have data scientists or data analytics plus business intelligence was in its relative infancy. Below is a simple diagram of a data life cycle management schema that smart DBAs might have implemented. That was then.
Nowadays the businesses see data as a leading differentiator and worth its weight in gold. Hence they want to keep as much data accessible online as possible, within reasonable cost. With much lower disk costs these days plus tiered cloud storage options, the amount of data practically possible to maintain online is arguably near unlimited. Of course that’s not a very legitimate answer since we also need to maintain certain performance service level agreements (SLAs). But today a DBA might have to maintain four to ten times as much historical data to meet the new data driven business requirements, including regulatory and compliance needs. Undoubtedly the data life cycle management picture above is not the optimal solution. For example on Amazon Cloud you might implement something along these lines.
For purpose of discussion let's assume that you have Oracle 11g or 12c RAC cluster deployed on premise, and now you've been directed to see about moving it to the cloud. Of course there is always the Oracle cloud, but let's further assume that your company has more than just Oracle databases and has therefore chosen a generic cloud provider such as Amazon AWS, Microsoft Azure or Google Cloud. These cloud providers generally do not offer network multicast, virtual IP nor database specific QOS control. Furthermore they also do not offer the true shared storage required by RAC and historically provided on premise via a SAN or NAS. So what are you to do?
Please note that I am not promoting any one cloud provider over another. I don't want to start any heated debates or discussions. I am just saying what do you do when management says move our Oracle databases which include some RAC clusters and they mandate it must be to one of the other cloud providers where they have already standardized. As I see it you have four choices (with some better than others, and some not really choices at all).
First, you might be able to educate management and thus push back on the decision to move Oracle databases to other cloud providers due to RAC infrastructure requirements. Only you know how much clout or political capital you have to spend on such an endeavor. My experience has been that more and more Oracle DBAs find that the tail cannot wag the dog anymore. As many hesitant DBAs learned in the early days of database virtualization that resistance is futile. Thus this option may not really be feasible.
Second, you might suggest that since the cloud VM size can be created and upgraded as needed that you can consolidate the RAC cluster into a single VM whose size equals the total of all the current nodes. However while that might solve the scalability issue it nonetheless loses the most critical aspect of RAC - High Availability (HA). So while technically this might sound good at first, it sacrifices HA which should not be the tradeoff you should readily make.
Third, you might consider using Oracle Data Guard to provide HA via a physical and/or logical standby database run in maximum protection mode. However there are scenarios where manual tasks must be replicated when using a physical standby that does not exist with RAC. And to quote Ask Tom about logical standby "There are more moving pieces which require more setup, care, and feeding". Also your company may not be licensed for nor have skills with Data Guard. So in effect you'd introduce two variables to this cloud migration: the cloud and Data Guard. That too may not be a viable choice.
Fourth and final you could search for a way to deploy Oracle RAC in non-Oracle cloud. Searching the web you should encounter a company called FlashGrid who offers a interesting and novel solution. You also will find references to FlashGrid on Amazon's and Azure's market places. Of course many skeptical DBAs might ask “Does Oracle support such a solution?” You can easily find a few blog posts against the idea. While they present reasonable technical points for consideration, they also admit to not yet having tried to run Oracle RAC on AWS or Azure. You might remember 10-12 years ago when the same kinds of questions were asked about running Oracle on VMware. At that time too there were blog posts against it and even some question as to the level of Oracle support. However today we all run Oracle on hypervisors without much concern – and Oracle fully supports it.
So my advice is to treat this like any new car you might buy. Take this or any other promising technology for a test drive to form your own opinion. I have run Oracle RAC on both AWS and Azure - and so far no issues. You can create a 2 or 3 node Oracle RAC cluster via FlashGrid's simple provisioning wizards which takes about 10 minutes to define what you want and then an additional 90-120 minutes to fully deploy a fully functioning environment. So for the mere cost of a few hours cloud time you can see and judge for yourself whether this new technology has merit for your database needs. Take it for a spin!
If you ever need for a simple, light weight and absolutely free data migration tool then you should try Data Puppy Lite. It offers the ability to migrate a schema at a time from a source to target database, and it supports the following databases: Oracle, MySQL, SQL Server and PostgreSQL. Data Puppy Lite automatically handles all the required, complex data type conversions as you got from one database to another. As a Windows client tool it's not well suited for moving very large amounts of data. But when all you really need is something quick and easy it fits the bill.
Microsoft has been extremely innovative the past few years – especially with regard to their SQL Server relational database both on premise and in the cloud. Although I am a Linux bigot and love the fact that Microsoft now offers SQL Server on Linux, that’s not the most interesting new development. I also like their new pricing vs. feature structure – that all editions now offer almost all features, the only limiting factor being maximum CPU, memory and database size. That makes database upgrades painless with zero application changes required. But arguably the most interesting and disruptive new feature is the “stretch database”, which basically allows your on premise database to store a portion of your data as normal and the rest in the cloud – with total transparency to the application and end users. Some users adopt this feature to slowly migrate to the cloud while others to leverage the cloud’s infinite scalability in order to keep colder data online for longer period of time as required by the business. Here is a basic diagram depicting this concept and feature. Note that SQL Server’s query optimizer handles this division of data totally transparently. Thus a big table logically looks like it’s all local when in fact part of it is on premise and part of it is in the cloud as shown here.
But as interesting as this capability is, it has some key limitations. First is requires SQL Server 2016 or greater and it can only stretch into Microsoft Azure SQL database. So it is limited to both one database and one cloud provider. Furthermore there are several key data type, constraint and DML operations limits which further complicate this solution. Moreover the costs for leveraging this capability are more costly than many find acceptable since one must pay several fees – basic cloud storage, database stretch units (DSUs) which are the computing power to process stretch processing and outbound data transfers.
What many people would prefer is this exact same capability for any on premise database stretched into any cloud provider with the target database being any database (i.e. not require same database on both sides) and without data type or other limits. Such a solution does exist! Cirro’s federated query engine provides the capability to create a “partition view” tables regardless of their database platform. You can stretch for example Oracle on premise to PostgreSQL on Amazon RDS. In fact you can stretch tables for any number of tables either on premise or in the cloud and across any number of database platforms. So the figure shown below highlights only the most basic scenario.
Below is the Cirro partition view definition required to stretch my two largest, on premise Oracle tables to PostgreSQL on Amazon RDS. I keep roughly 10% of the hottest data on premise and 90% of the colder data in the cloud.
create partition view stretch_table_1 as
select * from ORACLE.ORCL.stretch.table_1a partition by range on c1 (minvalue, 1000)
select * from POSTGRESQL.stretch.stretch.table_1b partition by range on c1 (1000,maxvalue);
create partition view stretch_table_2 as
select * from ORACLE.ORCL.stretch.table_2a partition by range on c1 (minvalue, 1000)
select * from POSTGRESQL.stretch.stretch.table_2b partition by range on c1 (1000,maxvalue);
Then from the end users’ perspective they simply query the Cirro view and Cirro automagically handles optimizing the query and returning the results. Note in this example below the user has two tables with some portions on premise and in the cloud, and performs SQL join operations without issue. Now that’s what I call truly stretching databases and tables. And only Cirro currently offers this.
select * from stretch_table_1 t1 join stretch_table_2 t2 on (t1.c1 = t2.c1)
where t1.c1 between 500 and 2500 and t2.c2 between 4 and 6;
Imagine that you’re the lead DBA managing a high profile, mission critical production database whose application processes $20 million per hour. The database schema design is complex with approximately 1,000 tables and 1,500 indexes. Moreover the application code and its SQL statements were written by many different developers over the years with varying SQL programming skills. In short it’s your typical DBA job description and the fundamental situation that you find yourself in.
Now imagine that the database application experiences a noticeable slowdown and that key business management is now asking (or possibly even screaming) to fix it and fix it now. So what’s the current recommended performance resolution approach for the DBA to follow?
There’s a common best practice often adhered to, often silently without even realizing that it’s being followed, do not throw hardware at a performance problem. In fact it’s sometimes true that throwing more hardware at a problem can make things worse, but it’s not a given. Now 20 or even 10 years ago this was sound advice because hardware was so very expensive especially as compared to consulting person hours to find and fix the issue. Furthermore the application often could not be down to permit hardware upgrades. So historically DBAs have tried to research and locate the underlying problem and then fix it. However this approach could sometimes yield little to no improvement at a high person hour cost and an even high opportunity cost of business opportunities missed due to a slow system. But it was basically our only choice.
However today things are quite different: hardware is now cheap, amazing new hardware technologies exist, robust hypervisors provide amazing virtualization options and now we have the cloud where as an option where dynamic scale up or out is simply a check box and increased hourly cost. Therefore todays DBAs has many viable options to consider. Remember that the business generally does not demand an academically ideal solution, but an expedient one that minimizes downtime and lost business costs. So in some cases trying options that historically were bad advice such as throwing hardware at the problem or simple dynamic scale up and out might be a reasonable alternative. But DBAs will need to throw away baggage of past best practices which may no longer apply.
Let’s look at a couple simple examples. While these examples may not be the best possible advice, the idea is to demonstrate some contemporary alternatives that might have seemed unreasonable or to be avoided from past experiences.
Do you note a pattern? Do these options make you uncomfortable? Don’t feel bad. DBAs simply need to evolve their performance diagnostic and optimization tool belt to include some new options which have not existed before or were considered historically bad practices. From a business perspective where all they generally care about is fix it fast, these new options might be a reasonable or even best solution.
I've teamed up with two world class Oracle ACE Directors to create a new book - Oracle Database 12c Release 2 Testing Tools and Techniques for Performance and Scalability.
This book will cover every tool and technique DBAs should be aware of for performance and scalability testing of their Oracle 12c databases whether on premise or in the cloud.
Quote: Do not put your faith in what statistics say until you have carefully considered what they do not say. ~William W. Watt.
I started this blog with that quote because I find that database execution plans can sometimes behave very much like statistics – you need to question what they really say. For example many DBA’s and SQL developers live by the rule “the lower the cost of an execution plan the better that plan must be”. That sounds quite reasonable, but is it really true?
Let’s take a very simple ANSI SQL statement, a select that joins nine tables with no restrictions, no aggregate functions, no group by, and no order by. It’s so simple that there’s really no obvious SQL modification that might improve performance.
INNER JOIN BIG_SCHEMA1.TABLE_2 ON (TABLE_1.COL1 = TABLE_2.COL5)
INNER JOIN BIG_SCHEMA1.TABLE_3 ON (TABLE_2.COL1 = TABLE_3.COL5)
INNER JOIN BIG_SCHEMA1.TABLE_4 ON (TABLE_3.COL1 = TABLE_4.COL5)
INNER JOIN BIG_SCHEMA1.TABLE_5 ON (TABLE_4.COL1 = TABLE_5.COL5)
INNER JOIN BIG_SCHEMA1.TABLE_6 ON (TABLE_5.COL1 = TABLE_6.COL5)
INNER JOIN BIG_SCHEMA1.TABLE_7 ON (TABLE_6.COL1 = TABLE_7.COL5)
INNER JOIN BIG_SCHEMA1.TABLE_8 ON (TABLE_7.COL1 = TABLE_8.COL5)
INNER JOIN BIG_SCHEMA1.TABLE_9 ON (TABLE_8.COL1 = TABLE_9.COL5)
If I load this SQL into tuning session within IDERA’s DB Optimizer product and ask it to provide any SQL rewrites it can that return the same rows of data, it finds 13 rewrites in just a few seconds that are good candidates as improvements. By default it displays the execution plan cost. If however I instruct the tool to also run each rewrite, then I also see the elapsed time as well as many other run time statistics such as physical IO, logical IO, etc. as shown here in Figure 1.
Figure 1: IDERA DB Optimizer Finds SQL Rewrites
But let’s examine the results to see if our premise holds up – that the lowest cost is always best. The original SQL had a cost of 27 and ran in .01 seconds. The lowest cost was row #2 at 25, but that SQL rewrite ran .02 seconds, or double the time. Now look at the last two rows with a cost of 46 and 803. Their execution time shows up as the lowest even though these execution plan costs are the highest.
The point is “Do your homework!” Don’t just look at execution plan costs alone. Make sure that what you care about most, such as execution time or physical IO are also low. Don't let the statistics conundrum lead you astray.
Since the dawn of database technology the Achilles heel or key limiting factor has been disk IO. In order to address this issue traditional spinning disk drive vendors historically offered 10K and even 15K RPM models. But more speed was needed. Then came Solid State Disks or SSDs, where depending upon the underlying technology (SLC vs. MLC vs. TLC) the speed increase vs. spinning disk was from between 10X to 100X. However once again more speed was needed. Then hardware vendors offered SSD technology connected directly to the PCI express (PCIe) bus which offers yet another 2X to 4X improvement. Finally we now have the nonvolatile memory express (NVMe) host controller interface specification which from the ground up was designed for SSD technology. This technology offers yet another 2X to 4X speedup. Thus we now have IO technology that’s at least 40X faster than spinning disks. Nonetheless memory is still 100X faster than that – so in-memory database technology offers DBAs the currently fastest possible method for accessing data from within the database. But just how good is it really?
For testing I used a star schema design data warehouse. I kept it small so that I could guarantee that the entire database would fit within the memory I had available. My dimension tables ranged from 5,000 to 180,000 rows. My fact table was 34 million rows – tiny by todays data warehousing norms, but sufficient for testing purposes. The query was a typical multi dimension to fact join with inline views, subqueries, complex filters and group operations. I tested three scenarios:
In addition I performed these three test cases on both Oracle 12c R1 and 12c R2 since the latter includes significant improvements to in-memory. The results are shown below in Figure 1.
Figure 1: Performance Comparison Results
The results are quite astounding on multiple levels. Of course we see the major (6X in this case) benefit of using star transformation execution which relies upon bitmap instead of b-tree indexes. Now look at the in-memory results – we see yet another major reduction of 12X faster than the star transformation and an astounding 69X vs. the traditional design. But there’s still more – look now at Figure 2 comparing in-memory improvements for Oracle 12c R2.
Figure 2: Oracle 12c R2 in-memory Improvement
As you can see Oracle 12c R2 has delivered about 4.5X better performance than 12c R1. That translates into 54X faster than the star transformation and a flabbergasting 309X vs. the traditional design. Oh and by the way the in-memory stores data in columnar format so I also saw a 3.85X compression factor so that what took 1 GB just for the row format not including indexes took just 25 MB of memory. So even on this meager setup with just 4GB of the Oracle SGA allocated for in-memory I could easily handle over a billion row table. So Oracle in-memory is not just fast, it’s highly efficient as well.
There’s no fighting progress. Decades ago database administrators managed and controlled everything – the OS, network, storage and database. Times have changed and DBAs have had to evolve (i.e. accept) well established principles of economics, namely specialization. Thus we have network administrators, storage administrators, virtualization administrators and database administrators. While it’s rarely very comforting to “give-up control”, DBAs have done so – even if begrudgingly. So now we have “the cloud”.
Once more things are evolving and DBAs have to again embrace a new way of doing things. And as with each and every evolutionary step, fighting the inevitable is a losing battle. The planets have aligned, the gods have spoken, the train has left the station, or whatever saying you prefer – we all will be deploying more and more cloud databases going forward. So just go with the flow. Better yet, embrace and “ride the wave”. So unless you’re really close to retirement and basically don’t care – you will need to “have your head in the clouds”.
But just as with every other evolutionary step where DBAs were worried about job security – the cloud does not eliminate the need for database administrators. It merely requires them to focus on other key aspects of managing a database. So while something critical like backup and recovery may be simply a questionnaire during cloud database instantiation, the DBA still has to know what choices to make and why. So in short, DBAs will be required to focus more on what vs. how. Moreover since everything in the cloud has a monthly cost – DBA’s will need to pay strict attention to capacity planning for storage and all other chargeable resources (e.g. CPU) in order to better assist management with controlling cloud costs. And as we all know “money talks”. So the DBA is just as important as ever.
I have written a book about Optimizing Oracle on VMware, plus have published papers and given presentations regarding the same. These basically all espouse essentially the same fundamental belief – that database virtualization is here to stay, and is rapidly moving towards mainstream (if not there already).
But a couple legitimate questions or concerns are always brought up (and they are good questions). I’m going to address the three that I hear most often, because I believe having the answers to just these will knock down 80+% of all the roadblocks to virtualizing your databases.
With these two issues put to rest, hopefully you and your management will more quickly embrace what’s inevitable – virtualization of database servers. Because none of us like swimming upstream.
I thought I’d write a quick blog this time and ask/think about what’s your favorite version of Oracle. Of course the proper answer is probably whatever version is mainstream right now – so maybe 11g R2. But what if you could enter Mr. Peabody’s “Way Back” time machine and once again live in any time you so desired. Then what version of Oracle would that be?
Now don’t laugh but one thought that came to mind was Oracle 6.0.36. It ran on DOS (with extended memory support), offered most of the essential database core features – and offered this cool option called PL/SQL. Just kidding – Oracle 6.0.36 was simply on my list since it was the first version one could run at home for learning purposes that did not require going out and buying a SPARC 20 workstation.
A genuine contender though would have to be Oracle 8.1.7. This version had many of the things most people routinely needed and yet was not over-bloated. What I mean is that it took just a little space to install and a little memory for the SGA and that’s it – so it fit real nice on a notebook, even an average one. So once again the winning hand was that ability to run the full blown Oracle database on minimal resources for both learning and experimentation.
I surely don’t mean any disrespect to either Oracle 11g or 12c – but it’s hard to forget such a memorable version as 8.1.7. Nonetheless these days we really must standardize 11g or 12c in order to have current technological minimums and amazing, must-have new features like multi-tenant (pluggable databases), in-memory column store tables, advanced compression and numerous other remarkable new features.
Finally special thanks to Oracle for offering the Oracle Express database. For many these smaller kin to the full database often offer an excellent minimal footprint, acceptable memory use and in most cases fully automatic installation and basic maintenance.
I’ll be speaking at the annual IOUG Collaborate Conference in Las Vegas April 3rd and 5th. To quote their website:
The Independent Oracle Users Group (IOUG), the Oracle Applications Users Group (OAUG) and Quest International Users Group (Quest) present COLLABORATE 17: Technology and Applications Forum for the Oracle Community. As an educational conference, COLLABORATE 17 helps users of the full family of Oracle business applications and database software gain greater value from their Oracle investments.
Created by and for customers, COLLABORATE 17 offers an expert blend of customer-to-customer interaction and insights from technology visionaries and Oracle strategists. Expand your network of contacts by interacting with Oracle customers, solutions providers, consultants, developers and representatives from Oracle Corporation at COLLABORATE 17.
Here are my presentation topics, dates and times: