Skip navigation
1 2 3 Previous Next

Bert Scalzo's Blog

35 posts

There’s no fighting progress. Decades ago database administrators managed and controlled everything – the OS, network, storage and database. Times have changed and DBAs have had to evolve (i.e. accept) well established principles of economics, namely specialization. Thus we have network administrators, storage administrators, virtualization administrators and database administrators. While it’s rarely very comforting to “give-up control”, DBAs have done so – even if begrudgingly. So now we have “the cloud”.


Once more things are evolving and DBAs have to again embrace a new way of doing things. And as with each and every evolutionary step, fighting the inevitable is a losing battle. The planets have aligned, the gods have spoken, the train has left the station, or whatever saying you prefer – we all will be deploying more and more cloud databases going forward. So just go with the flow. Better yet, embrace and “ride the wave”. So unless you’re really close to retirement and basically don’t care – you will need to “have your head in the clouds”.


But just as with every other evolutionary step where DBAs were worried about job security – the cloud does not eliminate the need for database administrators. It merely requires them to focus on other key aspects of managing a database. So while something critical like backup and recovery may be simply a questionnaire during cloud database instantiation, the DBA still has to know what choices to make and why. So in short, DBAs will be required to focus more on what vs. how. Moreover since everything in the cloud has a monthly cost – DBA’s will need to pay strict attention to capacity planning for storage and all other chargeable resources (e.g. CPU) in order to better assist management with controlling cloud costs. And as we all know “money talks”. So the DBA is just as important as ever.

I have written a book about Optimizing Oracle on VMware, plus have published papers and given presentations regarding the same. These basically all espouse essentially the same fundamental belief – that database virtualization is here to stay, and is rapidly moving towards mainstream (if not there already).


But a couple legitimate questions or concerns are always brought up (and they are good questions). I’m going to address the three that I hear most often, because I believe having the answers to just these will knock down 80+% of all the roadblocks to virtualizing your databases.


  1. Oracle does not support VMware. Not true. Roughly speaking – a little over a year ago their position was not to support any VM technology. Then about a year ago, once they got their own VM solution (OVM) - they supported that, but really not others. Then shortly after that, they opened up and now support non-OVM solutions such as VMware. But there is one very simple and reasonable caveat - Oracle support retains the right to ask the customer during troubleshooting efforts to at some point to re-host to non-VM in order to eliminate that possible point of contention. That’s more than fair. Plus with the Oracle acquisitions of Sun (who already had several VM technology offerings, including Virtual Box and Solaris Containers) and Virtual Iron – Oracle has truly become a virtual powerhouse in the VM arena (pun intended). So it’s in there best interest to support and propagate virtualization of everything – including databases.
  2. VM overhead is too debilitating for databases. Not true. Servers and memory are dirt cheap these days. And while storage costs have come down, the larger sizes of databases these days has grown to often eliminate savings. So today’s database servers have plenty of bandwidth to host more than one thing. And for those DBA’s overly concerned about the maybe roughly 10-15% overhead, CPU’s and memory are so cheap that factoring this cost into your cumulative or aggregate server needs is really not that big an issue – really. Often the direct power savings as well as the indirect power savings (e.g. lower cooling needs) can more than compensate for the minor increase in sizing calculations.
  3. Databases are too special or critical to virtualize. OK – BUT, DBA’s freely accepted virtualized storage for databases almost a decade ago. Gone are the days of going into the server room and pointing to your database’s disks. And since IO is the “Achilles Heal” or weakest link in terms of performance for any database, thus we’ve already virtualized the most important performance aspect. Thus it seems a little odd to debate or argue about virtualizing the remaining aspects, which generally impact database performance much less than the IO. So we’re just taking the next logical step in a progression we started a long time ago.

With these two issues put to rest, hopefully you and your management will more quickly embrace what’s inevitable – virtualization of database servers. Because none of us like swimming upstream.

I thought I’d write a quick blog this time and ask/think about what’s your favorite version of Oracle. Of course the proper answer is probably whatever version is mainstream right now – so maybe 11g R2. But what if you could enter Mr. Peabody’s “Way Back” time machine and once again live in any time you so desired. Then what version of Oracle would that be?


Now don’t laugh but one thought that came to mind was Oracle 6.0.36. It ran on DOS (with extended memory support), offered most of the essential database core features – and offered this cool option called PL/SQL. Just kidding – Oracle 6.0.36 was simply on my list since it was the first version one could run at home for learning purposes that did not require going out and buying a SPARC 20 workstation.


A genuine contender though would have to be Oracle 8.1.7. This version had many of the things most people routinely needed and yet was not over-bloated. What I mean is that it took just a little space to install and a little memory for the SGA and that’s it – so it fit real nice on a notebook, even an average one. So once again the winning hand was that ability to run the full blown Oracle database on minimal resources for both learning and experimentation.


I surely don’t mean any disrespect to either Oracle 11g or 12c – but it’s hard to forget such a memorable version as 8.1.7. Nonetheless these days we really must standardize 11g or 12c in order to have current technological minimums and amazing, must-have new features like multi-tenant (pluggable databases), in-memory column store tables, advanced compression and numerous other remarkable new features.


Finally special thanks to Oracle for offering the Oracle Express database. For many these smaller kin to the full database often offer an excellent minimal footprint, acceptable memory use and in most cases fully automatic installation and basic maintenance.

I’ll be speaking at the annual IOUG Collaborate Conference in Las Vegas April 3rd and 5th. To quote their website:


The Independent Oracle Users Group (IOUG), the Oracle Applications Users Group (OAUG) and Quest International Users Group (Quest) present COLLABORATE 17: Technology and Applications Forum for the Oracle Community. As an educational conference, COLLABORATE 17 helps users of the full family of Oracle business applications and database software gain greater value from their Oracle investments.


Created by and for customers, COLLABORATE 17 offers an expert blend of customer-to-customer interaction and insights from technology visionaries and Oracle strategists. Expand your network of contacts by interacting with Oracle customers, solutions providers, consultants, developers and representatives from Oracle Corporation at COLLABORATE 17.


Here are my presentation topics, dates and times:


  • Oracle Utilities as Vital as Ever
    • Apr 3, 2017
    • 12:00 PM–12:30 PM
    • Palm D
    • Session ID: 762
    • Why are books like "Advanced Oracle Utilities: The Definitive Reference" imperative for today's DBA's and senior developers? The answer: Oracle has become so robust and complex; where many new features are implemented not as SQL commands, but as either PL/SQL packages and/or stand alone utilities. Thus to best leverage all that Oracle has to offer, one must know all those non-SQL command interfaces for those new features. This presentation will demonstrate a few critical examples of such features and their usage. The goal will be to inspire attendees to realize that SQL alone is no longer sufficient - that there are many other commands and/or PL/SQL packages to master.


  • Productivity Enhancing SQL*Plus Scripting Techniques
    • Apr 5, 2017
    • 12:00 PM–12:30 PM
    • Palm D
    • Session ID: 759
    • No matter what other Oracle or 3rd party database tools Oracle DBA's and developers might embrace, SQL*Plus remains a mainstay. However many don't fully realize it's true efficiency potential by not mastering advanced techniques such as Dynamic SQL Scripting (i.e. scripts to generate and run scripts). Oracle ACE Bert Scalzo will demonstrate such techniques and provide example scripts to increase any SQL*Plus users productivity. Furthermore some useful, advanced Unix/Linux shell scripting techniques will be highlighted as well.


  • Flash Based Methods to Speed Up DW & BI Oracle Databases
    • Apr 5, 2017
    • 11:00 AM–12:00 PM
    • Jasmine D
    • Session ID: 761
    • DW & BI databases are growing ever bigger while analytical users continue demanding fast performance. While Exadata may be a great solution, not everyone can afford such a capital investment. Oracle ACE Bert Scalzo will highlight simple yet effective methods to utilize flash based storage technology to significantly improve performance with the least cost, effort and interruption. In some cases it's possible to get near Exadata like performance without as many zeros added to the cost.

I’ll be speaking at the annual Hotsos Symposium in Dallas TX February 27th and 28th. To quote their website:


Hotsos is the best conference in the Americas devoted to Oracle system performance. The combination of topic focus, small audience size, and large speaker list make the Hotsos Conference an unsurpassed networking opportunity.

The Hotsos Symposium reaches for a new technical high with presentations focused on techniques, experimental results, and field-tested scripts that attendees can take home and apply to real-world tasks. The speakers routinely offer an excellent balance between database principles and problem-solving techniques. The speakers also show you how to determine the difference between reliable information and bad advice around Oracle performance optimization.


Here are my presentation topics, dates and times:


  • Optimizing Data Warehouse AdHoc Queries against Star Schemas
    • Monday, February 27, 2017
    • 1:00pm to 2:00pm CST
    • Hall B & Virtual Hall B
    • Attendees will learn optimal techniques for designing, monitoring and tuning "Star Schema" Data Warehouses in Oracle11g and 12c. While there are numerous books and papers on Data Warehousing with Oracle, they generally provide a 50,000 foot overview focusing on hardware and software architectures -- with some database design. This presentation provides the ground level, detailed recipe for successfully querying tables whose sizes exceed a billion rows. Issues covered will include table and index designs, partitioning options, statistics and histograms, Oracle initialization parameters and star transformation explain plans. Attendees should be DBA's familiar with "Star Schema" database designs and have experience with Oracle 11g, and some exposure to Oracle 12c.


  • Successful Dimensional Modeling of Very Large Data Warehouses
    • Tuesday, February 28, 2017
    • 1:00pm to 2:00pm CST
    • Hall B & Virtual Hall B
    • Attendees will learn successful techniques for dimensional modeling of very large Data Warehouses using traditional Entity Relationship diagramming tools. While there are numerous new modeling conventions and tools, Entity Relationship modeling tools have proven best at real-world database design and implementation. This presentation provides the ground level, detailed recipe for the optimal dimensional modeling techniques of tables whose sizes exceed 500 million rows. Issues covered will include "Star Schemas", fact and dimension tables, plus aggregate table design and population. Attendees should be DBA's or senior developers familiar with Oracle database design and any form of data modeling.

Often people will write to ask why some database tool such as SQL Developer, DBArtisan or Toad when working against Oracle 10g, 11g, or 12c databases seems generally sluggish or that some screens may appear to lock up or take as long to refresh as the refresh cycle. They are often especially frustrated since this problem did not occur with Oracle versions prior to 10g. So what is the database tool doing that’s wrong (i.e. is it a bug)?


The short answer is that this is an Oracle DBA best practices issue - and not a database tool problem per se. The database tool just happens to expose the symptoms since it makes such frequent access to data dictionary views and tables. The underlying problem generally effects all database applications to a varying degree – it’s just exposed most prominently via the database tool.


Oracle versions prior to 10g offered two query optimizers: cost and rule – with rule based being the default and fallback, plus also quite importantly internally used for most access to the data dictionary. While 10g and beyond still offer both optimizers, all access to the data dictionary is via the cost based optimizer, hence the data dictionary needs statistics gathered in order to function efficiently.


The common misconception (and source of the problem) is that people think Oracle 10g does this “auto-magically” as a background task (i.e. GATHER_STATS_JOB). However, the database does not – well not reliably anyway. As a result 90+ percent of the time the database tool users experience sluggish performance with Oracle 10g, 11g, or 12c database that is the root cause. Most times it takes a “heated debate” between the database tool users, their DBAs, and myself to convince them to try the quick and very simple fix – which takes just two minutes to perform and then successfully test.


Here’s the Oracle 10g documentation which most times finally gets them to try our fix:

Oracle® Database Performance Tuning Guide
10g Release 2 (10.2)

Chapter: 14 Managing Optimizer Statistics

  1. When to Use Manual Statistics

Automatic statistics gathering should be sufficient for most database objects which are being modified at a moderate speed. However, there are cases where automatic statistics gathering may not be adequate…

Another area in which statistics need to be manually gathered are the system statistics. These statistics are not automatically gathered. See "System Statistics" for more information.

Statistics on fixed objects, such as the dynamic performance tables, need to be manually collected using GATHER_FIXED_OBJECTS_STATS procedure.


So here’s the recommended fix (performed via a DBA privileged account):

  • exec dbms_stats.gather_dictionary_stats;
  • exec dbms_stats.gather_fixed_objects_stats;


These two commands should probably be run about once per week for best results. Of the two commands, the first is far more critical as it handles the “ALL/DBA/USER_*” views and V$ tables – which are accessed constantly from within the database tool.

The second command is for the X$ tables, which are primarily referenced only by some advanced DBA features.


Please consider getting your DBA to make these calls either a scheduled PL/SQL job or possibly implementing them via “on instance start-up” database level trigger.


Now to give you an example of how much difference this process can make – we've clocked screens in the database tool that run quickly on pre-10g, but take very long on a default install of 10g (sometimes as long as 45 minutes). Once the proper data dictionary statistics are collected, the database tool performance equals or exceeds the pre-10g timings.

For far too long, database administrators and developers have focused on explain plans for tuning SQL statements that perform poorly. But with Oracle 11g and 12c, explain plans can quite possibly become a thing of the past. Furthermore, 12c can perform the entire tuning process automatically – with minimal time spent during the initial SQL development and maybe 1/10th the effort spent addressing any problematic SQL that somehow does find its way into production.


Does this all sound too good to be true? Then read on …


Let’s start by defining what an explain plan is – at least conceptually. The explain plan is the “approximate algorithm” that the Oracle database engine will attempt to “execute” in order to perform the requested database operation. Since SQL is a non-procedural, set oriented language – the explain plan is the only way to “guestimate” what the SQL code will actually do, because the code itself only expresses what is to be done – and not how. The explain plan exposes both the algorithm and indexes (if any) that the database engine would utilize during actual execution.


Over the years (and major database versions), Oracle has tried to make SQL optimization more stable and reliable than simple explain plans. An excerpt from the book Oracle 11g New Features cleanly delineates this progression nicely – which has been as follows:


  • Explain plans for Oracle versions <= 7.X
  • Plan stability for Oracle versions 8i and 9i
  • SQL Profiles for Oracle version 10g
  • SQL Plan Management for Oracle version 11g and 12c
  • I have to wait in line for my turn to be helped
  • The office may not have current information
  • The person helping me might be misinformed
  • I could say scenic route whereas the family simply wants to get there quickly
  • Things can change between getting the TripTik and starting/ending my trip
  • I must possess both an automobile and valid drivers license to make the trip
  • I’m purposefully eliminating far better methods or mode choices (e.g. flying)


Believe it or not, the exact same cons can often exist for explain plans – as follows:


  • During execution phase, the explain plan must be derived
  • Step #1 requires some resources, so there could be a delay
  • The statistics and/or histograms could be out of date (i.e. stale)
  • SQL may so malformed that fast explain plan derivation could be suboptimal
  • SQL code might contain hints that are suboptimal and yet take precedence
  • The fundamental performance conditions can change (e.g. new indexes, removal of indexes, partitioning of object, partitioning of indexes, “init.ora” changes that might affect SQL execution, addition and/or subtraction of server resources, etc.)
  • User must known how to obtain and read explain plans, and how to make SQL rewrites and/or database structure changes in order to improve the performance
  • Oracle 11g’s Automatic SQL Plan Management can often yield great results for far less (if not minimal) effort


So let’s examine how the newer versions of Oracle have totally eliminated such concerns.


Oracle 10g adds the new SQL Tuning Advisor – a utility to “Analyze individual SQL statements, and recommend SQL profiles, statistics, indexes, and restructured SQL to improve overall SQL execution performance.” Using my vacation analogy, the SQL Advisor is more like going to the American Airlines web site and booking airfare for the same trip. I’ll let the airline, pilot, navigator, tower staff, and air traffic controllers worry about all the details – I just have to say get me from point A to B for whatever price and travel time I am willing to bear. Likewise, we can now manually inform Oracle to worry about the algorithmic details churning inside – then we’ll just let the database know how much effort (i.e. time) to expend in order to meet our general SQL performance needs.


Thus Oracle 10g requires the DBA to remain engaged to generate, review and implement the recommendations, as shown below in Figure 1 (which is a slide from the Oracle 11g new features training class). Returning to the vacation analogy via an airline, I still have to manually input the origin, destination, dates, seat class and number of stops that I’m willing to endure. I can then choose from among the flight recommendations, but until I press the button for “purchase the ticket” – nothing real has as of yet actually happened. It’s all just “what-if” type analysis whose impact won’t be felt unless formally chosen.



Figure 1: Manual SQL Tuning in Oracle 10g


So for 10G, we’d look at OEM’s Advisor Central SQL Tuning Advisors, and specifically the SQL Tuning Advisor looking for any statements with excessive database times or IO. The AWR repository will automatically contain all the SQL Tuning Candidates and they will be broken down along their relative percentage contribution to the cumulative total, as shown below in Figure 2.



Figure 2: Top SQL by IO Tuning Candidates


The SQL Tuning Advisor will offer suggestions to improve performance – as well as an estimate of the percentage improvement, as shown below in Figure 3. Note too that it suggests more than just alternate explain plans (which is essentially just an internal SQL rewrite), but it also offers other critical advice – including database structural changes.



Figure 3: Tuning Advisor Recommendations


Oracle 11g adds SQL Plan Management which can essentially automate the 10g scenario in whole, as shown below in Figure 4 (which is another slide from the Oracle 11g new features training class). Returning once again to my vacation analogy via a flight would be like setting some American Airlines web site booking preferences – and automatically having the airline purchase those tickets that best meet all my needs (and all without any involvement from me). Yes – I’m giving up some control, but in the real world we call such scenarios excellent customer service. So why should the database and/or DBA offer anything less? And for those DBA’s who don’t want to relinquish such control or worry about the relative overhead involved, the last step can be either manual or automatic.



Figure 4: Automatic SQL Tuning in Oracle 11g and 12c


Of course in Oracle 12c this only gets better. Instead of just preserving repeatedly executed SQL statements as candidates, it now records and tests all SQL statements. I for one cannot wait to see what 12c R2 offers to improve this.

You’ll have to forgive me for spoofing the famous movie line “We don’t need no stinking badges” (Treasure of the Sierra Madre, 1948 and Blazing Saddles, 1974), it just seemed quite apropos.


I was discussing data modeling needs recently with a potential customer – and like many shops I visit, they saw no need for data modeling. That their DBA’s, data administrators and application architects knew everything that was home grown – and that the rest was third party applications for which they had no control and thus no need for a data model. That’s a very common belief these days it seems.


So my reply was to try to enlighten them about why data modeling makes sense no matter what – and especially for the scenario they described. I thought it might be worthwhile to share those points with the general public in the hopes of convincing a few more people.


Five common misbeliefs that data modeling is unneccessary:


1. Our DBA’s, data administrators and application architects have our databases under control.


That’s good news – you’re doing better than many shops. But what would happen if those guys and gals happen to play the Mega Millions lottery in a pool, and win for $360 million? What’s going to happen when they come in Monday and quit? That’s one heck of a lot of knowledge, wisdom and experience to try to back fill – and quickly. Wouldn’t having a data model make that process a little easier? OK – so the liklihood of that happening is low. What if they all get new jobs or retire around the same time? These things happen – so like the Boy Scouts say “Be Prepared”.


2. We only use 3rd party applications like SAP or Peoplesoft, so to us the database is a black box.


That’s perfectly true – often you cannot do anything about the design of such products. But knowing the design has several advantages. Ever been on a tech support call and wished you had a little more knowledge to either communicate your issue better or firmly dismiss a poor reason they say is the problem – and you know it is not. How about when you get a new version of the 3rd party application, would not knowing about some of the structural changes be useful in case the upgrade of the data runs into issues? And what if the business makes a requirements change, like having all addresses fully qualified and verified against a validity database? Would not having some idea of all areas impacted be helpful? I may not be able to effect changes to the US highway system, but it sure helps to have a map for long drives outside my comfort zone.


3. We only use home grown applications, so we already know everything about all the databases.


Actually, this has already been addressed by the prior two points (just exchange home grown for 3rd party application). Because all the same items hold true: you need to protect your company from unforseen knowledge loss, and it often helps to have a “road map” for impact analysis no matter who built the application. In many cases the justification for data models is even higher when the applications are developed in house. Because if you built the highways, you better be able to best navigate them – and a picture almost always helps.


4. We contract out application development, so no need to worry about such technical issues.


Ok – so when you buy a new home, you’re comfortable with just tell the builder to make something about 2600 square feet? That you don’t need to see a floor plan that meets your requirements, and that the contractors buidling it don’t need any blueprints off which to work. We all know that’s not true – especially because a home is such a large investment of the most valuable kind of money – our own. Well application development is no less specialized, critical, costs lots of precious corporate budget. What’s more, 80% or more of the IT project failures I’ve seen could have been adverted or at least lessened by having a data model up front. I’ve even been an expert witness in court cases on this issue. The data model is nothing more than a collection of the business requirements. If an application fails and there was no such document defining the business requirments – then whose fault is it? The best companies I’ve visited not only do data models for their own projects (regardless of who builds them), they even require that 3rd party vendors provide data models as part of the RFP process.


5. Data modeling is quite outdated and has been replaced by newer methodologies such as UML.


That could well be true. If your application developers have extensive UML models that fully cover the underlying “persistent” data structures, you may not need to do data models as well. But application developers tend to be programmers, and programmers tend ot think (and thus model) in terms of what the data in motion needs are – which is what their programs embody. So having just UML models may in fact provide you a false sense of security that you’re fully covered. If the UML models do not cover persistent data storage, then you may well need a data model to complement your UML models. It won’t hurt to double check to be safe.

Oracle’s Parallel Query Option (PQO) is a fantastic tool. But much like any good tool, it can very easily be used in the wrong situation or simply abused. In those cases, PQO can actually make database performance much worse. Let’s examine some common misuses or misperceptions regarding optimal and efficient PQO usage.


  1. PQO make sense universally on any SMP or multi-processer database server.


Right – and I have some wonderful “magic beans” to sell to people who blindly believe that theory. Oracle has to spawn additional shared server processes (or reuse those that are lying around still from recent completed PQO operations). So the cost of spawning additional processes and their footprints must be accounted for in the grand scheme of things. Plus, Oracle will need a “coordinator” process to coalesce all the results – and that too has a small cost. So calculate all this into your assumptions before using PQO. But for those of us less scientific, think of PQO as being like a grocery store checkout line. If you only have a couple items in your grocery cart, will forcibly spreading them across multiple checkout lanes get you out the any door sooner? And even if you do have an overly full cart, will spreading your items across checkout lanes that are already busy be any better? There are obvious times (i.e. rules) that are better than others. We simply need to apply the same clear thinking when using PQO.


  1. When I say parallel N, Oracle only exactly and only spawns N server processes.


OK – I’m 100% guilty of doing this on occasion. In fact, I might even do this more often than not. But I am cognoscente of the above issues and have planned according – and still have a reasonable PQO scenario where my IO is 100% not my chief bottleneck. Thus in those clear cases I will extend beyond the actual CPU count in my PQO settings. I’ve even gone two to four times higher in the right circumstances. And in those cases it has made a clear and measureable difference. But be careful of when it does not make sense. For an easy test to verify this, simply set PQO to degree four on your dual core desktop or laptop and run several concurrent queries. It will not only run slower, but painfully so.


  1. I have a big multi-processer server, so full-table scans are always the way to go.


As silly as this may sound at first, I’ve seen this in many data warehousing scenarios. I think it happens because somewhere someone wrote that “parallel full-table scans were efficient”. But that author meant in clearly reasonable scenarios – and not universally. Yet this remains one of the top data warehousing tuning mistakes I see. Think about a DW query – it may scan millions of rows in tables with many billions of rows. Why force a full-table scan when there is at least a ten-to-one, if not a one-hundred-to-one, reduction factor. That’s exactly what indexes are for – and index scans can be done in parallel. I’ve seen improvements as large as 1000% from undoing this simple PQO misuse.

Bert Scalzo

Oracle Dynamic Scripting

Posted by Bert Scalzo Jan 15, 2017

Ever wanted to perform some operation against all the objects in a schema? For example a one-time management task to turn off parallel for all tables and indexes. Another example might be an on-going task to disable and later enable triggers and referential integrity to speed up a recurring data load job. These types of tasks are not uncommon, however many people don’t know how to leverage dynamic SQL scripting to make such jobs trivially easy. So I’ll show dynamic SQL scripts for these two specific examples – which will provide you both a working knowledge and template for your own use.


Example #1: Turn off parallel for all tables and indexes for a named schema


-- @NOPAR.SQL  schema_name

-- Turns off parallel

-- for all tables and indexes

-- for the named schema


set echo off

set verify off

set feedback off


set linesize 256

set pagesize 0


set term off

spool %TEMP%\nopar.tmp


select 'alter table '||owner||'.'||table_name||' noparallel;'

  from all_tables

  where owner = upper('&1')

  order by table_name;


select 'alter index '||owner||'.'||index_name||' noparallel;'

  from all_indexes

  where owner = upper('&1')

  order by index_name;


spool off

set term on




A key aspect to understand is that as this SQL script runs we want all its output to be the new SQL script we’re creating to run. So we don’t want any excessive output such as echoing the commands being run, verification of parameter values (i.e. old vs. new), or feedback about the rows counts processed. Hence the first three SET commands address this. We also don’t know how long an output text line will be, so we need to set the line size accordingly. Finally we don’t want select headers or page breaks, so easiest way to get both is to set page size to zero. So now our generated script output is nothing more than just the SQL commands.


Note that I’m spooling the output to my Windows temporary directory as defined by the “%TEMP%” environment variable, plus I don’t want to see the select command output wiz by – set I set terminal output to off. Then all I do are some common select commands against Oracle data dictionary tables such as ALL_TABLES – but notice that I’m actually using text literals and concatenation operations with table columns to construct SQL commands rather than just typical select output. Once such output has been completed I simply set the terminal output back on and execute (i.e. @) the SQL script this script just created. That’s all there is to it.


Example #2: Disable and enable triggers and referential integrity (RI) for a named schema


-- @SETALL.SQL  schema_name  OFF|ON

-- Turns off triggers and referential integrity (RI)

-- for all tables

-- for the named schema


set echo off

set verify off

set feedback off


set linesize 256

set pagesize 0


set term off

spool %TEMP%\setall.tmp


select 'alter table '||owner||'.'||table_name||decode(upper('&2'),'OFF',' disable constraint ',' enable constraint ')||constraint_name||';'

  from all_constraints

  where owner = upper('&1')

  and constraint_type = 'R'

  and status = decode(upper('&2'),'OFF','ENABLED','DISABLED');


select 'alter trigger '||owner||'.'||trigger_name||decode(upper('&2'),'OFF',' disable',' enable')||';'

  from all_triggers

  where table_owner = upper('&1')

  and status = decode(upper('&2'),'OFF','ENABLED','DISABLED');


spool off

set term on




This script was just a little more complicated, but essentially uses the exact same technique. In fact all the SET commands are the same. The SPOOL command only differs in the name of the temporary file. Then of course the select commands to build the SQL are different. Note that I added a second input parameter that allows for me to use this one script to both turn on and off the items of interest via the DECODE function. So I only need this one script for both tasks – turning off vs. on.


There’s literally no limit to how creative you can be with such dynamic SQL scripts. For example a long time ago before Oracle provided the metadata package to regurgitate the DDL to create an object, I had a complex dynamic SQL script to reverse engineer all of my database objects. The key take away here is this technique should be on any DBA or senior database developers skills list.

It’s not uncommon to discover databases that have simply evolved over time rather than having been built according to some overall design or “master plan”. And there seem to be no shortage of answers as to why this occurs. Sometimes DBA’s are just far too busy with a myriad of other tasks. Sometimes developers are under the gun for deliverables so quickly that there’s just not time. Or my favorite, this project started as just a couple new tables and then ballooned into a monster. Whatever the reason, the situation exists far too often in the real world of information systems.


Let’s put this into perspective. Would you purchase a brand new house where the builder simply showed you a high level “floor plan”, and then said “trust us” – we’ll just hammer together the lumber and stuff until we’re done, don’t worry. Not too sure very many of us would so nonchalantly encumber ourselves with a 30 year loan on such vague promises. We automatically expect the home builder to use that rough diagram as the precursory for an architectural blueprint that meets local codes, standards and ordinances. We expect the city zoning commission or code enforcement division to sign off on key items during the construction. We expect to spend the week before taking ownership to inspect the home and to mark things to be fixed before the closing. In short, when it’s our money and/or financial future – we expect things to simply work right.


The same is true in many other disciplines that we utilize every day. We get on elevators knowing that they were well designed and recently inspected. We readily fly on airplanes knowing that the FAA and other government agencies are making sure it’s safe and well regulated. We undergo surgery at hospitals knowing that doctors and administrators have jumped through major hoops to guarantee a reasonable level of safety. Yet we still build computerized systems with all the skill of a MASH surgeon – just get it stable enough to pass onto the next guy. No wonder then that both database administrators and application developers are quickly catching up to lawyers and politicians in terms of likeability.


So let’s start this week by just looking at the two most critical database success factors: effectiveness (doing the right thing) and efficiency (doing it expediently).


How can anything be effective if it’s just cobbled together haphazardly over time? I’m not saying that everyone must do a logical data model and transform that into a physical data model. That may not be most peoples’ “cup of tea”. But slowing down a wee bit to think of the overall physical design should yield a more consistent business solution. That should translate into higher effectiveness.


The same can be said for efficiency. So by not simply and seemingly randomly adding tables and columns on an as need basis can easily result in a more fluid design. It’s much easier too to code against such a design, which can also mean lower costs in terms of the development (i.e. less complicated to work with). Furthermore, this generally translates into lower ongoing maintenance costs as well – and often far less unplanned crisis too.


Find all this “pie in the sky” thinking to be too esoteric or too academic? Well I’ve been an expert witness in several trials where companies have sued consulting firms that built substandard applications and databases. In every single case, the plaintiff won substantial judgments based solely poor database design. So far my testimony has directly resulted in seven figure financial awards. That’s millions of dollars per case that could easily have been saved by slowing down and doing more planning or design. So as the old song goes “Look before you leap, still waters run deep” …

There are those who think that data modeling has become very passé’ these days. The belief is that because data modeling theory is over thirty years old and that since some data modeling tools have been around 10-20 years that somehow data modeling is now no longer relevant. Nothing could be further from the truth. In fact data modeling may now be more necessary than ever before.


While there are other modeling techniques and notations, such as BPM (Business Process Modeling) and UML (Unified Modeling Language), the need for accurately capturing the business data requirements and transforming that into a reliable database structural design is as paramount as ever. The key differentiator is that data modeling is the only technique and notation that focuses on the “data at rest”. All the others tend to focus more on “data in motion”. Another way to say this is that data modeling concentrates on issues that lead to a solid database design, while all the others tend to focus more on issues that will result in better application design or the things useful to programmers – such as data structures, objects, classes, methods, application code generation, etc.


I’ve personally been involved in several trials as an expert witness where plaintiffs were suing defendants for serious financial remuneration where custom database applications had performance and/or data accuracy problems. In every single case there was a failure to data model the business requirements. Thus the data effectiveness suffered. Moreover ad-hoc database design or database design using more programmatic oriented techniques and tools often resulted in inefficient database design. And no amount of coding trickery could overcome the resulting bad database design. So in every case, the plaintiff won.


The other reason data modeling has seen measurable resurgence is the data warehousing phenomenon. With cheap storage these days, most companies can afford to and benefit from retaining historical aggregate and/or summary data for making significant strategic decisions. With the accumulation of numerous source legacy OLTP (Online Transaction Processing) systems, there are two keys ways to approach populating a data warehouse: directly from source to warehouse (Figure 1) or through an intermediary database often referred to as an ODS or Operational Data Store (Figure 2).





There is already sufficient debate as to which approach is superior. I won’t address that here. However regardless of which approach one decides to utilize, the database design (i.e. the data at rest) is paramount. Because in a data warehouse the “data itself” and the business information it contains is the most relevant and valuable asset. Typical data warehouse queries and reports issued via BI (Business Intelligence) tools process that asset to yield key strategic decision making results.


The other key area where data modeling often supports the whole data warehousing and business intelligence effort is through the mapping of legacy data fields to their DW and BI counterparts. This meta-data mapping about how front-line business data maps to the data warehouse helps with the design of both the queries and/or reports, as well as with the ETL (Extract Translate and Load) programming efforts. Without such a mapping, as the OLTP legacy systems evolve there would be no automatic tie to the dependent data warehousing information. Hence one would have to almost totally re-engineer rather than simply follow the OLTP source data ramifications and ripples downstream to the DW and BI end points.


For those not involved with data warehousing projects, possibly just performing the more traditional OLTP type systems development, data modeling is still no less important. But often people get so caught up in novel paradigms such as “extreme programming”, “agile software development” or “scrum” that they compromise or even possibly entirely skip data modeling. The problem is that these new approaches don’t always spell out exactly how data modeling should be incorporated, thus people often skip doing it. My belief is that no matter what latest and greatest approach one utilizes, data modeling should be integrated into your development process wherever it makes sense.  Figure 3 shows how both conceptual and physical data modeling should fit into an overall database design process - whether it’s for a totally new system, or one that’s simply being updated or even totally re-engineered.




There’s one final reason why data modeling has been getting more attention these days. In many cases organizations are finally requiring data models as a “sign-off” deliverable as part of the development process. I attribute this to organizations attempting to adhere to the Software Engineering Institute’s (SEI’s) Capability Maturity Model (CMM) and Capability Maturity Model Integration (CMMI) concepts. The idea here is quite simple: in order to mature ones’ development process regardless of technique, you need to mature in terms of both the processes and tools used to achieve the desired better end result. For example the proper and effective use of project management tools is often quoted as the single best way to escalate from Level 1 (Chaos) to Level 2 (Repeatable). The central idea that both processes and tools can lead to maturity has helped to reinvigorate many peoples’ interest in data modeling.


Now comes the hard part, what data modeling tool should one use? That might seem like a tough or loaded question. There are numerous data modeling tools out there. Plus there are many enterprise modeling suites that contain data modeling capabilities. Rather than advise any particular tool, I’m going to outline some basic guidelines of things to avoid. My belief is that any tool that meets some standard and minimal requirements will enable one to produce effective and efficient data models – and hence the resulting databases.


  1. Avoid drawing tools that aspire to be data modeling tools. A good data modeling tool supports defining tons of meta-data that has business relevance. Think of the diagram as just the tip of the iceberg – where 90% of the berg mass what you don’t see as it’s underwater. The same is true for data modeling. If you concentrate on just what the picture is, you’ll probably compromise the effectiveness of the resulting database.
  2. Choose a tool that fits your needs. Often people will purchase a killer modeling tool that offers everything imaginable. But if all you need or will use is the data modeling portion, then why pay for more. The key concern here is that the more any tool does besides data modeling, the more chance that its data modeling capabilities may have been compromised in order to do everything else. Sometime more is not better.
  3. Verify that the data modeling tool you choose generates solid DDL, especially for the target database platform. This is another case where more might not be better. It will be better if your tool supports 100% accurate CREATE or ALTER scripts for a few databases important to you than one that does them all at a lesser level. But be very careful, the DDL generated by many tools, even those that focus on a few databases, can often generate less than optimal DDL. You have to know what to look for – so engage your DBA in making the decision just to be safe.
  4. Verify that your data modeling tool provides robust model consistency and accuracy checking reports and/or utilities. As data models grow (and they will), it can be quite overwhelming to have to manually check everything. And you cannot expect the poor DBA to sanity check the thousands or tens of thousands DDL lines a data modeling tool can quickly generate. Remember, above we talked about both effectiveness and efficiency. Effectiveness is mostly on your shoulders, but efficiency can be aided by good data modeling checking utilities.


In closing, data modeling has come a long way since its inception. And even though the heydays of CASE and software engineering have passed with the 90’s – the need for and usefulness of data models has not subsided. And data modeling can assist with any effort regardless of development methodology or paradigm. So don’t pass up on data modeling these days just because it’s a mature technique. You might well be very sorry if you do.


Data Warehousing DBA’s usually work with huge tables, often possessing hundreds of millions or even billions of rows. And occasionally, these DBA’s might need to perform maintenance that requires either recreating the entire huge table or some of its partitions. For example, the DBA might need to drop an unnecessary column or simply desire to de-fragment that table or some of its partitions (i.e. reduce the number of extents allocated).


In either case, data warehousing tables, and even just their partitions, are often simply too big for unloading and reloading. For example, exporting and then importing a 700 million row table may require 2 to 3 days run-time. Whereas a “CREATE TABLE AS SELECT” (a.k.a. CTAS) would take just a few hours -- assuming that sufficient space is available.

A run-time efficient version of a CTAS command to copy a huge table is as follows:










Thus, CTAS becomes the data warehousing DBA’s greatest alley. It permits the DBA to easily and quickly copy massive amounts of data. However, CTAS can also be a DBA’s worst enemy if they’re not careful. While the CTAS example above is run-time efficient for the copy operation itself, what about its effect on queries against the copied table?


Our test environment was a 700 million-row table on a 64-bit, 16-processor UNIX box with an EMC RAID-1 disk array. The CTAS command copied the table in less than one hour. However, queries against the copied table ran nearly four times as long as those same queries issued against the original table. What happened?


The problem is that the CTAS has negated the original table’s proximity of data. More simply stated, the CTAS has more uniformly spread related rows of data across many more blocks and files. This can be a very elusive problem to find. Look again at the CTAS example. Unfortunately, it looks harmless enough. It’s tough to see that it’s the source of the problem.


Most data-warehousing tables have their data loaded at regular intervals (e.g. nightly). As a result, related rows of data become naturally located in close proximity, often within the same or closely related data blocks inside a relatively few data files. So queries retrieving those densely packed rows perform a relatively minimum amount of physical IO.


The query to test a table’s data proximity is as follows:










FROM &table_name

WHERE &where_condition_for_returning_related_rows





The average row density for our original table was 90. In other words, each data block held approximately 90 related rows. However the density for our new table was only 24, meaning that each block held only 24 related rows. Hence, all the related data rows are spread across nearly four times as many data blocks. Since Oracle does its physical IO at the block level, we have nearly four times as much physical IO occurring. That’s why our queries were running four times as long.


The corrected CTAS locates the related rows together and is as follows:









GROUP BY &all_the_column_names_in_the_table;


The average row density for this newest table was 104 – which was even better than the original table’s density of 90. Note that the CTAS uses the GROUP BY clause, because the ORDER BY clause is illegal for a CTAS. Although the table copy run-time increased to nearly four hours (which was acceptable as a one time cost to copy the data versus 2-3 days for doing export/import), queries against this newest table ran within 5-10 minutes.


As you can see, data warehousing DBA’s have some very challenging issues to deal with. And often the problems and their solutions are not readily apparent. But that’s the real fun in working with terabytes of data …

Oracle technologists, database architects, database administrators, and senior database application developers are all very curious people by nature – even more curious than most people. So naturally database oriented issues evoke debates similar to “tastes great, less filling” from the famous beer commercial. You might be surprised by the scope and passion of such database debates. One area getting lots of discussion of recent is database virtualization – where many seem opposed just for the sake of opposition. I‘ve written a few articles why fighting against virtualizing databases is a losing proposition.



I’ve also written several articles on how to approach successful testing of database virtualization testing on notebooks and preparation for actual deployments.



Yet the skeptics remain. So be it. Hence I’m going to focus this article on comparing some common virtualization solutions Oracle technologists can use to test drive virtualized databases on their workstations – namely VMware Workstation ($189) vs. Oracle Virtual Box (free for personal use, $50 for commercial use). For testing these tools are great, for real deployments you would of course choose their industrial strength offerings: VMware vSphere and Oracle VM, respectively. But it’s my hopes that comparing the workstation versions, plus doing so on Windows vs. Linux as the host, will glean some useful information for peoples’ curiosity regarding virtualization and databases.

For testing I’ll be using Quest’s Benchmark Factory version 6.7 (which I’ll be referring to as BMF for short) to run the industry standard TPC-C benchmark. The databases will be fairly stock installs of Oracle 11g R2 with a few initialization parameter modifications, but otherwise pretty much the default or “out of the box”. The Linux and Windows hosts and VM’s too are pretty much stock, with just a file system modification to improve database IO performance:


  • For Windows:
    •   HKEY_LOCAL_MACHINE\System\CurrentControlSet\ Control\FileSystem\NtfsDisableLastAccessUpdate = 1
  • For Linux:
    • Edit /etc/fstab file and add the “NOATIME” attribute
    • Example: /dev/sda2        / ext3 defaults,noatime    1 1


The following hardware (two identical boxes) and software was used for all reported testing:


  • Windows Host
    • AMD Phenom II X4 955, 3.2 GHz CPU, Quad Core
    • 16 GB RAM DDR-3 RAM
    • RAID-0 (striped), Dual 7200 RPM SATA-II Disks
    • Windows Server 2003 Enterprise R2 64-bit with SP2
    • VMware Workstation for Windows 8.02 64-bit
    • Oracle Virtual Box 4.12 for Windows 64-bit
  • Linux Host
    • AMD Phenom II X4 955, 3.2 GHz CPU, Quad Core
    • 16 GB RAM DDR-3 RAM
    • RAID-0 (striped), Dual 7200 RPM SATA-II Disks
    • Ubuntu Desktop Linux 10.04 (lucid) 64-bit
    • VMware Workstation for Linux 8.02 64-bit
    • Oracle Virtual Box 4.12 for Linux 64-bit


The following virtual machines were tested on both hosts and under each virtualization platform:


  • Windows VM
    • 2 CPU with 2 threads per CPU = 4 CPU’s
    • 8 GB RAM
    • Windows Server 2003 Enterprise R2 64-bit with SP2
    • Oracle 11g R2 ( 64-bit
  • Linux VM
    • 2 CPU with 2 threads per CPU = 4 CPU’s
    • 8 GB RAM
    • Redhat Enterprise Linux 5.4 64-bit
    • Oracle 11g R2 ( 64-bit


Due to technical difficulties such as not being able to open the Windows VM’s .VMDK file under Virtual Box due to Windows issues with storage driver properties settings (i.e. SCSI vs. IDE) and not being able to configure Linux for running two virtual machine platforms both requiring hardware virtualization support at the same time – the matrix of test scenarios was reduced to as shown in the following table:




Virtualization Platform


Virtual Box


Host = Windows








Host = Linux








Figure 1 below displays the BMF TPC-C benchmark results for transactions per second, often simply abbreviated and referred to as TPS. Unfortunately although commonly favored, TPS is a generally misunderstood and largely misinterpreted metric.



Figure 1: Transactions / Second


Figure 1 clearly demonstrates two very key findings. First, that the various host operating systems, virtualization platforms, and VM operating systems combinations do not make a substantial difference. In other words the transactions per second results are essentially the same across all the various combinations – meaning that no one solution is better or worse than the others. The second and more important result is that the transaction per second or TPS rate rises with the concurrent user load. So at 120 concurrent users the score of 6.3 TPS is triple that at 40 concurrent users of 2.1 TPS. Of course that’s the expected result – i.e. the more concurrent users the higher the TPS rate until some threshold is reached breaking that trend. Since Figure 1 results have not yet reached that threshold, we could deduce that we can keep growing the concurrent user load even higher until the inflection point is finally reached. That however would be where we’d make the common and devastating mistake – and why TPS by itself is so dangerous.


Figure 2 below displays the BMF TPC-C benchmark results for average response time (in milliseconds), the one metric that all business end-users know quite well and often judge acceptable database application performance by – often even stating service level agreements or SLA’s in terms of average response time maximums. Unfortunately although the best understood and simplest metric to rate, average response time is quite often overlooked by technical people doing benchmarks in lieu of TPS and corresponding calculable physical IO rates such as IOPS. In fact these same technical people will often lower or remove the benchmark’s “keying & thinking” time in order to inflate those numbers. In essence they are more interested in engine’s tachometer (i.e. RPM’s) than the speedometer (i.e. MPH).



Figure 2: Average Response Time (in milliseconds)


Figure 2 also clearly demonstrates two very key findings – so key in fact that they are the crux of the actual knowable results from this benchmark testing. First, the various host operating systems, virtualization platforms, and VM operating systems do vary in terms of performance. The various combinations are not in fact all equal. Some choices such as Run-4 (Linux host, VMware and Linux guest) performs markedly worse across the board. While other choices such as Run-2 (Windows host, VMware and Windows guest) perform somewhat inconsistently – i.e. worse at lower concurrent user loads, but gradually improving as load increases. However second and the more critical finding is that assuming a common two second response time SLA requirement, we have quite a ways to increase the user load before we’re likely to cross that threshold. Note that TPS give us no clue as to where we are in relation to any limits, just that the numbers are getting better until they don’t. Average response time clearly indicates to us where we are in relation to that limit. In fact taking Run-1 (Windows host, VMware and Linux guest) and increasing the concurrent user load until that 2 second limit on average response time was reached took over 1,000 concurrent users. So MPH do in fact tell you when you’ll arrive at your destination more so than RPM’s.


In closing let me clearly state that I do not intend these results to suggest nor pick any winners nor losers. Merely that these benchmark results show clear evidence for me personally to make my own conclusions in some very specific testing scenarios. There are no universal truths indicated here regarding those choices. Hence I am not trying to suggest which host OS, virtualization platform or guest OS is best. Rather to me they are all just fine and hence they are all on my tool belt. What I do suggest is that virtualization is here to stay, it works and works well, so quit swimming against the current and embrace it – and when doing so proper benchmarks and their correct results interpretations can lead to very successful virtualized database deployments.

In the good old days most DBA’s could walk into their computing center and point to their database server and its disks. Furthermore, many DBA’s could also point to specific disks and state which database objects they housed. In fact database object placement and separation was a key tuning consideration back in the day. Oh my how times have changed.


At first all database servers utilized direct attached storage devices (i.e. DASD). In many cases the database server had limits as to the number of DASD it could support. Thus a big database often required a big server. So there was a direct one-to-one correlation between database size and server size.


Then along came Storage Area Networks (i.e. SAN). Think of a SAN as a storage server designed to house many disks and connected back to the database server via expensive, high speed fiber channels. Smaller databases tended to remain on DASD, while larger ones often got a dedicated SAN. Then as SAN technology improved, small to medium databases started sharing a SAN.


Not too long after the SAN along came the Network Attached Storage (i.e. NAS). Think of a NAS as a storage server designed to house many disks and connected back to the database server via inexpensive and common Ethernet. At first Ethernet speeds of 100Mbit restricted NAS usage to primarily smaller and less critical databases. However as Ethernet speeds reached 1Gbit and now 10Gbit, NAS has become a viable choice for most databases storage needs.


Finally both SAN and NAS evolved, adding some new and interesting wrinkles such as Fiber Channel over Ethernet (i.e. FCoE) and Internet Small Computer System Interface (i.e. iSCSI). But essentially SAN and NAS device architectures have remained the same. Just the connection methods were modernized. So the basic architectural diagrams did not really change.


In either of these storage scenarios (i.e. SAN and NAS), the DBA generally lost all touch with database object to disk mappings. At first when the SAN or NAS device was generally for but one database, some DBA’s may have performed or assisted with the Logical Unit (i.e. LUN) design. But as these storage devices became shared, often the DBA was simply assigned a black box of space for the DBA specified size requirements.


So let’s assume that the DBA requested 80GB of disk space for the database and the storage administrator created four 20GB LUN’s for that request. There are many reasons why the storage administrator might have created for LUN’s rather than one. It might be that the LUN’s are on different SAN and/or NAs devices. It’s also quite possible that storage administrator simply wanted to provide more one LUN so that the DBA could perform some logical separation of database objects.



Figure 1: Database Storage Design


Thus the DBA might design the database storage as shown above in Figure 1. Basically the DBA thinks he has four disk drives (i.e. the four LUN’s) and lays things out accordingly. But he doesn’t know where the spindles are or how may he may be striped across due to this basic storage abstraction. And as Figure 1 shows, the DBA may introduce further abstractions by using a Logical Vole Manager (i.e. LVM) or Oracle’s Automated Storage manager (i.e. ASM). Furthermore the use of database logical containers and constructs such as table spaces and partitions adds yet additional level of abstractions. But it now gets even worse as shown below in Figure 2.



Figure 2: Virtualized Storage Architecture


With storage virtualization, the storage administrator can now manage space across storage devices in a manner much like a LVM. Thus the physical LUN’s are aggregated into a centralized and shared “Storage Pool”. Now he can create and assign logical LUN’s for the DBA’s need. Of course that means yet another level of abstraction and thus removal from knowing where your disks are.


Thus we need to realize and acknowledge two key points. First, we essentially virtualized (i.e. abstracted) storage a very long time ago with the concept of the LUN. And second, the virtualization craze is simply adding yet another layer of abstraction or removal from your disks. You now have logical LUN’s.


So forget about ever really knowing where your disks are and how your database objects are laid out across them. You’ll need a time machine if you really must know. I’d say just forget it – there are bigger fish to fry.