Skip navigation

I’ll be speaking at the annual Hotsos Symposium in Dallas TX February 27th and 28th. To quote their website:

 

Hotsos is the best conference in the Americas devoted to Oracle system performance. The combination of topic focus, small audience size, and large speaker list make the Hotsos Conference an unsurpassed networking opportunity.

The Hotsos Symposium reaches for a new technical high with presentations focused on techniques, experimental results, and field-tested scripts that attendees can take home and apply to real-world tasks. The speakers routinely offer an excellent balance between database principles and problem-solving techniques. The speakers also show you how to determine the difference between reliable information and bad advice around Oracle performance optimization.

 

Here are my presentation topics, dates and times:

 

  • Optimizing Data Warehouse AdHoc Queries against Star Schemas
    • Monday, February 27, 2017
    • 1:00pm to 2:00pm CST
    • Hall B & Virtual Hall B
    • Attendees will learn optimal techniques for designing, monitoring and tuning "Star Schema" Data Warehouses in Oracle11g and 12c. While there are numerous books and papers on Data Warehousing with Oracle, they generally provide a 50,000 foot overview focusing on hardware and software architectures -- with some database design. This presentation provides the ground level, detailed recipe for successfully querying tables whose sizes exceed a billion rows. Issues covered will include table and index designs, partitioning options, statistics and histograms, Oracle initialization parameters and star transformation explain plans. Attendees should be DBA's familiar with "Star Schema" database designs and have experience with Oracle 11g, and some exposure to Oracle 12c.

 

  • Successful Dimensional Modeling of Very Large Data Warehouses
    • Tuesday, February 28, 2017
    • 1:00pm to 2:00pm CST
    • Hall B & Virtual Hall B
    • Attendees will learn successful techniques for dimensional modeling of very large Data Warehouses using traditional Entity Relationship diagramming tools. While there are numerous new modeling conventions and tools, Entity Relationship modeling tools have proven best at real-world database design and implementation. This presentation provides the ground level, detailed recipe for the optimal dimensional modeling techniques of tables whose sizes exceed 500 million rows. Issues covered will include "Star Schemas", fact and dimension tables, plus aggregate table design and population. Attendees should be DBA's or senior developers familiar with Oracle database design and any form of data modeling.

Often people will write to ask why some database tool such as SQL Developer, DBArtisan or Toad when working against Oracle 10g, 11g, or 12c databases seems generally sluggish or that some screens may appear to lock up or take as long to refresh as the refresh cycle. They are often especially frustrated since this problem did not occur with Oracle versions prior to 10g. So what is the database tool doing that’s wrong (i.e. is it a bug)?

 

The short answer is that this is an Oracle DBA best practices issue - and not a database tool problem per se. The database tool just happens to expose the symptoms since it makes such frequent access to data dictionary views and tables. The underlying problem generally effects all database applications to a varying degree – it’s just exposed most prominently via the database tool.

 

Oracle versions prior to 10g offered two query optimizers: cost and rule – with rule based being the default and fallback, plus also quite importantly internally used for most access to the data dictionary. While 10g and beyond still offer both optimizers, all access to the data dictionary is via the cost based optimizer, hence the data dictionary needs statistics gathered in order to function efficiently.

 

The common misconception (and source of the problem) is that people think Oracle 10g does this “auto-magically” as a background task (i.e. GATHER_STATS_JOB). However, the database does not – well not reliably anyway. As a result 90+ percent of the time the database tool users experience sluggish performance with Oracle 10g, 11g, or 12c database that is the root cause. Most times it takes a “heated debate” between the database tool users, their DBAs, and myself to convince them to try the quick and very simple fix – which takes just two minutes to perform and then successfully test.

 

Here’s the Oracle 10g documentation which most times finally gets them to try our fix:

Oracle® Database Performance Tuning Guide
10g Release 2 (10.2)

Chapter: 14 Managing Optimizer Statistics

  1. 14.2.3.1 When to Use Manual Statistics

Automatic statistics gathering should be sufficient for most database objects which are being modified at a moderate speed. However, there are cases where automatic statistics gathering may not be adequate…

Another area in which statistics need to be manually gathered are the system statistics. These statistics are not automatically gathered. See "System Statistics" for more information.

Statistics on fixed objects, such as the dynamic performance tables, need to be manually collected using GATHER_FIXED_OBJECTS_STATS procedure.

 

So here’s the recommended fix (performed via a DBA privileged account):

  • exec dbms_stats.gather_dictionary_stats;
  • exec dbms_stats.gather_fixed_objects_stats;

 

These two commands should probably be run about once per week for best results. Of the two commands, the first is far more critical as it handles the “ALL/DBA/USER_*” views and V$ tables – which are accessed constantly from within the database tool.

The second command is for the X$ tables, which are primarily referenced only by some advanced DBA features.

 

Please consider getting your DBA to make these calls either a scheduled PL/SQL job or possibly implementing them via “on instance start-up” database level trigger.

 

Now to give you an example of how much difference this process can make – we've clocked screens in the database tool that run quickly on pre-10g, but take very long on a default install of 10g (sometimes as long as 45 minutes). Once the proper data dictionary statistics are collected, the database tool performance equals or exceeds the pre-10g timings.

For far too long, database administrators and developers have focused on explain plans for tuning SQL statements that perform poorly. But with Oracle 11g and 12c, explain plans can quite possibly become a thing of the past. Furthermore, 12c can perform the entire tuning process automatically – with minimal time spent during the initial SQL development and maybe 1/10th the effort spent addressing any problematic SQL that somehow does find its way into production.

 

Does this all sound too good to be true? Then read on …

 

Let’s start by defining what an explain plan is – at least conceptually. The explain plan is the “approximate algorithm” that the Oracle database engine will attempt to “execute” in order to perform the requested database operation. Since SQL is a non-procedural, set oriented language – the explain plan is the only way to “guestimate” what the SQL code will actually do, because the code itself only expresses what is to be done – and not how. The explain plan exposes both the algorithm and indexes (if any) that the database engine would utilize during actual execution.

 

Over the years (and major database versions), Oracle has tried to make SQL optimization more stable and reliable than simple explain plans. An excerpt from the book Oracle 11g New Features cleanly delineates this progression nicely – which has been as follows:

 

  • Explain plans for Oracle versions <= 7.X
  • Plan stability for Oracle versions 8i and 9i
  • SQL Profiles for Oracle version 10g
  • SQL Plan Management for Oracle version 11g and 12c
  • I have to wait in line for my turn to be helped
  • The office may not have current information
  • The person helping me might be misinformed
  • I could say scenic route whereas the family simply wants to get there quickly
  • Things can change between getting the TripTik and starting/ending my trip
  • I must possess both an automobile and valid drivers license to make the trip
  • I’m purposefully eliminating far better methods or mode choices (e.g. flying)

 

Believe it or not, the exact same cons can often exist for explain plans – as follows:

 

  • During execution phase, the explain plan must be derived
  • Step #1 requires some resources, so there could be a delay
  • The statistics and/or histograms could be out of date (i.e. stale)
  • SQL may so malformed that fast explain plan derivation could be suboptimal
  • SQL code might contain hints that are suboptimal and yet take precedence
  • The fundamental performance conditions can change (e.g. new indexes, removal of indexes, partitioning of object, partitioning of indexes, “init.ora” changes that might affect SQL execution, addition and/or subtraction of server resources, etc.)
  • User must known how to obtain and read explain plans, and how to make SQL rewrites and/or database structure changes in order to improve the performance
  • Oracle 11g’s Automatic SQL Plan Management can often yield great results for far less (if not minimal) effort

 

So let’s examine how the newer versions of Oracle have totally eliminated such concerns.

 

Oracle 10g adds the new SQL Tuning Advisor – a utility to “Analyze individual SQL statements, and recommend SQL profiles, statistics, indexes, and restructured SQL to improve overall SQL execution performance.” Using my vacation analogy, the SQL Advisor is more like going to the American Airlines web site and booking airfare for the same trip. I’ll let the airline, pilot, navigator, tower staff, and air traffic controllers worry about all the details – I just have to say get me from point A to B for whatever price and travel time I am willing to bear. Likewise, we can now manually inform Oracle to worry about the algorithmic details churning inside – then we’ll just let the database know how much effort (i.e. time) to expend in order to meet our general SQL performance needs.

 

Thus Oracle 10g requires the DBA to remain engaged to generate, review and implement the recommendations, as shown below in Figure 1 (which is a slide from the Oracle 11g new features training class). Returning to the vacation analogy via an airline, I still have to manually input the origin, destination, dates, seat class and number of stops that I’m willing to endure. I can then choose from among the flight recommendations, but until I press the button for “purchase the ticket” – nothing real has as of yet actually happened. It’s all just “what-if” type analysis whose impact won’t be felt unless formally chosen.

 

x1.png

Figure 1: Manual SQL Tuning in Oracle 10g

 

So for 10G, we’d look at OEM’s Advisor Central SQL Tuning Advisors, and specifically the SQL Tuning Advisor looking for any statements with excessive database times or IO. The AWR repository will automatically contain all the SQL Tuning Candidates and they will be broken down along their relative percentage contribution to the cumulative total, as shown below in Figure 2.

 

x2.png

Figure 2: Top SQL by IO Tuning Candidates

 

The SQL Tuning Advisor will offer suggestions to improve performance – as well as an estimate of the percentage improvement, as shown below in Figure 3. Note too that it suggests more than just alternate explain plans (which is essentially just an internal SQL rewrite), but it also offers other critical advice – including database structural changes.

 

x3.png

Figure 3: Tuning Advisor Recommendations

 

Oracle 11g adds SQL Plan Management which can essentially automate the 10g scenario in whole, as shown below in Figure 4 (which is another slide from the Oracle 11g new features training class). Returning once again to my vacation analogy via a flight would be like setting some American Airlines web site booking preferences – and automatically having the airline purchase those tickets that best meet all my needs (and all without any involvement from me). Yes – I’m giving up some control, but in the real world we call such scenarios excellent customer service. So why should the database and/or DBA offer anything less? And for those DBA’s who don’t want to relinquish such control or worry about the relative overhead involved, the last step can be either manual or automatic.

 

x4.png

Figure 4: Automatic SQL Tuning in Oracle 11g and 12c

 

Of course in Oracle 12c this only gets better. Instead of just preserving repeatedly executed SQL statements as candidates, it now records and tests all SQL statements. I for one cannot wait to see what 12c R2 offers to improve this.

You’ll have to forgive me for spoofing the famous movie line “We don’t need no stinking badges” (Treasure of the Sierra Madre, 1948 and Blazing Saddles, 1974), it just seemed quite apropos.

 

I was discussing data modeling needs recently with a potential customer – and like many shops I visit, they saw no need for data modeling. That their DBA’s, data administrators and application architects knew everything that was home grown – and that the rest was third party applications for which they had no control and thus no need for a data model. That’s a very common belief these days it seems.

 

So my reply was to try to enlighten them about why data modeling makes sense no matter what – and especially for the scenario they described. I thought it might be worthwhile to share those points with the general public in the hopes of convincing a few more people.

 

Five common misbeliefs that data modeling is unneccessary:

 

1. Our DBA’s, data administrators and application architects have our databases under control.

 

That’s good news – you’re doing better than many shops. But what would happen if those guys and gals happen to play the Mega Millions lottery in a pool, and win for $360 million? What’s going to happen when they come in Monday and quit? That’s one heck of a lot of knowledge, wisdom and experience to try to back fill – and quickly. Wouldn’t having a data model make that process a little easier? OK – so the liklihood of that happening is low. What if they all get new jobs or retire around the same time? These things happen – so like the Boy Scouts say “Be Prepared”.

 

2. We only use 3rd party applications like SAP or Peoplesoft, so to us the database is a black box.

 

That’s perfectly true – often you cannot do anything about the design of such products. But knowing the design has several advantages. Ever been on a tech support call and wished you had a little more knowledge to either communicate your issue better or firmly dismiss a poor reason they say is the problem – and you know it is not. How about when you get a new version of the 3rd party application, would not knowing about some of the structural changes be useful in case the upgrade of the data runs into issues? And what if the business makes a requirements change, like having all addresses fully qualified and verified against a validity database? Would not having some idea of all areas impacted be helpful? I may not be able to effect changes to the US highway system, but it sure helps to have a map for long drives outside my comfort zone.

 

3. We only use home grown applications, so we already know everything about all the databases.

 

Actually, this has already been addressed by the prior two points (just exchange home grown for 3rd party application). Because all the same items hold true: you need to protect your company from unforseen knowledge loss, and it often helps to have a “road map” for impact analysis no matter who built the application. In many cases the justification for data models is even higher when the applications are developed in house. Because if you built the highways, you better be able to best navigate them – and a picture almost always helps.

 

4. We contract out application development, so no need to worry about such technical issues.

 

Ok – so when you buy a new home, you’re comfortable with just tell the builder to make something about 2600 square feet? That you don’t need to see a floor plan that meets your requirements, and that the contractors buidling it don’t need any blueprints off which to work. We all know that’s not true – especially because a home is such a large investment of the most valuable kind of money – our own. Well application development is no less specialized, critical, costs lots of precious corporate budget. What’s more, 80% or more of the IT project failures I’ve seen could have been adverted or at least lessened by having a data model up front. I’ve even been an expert witness in court cases on this issue. The data model is nothing more than a collection of the business requirements. If an application fails and there was no such document defining the business requirments – then whose fault is it? The best companies I’ve visited not only do data models for their own projects (regardless of who builds them), they even require that 3rd party vendors provide data models as part of the RFP process.

 

5. Data modeling is quite outdated and has been replaced by newer methodologies such as UML.

 

That could well be true. If your application developers have extensive UML models that fully cover the underlying “persistent” data structures, you may not need to do data models as well. But application developers tend to be programmers, and programmers tend ot think (and thus model) in terms of what the data in motion needs are – which is what their programs embody. So having just UML models may in fact provide you a false sense of security that you’re fully covered. If the UML models do not cover persistent data storage, then you may well need a data model to complement your UML models. It won’t hurt to double check to be safe.

Oracle’s Parallel Query Option (PQO) is a fantastic tool. But much like any good tool, it can very easily be used in the wrong situation or simply abused. In those cases, PQO can actually make database performance much worse. Let’s examine some common misuses or misperceptions regarding optimal and efficient PQO usage.

 

  1. PQO make sense universally on any SMP or multi-processer database server.

 

Right – and I have some wonderful “magic beans” to sell to people who blindly believe that theory. Oracle has to spawn additional shared server processes (or reuse those that are lying around still from recent completed PQO operations). So the cost of spawning additional processes and their footprints must be accounted for in the grand scheme of things. Plus, Oracle will need a “coordinator” process to coalesce all the results – and that too has a small cost. So calculate all this into your assumptions before using PQO. But for those of us less scientific, think of PQO as being like a grocery store checkout line. If you only have a couple items in your grocery cart, will forcibly spreading them across multiple checkout lanes get you out the any door sooner? And even if you do have an overly full cart, will spreading your items across checkout lanes that are already busy be any better? There are obvious times (i.e. rules) that are better than others. We simply need to apply the same clear thinking when using PQO.

 

  1. When I say parallel N, Oracle only exactly and only spawns N server processes.

 

OK – I’m 100% guilty of doing this on occasion. In fact, I might even do this more often than not. But I am cognoscente of the above issues and have planned according – and still have a reasonable PQO scenario where my IO is 100% not my chief bottleneck. Thus in those clear cases I will extend beyond the actual CPU count in my PQO settings. I’ve even gone two to four times higher in the right circumstances. And in those cases it has made a clear and measureable difference. But be careful of when it does not make sense. For an easy test to verify this, simply set PQO to degree four on your dual core desktop or laptop and run several concurrent queries. It will not only run slower, but painfully so.

 

  1. I have a big multi-processer server, so full-table scans are always the way to go.

 

As silly as this may sound at first, I’ve seen this in many data warehousing scenarios. I think it happens because somewhere someone wrote that “parallel full-table scans were efficient”. But that author meant in clearly reasonable scenarios – and not universally. Yet this remains one of the top data warehousing tuning mistakes I see. Think about a DW query – it may scan millions of rows in tables with many billions of rows. Why force a full-table scan when there is at least a ten-to-one, if not a one-hundred-to-one, reduction factor. That’s exactly what indexes are for – and index scans can be done in parallel. I’ve seen improvements as large as 1000% from undoing this simple PQO misuse.

Bert Scalzo

Oracle Dynamic Scripting

Posted by Bert Scalzo Jan 15, 2017

Ever wanted to perform some operation against all the objects in a schema? For example a one-time management task to turn off parallel for all tables and indexes. Another example might be an on-going task to disable and later enable triggers and referential integrity to speed up a recurring data load job. These types of tasks are not uncommon, however many people don’t know how to leverage dynamic SQL scripting to make such jobs trivially easy. So I’ll show dynamic SQL scripts for these two specific examples – which will provide you both a working knowledge and template for your own use.

 

Example #1: Turn off parallel for all tables and indexes for a named schema

 

-- @NOPAR.SQL  schema_name

-- Turns off parallel

-- for all tables and indexes

-- for the named schema

 

set echo off

set verify off

set feedback off

 

set linesize 256

set pagesize 0

 

set term off

spool %TEMP%\nopar.tmp

 

select 'alter table '||owner||'.'||table_name||' noparallel;'

  from all_tables

  where owner = upper('&1')

  order by table_name;

 

select 'alter index '||owner||'.'||index_name||' noparallel;'

  from all_indexes

  where owner = upper('&1')

  order by index_name;

 

spool off

set term on

 

@%TEMP%\nopar.tmp

 

A key aspect to understand is that as this SQL script runs we want all its output to be the new SQL script we’re creating to run. So we don’t want any excessive output such as echoing the commands being run, verification of parameter values (i.e. old vs. new), or feedback about the rows counts processed. Hence the first three SET commands address this. We also don’t know how long an output text line will be, so we need to set the line size accordingly. Finally we don’t want select headers or page breaks, so easiest way to get both is to set page size to zero. So now our generated script output is nothing more than just the SQL commands.

 

Note that I’m spooling the output to my Windows temporary directory as defined by the “%TEMP%” environment variable, plus I don’t want to see the select command output wiz by – set I set terminal output to off. Then all I do are some common select commands against Oracle data dictionary tables such as ALL_TABLES – but notice that I’m actually using text literals and concatenation operations with table columns to construct SQL commands rather than just typical select output. Once such output has been completed I simply set the terminal output back on and execute (i.e. @) the SQL script this script just created. That’s all there is to it.

 

Example #2: Disable and enable triggers and referential integrity (RI) for a named schema

 

-- @SETALL.SQL  schema_name  OFF|ON

-- Turns off triggers and referential integrity (RI)

-- for all tables

-- for the named schema

 

set echo off

set verify off

set feedback off

 

set linesize 256

set pagesize 0

 

set term off

spool %TEMP%\setall.tmp

 

select 'alter table '||owner||'.'||table_name||decode(upper('&2'),'OFF',' disable constraint ',' enable constraint ')||constraint_name||';'

  from all_constraints

  where owner = upper('&1')

  and constraint_type = 'R'

  and status = decode(upper('&2'),'OFF','ENABLED','DISABLED');

 

select 'alter trigger '||owner||'.'||trigger_name||decode(upper('&2'),'OFF',' disable',' enable')||';'

  from all_triggers

  where table_owner = upper('&1')

  and status = decode(upper('&2'),'OFF','ENABLED','DISABLED');

 

spool off

set term on

 

@%TEMP%\setall.tmp

 

This script was just a little more complicated, but essentially uses the exact same technique. In fact all the SET commands are the same. The SPOOL command only differs in the name of the temporary file. Then of course the select commands to build the SQL are different. Note that I added a second input parameter that allows for me to use this one script to both turn on and off the items of interest via the DECODE function. So I only need this one script for both tasks – turning off vs. on.

 

There’s literally no limit to how creative you can be with such dynamic SQL scripts. For example a long time ago before Oracle provided the metadata package to regurgitate the DDL to create an object, I had a complex dynamic SQL script to reverse engineer all of my database objects. The key take away here is this technique should be on any DBA or senior database developers skills list.

It’s not uncommon to discover databases that have simply evolved over time rather than having been built according to some overall design or “master plan”. And there seem to be no shortage of answers as to why this occurs. Sometimes DBA’s are just far too busy with a myriad of other tasks. Sometimes developers are under the gun for deliverables so quickly that there’s just not time. Or my favorite, this project started as just a couple new tables and then ballooned into a monster. Whatever the reason, the situation exists far too often in the real world of information systems.

 

Let’s put this into perspective. Would you purchase a brand new house where the builder simply showed you a high level “floor plan”, and then said “trust us” – we’ll just hammer together the lumber and stuff until we’re done, don’t worry. Not too sure very many of us would so nonchalantly encumber ourselves with a 30 year loan on such vague promises. We automatically expect the home builder to use that rough diagram as the precursory for an architectural blueprint that meets local codes, standards and ordinances. We expect the city zoning commission or code enforcement division to sign off on key items during the construction. We expect to spend the week before taking ownership to inspect the home and to mark things to be fixed before the closing. In short, when it’s our money and/or financial future – we expect things to simply work right.

 

The same is true in many other disciplines that we utilize every day. We get on elevators knowing that they were well designed and recently inspected. We readily fly on airplanes knowing that the FAA and other government agencies are making sure it’s safe and well regulated. We undergo surgery at hospitals knowing that doctors and administrators have jumped through major hoops to guarantee a reasonable level of safety. Yet we still build computerized systems with all the skill of a MASH surgeon – just get it stable enough to pass onto the next guy. No wonder then that both database administrators and application developers are quickly catching up to lawyers and politicians in terms of likeability.

 

So let’s start this week by just looking at the two most critical database success factors: effectiveness (doing the right thing) and efficiency (doing it expediently).

 

How can anything be effective if it’s just cobbled together haphazardly over time? I’m not saying that everyone must do a logical data model and transform that into a physical data model. That may not be most peoples’ “cup of tea”. But slowing down a wee bit to think of the overall physical design should yield a more consistent business solution. That should translate into higher effectiveness.

 

The same can be said for efficiency. So by not simply and seemingly randomly adding tables and columns on an as need basis can easily result in a more fluid design. It’s much easier too to code against such a design, which can also mean lower costs in terms of the development (i.e. less complicated to work with). Furthermore, this generally translates into lower ongoing maintenance costs as well – and often far less unplanned crisis too.

 

Find all this “pie in the sky” thinking to be too esoteric or too academic? Well I’ve been an expert witness in several trials where companies have sued consulting firms that built substandard applications and databases. In every single case, the plaintiff won substantial judgments based solely poor database design. So far my testimony has directly resulted in seven figure financial awards. That’s millions of dollars per case that could easily have been saved by slowing down and doing more planning or design. So as the old song goes “Look before you leap, still waters run deep” …

There are those who think that data modeling has become very passé’ these days. The belief is that because data modeling theory is over thirty years old and that since some data modeling tools have been around 10-20 years that somehow data modeling is now no longer relevant. Nothing could be further from the truth. In fact data modeling may now be more necessary than ever before.

 

While there are other modeling techniques and notations, such as BPM (Business Process Modeling) and UML (Unified Modeling Language), the need for accurately capturing the business data requirements and transforming that into a reliable database structural design is as paramount as ever. The key differentiator is that data modeling is the only technique and notation that focuses on the “data at rest”. All the others tend to focus more on “data in motion”. Another way to say this is that data modeling concentrates on issues that lead to a solid database design, while all the others tend to focus more on issues that will result in better application design or the things useful to programmers – such as data structures, objects, classes, methods, application code generation, etc.

 

I’ve personally been involved in several trials as an expert witness where plaintiffs were suing defendants for serious financial remuneration where custom database applications had performance and/or data accuracy problems. In every single case there was a failure to data model the business requirements. Thus the data effectiveness suffered. Moreover ad-hoc database design or database design using more programmatic oriented techniques and tools often resulted in inefficient database design. And no amount of coding trickery could overcome the resulting bad database design. So in every case, the plaintiff won.

 

The other reason data modeling has seen measurable resurgence is the data warehousing phenomenon. With cheap storage these days, most companies can afford to and benefit from retaining historical aggregate and/or summary data for making significant strategic decisions. With the accumulation of numerous source legacy OLTP (Online Transaction Processing) systems, there are two keys ways to approach populating a data warehouse: directly from source to warehouse (Figure 1) or through an intermediary database often referred to as an ODS or Operational Data Store (Figure 2).

 

x1.png

x2.png

 

There is already sufficient debate as to which approach is superior. I won’t address that here. However regardless of which approach one decides to utilize, the database design (i.e. the data at rest) is paramount. Because in a data warehouse the “data itself” and the business information it contains is the most relevant and valuable asset. Typical data warehouse queries and reports issued via BI (Business Intelligence) tools process that asset to yield key strategic decision making results.

 

The other key area where data modeling often supports the whole data warehousing and business intelligence effort is through the mapping of legacy data fields to their DW and BI counterparts. This meta-data mapping about how front-line business data maps to the data warehouse helps with the design of both the queries and/or reports, as well as with the ETL (Extract Translate and Load) programming efforts. Without such a mapping, as the OLTP legacy systems evolve there would be no automatic tie to the dependent data warehousing information. Hence one would have to almost totally re-engineer rather than simply follow the OLTP source data ramifications and ripples downstream to the DW and BI end points.

 

For those not involved with data warehousing projects, possibly just performing the more traditional OLTP type systems development, data modeling is still no less important. But often people get so caught up in novel paradigms such as “extreme programming”, “agile software development” or “scrum” that they compromise or even possibly entirely skip data modeling. The problem is that these new approaches don’t always spell out exactly how data modeling should be incorporated, thus people often skip doing it. My belief is that no matter what latest and greatest approach one utilizes, data modeling should be integrated into your development process wherever it makes sense.  Figure 3 shows how both conceptual and physical data modeling should fit into an overall database design process - whether it’s for a totally new system, or one that’s simply being updated or even totally re-engineered.

 

x3.png

 

There’s one final reason why data modeling has been getting more attention these days. In many cases organizations are finally requiring data models as a “sign-off” deliverable as part of the development process. I attribute this to organizations attempting to adhere to the Software Engineering Institute’s (SEI’s) Capability Maturity Model (CMM) and Capability Maturity Model Integration (CMMI) concepts. The idea here is quite simple: in order to mature ones’ development process regardless of technique, you need to mature in terms of both the processes and tools used to achieve the desired better end result. For example the proper and effective use of project management tools is often quoted as the single best way to escalate from Level 1 (Chaos) to Level 2 (Repeatable). The central idea that both processes and tools can lead to maturity has helped to reinvigorate many peoples’ interest in data modeling.

 

Now comes the hard part, what data modeling tool should one use? That might seem like a tough or loaded question. There are numerous data modeling tools out there. Plus there are many enterprise modeling suites that contain data modeling capabilities. Rather than advise any particular tool, I’m going to outline some basic guidelines of things to avoid. My belief is that any tool that meets some standard and minimal requirements will enable one to produce effective and efficient data models – and hence the resulting databases.

 

  1. Avoid drawing tools that aspire to be data modeling tools. A good data modeling tool supports defining tons of meta-data that has business relevance. Think of the diagram as just the tip of the iceberg – where 90% of the berg mass what you don’t see as it’s underwater. The same is true for data modeling. If you concentrate on just what the picture is, you’ll probably compromise the effectiveness of the resulting database.
  2. Choose a tool that fits your needs. Often people will purchase a killer modeling tool that offers everything imaginable. But if all you need or will use is the data modeling portion, then why pay for more. The key concern here is that the more any tool does besides data modeling, the more chance that its data modeling capabilities may have been compromised in order to do everything else. Sometime more is not better.
  3. Verify that the data modeling tool you choose generates solid DDL, especially for the target database platform. This is another case where more might not be better. It will be better if your tool supports 100% accurate CREATE or ALTER scripts for a few databases important to you than one that does them all at a lesser level. But be very careful, the DDL generated by many tools, even those that focus on a few databases, can often generate less than optimal DDL. You have to know what to look for – so engage your DBA in making the decision just to be safe.
  4. Verify that your data modeling tool provides robust model consistency and accuracy checking reports and/or utilities. As data models grow (and they will), it can be quite overwhelming to have to manually check everything. And you cannot expect the poor DBA to sanity check the thousands or tens of thousands DDL lines a data modeling tool can quickly generate. Remember, above we talked about both effectiveness and efficiency. Effectiveness is mostly on your shoulders, but efficiency can be aided by good data modeling checking utilities.

 

In closing, data modeling has come a long way since its inception. And even though the heydays of CASE and software engineering have passed with the 90’s – the need for and usefulness of data models has not subsided. And data modeling can assist with any effort regardless of development methodology or paradigm. So don’t pass up on data modeling these days just because it’s a mature technique. You might well be very sorry if you do.

 

Data Warehousing DBA’s usually work with huge tables, often possessing hundreds of millions or even billions of rows. And occasionally, these DBA’s might need to perform maintenance that requires either recreating the entire huge table or some of its partitions. For example, the DBA might need to drop an unnecessary column or simply desire to de-fragment that table or some of its partitions (i.e. reduce the number of extents allocated).

 

In either case, data warehousing tables, and even just their partitions, are often simply too big for unloading and reloading. For example, exporting and then importing a 700 million row table may require 2 to 3 days run-time. Whereas a “CREATE TABLE AS SELECT” (a.k.a. CTAS) would take just a few hours -- assuming that sufficient space is available.

A run-time efficient version of a CTAS command to copy a huge table is as follows:

 

            CREATE TABLE COPY_OF_ORDER

STORAGE (INITIAL 100M NEXT 100M PCTINCREASE 0)

PCTFREE 5 PCTUSED 90

NOLOGGING PARALLEL (DEGREE 16)

AS

SELECT *

FROM ORDER;

 

Thus, CTAS becomes the data warehousing DBA’s greatest alley. It permits the DBA to easily and quickly copy massive amounts of data. However, CTAS can also be a DBA’s worst enemy if they’re not careful. While the CTAS example above is run-time efficient for the copy operation itself, what about its effect on queries against the copied table?

 

Our test environment was a 700 million-row table on a 64-bit, 16-processor UNIX box with an EMC RAID-1 disk array. The CTAS command copied the table in less than one hour. However, queries against the copied table ran nearly four times as long as those same queries issued against the original table. What happened?

 

The problem is that the CTAS has negated the original table’s proximity of data. More simply stated, the CTAS has more uniformly spread related rows of data across many more blocks and files. This can be a very elusive problem to find. Look again at the CTAS example. Unfortunately, it looks harmless enough. It’s tough to see that it’s the source of the problem.

 

Most data-warehousing tables have their data loaded at regular intervals (e.g. nightly). As a result, related rows of data become naturally located in close proximity, often within the same or closely related data blocks inside a relatively few data files. So queries retrieving those densely packed rows perform a relatively minimum amount of physical IO.

 

The query to test a table’s data proximity is as follows:

 

SELECT

    COUNT(

        DISTINCT DBMS_ROWID.ROWID_RELATIVE_FNO(rowid)||'-'||

DBMS_ROWID.ROWID_BLOCK_NUMBER(rowed

    ) "FILES-BLOCKS",

                    AVG(

COUNT(DISTINCT DBMS_ROWID.ROWID_ROW_NUMBER(rowid))

    ) "AVERAGE ROW DENSITY"

FROM &table_name

WHERE &where_condition_for_returning_related_rows

GROUP BY

DBMS_ROWID.ROWID_RELATIVE_FNO(rowid)||'-‘||

DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid);

 

The average row density for our original table was 90. In other words, each data block held approximately 90 related rows. However the density for our new table was only 24, meaning that each block held only 24 related rows. Hence, all the related data rows are spread across nearly four times as many data blocks. Since Oracle does its physical IO at the block level, we have nearly four times as much physical IO occurring. That’s why our queries were running four times as long.

 

The corrected CTAS locates the related rows together and is as follows:

 

            CREATE TABLE ORDERED_COPY_OF_ORDER

STORAGE (INITIAL 100M NEXT 100M PCTINCREASE 0)

PCTFREE 5 PCTUSED 90

NOLOGGING PARALLEL (DEGREE 16)

AS

SELECT *

FROM ORDER

GROUP BY &all_the_column_names_in_the_table;

 

The average row density for this newest table was 104 – which was even better than the original table’s density of 90. Note that the CTAS uses the GROUP BY clause, because the ORDER BY clause is illegal for a CTAS. Although the table copy run-time increased to nearly four hours (which was acceptable as a one time cost to copy the data versus 2-3 days for doing export/import), queries against this newest table ran within 5-10 minutes.

 

As you can see, data warehousing DBA’s have some very challenging issues to deal with. And often the problems and their solutions are not readily apparent. But that’s the real fun in working with terabytes of data …

Oracle technologists, database architects, database administrators, and senior database application developers are all very curious people by nature – even more curious than most people. So naturally database oriented issues evoke debates similar to “tastes great, less filling” from the famous beer commercial. You might be surprised by the scope and passion of such database debates. One area getting lots of discussion of recent is database virtualization – where many seem opposed just for the sake of opposition. I‘ve written a few articles why fighting against virtualizing databases is a losing proposition.

 

 

I’ve also written several articles on how to approach successful testing of database virtualization testing on notebooks and preparation for actual deployments.

 

 

Yet the skeptics remain. So be it. Hence I’m going to focus this article on comparing some common virtualization solutions Oracle technologists can use to test drive virtualized databases on their workstations – namely VMware Workstation ($189) vs. Oracle Virtual Box (free for personal use, $50 for commercial use). For testing these tools are great, for real deployments you would of course choose their industrial strength offerings: VMware vSphere and Oracle VM, respectively. But it’s my hopes that comparing the workstation versions, plus doing so on Windows vs. Linux as the host, will glean some useful information for peoples’ curiosity regarding virtualization and databases.

For testing I’ll be using Quest’s Benchmark Factory version 6.7 (which I’ll be referring to as BMF for short) to run the industry standard TPC-C benchmark. The databases will be fairly stock installs of Oracle 11g R2 with a few initialization parameter modifications, but otherwise pretty much the default or “out of the box”. The Linux and Windows hosts and VM’s too are pretty much stock, with just a file system modification to improve database IO performance:

 

  • For Windows:
    •   HKEY_LOCAL_MACHINE\System\CurrentControlSet\ Control\FileSystem\NtfsDisableLastAccessUpdate = 1
  • For Linux:
    • Edit /etc/fstab file and add the “NOATIME” attribute
    • Example: /dev/sda2        / ext3 defaults,noatime    1 1

 

The following hardware (two identical boxes) and software was used for all reported testing:

 

  • Windows Host
    • AMD Phenom II X4 955, 3.2 GHz CPU, Quad Core
    • 16 GB RAM DDR-3 RAM
    • RAID-0 (striped), Dual 7200 RPM SATA-II Disks
    • Windows Server 2003 Enterprise R2 64-bit with SP2
    • VMware Workstation for Windows 8.02 64-bit
    • Oracle Virtual Box 4.12 for Windows 64-bit
  • Linux Host
    • AMD Phenom II X4 955, 3.2 GHz CPU, Quad Core
    • 16 GB RAM DDR-3 RAM
    • RAID-0 (striped), Dual 7200 RPM SATA-II Disks
    • Ubuntu Desktop Linux 10.04 (lucid) 64-bit
    • VMware Workstation for Linux 8.02 64-bit
    • Oracle Virtual Box 4.12 for Linux 64-bit

 

The following virtual machines were tested on both hosts and under each virtualization platform:

 

  • Windows VM
    • 2 CPU with 2 threads per CPU = 4 CPU’s
    • 8 GB RAM
    • Windows Server 2003 Enterprise R2 64-bit with SP2
    • Oracle 11g R2 (11.2.0.2) 64-bit
  • Linux VM
    • 2 CPU with 2 threads per CPU = 4 CPU’s
    • 8 GB RAM
    • Redhat Enterprise Linux 5.4 64-bit
    • Oracle 11g R2 (11.2.0.2) 64-bit

 

Due to technical difficulties such as not being able to open the Windows VM’s .VMDK file under Virtual Box due to Windows issues with storage driver properties settings (i.e. SCSI vs. IDE) and not being able to configure Linux for running two virtual machine platforms both requiring hardware virtualization support at the same time – the matrix of test scenarios was reduced to as shown in the following table:

 

 

VM OS

Virtualization Platform

VMware

Virtual Box

 

Host = Windows

Linux

Run-1

Run-3

Windows

Run-2

X

 

Host = Linux

Linux

Run-4

X

Windows

Run-5

X

 

Figure 1 below displays the BMF TPC-C benchmark results for transactions per second, often simply abbreviated and referred to as TPS. Unfortunately although commonly favored, TPS is a generally misunderstood and largely misinterpreted metric.

 

bbb1.png

Figure 1: Transactions / Second

 

Figure 1 clearly demonstrates two very key findings. First, that the various host operating systems, virtualization platforms, and VM operating systems combinations do not make a substantial difference. In other words the transactions per second results are essentially the same across all the various combinations – meaning that no one solution is better or worse than the others. The second and more important result is that the transaction per second or TPS rate rises with the concurrent user load. So at 120 concurrent users the score of 6.3 TPS is triple that at 40 concurrent users of 2.1 TPS. Of course that’s the expected result – i.e. the more concurrent users the higher the TPS rate until some threshold is reached breaking that trend. Since Figure 1 results have not yet reached that threshold, we could deduce that we can keep growing the concurrent user load even higher until the inflection point is finally reached. That however would be where we’d make the common and devastating mistake – and why TPS by itself is so dangerous.

 

Figure 2 below displays the BMF TPC-C benchmark results for average response time (in milliseconds), the one metric that all business end-users know quite well and often judge acceptable database application performance by – often even stating service level agreements or SLA’s in terms of average response time maximums. Unfortunately although the best understood and simplest metric to rate, average response time is quite often overlooked by technical people doing benchmarks in lieu of TPS and corresponding calculable physical IO rates such as IOPS. In fact these same technical people will often lower or remove the benchmark’s “keying & thinking” time in order to inflate those numbers. In essence they are more interested in engine’s tachometer (i.e. RPM’s) than the speedometer (i.e. MPH).

 

bbb2.png

Figure 2: Average Response Time (in milliseconds)

 

Figure 2 also clearly demonstrates two very key findings – so key in fact that they are the crux of the actual knowable results from this benchmark testing. First, the various host operating systems, virtualization platforms, and VM operating systems do vary in terms of performance. The various combinations are not in fact all equal. Some choices such as Run-4 (Linux host, VMware and Linux guest) performs markedly worse across the board. While other choices such as Run-2 (Windows host, VMware and Windows guest) perform somewhat inconsistently – i.e. worse at lower concurrent user loads, but gradually improving as load increases. However second and the more critical finding is that assuming a common two second response time SLA requirement, we have quite a ways to increase the user load before we’re likely to cross that threshold. Note that TPS give us no clue as to where we are in relation to any limits, just that the numbers are getting better until they don’t. Average response time clearly indicates to us where we are in relation to that limit. In fact taking Run-1 (Windows host, VMware and Linux guest) and increasing the concurrent user load until that 2 second limit on average response time was reached took over 1,000 concurrent users. So MPH do in fact tell you when you’ll arrive at your destination more so than RPM’s.

 

In closing let me clearly state that I do not intend these results to suggest nor pick any winners nor losers. Merely that these benchmark results show clear evidence for me personally to make my own conclusions in some very specific testing scenarios. There are no universal truths indicated here regarding those choices. Hence I am not trying to suggest which host OS, virtualization platform or guest OS is best. Rather to me they are all just fine and hence they are all on my tool belt. What I do suggest is that virtualization is here to stay, it works and works well, so quit swimming against the current and embrace it – and when doing so proper benchmarks and their correct results interpretations can lead to very successful virtualized database deployments.

In the good old days most DBA’s could walk into their computing center and point to their database server and its disks. Furthermore, many DBA’s could also point to specific disks and state which database objects they housed. In fact database object placement and separation was a key tuning consideration back in the day. Oh my how times have changed.

 

At first all database servers utilized direct attached storage devices (i.e. DASD). In many cases the database server had limits as to the number of DASD it could support. Thus a big database often required a big server. So there was a direct one-to-one correlation between database size and server size.

 

Then along came Storage Area Networks (i.e. SAN). Think of a SAN as a storage server designed to house many disks and connected back to the database server via expensive, high speed fiber channels. Smaller databases tended to remain on DASD, while larger ones often got a dedicated SAN. Then as SAN technology improved, small to medium databases started sharing a SAN.

 

Not too long after the SAN along came the Network Attached Storage (i.e. NAS). Think of a NAS as a storage server designed to house many disks and connected back to the database server via inexpensive and common Ethernet. At first Ethernet speeds of 100Mbit restricted NAS usage to primarily smaller and less critical databases. However as Ethernet speeds reached 1Gbit and now 10Gbit, NAS has become a viable choice for most databases storage needs.

 

Finally both SAN and NAS evolved, adding some new and interesting wrinkles such as Fiber Channel over Ethernet (i.e. FCoE) and Internet Small Computer System Interface (i.e. iSCSI). But essentially SAN and NAS device architectures have remained the same. Just the connection methods were modernized. So the basic architectural diagrams did not really change.

 

In either of these storage scenarios (i.e. SAN and NAS), the DBA generally lost all touch with database object to disk mappings. At first when the SAN or NAS device was generally for but one database, some DBA’s may have performed or assisted with the Logical Unit (i.e. LUN) design. But as these storage devices became shared, often the DBA was simply assigned a black box of space for the DBA specified size requirements.

 

So let’s assume that the DBA requested 80GB of disk space for the database and the storage administrator created four 20GB LUN’s for that request. There are many reasons why the storage administrator might have created for LUN’s rather than one. It might be that the LUN’s are on different SAN and/or NAs devices. It’s also quite possible that storage administrator simply wanted to provide more one LUN so that the DBA could perform some logical separation of database objects.

 

aaa1.png

Figure 1: Database Storage Design

 

Thus the DBA might design the database storage as shown above in Figure 1. Basically the DBA thinks he has four disk drives (i.e. the four LUN’s) and lays things out accordingly. But he doesn’t know where the spindles are or how may he may be striped across due to this basic storage abstraction. And as Figure 1 shows, the DBA may introduce further abstractions by using a Logical Vole Manager (i.e. LVM) or Oracle’s Automated Storage manager (i.e. ASM). Furthermore the use of database logical containers and constructs such as table spaces and partitions adds yet additional level of abstractions. But it now gets even worse as shown below in Figure 2.

 

aaa2.png

Figure 2: Virtualized Storage Architecture

 

With storage virtualization, the storage administrator can now manage space across storage devices in a manner much like a LVM. Thus the physical LUN’s are aggregated into a centralized and shared “Storage Pool”. Now he can create and assign logical LUN’s for the DBA’s need. Of course that means yet another level of abstraction and thus removal from knowing where your disks are.

 

Thus we need to realize and acknowledge two key points. First, we essentially virtualized (i.e. abstracted) storage a very long time ago with the concept of the LUN. And second, the virtualization craze is simply adding yet another layer of abstraction or removal from your disks. You now have logical LUN’s.

 

So forget about ever really knowing where your disks are and how your database objects are laid out across them. You’ll need a time machine if you really must know. I’d say just forget it – there are bigger fish to fry.

Bert Scalzo

Flashback to the Rescue

Posted by Bert Scalzo Dec 2, 2016

Face it, we’ve all been there – that moment when we’ve done something a wee bit too fast and then realize we probably shouldn’t have done it. Then we often generally say something like “stupid database” or “stupid TOAD’ or “stupid whatever. But there are ways to recover without having to go hat in hand to the DBA and beg forgiveness and obtain a recovery. More recent versions of Oracle offer several wonderful “Flashback” options that can often save the day. I’ve included a summary of all those options with an excerpt from my book “Advanced Oracle Utilities: The Definitive Reference”.

 

DBMS_FLASHBACK

 

At the beginning of this chapter when discussing various use case scenarios for the data pump export and import were mentioned. Namely – that a DBA might need the ability to restore collections of tables to some pristine state on a regular interval. In the prior section we just examined doing restructures where a backup or before image might be handy in case of problems mid-restructuring. The truth is that there are many cases where the DBA would like the ability to restore a logical portion of the database back in time or to a prior SCN. But the problem is that physical backups and restores are very complex – and operate more at the physical database level. And few applications are generally worthy of their own backup and recovery strategy using the physical tools to build logical or application based restore points.

 

But a long time ago Oracle introduced a great concept in the SQL and PL/SQL languages called the SAVEPOINT. This was the ability to issue a database state bookmark within you application code such that you could rollback to an application based logical point in time. This was a useful technique, but never really saw extensive usage. But it was a good concept nonetheless – if it only it would have extended to database objects and/or even the database level itself. Well now it does – we have Oracle’s flashback technology.

 

Oracle flashback technology essentially lets you create SAVEPOINT like bookmarks to restore to for either objects or the entire database. In some respects it’s a great short term point-in-time recover technique – rather than going to a full blown backup and restore. Plus its usage has been made so integrated, seamless and easy – that it’s sure to see heavy usage as time goes on. It’s truly a key new must-have tool for the DBA’s tool belt.

 

There are six flashback technologies (in chronological order of their appearance) whose topics we’ll examine in more detail:

 

  • Flashback Queries (9i)
  • Flashback Drop (10g)
  • Flashback Tables
  • Flashback Database
  • Flashback Transaction (11g)
  • Flashback Archives (i.e. Oracle Total Recall)

 

Furthermore, you’ll see that unlike other features discussed in this chapter – you’ll want to learn how to utilize these various flashback technology capabilities via OEM, SQL commands and the PL/SQL API.

 

Note – Part of the reason for covering all the flashback technologies here, including a recap of older ones, is to hopefully lead the reader along the historical path of flashback technology development – and thus to perceive that each step was built on the foundations of those prior.

 

Flashback Queries

 

Oracle 9i introduced the concept of the flashback query. I like to call this the “Back to the Future” or time machine type query – where Oracle lets us make some queries in the present as if from a database state in the not too distant past.

 

The black magic that makes this possible are UNDO tablespaces and automatic UNDO management – and that Oracle now treats those UNDO blocks as first rate data based upon the UNDO_RETENTION parameter. Using these, Oracle does its best to retain UNDO data (not guaranteed). You can even force that availability via the UNDO tablespace RETENTION GUARANTEE option.

 

In its simplest form, we merely add an AS OF clause to the SELECT statement to request the current execution be performed as if it were run at some prior specified time or system change number (i.e. thus looking backwards in time).

 

SQL> select DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER from dual;

 

GET_SYSTEM_CHANGE_NUMBER

------------------------

                  353026

 

SQL> update movies.customer set zip='99999';

 

62 rows updated.

 

SQL> commit;

 

Commit complete.

 

SQL> select firstname, lastname, zip from movies.customer as of scn 353026 where rownum < 5;

 

FIRSTNAME            LASTNAME                       ZIP

-------------------- ------------------------------ -----

Flozell              Adams                          75063

Troy                 Aikman                         75063

Larry                Allen                          75063

Eric                 Bjornson                       75063

 

We also can enable the entire Oracle session to enter a “time tunnel” or “time warp” so that we don’t have to add anything at all to the SELECT command to see such historical data. In that case we simply enable and disable the flashback effect as shown here via the PL/SQL packages found in DBMS_FLASHBACK.

 

SQL> select firstname, lastname, zip from movies.customer where rownum < 5;

 

FIRSTNAME            LASTNAME                       ZIP

-------------------- ------------------------------ -----

Flozell              Adams                          99999

Troy                 Aikman                         99999

Larry                Allen                          99999

Eric                 Bjornson                       99999

 

SQL>

SQL> execute DBMS_FLASHBACK.ENABLE_AT_SYSTEM_CHANGE_NUMBER(353026)

 

PL/SQL procedure successfully completed.

 

SQL>

SQL> select firstname, lastname, zip from movies.customer where rownum < 5;

 

FIRSTNAME            LASTNAME                       ZIP

-------------------- ------------------------------ -----

Flozell              Adams                          75063

Troy                 Aikman                         75063

Larry                Allen                          75063

Eric                 Bjornson                       75063

 

SQL> execute DBMS_FLASHBACK.DISABLE

 

PL/SQL procedure successfully completed.

 

Flashback Drop

 

Oracle 10g introduced the flashback drop concept – which is built off the new recycle bin in the database (think of it as much like the Windows recycle bin). Now when you drop an object, it’s moved first to the recycle bin (assuming that the recycle bin has been enabled via the RECYCLEBIN=ON init.ora parameter). The recycle bin simply retains the dropped objects under a new unique name.

 

SQL> drop table junk;

 

Table dropped.

 

SQL> select object_name, original_name, type from recyclebin;

 

OBJECT_NAME                    ORIGINAL_NAME                    TYPE

------------------------------ -------------------------------- ---------------

BIN$oP2i2G1STvita2AHhmFdVw==$0 JUNK                             TABLE

 

Then when we want to un-drop a table, the process is very easy – we simply use the FLASHBACK TABLE command to restore the table as it was before the drop. When the table flashback is invoked, not only are the table and its data brought back – but so are any dependent objects and grants. It’s that simple.

 

SQL> select * from junk;

 

        C1         C2

---------- ----------

1          2

 

SQL> drop table junk;

 

Table dropped.

 

SQL> select * from junk;

select * from junk

              *

ERROR at line 1:

ORA-00942: table or view does not exist

 

SQL> flashback table junk to before drop;

 

Flashback complete.

 

SQL> select * from junk;

 

        C1         C2

---------- ----------

         1          2

 

For those who prefer a graphical interface, OEM has a rather easy screen for doing object level complete recoveries (i.e. un-drop).

 

xyz1.png

Figure 4: OEM Object Level Recovery Screen

 

Prior to the flashback drop capability, the best method for doing object level recoveries was the logical backup method discussed earlier in this chapter (i.e. export/import). But handling referential integrity and dependencies was a problematic manual effort requiring significant investment to get 100% right.

 

Flashback Tables

 

The next logical step in the flashback progression is to do more than to undo a simple table drop – but rather to permit the table to return to its prior state as if making a flashback query permanent. Once again we have yet another key new requirement for this latest flashback feature to work – tables must have row movement enabled (feature that permits Oracle to change the ROWID of a row, otherwise they are usually immutable). Now we can use the prior section’s flashback syntax to return a table to some prior specified time or system change number – as shown here.

 

SQL> create table junk (c1 int, c2 int) enable row movement;

 

Table created.

 

SQL> insert into junk values (1,2);

 

1 row created.

 

SQL> commit;

 

Commit complete.

 

SQL> select DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER from dual;

 

GET_SYSTEM_CHANGE_NUMBER

------------------------

                  362096

 

SQL> insert into junk values (3,4);

 

1 row created.

 

SQL> commit;

 

Commit complete.

 

SQL> flashback table junk to scn 362096;

 

Flashback complete.

 

SQL> select * from junk;

 

        C1         C2

---------- ----------

         1          2

 

Once again for or those who prefer a graphical interface, OEM has a rather easy screen for doing object level point in time table recoveries (i.e. flashbacks).

 

xyz2.png

Figure 5: OEM Table Screen

 

xyz3.png

Figure 6: OEM Table Flashback Screen

 

Flashback Database

 

Continuing with our flashback technology progression, the next big step is to permit one to flashback an entire database. In essence we now want to perform a point in time recovery at the database level – and all very simply and without actually doing anything in RMAN. Once again we have yet another key new requirement for this latest flashback feature to work – the database must be running in ARCHIVELOG mode. Plus, we can only flashback to whatever online redo logs are immediately available via disk – any further back in time requires a traditional point in time recovery via RMAN. But for many cases even that short time span may suffice as the first-level basic point-in-time recovery strategy. The next example may look quite a bit like the last for the flashback table, we now simply have to take the database to a mounted status and issue the FLASHBACK DATABSE command as shown. All that’s changed is the scope of the operation.

 

SQL> create table junk (c1 int, c2 int) enable row movement;

 

Table created.

 

SQL> insert into junk values (1,2);

 

1 row created.

 

SQL> commit;

 

Commit complete.

 

SQL> select DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER from dual;

 

GET_SYSTEM_CHANGE_NUMBER

------------------------

                  365991

 

SQL> insert into junk values (3,4);

 

1 row created.

 

SQL> commit;

 

Commit complete.

 

SQL> connect sys/mgr as sysdba

 

Connected.

 

SQL> shutdown immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> startup mount;

ORACLE instance started.

 

Total System Global Area  627732480 bytes

Fixed Size                  1334996 bytes

Variable Size             398459180 bytes

Database Buffers          222298112 bytes

Redo Buffers                5640192 bytes

Database mounted.

SQL> flashback database to scn 365991;

 

Flashback complete.

 

SQL> alter database open resetlogs;

 

Database altered.

 

SQL> connect bert/bert

 

Connected.

 

SQL> select * from junk;

 

        C1         C2

---------- ----------

         1          2

 

We’ve now wrapped up the flashback technology historical progression as of Oracle 10g release 2 – so now it’s onto even bigger and better flashback capabilities with Oracle 11g. However whereas the progression and granularities up until this point were probably fairly obvious, we’re now moving into less obvious but critically useful flashback techniques – hopefully with a decent background and understanding now to appreciate just how we got here.

 

Flashback Transaction

 

The DBMS_FLASHBACK.TRANSACTION_BACKOUT procedure rolls back a transaction and all its dependent transactions. As with all the other flashback technologies discussed so far, the transaction back out operation uses UNDO to create and execute the compensating or opposite transactions that return the affected data to its original state. So in some respects the granularity of this flashback operation is somewhere between our last two cases: table and database. We’re now flashing back a subset of the database that represents some logical collection of tables and queries. It essentially implements the prior mentioned SAVEPOINT concept in the database now rather than the application code. In fact, flashback transactions mostly eliminate the need for the next section on redo log file mining – as it’s now transparently and more easily done as shown here.

 

First, we query the FLASHBACK_TRANSACTION_QUERY view to see what transactions exist for whatever objects and/or users we think may have done something that need undone. This view can return a lot of information in even a mildly busy database, so filtering is highly recommended. Here I’ve asked to see what transactions have occurred in the past day by logon user BERT and on tables owned by BERT. Note that this view offers us the UNDO SQL code.

 

SQL> select xid, start_scn, operation, table_name, undo_sql from flashback_transaction_query where start_timestamp>=sysdate-1 and username='BERT' and table_owner='BERT';

XID               START_SCN OPERATION    TABLE_NAME

---------------- ---------- ------------ ------------

UNDO_SQL

--------------------------------------------------------------

0200030052030000     475697 DELETE       JUNK

insert into "BERT"."JUNK"("C1","C2") values ('5','6');

 

0200030052030000     475697 DELETE       JUNK

insert into "BERT"."JUNK"("C1","C2") values ('3','4');

 

0200030052030000     475697 INSERT       JUNK

delete from "BERT"."JUNK" where ROWID = 'AAAD94AAAAAAChOAAD';

 

0200030052030000     475697 INSERT       JUNK

delete from "BERT"."JUNK" where ROWID = 'AAAD94AAAAAAChOAAC';

 

0200030052030000     475697 INSERT       JUNK

delete from "BERT"."JUNK" where ROWID = 'AAAD94AAAAAAChOAAB';

 

0200030052030000     475697 INSERT       JUNK

delete from "BERT"."JUNK" where ROWID = 'AAAD94AAAAAAChOAAA';

 

If I now want to simply undo the two delete commands (whose undo action to re-insert the data that was deleted), here’s the PL/SQL code for doing that.

 

SQL> select * from bert.junk;

 

        C1         C2

---------- ----------

         1          2

         7          8

 

SQL> declare

   trans_arr XID_ARRAY;

begin

   trans_arr := xid_array('0200030052030000','0200030052030000');

   dbms_flashback.transaction_backout (

        numtxns         => 1,

        xids            => trans_arr,

        options         => dbms_flashback.cascade

   );

end;

/

 

SQL> select * from bert.junk;

 

        C1         C2

---------- ----------

         1 2

         3          4

         5          6

         7          8

 

Once again for or those who prefer a graphical interface, OEM has a rather easy screen for doing object level point in time transaction recoveries (i.e. flashbacks).

 

xyz4.png

Figure 7: OEM Table Screen

 

xyz5.png

Figure 8: OEM Transaction Flashback Screen

 

Flashback Archives

 

The final piece of the puzzle in the flashback progression is the new Oracle 11g flashback archive. You define an area that provides the ability to automatically track and archive transactional data changes to specified database objects. These flashback archives become user named and somewhat then managed persistence of UNDO at the specified object level. So when you need to do a SELCT with an AS OF, you can rely on the object being in your chosen container for the specified duration and competing for space only with the objects you so choose. Thus it’s merely a named are to support all the prior flashback features we’ve just examined.

 

SQL> create tablespace flash_archive

datafile 'c:\oracle\oradata\ordb1\flash_archive.dbf' size 50M;

 

 

Tablespace created

 

SQL> create flashback archive default flash_archive tablespace flash_archive retention 30 day;

 

Flashback archive created.

 

SQL> create table bert.junk (c1 int, c2 int) flashback archive flash_archive;

 

Table created.

Probably one of the most often discussed and hotly debated topics in both data modeling and database design is that of normalization. Many database designers, architects, DBA’s and senior developers have differing positions and/or beliefs on the topic. However quite unfortunately, they often are not communicating with optimal effectiveness due to some fundamental differences in terminology and understanding. The simple solution is often to take five minutes to review normalization basics – and thus get everyone on the same page. That’s what this  blog entry strives to do.

 

However before you proceed, always ask yourself what kind of database and application you’re building – because many important design issues vary widely depending on the target system’s nature shown in the chart below.

 

 

OLTP

ODS

OLAP

DM/DW

Business Focus

Operational

Operational Tactical

Tactical

Tactical

Strategic

End User Tools

Client Server

Web

Client Server

Web

Client Server

Client Server

Web

DB Technology

Relational

Relational

Cubic

Relational

Trans Count

Large

Medium

Small

Small

Trans Size

Small

Medium

Medium

Large

Trans Time

Short

Medium

Long

Long

Size in Gigs

10 – 200

50 – 400

50 – 400

400 - 4000

Normalization

3NF

3NF

N/A

0NF

Data Modeling

Traditional ER

Traditional ER

N/A

Dimensional

 

For the purposes of this discussion, let’s assume that you’re working on traditional OLTP database and application. Thus you’re hoping to achieve third normal form. Let’s review what that means – and specifically, without using fancy mathematical definitions and/or terms that only PhD’s might understand. Let’s make normalization both easy and fun J

 

Remember the old “Perry Mason” television show? (maybe I should have said Matlock so as not to date myself) When a witness takes the stand – the clerk asks “Do you swear to tell the truth, the whole truth, and nothing but the truth – so help you God?” Well, we can utilize that simple little jingle to both define and remember normalization. Our catch phrase will be “… the key, the whole key, and nothing but the key – so help me Codd”. I’ll assume everyone knows who Dr. Codd was – the father of modern relational theory.

 

First Normal Form

 

There are four fundamental requirements for an entity or table table to meet first normal form – all of which must hold true:

 

  • The entity or table must have a defined and active primary key
  • The primary key is the minimal set of columns that uniquely identify a record
  • Each attribute or column in the table contains a single value (i.e. atomicity)
  • Eliminate repeating groups (i.e. no arrays)

 

It’s these last two bullet items where we’ll focus (i.e. for now, we’ll just agree that our tables should have primary keys as part of good relational design). We’re looking for single atomic values that depend upon the key – hence the first item in our little jingle.

 

So here’s an entity to review for adherence to first normal form:

 

x1.jpg

So how does CUSOMER measure up? There is a primary key, and it’s a single numerical column – can’t get much shorter than that. And all of the columns have simple data types that appear to contain single values (i.e. no complex data types like varrays, nested tables or XML Types – where you’d have to ask some additional questions). So at first glance you might accept CUSTOMER as being in 1NF – but you very well could be wrong L

 

I actually testified as an expert witness in a court case on this first issue. See the LONG field – what and how is that being used for? After talking to the developers and looking at some code, assume that we find out that all the customer’s orders are collected here as the order number followed by a semi-colon. That’s an array – or a collection of values. That means we’re not in 1NF. And more importantly – we have a real world performace issue.

 

How would you write the query to join CUSTOMER and ORDER? Since SQL can not handle this – you’d have to programmatically process the join. So instead of the server doing the work on potentially large tables, the client code must process the join logic. Thus you have two bottlenecks – the client CPU and the network traffic to send all the records to the client to examine. Ouch!

 

OK – that example is a bit extreme. The real problem is with ADDRESS_LINE. In this case we have modeling how an address looks on an envelope rather than its constituent parts. Now in the case of an address that’s probably reasonable – but it sets a dangerous precedent. Look at the example below:

 

x2.jpg

 

The ORDER contains a bunch of dates. The idea is to track each step of processing the order from start to finish. So everytime we find a new “work in progress” step that we want to track, we need to alter the ORDER table and add an optional column. Yuck.

 

A more elegant solution that removes this hidden repeating group is to create a separate entity for work order tracking as shown here:

 

x3.jpg

 

Now we can add new new “work in progress” steps by merely adding a row of data to the ORDER_STAGES table. In fact, we can even build a maintenance screen to support this so that end-users can easily make such business oriented modifications without sending this back to information systems. In other words, the resulting system is more flexible.

 

Second Normal Form

 

There are two fundamental requirements for an entity or table table to meet second normal form – all of which must hold true:

 

  • The entity or table must be in first normal form
  • All attributes or columns must depend upon the whole key

 

It’s the last bullet item where we’ll focus. If the primary key is constructed upon a single attribute or column, then we are already in 2NF. But if the primary key is composed of multiple attributes or columns (i.e. a concatenated key), then we are looking for all the attributes or columns to depend upon the whole key – hence the second item in our little jingle.

 

So here’s an entity to review for adherence to second normal form:

 

x4.jpg

 

The test is really simple – does each attribute or column depend upon the whole key? First, does PROD_COST depend on both PROD_NAME and VENDOR_NAME? It might be arguable that this holds. But second, does FACTORY_LOCATION depend upon anything other than VENDOR_NAME? Since the answer is most likely not, then we separate this information into its own entity as shown here:

 

x5.jpg

 

The only major problem that I see more often than not is where people break a cardinal relational database design rule, and have a single attribute or column contain multiple pieces of information. For example, they might design an attribute or column called PRODUCT_KEY which is a VARCHAR2(20) column where the first 10 characters are the product’s name and the second 10 are the vendor’s name. Thus in effect they have a concatenated key that violates 2NF. My point is that you cannot effectively normalize when attributes or columns don’t represent a single concept – so please don’t do it.

 

Third Normal Form

 

There are two fundamental requirements for an entity or table table to meet third normal form – all of which must hold true:

 

  • The entity or table must be in second normal form (and thus also in 1NF)
  • All non-key attributes or columns must depend on nothing but the key

 

It’s the last bullet item where we’ll focus. Does every non-key attribute or column depend upon nothing but the key – hence the third item in our little jingle.

 

So here’s an entity to review for adherence to third normal form:

 

x6.jpg

 

Again the test is quite simple – does each non-key attribute or column depend only on PROD_NUMBER? The last two attributes or columns very clearly depend on just the PROD_CATEGORY, so it violates 3NF. Once again the solution is to simply separate this related information into its own entity as shown here:

 

x7.jpg

 

Higher Normal Forms

 

Most shops will be well served if they can regularly identify and correct for up to 3NF. The primary question of whether to normalize further really depends on whom you ask. The database designers and data architects might like to normalize further, but then they don’t have to write the more complex code to manage such highly structured data. There is a clear trade-off between code readability and maintainability when you go much higher than 3NF. But it’s my experience that most developers’ skills and patience rapidly erode beyond 3NF. We live in a world of do more faster and with less resources. So let’s not design a perfect database doomed to fail based upon academic or theoretical practice that sacrifices coding efficiency to save cheap disk space. Let’s choose a happy medium that’s effective and yet allows us to remain efficient in the implementation. And you heard that from a person with his PhD – so I guess 20 years in the trenches has forced me to practice what I preach.

While there are many different graphical user interface tools for running SQL statements, sometimes all you really want and need is a smaller, focused and thus simpler tool for modest tasks – such as opening and running a basic SQL script. For those specific, lesser needs I often utilize the popular freeware editor Notepad++. However it requires manual tweaking in order to have Notepad++ execute SQL statements via Oracle’s SQL*Plus. In this blog I’ll show you how. The goal will be to have Notepad++ look, feel and behave as shown in the Figure 1 below – where the top half of the screen displays the SQL statement, the bottom half of the screen shows the command’s output, and there is a key mapped to execute the SQL statement.

 

zzz1.png

Figure 1: Notepad++ Example

 

Here are the steps:

  1. Launch Notepad++
  2. Main menu -> Plugins -> Plugin Manager -> Show Plugin Manager
  3. Available Tab, Find and check NppExec plugin (see Figure 2 below)
  4. Press Install button to download & install plugin – restarts Notepad++
  5. Open a SQL script
  6. Press F6 key (NppExec’s default keyboard mapping for “Execute Statement”)
  7. Enter the following macro script into the Execute pop-up (see Figure 3 below)

set ORA_USER=bert

set ORA_PASS=bert1234

set ORA_SID= ORCL

npp_save

cmd /c copy /y "$(CURRENT_DIRECTORY)\$(FILE_NAME)" "$(SYS.TEMP)\$(FILE_NAME)" >nul 2>&1

cmd /c echo. >> "$(SYS.TEMP)\$(FILE_NAME)"

cmd /c echo exit >> "$(SYS.TEMP)\$(FILE_NAME)"

sqlplus -l $(ORA_USER)/$(ORA_PASS)@$(ORA_SID) @"$(SYS.TEMP)\$(FILE_NAME)"

 

  1. Change the first three variables for your database, username and password
  2. Press the OK button

 

zzz2.png

Figure 2: Notepad++ Plugin Manager – Enable NppExec

 

zzz3.png

Figure 3: Save your SQL*Plus execute macro

Servers everywhere are being virtualized. Yet many DBA’s are hesitant to embrace virtualized database servers. I’ve been mystified by this anomaly, so I’ve asked those opposed for their rationale. While there are numerous arguments against, basically two pervasive themes surface from amongst all the replies.

 

First and foremost, DBA’s almost universally claim that their databases are “mission critical” and cannot suffer any performance hit that virtualization would necessarily impose. I hypothesize that these people must either consider shared resources as negative or have read that virtualization overhead can be from 5% to 15% - and they cannot suffer that loss.

 

However those very same DBA’s quickly allowed the single most important database performance factor (Disk IO) to become shared well over a decade ago. We all quickly embraced new Storage Area Network (SAN) disk arrays in order to get large pools of storage. Yet very few of those SAN’s were dedicated to a single database, or even a single DBA’s multiple databases. SAN’s were generally shared resources, and often without the DBA fully aware of who was sharing their spindles. We simply asked for “black box” amounts of space that were assigned for our use as LUN’s.

 

Today we’re simply permitting the three remaining key components (CPU, memory and networking) to be shared like our storage. If we so brazenly accepted it for Disk IO back then, how can we now say that the much less important database performance factors cannot be shared? I believe it’s just resistance to change.

 

As for the virtualization overhead, it’s a non-factor. If we were simply going to virtualize the database server and place it back on the same physical server, then sure – there would be a slight performance reduction. However DBA’s generally order excess capacity for growth, thus most servers are idle more than 50% of the time overall. But most virtualization efforts are to replace smaller servers with much larger shared ones. So losing my four CPU and 16GB RAM physical server and then being allocated the same or more resources from a much larger shared server should be a non-issue. As long as there is not over-allocation of resources on the physical virtual servers (i.e. hosts), then the negative performance impact should range from minimal to non-existent. Thus if four quad CPU and 16GB database servers were re-hosted to a virtualized host that had 32 CPU’s and 128GB of memory – the performance could actually be better (or at worst about the same).

 

The second pervasive but veiled theme is one regarding “loss of control”. You’re not going to like this observation nor be happy with me for making it. But in the good old days the DBA was a god. We often had unfettered access to our hardware platform. It was not uncommon to have “root” access. We often performed numerous complex design and management tasks, including hardware platform research and ordering, operating system configuration and tuning, storage design and allocation, capacity monitoring and projections, and so on. Thus the DBA knowledge and responsibilities were Herculean – and we loved it that way.

 

But in a virtualized world, now the DBA simply treats everything as a “black box” that someone else both provides and manages. We cannot venture into the server room anymore and knowingly point to our static resources such as disks. Nor can we really know exactly where our virtual machine is being hosted, because it can move – sometimes dynamically. Plus we have to ask someone else for things we used to do for ourselves. It’s a bit unnerving for those who remember the good old days.

 

Yes – there are some very valid performance issues that must be addressed when you virtualize your database, and those cannot be left to the defaults or chance. But most people seem to object more in the abstract. You’re not going to stop virtualization – so might as well learn to embrace and even like it.