This discussion is archived
13 Replies Latest reply: Nov 28, 2012 9:56 AM by Osama_Mustafa RSS

Masking on Oracle

976391 Newbie
Currently Being Moderated
Hi Everyone,

How we can mask a column value present in the production which is very sesitive like bank a/c no,Credit card nos etc while replicating to the test/development envirnoments in PL/SQL.Things that need to be considered is referential integrity,data types,length of the column and performace while execuing the code.Any help or pointer will be very helpful.Thanks !

Regards,
Sandeep
  • 1. Re: Masking on Oracle
    Ora Pro
    Currently Being Moderated
    VPD column masking.

    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1855489000346413353
  • 2. Re: Masking on Oracle
    Osama_Mustafa Oracle ACE
    Currently Being Moderated
    Google it ,
    http://www.oracle-base.com/articles/10g/database-security-enhancements-10g.php#column_masking
  • 3. Re: Masking on Oracle
    rp0428 Guru
    Currently Being Moderated
    >
    How we can mask a column value present in the production which is very sesitive like bank a/c no,Credit card nos etc while replicating to the test/development envirnoments in PL/SQL.Things that need to be considered is referential integrity,data types,length of the column and performace while execuing the code
    >
    You shouldn't need to replicate production data to test/dev you should use test/dev data that conforms to the proper business rules.
    Production might have 100,000 vendors. Why do you need or even want 100,000 vendors in test/dev? The only system other than production that should have production data is a standby database or a secure database used for volume and performance testing.

    For test/dev purposes you should create a database that has the minimum amount of data needed to represent all of the business rules for the application. Typically the lookup tables would have the same content as production because they contain unique values but the actual data tables (e.g. customer, customer_address) should just contain a minimum of data.

    If your companies policy allows you to replicate production data to test/dev environments they should change the policy. You will likely never be able to implement the controls needed to satisfy a Sarbanes/Oxley or HIPAA audit.
  • 4. Re: Masking on Oracle
    Justin Cave Oracle ACE
    Currently Being Moderated
    I totally disagree-- there is great value in replicating and masking production data to the lower environments. And doing so is perfectly compatible with Sarbanes-Oxley and HIPAA requirements (obviously, the masking of data is a requirement). That's why Oracle has things like the Data Masking Pack.

    If production has 100,000 vendors, you want dev and test to have 100,000 vendors because that's the only way that you are going to ensure that you've got a similar data volume to prod and that you'll have similar performance to prod. I've seen way too many performance issues created when the dev and test databases have a tiny fraction of the prod database so reports and queries that are quick in the lower environments grind the production server to a halt. And then it becomes a major challenge to tune these queries both because the lower environments use a different plan (which can be remedied by regularly copying statistics from the production environment) and because plans that are more efficient in lower environments aren't necessarily more efficient in production. So you end up with multiple cycles of developers tuning a query in a dev database only to find that the tuned query ends up performing much worse in prod (or no better or a tiny bit better but nowhere near the improvement you saw in dev).

    And a production OLTP database invariably contains lots of dirty data. It has data from years ago before a particular business rule was in place. It has data that was allowed to violate a business rule because of an application bug that was fixed 6 months ago. It has data that was allowed to violate a business rule because of an application bug that hasn't been identified yet. Having all this dirty data in the lower environments is hugely beneficial when you're trying to do things like write data migration scripts to handle changes to the data model. Otherwise, you inevitably find that the scripts that you tested in dev & test error out on deployment day because there is some data condition that they don't account for (which may or may not be thought to be possible) meaning that the entire build gets pushed or your developers are scrambling to rewrite migration scripts with little or no testing.

    Justin
  • 5. Re: Masking on Oracle
    Osama_Mustafa Oracle ACE
    Currently Being Moderated
    Lets face it , using production data in test/Dev Database its more effective and efficient . first developers will be able to test on real data and make them solve bugs , if there's no data real data will be available for developers they will generate meaningless information that only lead to nothing .

    PROD Data help with solving code problem and performance that leading to this data . but as we all know as DBA the PROD data Contain Sensitive information for Customer and maybe some company prevent to use this data in test/Dev database cause the customer information such as Credit card , bank account , email address . but all this issues could be solve by remove the Sensitive data from the test/dev database , also oracle provide solutions like column masking, row masking to hide this data . and use anonymous one . another solution for hiding sensitive data by use another format for credit card , account number by writing Procedure or trigger to do that .

    so it could be used but carefully .
  • 6. Re: Masking on Oracle
    rp0428 Guru
    Currently Being Moderated
    >
    I totally disagree
    >
    And so do I - with virtually everything you said. In this paragraph you actually refute your own argument.
    >
    If production has 100,000 vendors, you want dev and test to have 100,000 vendors because that's the only way that you are going to ensure that you've got a similar data volume to prod and that you'll have similar performance to prod. I've seen way too many performance issues created when the dev and test databases have a tiny fraction of the prod database so reports and queries that are quick in the lower environments grind the production server to a halt. And then it becomes a major challenge to tune these queries both because the lower environments use a different plan (which can be remedied by regularly copying statistics from the production environment) and because plans that are more efficient in lower environments aren't necessarily more efficient in production. So you end up with multiple cycles of developers tuning a query in a dev database only to find that the tuned query ends up performing much worse in prod (or no better or a tiny bit better but nowhere near the improvement you saw in dev).
    >
    DEV is not meant for volume or performance testing - it is for development. Perhaps you missed my comment about that
    >
    The only system other than production that should have production data is a standby database or a secure database used for volume and performance testing.
    >
    Certainly you need an environment for volume/performance testing but that environment should certainly not be DEV - that would be a MAJOR mistake.

    As your own argument says you can easily copy production stats to DEV to check execution plans and there are no security issues or cleansing needed when stats are copied.

    And most of what you said in this paragraph is the type of comment I expect from a rookie not from an experienced professional like yourself. The assumptions you make here are way off the mark and relying on them is what actually causes more future bugs that it does to prevent them.
    >
    And a production OLTP database invariably contains lots of dirty data. It has data from years ago before a particular business rule was in place. It has data that was allowed to violate a business rule because of an application bug that was fixed 6 months ago. It has data that was allowed to violate a business rule because of an application bug that hasn't been identified yet. Having all this dirty data in the lower environments is hugely beneficial when you're trying to do things like write data migration scripts to handle changes to the data model. Otherwise, you inevitably find that the scripts that you tested in dev & test error out on deployment day because there is some data condition that they don't account for (which may or may not be thought to be possible) meaning that the entire build gets pushed or your developers are scrambling to rewrite migration scripts with little or no testing.
    >
    Absolutely, positively, totally WRONG. Almost everything you said here is stated as FACT when it is almost always nothing but CONJECTURE or WISHFUL THINKING. And that is EXACTLY a major reason you DO NOT want to use production data for DEV or test (again, except for volume/performance tests).

    Production data seldom, if ever, represents ALL of the business rules an application should be enforcing. And any dirty data in the prod system will NEVER represent all of the 'dirty data' scenarios that need to be developed for and mitigated against.

    So any development that solely used prod data would be woefully incomplete and will not cover most of the corner cases that can cause havoc in a system. Use of prod data also makes it impossible to perform regression testing since the data will change over time.

    A developer has absolutely no control over any aspect of the production data. When you use prod data it is strictly 'take it or leave it'. And that is a bad proposition on several fronts. For dev/test you need to know with 100% certainty exactly what data you are using, what business rules it obeys, or doesn't obey. Most importantly you need to have data that represents all of the corner cases and tests every single business rule possible and the combinations of those rules.

    Even things as simple as an address column defined, perhaps, as VARCHAR2(30) may not have ANY data that is actually 30 bytes long and fills the column. Do you have columns defined as NUMBER? Then you need test data for that column that includes both positive and negative numbers and includes both the largest and smallest values that a NUMBER column can hold. I've never worked on a production system that had production data with 38 significant digits. There are many, many similar data scenarios that are unlikely to be covered by a given set of production data.

    1. Define the requirements and the business rules
    2. Create test data that matches the documented business rules.
    3. Create tests that validate that the business rules are being met.
    4. Design/write/test code that implements the business rules.

    Turn me loose in your dev environment where you use production data and I will break your system before you can even reboot your machine.
  • 7. Re: Masking on Oracle
    rp0428 Guru
    Currently Being Moderated
    >
    Lets face it , using production data in test/Dev Database its more effective and efficient .
    >
    More effective? Absolutely not. See my arguments in my reply to Justin. More efficient? Not sure what you mean. I certainly agree that it is simpler to just copy production data rather than do the hard work of creating data that actually covers all possible corner cases.

    It is reasonable to use a subset of production data as a starting point for creating your test database. For dev the subset should be a minimum volume for each business rule that needs to be covered. No one needs 100,000 vendors to validate that their code can create, update or delete vendors properly. That would be overkill that is totally unnecessary.
    >
    first developers will be able to test on real data and make them solve bugs , if there's no data real data will be available for developers they will generate meaningless information that only lead to nothing .
    >
    Developers don't need to test on real data. Has any piece of code you've written ever known, or cared if an address field defined as varchar2(25) contains '123 Elm St.' or instead contains 'abcdefghijklmnopqrstuvwxy'. The second value may not look like an address to a human but to a piece of code it is just as valid as the first example. Better, it is 25 bytes long (single-byte character set). Try to suck that value into a procedure variable defined as VARCHAR2(20) instead of the more proper 'myTable.myColumn%type' and your code will break. The first example will work just fine, pass your test and you will never know there is a serious potential problem just waiting to happen.

    Data issues like those can hide for weeks, months or years and then as soon as your put a value longer than 20 bytes into your table (perhaps a nightly batch load) your production code will break and might cause a whole lot of issues due to failed data loads or production down time.

    All because you tested with production data and none of that data covered even this simple corner case.

    THAT is the thrust of my objection to using production data. Production data, seldom, if ever, covers even the simple corner cases like the one I just presented. When I see orgs using production data for dev and unit test I pretty much know for certain that there are 'bugs' like these just waiting to happen because they are not doing the proper testing.

    Copy PROD data to DEV? Sure it's easy and convenient. But it deludes you into thinking you have a robust set of test data when you don't.

    I've got a feeling we're going to have to 'agree to disagree' on this topic.
  • 8. Re: Masking on Oracle
    jgarry Guru
    Currently Being Moderated
    You don't have to agree to disagree, since you and Justin are both right. In a perfect world, we'd have enough resources to cover all corner cases, test performance on real data, have proper masking and fake usable credit card numbers etc. As it is, we have to deal with site-dependent resource scarcity, with PHB's making technical decisions. We must pick our battles using our wits. Sites with customized OTS ERP's are going to have different requirements than totally bespoke sites, and most organizations of any size will have varying requirements across departments and over time. I've just spent an inordinate amount of time making mods all over the place from a slight change in the definition of what an "open job" is in a manufacturing system, that was all written with "rapid application development" tools. So the data model has to handle both the old and new definitions, take that and shove it in your normalization.

    I've seen the data issues discovered after decades, too. Sometimes it takes the requirement to make a change to force people to examine it.
  • 9. Re: Masking on Oracle
    rp0428 Guru
    Currently Being Moderated
    >
    Sometimes it takes the requirement to make a change to force people to examine it.
    >
    That's one way. I've found that slipping in some very simple test data that brings the app to it's knees is very effective at demonstrating the problem. And for reports you can get their attention just by maxing out the length (width) of every field on the report. It's almost cruel to do that and then watch the reaction of the business folk when neighboring column data overlays each other and makes the report unreadable.

    But then we all have to find comic relief somewhere. ;)
  • 10. Re: Masking on Oracle
    Osama_Mustafa Oracle ACE
    Currently Being Moderated
    I've got a feeling we're going to have to 'agree to disagree' on this topic.
    Nice discussion , include to that it's another learning day ;)
  • 11. Re: Masking on Oracle
    Osama_Mustafa Oracle ACE
    Currently Being Moderated
    after reading lot of online security article about that, i found most them recommend to avoid this but if you have to use it and no other available solutions there's condition to do that . i will share some nice article and reports that discuss this and they include number of compromises companies regarding to this issue.

    the below links discuss what we are talking about :
    http://blog.securityactive.co.uk/2010/02/01/do-the-right-think-when-testing-with-production-data/

    Reports :
    http://www.maravis.com/data-security-in-development-and-testing/



    Regards
    Osama
  • 12. Re: Masking on Oracle
    rp0428 Guru
    Currently Being Moderated
    >
    after reading lot of online security article about that, i found most them recommend to avoid this but if you have to use it and no other available solutions there's condition to do that . i will share some nice article and reports that discuss this and they include number of compromises companies regarding to this issue.
    >
    I agree that using prod data can be useful and efficient if it is well-managed and used as a starting point for creating the proper test data. It's just that prod data alone isn't enough because it won't cover all of the boundary cases that need to be checked.

    My comments weren't intended to discourage discussion or comment or suggest that there is a 'right' or 'wrong' answer or a 'one size fits all solution'. The only time I try to do that is when off-topic conversations start to detract from the question an OP originally ask.
  • 13. Re: Masking on Oracle
    Osama_Mustafa Oracle ACE
    Currently Being Moderated
    My comments weren't intended to discourage discussion or comment or suggest that there is a 'right' or 'wrong' answer or a 'one size fits all solution'. The only time I try to do that is when off-topic conversations start to detract from the question an OP originally ask.
    No I didn't mean that, We all here to learn something new and i like to discuss oracle topic with Guru People . you didn't discourage any comments instead of that you open amazing Case study to discuss. we just sharing our opinions .

    I should thank you

    BR
    Osama

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points