Forum Stats

  • 3,783,353 Users
  • 2,254,762 Discussions
  • 7,880,372 Comments

Discussions

Would it serve any benefit to Index a Nullable Column?

User_OMEF8
User_OMEF8 Member Posts: 126 Blue Ribbon
edited Apr 2, 2014 1:19PM in General Database Discussions

Would it serve any benefit to Index a Nullable Column?

I hope this topic is in the correct topic thread.  I apologize if it is not.  Additional information...

Oracle 11.2.0.2.0

I have a table with roughly ~900,000 records (at this time) and will continue to grow over time probably at the rate of ~350,000 records in addition per year.  We have a column in the table called "Comments" and this field is used by our clerks to allow them to enter in their "comments" on the record that they're working on.  Hence, I would highly doubt they will ever remember what their actual comment is when they search, so the equal (=) operator would most likely not ever be used when they perform a search.  In most cases (probably majority, if not all), they will do a search with a clause of "where comments is null" or "where comment is like 'blah%'".  At this time, we have approximately 40% of the records with a null value for Comments and I would suspect that it will probably continue to hover around the ~40% mark going forward.  Because as new records are added and as records are worked on, they will probably just offset each other.

That being said, does it make sense to still Index this column?  Any help/advice is greatly appreciated!  Thanks everyone.

Tagged:
«1

Answers

  • Brian Bontrager
    Brian Bontrager Member Posts: 767
    edited Apr 1, 2014 12:49PM
    Would it serve any benefit to Index a Nullable Column?
    
    

    It can. the nulls are not indexed themselves, but the non-null values can be very useful in an index.

    .  We have a column in the table called "Comments"
    
    

    This case is not one of them. A regular index won't make any sense for a comment column unless users query by the exact original (full) comment text.

    You may want to read up on Oracle Text, which is geared to full-text search indexing, and see whether it fits your need (or is too big of a hammer).

    Brian Bontrager
  •  In most cases (probably majority, if not all), they will do a search with a clause of "where comments is null" or "where comment is like 'blah%'".

    Don't create ANY index unless it is designed to solve a PROBLEM.

    If you don't know how, or if, you users will even query that table then you don't yet have a problem to solve.

    1. identify an actual problem or issue.

    2. identify potential solutions for the problem/issue

    3. select one or two 'solutions' for actual testing

    4. select the 'best' solution based on the test results and other considerations (cost, resource use, etc)

    You need to gather more information about:

    1. is there even a need to search the 'comments' column - or can the row be found more easily some other way (e.g. customer name)?

    2. what info needs to be searched for?

    3. how often do those searches need to be performed?

    4. how many users will be conducting those searches?

    5. how many results might a search return? 1, 2, a million?

  • jgarry
    jgarry Member Posts: 13,842

    You also might want to research the ora-1450 error.  Some other things to think about with larger data types might give other ideas for your case 12c Indexing Extended Data Types Part I (A Big Hurt) | Richard Foote's Oracle Blog

    I second what rp and Brian say.  I'll add, I work on some very old stuff that lets people use a form to query any column, basically translating directly to the sql equivalent like '%blah%', and current (that's a joke) hardware is powerful enough to do the full scan.  I use it when poking about trying to find what's wrong with things, the limit of my patience is about 30M rows.  But that's just me.

    jgarry
  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon
    Brian Bontrager wrote:
    
    
    Would it serve any benefit to Index a Nullable Column?
    
    It can. the nulls are not indexed themselves, but the non-null values can be very useful in an index.
    
    .  We have a column in the table called "Comments"
    
    This case is not one of them. A regular index won't make any sense for a comment column unless users query by the exact original (full) comment text.
    You may want to read up on Oracle Text, which is geared to full-text search indexing, and see whether it fits your need (or is too big of a hammer).
    

    Thanks for the lead.  I will check out Oracle Text information.

    User_OMEF8
  • Franck Pachot
    Franck Pachot Member Posts: 912 Bronze Trophy

    Hi,

    The null is indexed if there is another non null column in the index. But anyway, if "where comments is null" returns half a million rows you don't want to access by index.

    Or maybe, if they search only for recent records, combining the user, the date, and the comment may be efficient.

    About "where comment is like 'blah%'" maybe that is selective enough. But you will probably use a function to make it case insensitive, and maybe to index only first characters.

    Oracle Text is the efficient solution for that.

    Regards,

    Franck.

    Franck Pachot
  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon
    rp0428 wrote:
    
    You need to gather more information about:
    1. is there even a need to search the 'comments' column - or can the row be found more easily some other way (e.g. customer name)?
    
    

    Yes.  If the record does not have a comment, then the record has not yet been worked on.  Once a clerk is working on a record he/she will add a comment and move to the next record with a NULL comment.

    rp0428 wrote:
    
    
    You need to gather more information about:
    2. what info needs to be searched for?
    
    

    Mainly what document needs to be worked on.  There are many folks that perform the same job function.

    rp0428 wrote:
     
    You need to gather more information about:
    3. how often do those searches need to be performed?
    
    

    Every day and through out the day.  The clerks process these documents through our system.  If it does not have a comment, then it needs to be worked on.  If it does have a comment, then more than likely it has already been completed.  There are times when a user has to go back to the document to rework it because they might be waiting on additional information from someone to be able to complete the transaction.

    rp0428 wrote:
     
    You need to gather more information about:
    4. how many users will be conducting those searches?
    
    

    Any where between 4 - 10, all day every day, minus weekends.

    rp0428 wrote:
     
    You need to gather more information about:
    5. how many results might a search return? 1, 2, a million?
    

    Currently, there are approximately 900,000 records with an yearly increment of about 350,000 additional per year.  Approximately 40% of the data is currently NULL in the comments field and will continue to be around 40% going forward, since new documents are added daily and documents are worked on consistently.

  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon
    jgarry wrote:
    
    You also might want to research the ora-1450 error.  Some other things to think about with larger data types might give other ideas for your case 12c Indexing Extended Data Types Part I (A Big Hurt) | Richard Foote's Oracle Blog
    I second what rp and Brian say.  I'll add, I work on some very old stuff that lets people use a form to query any column, basically translating directly to the sql equivalent like '%blah%', and current (that's a joke) hardware is powerful enough to do the full scan.  I use it when poking about trying to find what's wrong with things, the limit of my patience is about 30M rows.  But that's just me.
    

    I am not familiar with the 1450 error, but will look into that some more.  Thanks for the article as well.  I will brush up on that information too.

  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon
    Franck Pachot wrote:
    
    Hi,
    The null is indexed if there is another non null column in the index. But anyway, if "where comments is null" returns half a million rows you don't want to access by index.
    Or maybe, if they search only for recent records, combining the user, the date, and the comment may be efficient.
    About "where comment is like 'blah%'" maybe that is selective enough. But you will probably use a function to make it case insensitive, and maybe to index only first characters.
    Oracle Text is the efficient solution for that.
    Regards,
    Franck.
    

    i have read some other articles that talk about including a non-null column in the index with a null column (comments in my case).  also another article that described using the NVL function and creating a function-based index.  something like:

    create function-based index on nvl(comments, -1)  ...  then my query can be something like  ...  select * from table1 where nvl(comments,-1) = -1;

                          Oracle DBA tips: TIP #8 : Index for null columns

    Also, read this article Index your NULL table columns for fast SQL

    and it talks about creating an index like this ...

    create index comments_idx on table1 (comments asc, 1)

  • Yes.  If the record does not have a comment, then the record has not yet been worked on.  Once a clerk is working on a record he/she will add a comment and move to the next record with a NULL comment.

    IMHO this is where things start to get funky.

    1. In Oracle there is no such thing as 'next record' - there is no inherent order to rows in a table anymore than there is an order to tennis balls in a barrel. 'next tennis ball' has no meaning other than 'any ball will do'.

    2. a comment column is typically an ATTRIBUTE of the data; not a flag that indicates the STATUS of the row. Most data models would use a STATUS column for that purpose.

    3. It also seems unlikely to me that a clerk should just spend there time on 'any row with no comment'. Surely some rows must be more inportant than others. There need to be attributes that a query can use to PRIORITIZE the rows to be worked on. The lack of a comment may be one indicator but I find it hard to believe it is the most important one.

    If it does have a comment, then more than likely it has already been completed

    More than likely? Sounds like a pretty lax business rule to me. Sees #3 above.

    As for how many users/searches

    Any where between 4 - 10, all day every day, minus weekends.

    Sorry - that isn't a spec.  There is a HUGE difference between 4 - 10 users doing 4 seaches a day each and doing a thousand searches a day each. The requirement spec needs to specify an actual volume of queries per minute/hour that need to be handled and the size of the result set for those queries. Does a query return just one row? Or thousands?

     Approximately 40% of the data is currently NULL in the comments field and will continue to be around 40% going forward,

    Well that is certainly odd. You don't expect to make ANY progress at all?

    That is even more indication that a user can't just work on the next row that doesn't have a comment. There has to be some priority here.

    What is it you aren't telling us if management accepts that 40% of the data will always be NULL if that 'no comment' means the work has not been completed?

  • User_OMEF8
    User_OMEF8 Member Posts: 126 Blue Ribbon
    rp0428 wrote:
    
    Yes.  If the record does not have a comment, then the record has not yet been worked on.  Once a clerk is working on a record he/she will add a comment and move to the next record with a NULL comment.
    IMHO this is where things start to get funky.
    
    1. In Oracle there is no such thing as 'next record' - there is no inherent order to rows in a table anymore than there is an order to tennis balls in a barrel. 'next tennis ball' has no meaning other than 'any ball will do'.
    
    2. a comment column is typically an ATTRIBUTE of the data; not a flag that indicates the STATUS of the row. Most data models would use a STATUS column for that purpose.
    
    3. It also seems unlikely to me that a clerk should just spend there time on 'any row with no comment'. Surely some rows must be more inportant than others. There need to be attributes that a query can use to PRIORITIZE the rows to be worked on. The lack of a comment may be one indicator but I find it hard to believe it is the most important one.
    

    1-2.  I understand that there's no "next" button to easily go to the next document that needs to be worked.  Comments is an attribute of the document/record that they are working on.  Status - yes, would make sense.  However, it is the responsibility of the user to change the Status each time.  At this time, there is no workflow process to automatically update the status when it is at the beginning of the stage to its final stage.  Unfortunately, if the user "forgets" to change the Status, then we are still in the same situation, but instead of searching for Comments, you search for a certain Status.

    3.  I am as much surprised with this business process as you are.  Generally, this is the broken process.  There are other searches that they will do for specific documents, but the majority of the time it will be when comments is null.

    rp0428 wrote: 
    As for how many users/searches
    Any where between 4 - 10, all day every day, minus weekends.
    Sorry - that isn't a spec.  There is a HUGE difference between 4 - 10 users doing 4 seaches a day each and doing a thousand searches a day each. The requirement spec needs to specify an actual volume of queries per minute/hour that need to be handled and the size of the result set for those queries. Does a query return just one row? Or thousands?
    

    I do not disagree that 4-10 can be a big difference in numbers per day/week/month/year.  However, they have 4-10 staff members that have the same job responsibility.  I am not in charge of that group, so if they get asked to do some thing other than what they are supposed to do (process these documents), then they do the other thing.  At any given time, you will have at least 4 and at most 10.  Some days it may be 4, other days it may be 10.  You also have folks that take vacation or may get sick, so I cannot provide you with a solid number because it may tend to change.

    Some times, they will use a date range, other times they may use a vendor name.  However, when they search for their vendor names, they perform their searches such as where vendorname like 'a%' (unfortunately).

    rp0428 wrote:
    Approximately 40% of the data is currently NULL in the comments field and will continue to be around 40% going forward,
    Well that is certainly odd. You don't expect to make ANY progress at all?
    
    That is even more indication that a user can't just work on the next row that doesn't have a comment. There has to be some priority here.
    
    What is it you aren't telling us if management accepts that 40% of the data will always be NULL if that 'no comment' means the work has not been completed?
    

    There will be progress made, but with the amount of documents that are added daily with the amount of documents that they process.  I believe that it will offset each other and hence the amount of null comments will still be around the 40% range.  It serves me no benefit to hold information from this topic especially when I am seeking some help and assistance from you all.  I am not looking for a 100% solution, but maybe just some brainstorming ideas that I have not come up with yet.  They have been resistant to changes to their process and unfortunately that goes above me.  If they want it this way, regardless of how much I disagree with the broken process, then I don't have much of a choice.  I do have intentions of building a workflow process for them in due time.

    Again, I do appreciate all of your help and suggestions.  It greatly helps from others perspectives.

This discussion has been closed.