8 Replies Latest reply on Mar 3, 2017 7:11 PM by GregV

    Table definition - Data Type - Units for VarChar2?

    Blue Bird

      Hello,

       

      I have a dilemma how to properly define/design table fields. When I define a new table in SQL Developer and select for Data Type VARCHAR2 I have below two properties "Size" and "Units". For Units is by default selected <Not Specified>. Other two options are BYTE and CHAR.

      If I for example select Size 5 this would allow me to input up to 5 ASCII characters in all three cases (<Not Specified>, BYTE and CHAR). E.g "asdfg".

      If I try to input "€asdf" these are already 6 chars in case 1 (<Not Specified>) and 2 (BYTE) and does not allow me to Commit changes (with F11). But it went through in case 3 (CHAR).

       

      My questions are next:

      1. In Oracle sample apps I noticed that is always used (as I checked) for Units option BYTE. Is this selected because English language doesn't need/use input of International chars and table fields spend 50% less space or is some other reason to use this option? For example if you regularly use chars (2 byte size) from non regular ASCII pool this could cause that you can in worst case input only half of specified size characters.

      2. Is option<Not Specified> same as BYTE? It's look like to me.

      3. Which option (as good practice) among these three you use if you design database tables for apps which use accented chars in alphabet (e.g. European codepages - €, č, š, ž, ...)?

       

      All your answers and thinking are welcome.

       

      BB

        • 1. Re: Table definition - Data Type - Units for VarChar2?
          thatJeffSmith-Oracle

          If you know you want to fit 2 characters, NO MATTER WHAT - use the character definition. Some characters will require as many as 4 bytes per character. I've seen big issues where folks try to import data to fresh db and it fails b/c the table definitions don't account for the 'wider' data from the source

           

          if you know you only ever want 2 bytes use the BYTES definition.

           

          I imagine most folks would fall into the first bucket. Otherwise, refer to your data model and business owners for the app, they should be able to tell you what you're going to need.

          1 person found this helpful
          • 2. Re: Table definition - Data Type - Units for VarChar2?
            Blue Bird

            Thanks for your opinion, Jeff. Yes, this could actually be big dilemma, especially if you then migrate your data into same other database table with "same" size.

            You mentioning 2 bytes and 4 bytes per character. Can VarChar2 spend 4 bytes? What about NVARCHAR? I was thinking that If I select Units Byte this would spend 1 byte per char. Can this change to 2 or even 4 bytes per char, regarding to use some other settings / codepages? I know this could be a deep topic so I would please you if you can explain on simple way.

             

            So regarding that my characters input always include also accented characters then It would be smart to always use CHAR Units. If I export such data into some plain ASCII files for some other use as input for other system, can then be trouble at their side, even they import only "plain" ACII file?

             

            BB

            • 3. Re: Table definition - Data Type - Units for VarChar2?
              thatJeffSmith-Oracle

              how many bytes you get per character depend on your database character set

               

              NVARCHAR, NCLOB - it depends on your national character set.

               

              I think more common to have a multi-byte character set and get 2 bytes per char for VARCHAR2() columns

              • 4. Re: Table definition - Data Type - Units for VarChar2?
                Blue Bird

                I know and understand that this is wide topic. That why I rather ask for quick expert opinion, so I won't go in wrong direction. I will read more about this when I will have some spare time and till then I will rather stick with Units Char. Thank you again for you quick responses.

                 

                BB

                • 5. Re: Table definition - Data Type - Units for VarChar2?
                  GregV

                  Hi,

                   

                  I always go for the CHAR unit in all my VARCHAR2 columns. I find it clearer to think in terms of number of characters rather than number of bytes. Some character sets requires 2 or more bytes for some of their characters, so it makes more sense to choose the CHAR unit.

                  NVARCHAR2 should not be used, an Oracle's tech told me this was created in olden times to save some space when storing characters. This is no longer an issue, so don't use that datatype.

                  • 6. Re: Table definition - Data Type - Units for VarChar2?

                    I know and understand that this is wide topic. That why I rather ask for quick expert opinion, so I won't go in wrong direction. I will read more about this when I will have some spare time and till then I will rather stick with Units Char. Thank you again for you quick responses.

                     

                    That same Oracle doc explains further what Jeff was describing. Read this ENTIRE SECTION and it will answer ALL of the questsions you ask.

                    https://docs.oracle.com/database/121/NLSPG/ch2charset.htm#GUID-B3C74175-7F77-49A6-9CC9-1F6F70F3A69D

                    I will read more about this when I will have some spare time and till then I will rather stick with Units Char.

                    Actually the BEST PRACTICE is that you should NOT explicitly specify BYTE or CHAR unless you INTENTIONALLY want/need to override the default provided by the NLS_LENGTH_SEMANTICS setting which defaults to BYTE.

                    https://docs.oracle.com/cd/E24693_01/server.11203/e24448/initparams149.htm

                     

                    When you create a database that uses a multi-byte character set that parameter should be specified as CHAR in the initialization file.

                     

                    Then all created objects (except those created by SYS) would use CHAR semantics and you would ONLY specify BYTE if you needed to override that.

                    1. In Oracle sample apps I noticed that is always used (as I checked) for Units option BYTE. Is this selected because English language doesn't need/use input of International chars and table fields spend 50% less space or is some other reason to use this option? For example if you regularly use chars (2 byte size) from non regular ASCII pool this could cause that you can in worst case input only half of specified size characters.

                    Only the app creator (Oracle) can provide a definitive answer. And just because YOUR DB uses English that isn't the case for other databases around the world.

                     

                    But as just discussed above those sample apps explicitly specify BYTE and that will guarantee that they use BYTE regardless of the database character set and even if the NLS paramter is set to CHAR.

                     

                    As Jeff explained certain combinations of database character set, BYTE semantics and data just won't 'fit' properly and will cause problems. That applies to the sample data Oracle provides also.

                    2. Is option<Not Specified> same as BYTE? It's look like to me.

                    No - and that doc tells you why.

                    If a column, user-defined type attribute or PL/SQL variable definition contains neither the BYTE nor the CHAR qualifier, the length semantics associated with the column, attribute, or variable is determined by the value of the session parameter NLS_LENGTH_SEMANTICS.

                    So 'Not Specified' means just that - do NOT provide a value. The length semantics used will then be determined by the NLS_LENGTH_SEMANTICS setting provided by the database or, as the doc quote says, by the session level value if you override the DB setting for your session.

                    3. Which option (as good practice) among these three you use if you design database tables for apps which use accented chars in alphabet (e.g. European codepages - €, č, š, ž, ...)?

                    That BEST PRACTICE option is to NOT specify the value so that the database setting for NLS_LENGTH_SEMANTICS will be used.

                     

                    If the DB character set is a multi-byte character set (e.g. a unicode set) that parameter should be set to CHAR.

                     

                    As a developer you should RARELY, if ever, need to explicitly specify BYTE or CHAR for the objects you create.

                    • 8. Re: Table definition - Data Type - Units for VarChar2?
                      GregV

                      The problem with you, rp0428, is that you always WANT to be right in the end, and always WANT to have the last word. You take each reply bit by bit and give a lengthy reply, making always look like a debate. I could do the same but that would lead to endless back and forth replies.

                      So you're right about one point, is that I should start a topic in the general discussion, and I'm apologizing to Blue Bird we've drifted from his/her original question.