Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Varchar2 lengths

John_KFeb 13 2018 — edited Feb 14 2018

This is more of a discussion out of interest rather than an issue I am having... I see many developers when creating tables use base 2 lengths for varchar2 columns. I've always been a believer in the right size for the job, however I'm interested into where this has come from. I.e. a column is needed to store a descriptive piece of text. About 30 characters. However - they will instead use Varchar2(32). Need something a bit longer - it of course is a varchar2(128) or similar!

Is there any logic in this at all? I presume no storage gain, as the strings are variable length. Or is it just developers like to think in base 2?

Comments

GregV

Hi,

Interesting. I've not especially noticed that in all the models I've encountered. Using oversized varchar2 is something I've seen more. I don't know if there's any reason for using multiples of 2 sizes for varchar2.

SeánMacGC

No harm in that I'd say

And all computers are happier dealing with radix 2

[Deleted User]

Hmmm. Never really thought about it, but now that you mention it I have a tendency to do that too, if I am unsure of what will go into that varchar2. Of course if I know that something is always 17 characters, and will never exceed that due to whatever (business) requirement, I'll use varchar2(17), but otherwise it would indeed usually be 32, 64, 128, 256.... Not sure why, but I suspect a habit of thinking in bytes from my old TurboPascal days.

Billy Verreynne

Likely a habit of C/Pascal programmer types that are used to align structures and variables on word boundaries, manually, for optimal performance. Though for many years now the compiler takes care of that for you as part of compiler setting and/or compiler flags - no need to do this manually.

Doubt that this is an issue in PL/SQL - scripting (p-code) engines typically have their own memory abstraction layer, which would for example handle issues such a word boundary alignment internally.

Billy Verreynne

Another reason could be to limit memory fragmentation - the holes in memory left by freeing base 2 size variables, are more easily reused by new base 2 size variables, than odd/variant size variables.

Though the last time I saw this used was for table storage (initial, next) clauses in Oracle 7, where tablespace fragmentation was a real issue. Does not make much sense in a PL/SQL context - but then who knows how deep and far back entrenched behaviours run... ;-)

mathguy

This must go back a very long time, to the days when programmers had to allocate memory dynamically in their code for arrays of variable length (not just for arrays of characters, but for any kind of array).

On the one hand, you could make all your dynamic (variable length) arrays very long, so you would never run out of room (essentially making them static) - but that wastes memory. On the other hand, you could also allocate more memory each time you need to add an element to an array, but that takes significant execution time - especially if you need to move a lot of data around. (Array elements must be contiguous in memory, so that pointer arithmetic can be used.)

A good compromise (optimal, under some reasonable assumptions) is to start with a relatively small size. Then, whenever you need to allocate more memory (as the array keeps growing as you add data), DOUBLE the size of the array. And as the array shrinks (if it does, in whatever application you use it), only release memory whenever the used portion of the array is HALF (or less) of the previous size.

So this working in powers of 2 was actually solving an optimization problem in the way back days. Why anyone keeps doing it these days ...  I have no clue; the historical reason no longer applies.

[Deleted User]

mathguy wrote:

Why anyone keeps doing it these days ... I have no clue; the historical reason no longer applies.

In my case the most likely reason is habit, without a good reason to change that habit

mathguy

I guess the original question was, "why do such habits get formed in the first place" rather than "why don't people change their habits." Can you recall, when you first started doing it, why that was? I assume you saw your teachers (or smart people on discussion boards like this one) do it, and since they otherwise had very good reasons for what they did, you simply adopted what you saw. Perhaps this is not your case specifically, but I can see how it may apply to many. (It would likely apply to me, if I was doing it.)

[Deleted User]

Oof. Why that was I am not sure, such a long time ago. It was in the (very) late eighties, early nineties. One of my personal pet projects when I was a student was writing my own OS (once I had learned enough of the damn things, just to see if I could - turns out I wasn't very good at it ), which requires quite a bit of memory management. And in those days the biggest you could get was 64Kb, so you had to squeeze everything as much as possible, but if I remember correctly it still had to be a power of 2 of bytes. After failing miserably at an OS I tried my hand at writing my own GIS, which was tricky when you had to position every pixel on screen with not enough memory, so that probably reinforced the habit. After that I probably got a bit smarter and left such things to the pro's , but the habits stuck. And it taught me a lot about how computers work, which still benefits me today actually. So the efforts haven't been wasted imho (at the very least I had fun on a bunch of long winter nights).

Mike Kutz

Why create VARCHAR2() with lengths 2n?

'Cause all the l33t k00l k1|()s are doing it. 

unknown-7404

I have a tendency to do that too, if I am unsure of what will go into that varchar2.

Which raises the question: WHY are you creating tables if you haven't properly defined the data model?

Seems to only make sense if you are prototyping something new or just 'sandboxing'.

Doesn't make sense if you are working in an org actually creating an app that needs tables to store user data. Somebody had BETTER be sure 'of what will go into that' column and how it will be used.

Of course if I know that something is always 17 characters, and will never exceed that due to whatever (business) requirement, I'll use varchar2(17), but otherwise it would indeed usually be 32, 64, 128, 256.... Not sure why, but I suspect a habit of thinking in bytes from my old TurboPascal days.

Thinking in 'bytes'? But you said 'always 17 characters'.

And 'characters' are NOT necessarily 'bytes'. So the length in bytes for a string of 17 characters depends on the character set you are using and, if it is multi-byte, exactly which characters those 17 are.

In UTF32 those 17 characters could take 68 bytes. And if you declare a column as VARCHAR2(17) and the length semantics are BYTE you could only store FOUR of those characters -  a value that really had 17 characters wouldn't fit.

unknown-7404

I guess the original question was, "why do such habits get formed in the first place" rather than "why don't people change their habits."

Hmmm - wasn't the 'original question' about tables and varchar2 columns?

I see many developers when creating tables use base 2 lengths for varchar2 columns

OP never used the word 'memory' or 'code' at all.

1. Oracle stores varchar2 data in tables with a length prefix.

2. The number of bytes of data (if power of 2) wouldn't take that prefix into account

3. bytes and characters are NOT the same thing if multi-byte character sets are being used

This must go back a very long time, to the days when programmers had to allocate memory dynamically in their code for arrays of variable length (not just for arrays of characters, but for any kind of array).

OP didn't mention arrays either.

In those 'old days' memory allocation was for discrete chunks of 'BYTES'. There were no physical 'arrays of variable length'. If you used up your 'chunk' you usually had to:

1. allocate a new, bigger, chunk (which was almost always somewhere else physically in memory)

2. move the existing data to that new chunk

3. add new data to the unused part of the new chunk.

The memory allocation was for BYTES - not characters. If you need to allow for a character string of 17 bytes then you needed to calculate the space needed based on the character set you were using and the number of bytes that 17 of the LARGEST characters would take.

Although the current one has an option to allocate space dynamically that 'fixed length' allocation is exactly how the JDBC drivers have been allocating space for decades.

That is why a table with columns defined as VARCHAR2(4000) can require such a large 'array'/buffer of storage for the result set of a query when using JDBC.

unknown-7404

I see many developers when creating tables use base 2 lengths for varchar2 columns.

1. never seen it myself in 30+ years

2. totally pointless and useless as a guideline or any other reason

3. bytes and characters are NOT the same thing.

The actual space required for a string depends on:

1. the NLS_LENGTH_SEMANTICS setting (BYTE or CHAR)

2. the character set being used

3. the actual characters being stored.

BEDE

Well, that depends on what kind of developers there are.

In one system I've been working with for years there are sizes 20, 30, 50, 100, 250, 1000 and 4000(maximum up to 11g).

I'm not sure whether the base 2 length is something that widespread in ERP systems using Oracle DB.

As for base, I'd rather use hex. For instance 0x032 sounds like more poetic age...

mathguy

Sometimes it seems like you spend much more time writing your own thoughts than reading what others have written.

The OP observed that some (many?) developers seem to define VARCHAR2 columns with lengths like 32 and 128 - powers of 2. And wanted to know where that habit may come from.

I offered a possible (historical) reason. VARCHAR2 is (or "may be" - I don't know how Oracle implements things) a variable array of characters (in C, which is what Oracle is written in). Or, in any case, people may have come to Oracle from having worked with dynamic arrays in C. Meaning, arrays of length that may change during execution - so the program (in C) must allocate memory dynamically, it can't all be set in stone at compile time. This has nothing to do with data type. In the old days we didn't have Oracle do this for us, we had to do the memory management in the code, along with the desired code logic.

What part of all of this is either unclear, or seems to you unrelated to the OP's question? I wonder if anyone else had the same questions you did.

Mark D Powell

All, I suspect the practice might stem from the way Oracle passes varchar2 variables to and from PL/SQL routines.  If you have stored code one of the causes of ending up with cursor mismatches is due to bind variable size mismatches.  Data is copied into variables of standard lengths for the calls.  I cannot remember the actual sizes used and I think the list actually changed between recent releases but the power of 2 type progression seems likely: 32, 64, 128, etc....

- -

Anybody remember seeing any articles / blogs on the subject of varchar2 parameter passing?  Though I do not think it would necessarily make sense to define all table columns in these same size increments since Oracle should handle the parameter sizing on the PL/SQL call and return.

- -

HTH -- Mark D Powell --

[Deleted User]

You can be a bit of a nitpicker sometimes, but okay, I'll bite:

You obviously haven't worked with the organization I have worked with/am working with, that much is clear from your questions . And there are plenty of examples where you do not exactly know what length of text will go into a varchar2 column, but you DO know that it will never exceed more than a certain amount. If for example the business wishes to give the end user the room to leave a brief comment or a note for another end user: you'll never guess what goes in there, but you can say: let's keep the maximum to 140 characters. Sometimes a user may decide to leave that column empty, sometimes they will complain that there's not enough space. Properly designed datamodel, where I still do NOT know in advance what will end up in a certain column.

And I said characters because I tend to still think in ASCII only, where a single character DOES take up a single byte. Generally I have not much use for much more extended character sets, so that's how I still think. I usually have to force myself to allow for UTF-8 or something, but then again: disk space is so cheap these days that it doesn't matter much anymore anyway (at least not in the way it mattered when I started out).

unknown-7404

What part of all of this is either unclear, or seems to you unrelated to the OP's question?

All of it. As you just said yourself

This has nothing to do with data type

Exactly - and 'data type' is what OPs thread is about. I don't see how memory, C, arrays and the like have ANYTHING to do  with how the choice of length for a VARCHAR2 datatype is made.

mathguy

No, the question was not about data type. Read it again: it was about powers of 2 used as "standard" (in some people's minds, or at least practice) length when declaring VARCHAR2 columns. Why 32 and 128, and not 50 or 100? And that "standard" or "old habit" may come from places that have nothing to do with Oracle or with databases in general.

Know what? Feel free to ignore my answer, or to badmouth it all you want. I will wait for anyone else to raise the same questions before I will spend any more of my time on them.

AndrewSayer

John_K wrote:

This is more of a discussion out of interest rather than an issue I am having... I see many developers when creating tables use base 2 lengths for varchar2 columns. I've always been a believer in the right size for the job, however I'm interested into where this has come from. I.e. a column is needed to store a descriptive piece of text. About 30 characters. However - they will instead use Varchar2(32). Need something a bit longer - it of course is a varchar2(128) or similar!

Is there any logic in this at all? I presume no storage gain, as the strings are variable length. Or is it just developers like to think in base 2?

Avoiding the direct question now that it’s been addressed by others, but you still might find this relevant.

Maybe it’s because I started my career this decade, but I’ve never had to consider the impact of data type length besides its use as a constraint. Given the choice of defining the constraints myself, I use

1 if its a character

10 for smalllest bits of data

30 for names (it was good enough for Oracle)

50 for longer names

100 for descriptions

400 for long descriptions

1000 For when the user isn’t too sure but definitely doesn’t want a clob

4000 bytes if it’s inside a dml error logging table.

If I find that the usage is much different to what was chosen then I could reduce the sizes but it takes a lot more effort than just accepted it (the only real impact will relate to array fetch sizes when many rows are required in a query)

unknown-7404

No, the question was not about data type.

Yes it IS about datatype. You pretty much acknowledge that when you just said:

. . . when declaring VARCHAR2 columns.

That is a direct reference to declaring a VARCHAR2 'datatype' for an Oracle table - not an array, nested table, pl/sql variable, user object attribute.

And that "standard" or "old habit" may come from places that have nothing to do with Oracle or with databases in general.

A question related to 'declaring VARCHAR2 columns' has EVERYTHING to do with Oracle.

Feel free to ignore my answer, or to badmouth it all you want.

I'm not 'badmouthing' it - I'm trying to clarify it and correct the misdirection some of your statements imply when you start talking about C programming, arrays in C, client interfaces to the database and the like.

The use of a power of 2 for a length in a programming language has NOTHING to do with declaring VARCHAR2 columns. There IS NO direct correlation at all.

The typical use of powers of 2 for storage in programming languages is related to physical limitations of the CPU and memory architectures involved.

CPU instructions that load data from memory load that data into 'registers'. Those registers are of fixed length. So a CPU instruction that loads a 32 bit register loaded 32 bits or 4 bytes. Even if you only wanted to work with 1, 2 or 3 of those bytes you would need to load four bytes and ignore the bytes you didn't want to access.

Memory allocations were always an even number of bytes. Some systems allocated 'pages' that were always a power of two.

So code evolved to deal with the 'common' objects whose length was a multiple of two. Not necessarily a 'power' of two.

None of that has ANYTHING to do with declaring a VARCHAR2 datatype for a table column. And as I previously said the more important considerations are the character set being used and the length semantics being used.

It is meaningless to talk about the 'length' of a character string without taking 1) character set, 2) length semantics and 3) the actual characters in the string into account.

Depending on those three attributes the generic string 'abc' will be three CHARACTERS in length but can take a different number of bytes to represent.

SeánMacGC

Despite all of the foregoing, just remember that the Oracle RDBMS is implemented in the 'C' language, under the covers, so not all of the discussion here about 2n storage definitions and the validity thereof hereon are academic!

Billy Verreynne

Mike Kutz wrote:

Why create VARCHAR2() with lengths 2n?

'Cause all the l33t k00l k1|()s are doing it.

Agree. I don't know this for a fact, I just know this is true... :-)

https://www.youtube.com/watch?v=Siftu72DfJI

John_K

Ok, it was merely an observation that's all, that I've seen as a consultant working across many different sites. It was intended as a light hearted discussion on whether anybody did the same, and whether it was just a course of habit. All completely out of interest. Peace and love.

1 - 24
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Mar 14 2018
Added on Feb 13 2018
24 comments
4,256 views