I have a DB running on HP-UX supporting a COTS application that contains daemons running on the DB server. I've recently updated the character set of the DB to WE8MSWIN1252 per COTS direction. Part of this application involves utf8 coded xml files on the DB server.
Does anyone know what the differences are between the en_US.utf8 and univ.utf8 locale? Thru testing I have identified that there are differences, for example tr command which uses LC_COLLATE acts differently, as does output of date command which uses LC_TIME.
Also, I am seriously considering setting LANG to one of the two values, and then changing all the LC_ variables back to C to limit the complexity of this character set change has on the user community. Is this an OK approach or a dumb idea?
You are asking questions for a HP/UX globalization forum here ;-)
From Oracle perspective, the univ.utf8 locale may cause some issues for OUI or other Java-based tools, if the Java Runtime cannot map it correctly to a Java locale. Therefore, check this first. I have seen reports on the Web of OUI failing in Siebel installation attempts.
There is hardly any documentation about univ.utf-8 locale. Based on some indications, I guess the main difference is that the univ.utf-8 locale supports the full Unicode standard classification for characters, while en_US.utf-8 locale possibly does not. Also, if collations seems to differ, then either en_US uses a proper English collation (with uppercase and lowercase letters sorted linguistically => aa < Ab < aB < Ba < bB) and univ uses the binary collation (as with LANG=C => Ab < Ba < aB < aa < bB) or the other way round, that is, en_US uses binary collation and univ uses the UCA multilingual collation standard. This must be tested or you ask HP. There may be a similar difference for date formatting: some US standard format in en_US (like DD-MON-RR in Oracle) versus some international format (like YYYY-MM-DD) in univ.
Regarding setting LANG vs LC_, my advice would be to generally not mix the two sets. LANG is an older standard, LC_ is a newer standard that overrides LANG.You should standardize on one of the two: either the simple LANG setting with no fine grained control and no LC_ variables, or use LC_ALL as the overall locale setting and override particular details with other LC_ variables.