Forum Stats

  • 3,839,032 Users
  • 2,262,440 Discussions
  • 7,900,836 Comments

Discussions

Issues with charset WE8ISO8859P1

luizsfjr
luizsfjr Member Posts: 4 Green Ribbon
edited Jul 21, 2022 6:58PM in APEX Discussions

Hello everyone.


I am having problems reading and storing strings in a database. Basically, some characters are being mistakenly replaced with "¿".


For example, when I try to store the following string in the DB 'Centro Cultural TCU presents “Diego e Frida – Um sorriso no meio do caminho"', it is stored 'Centro Cultural TCU presents ¿Diego e Frida ¿ Um sorriso no meio do caminho"', i.e. the characters " " " and " - " have been converted to "¿". Unfortunately, this happens not only with these characters, but also with accents. The DB version I am working with is "Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production" and my charset is "WE8ISO8859P1".


My boss said that at the moment it is impossible to make the charset change, since this would impact many applications and this would be chaos. 


Has anyone experienced this? Does anyone have any ideas on how to resolve this situation?

Answers

  • fac586
    fac586 Senior Technical Architect Member Posts: 21,101 Red Diamond

    Please update your community profile with a recognisable username instead of "User_2VE2C".

    For example, when I try to store the following string in the DB 'Centro Cultural TCU presents “Diego e Frida – Um sorriso no meio do caminho"', it is stored 'Centro Cultural TCU presents ¿Diego e Frida ¿ Um sorriso no meio do caminho"', i.e. the characters " " " and " - " have been converted to "¿". Unfortunately, this happens not only with these characters, but also with accents. The DB version I am working with is "Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production" and my charset is "WE8ISO8859P1".

    Post the result of this query (as text, not a screenshot): select * from nls_database_parameters where parameter like '%SET'

    What data type are you trying to store it in?

    Where does that content come from?

    ISO-8859-1 does not include typographic quotation marks or the en dash, so the text referred to cannot be accurately stored as VARCHAR2 data in a WE8ISO8859P1 database. If accented characters are also not stored correctly then it seems that the data source has a different encoding—probably Windows-1252.

  • luizsfjr
    luizsfjr Member Posts: 4 Green Ribbon

    Hi, @fac586. Thank you for your comment.


    Here is the result of the query select * from nls_database_parameters where parameter like '%SET': 


    PARAMETER VALUE

    NLS_NCHAR_CHARACTERSET AL16UTF16

    NLS_CHARACTERSET WE8ISO8859P1


    Well, the problem seems to happen in storage in varchar2 and CLOB data types. In many cases the data comes from the user's copy and paste action, more specifically, the user receives an email and pastes the content of the email into the system. In addition, we also store data from web services that carry a UTF-8 encoded text.

  • fac586
    fac586 Senior Technical Architect Member Posts: 21,101 Red Diamond

    Well, the problem seems to happen in storage in varchar2 and CLOB data types. In many cases the data comes from the user's copy and paste action, more specifically, the user receives an email and pastes the content of the email into the system. In addition, we also store data from web services that carry a UTF-8 encoded text.

    On that basis the best solution is to change the database character set to AL32UTF8. Yes it's disruptive but in 2022 everyone should be using a UTF-8 database.

    Otherwise I think you'd have to be looking at a client-side fix because by the time the data is in session state and available for processing in the database the unsupported characters will already have been lost.

  • luizsfjr
    luizsfjr Member Posts: 4 Green Ribbon

    Hi, @fac586. Thank you for your comment.


    Here is the result of the query select * from nls_database_parameters where parameter like '%SET': 


    PARAMETER VALUE

    NLS_NCHAR_CHARACTERSET AL16UTF16

    NLS_CHARACTERSET WE8ISO8859P1


    Well, the problem seems to happen in storage in varchar2 and CLOB data types. In many cases the data comes from the user's copy and paste action, more specifically, the user receives an email and pastes the content of the email into the system. In addition, we also store data from web services that carry a UTF-8 encoded text.