This discussion is archived
1 2 Previous Next 19 Replies Latest reply: Sep 4, 2013 2:48 AM by FouadAzem RSS

most unique delimiter between key and value.

FouadAzem Newbie
Currently Being Moderated

Hi,

 

i need to write some implementation of key delimeter value , while the application will insert this value and delimeter which is undefined to the db.

 

after that i need to retrieve the key and the value , the issue that this delimeter is defined by some custom code and could be any thing.

 

 

my problem begins when the delimeter which defined by custom code will be part of the 'key' or\and 'value' . in this case the retieval of the key and value base on parsing and split or any regex will give the wrong key and value.

 

 

Please advice,what is your suggestion since m looking on the internet to find any piece of information which can help.

 

 

Thanks in advance.

  • 1. Re: most unique delimiter between key and value.
    rp0428 Guru
    Currently Being Moderated

    i need to write some implementation of key delimeter value , while the application will insert this value and delimeter which is undefined to the db.

     

    after that i need to retrieve the key and the value , the issue that this delimeter is defined by some custom code and could be any thing.

     

     

    my problem begins when the delimeter which defined by custom code will be part of the 'key' or\and 'value' . in this case the retieval of the key and value base on parsing and split or any regex will give the wrong key and value.

     

    That problem is no different than the one that occurs for delimited files.

     

    First you have to know what the delimiter is in order to do the parsing.

     

    Second you need to implement some sort of escape mechanism if that delimiter is allowed to be an actual data value.

     

    One common mechanism is to use TWO characters; that is, replace a single data value with two instances.

     

    If the key value is 'MY=KEY', the delimiter is '=' and the value is 'MY=VALUE' then the 'encoder' would 'escape' the embedded equals character by doubling:

    MY==KEY=MY==VALUE

    When you parse you would use a single '=' as the delimiter and then for the 'key' and 'value' components you would search for two consecutive '=' characters and replace them with one character.

     

    Another common 'escape' mechanism is to enclose a value in another character (e.g. double-quote). Either ALWAYS enclose it, or OPTIONALLY enclose it if it contains a delimiter character

    "MY=KEY"="MYVALUE"
    "MY=KEY"=MYVALUE

    MYKEY=MYVALUE

    The first example is enclosed by the encoder in double-quotes because of the embedded '='. The decoder would parse by looking for matching beginning/end double-quotes to extract the components and would then strip (remove) the start/end quotes.

     

    The second example  only has the 'key' enclosed in double quotes because only the key has an embedded delimiter.

     

    The third example has no embedded delimiters.

     

    The encoder/decoder need to agree on and use the same set of rules for escaping possible embedded delimiters.

  • 2. Re: most unique delimiter between key and value.
    FouadAzem Newbie
    Currently Being Moderated

    Hi,

     

    it's not so clear , can you please share example or posted code to be more clear what do you mean ?

     

     

    Thanks in advance.

  • 3. Re: most unique delimiter between key and value.
    FouadAzem Newbie
    Currently Being Moderated

    Hi,

     

    lets say I have two parameters of Delimiters , one is the field delimiter and list delimiter .

     

    the field delimiter is comes as '+' and list delimiter in this specific case comes as ';' from the custom code , in my code I have them as variables which already populated as described.

     

    now , m saving the key in the value in the DB as following:-

     

    key1+value;key2+value2;key3+value3

     

    the issue begins when the key1 or/and value1 or/and key2 or/and value2 key3 or/and value3, contains the one of the delimiters , means:-

     

    ke+y1+val+u;e;+ke;y2++va+lue2;+k+e;y3+value+3

     

    now , I still save them in the same manner , but during retrieving from the db all the key value will be retrieved wrongly if I still using the same procedure for parsing.

     

    I need to mention that the same delimiters which used for writing to the DB will be used for reading the value , the issue that the delimiters will be exist as part of the key or/and value as described above.

     

     

    Thanks in advance.

  • 4. Re: most unique delimiter between key and value.
    rp0428 Guru
    Currently Being Moderated

    What I posted is about as clear as it can be. If that isn't adequate then you will have to wait for others to help you.

     

    So far I have posted a lot more than you have.

  • 5. Re: most unique delimiter between key and value.
    939520 Explorer
    Currently Being Moderated

    Well, here's my two cents worth:

    Do not attempt to parse out the keys and values of the file based on the delimiter(s) all in one step. Its too complicated. Instead break up the problem as follows: Visually determine the business rules for distinguishing true delimiter(s) from those embedded in keys and values by examining the file contents. Copy the file to memory. Programmatically go through the file in memory and replace the identified delimiter(s) using your business rules with some other character or character sequence that will never occur in the keys or values. Example: the pipe character (  |  ), or the characters @@@@@. Visually examine the new file contents to see if there are any key/value pairs that don't look correct and apply more business rules for that situation.  Once that's done, you can apply a simple regexp to parse the file, using your replacement delimiter. Lastly, you might consider coding some type of trap (if possible) to throw an exception if it comes across a condition not covered by your business rules so you can see what line it appears on, so you can add a new business rule in the future.

     

    Personally, I would rather contact the supplier of the file and tell him to fix his program so it uses a delimiter that doesn't appear in the keys or values.

    By the way, if the delimiter(s) can be any set of characters, and they can be embedded in keys and/or values, I doubt there is a finite set of business rules to cover all possibilities.

  • 6. Re: most unique delimiter between key and value.
    rp0428 Guru
    Currently Being Moderated

    By the way, if the delimiter(s) can be any set of characters, and they can be embedded in keys and/or values, I doubt there is a finite set of business rules to cover all possibilities.

    Then you need to reread my reply from above. That is easily handled by enclosing in double quotes any key/value that contains a delimiter character.

  • 7. Re: most unique delimiter between key and value.
    rp0428 Guru
    Currently Being Moderated

    the field delimiter is comes as '+' and list delimiter in this specific case comes as ';' from the custom code , in my code I have them as variables which already populated as described.

     

    now , m saving the key in the value in the DB as following:-

     

    key1+value;key2+value2;key3+value3

     

    the issue begins when the key1 or/and value1 or/and key2 or/and value2 key3 or/and value3, contains the one of the delimiters , means:-

     

    ke+y1+val+u;e;+ke;y2++va+lue2;+k+e;y3+value+3

    So? You do just what I said in my reply above. Use double-quotes to enclose data that contains a delimiter.

     

    So lets use your example. I am using [ and ] to make the text stand out; only the characters between [ and ] are the data value.

     

    So the key is [ke+y1] and the value is [val+u;e]

     

    You examine the key and find an embedded field delimiter. That tells you to enclose the field in double quotes.

     

    So the key that you store becomes ["ke+y1"]. Later when you parse you find the data begins with a double quote so you know to look for the next double quote to find the end of the data: the parsed key becomes [ke+y1] after you remove the double quotes.

     

    Same for the value. The original value is [val+u;e]. You examine the value and find an embedded field delimiter and an embedded list delimiter. That telss you to enclose the field in double quotes.

     

    So the value that you store becomes ["val+u;e"]. Later when you parse you find the value begins with a double quote so you know to look for the next double quote to find the end of the data: the parsed value becomes [val+u;e] after you remove the double quotes.

     

    Because you are using double quote to mean start/end of data then you also need to detect any double quote characters that are embedded. If a double quote character is embedded you simply replace it with TWO double quotes.

     

    The double quotes that you add as escape characters will NEVER occur twice together. So any consecutive double quotes that your parser detects are data characters and you replace them with ONE double quote.

  • 8. Re: most unique delimiter between key and value.
    FouadAzem Newbie
    Currently Being Moderated

    your desire for help you all Guys is most appreciated.

     

    i don't understand the need of the brackets use as "So the key is [ke+y1] and the value is [val+u;e]"

     

    secondly,

    . If a double quote character is embedded you simply replace it with TWO double quotes.

     

    The double quotes that you add as escape characters will NEVER occur twice together. So any consecutive double quotes that your parser detects are data characters and you replace them with ONE double quote.

     

     

    again i don't understnad the need of that attempt to use one double quote which i think will make this much more hard ??

     

     

    bottom line, the use of the double quote whenever delimeter contains inside the key and value is acceptable, the main issue now is how to handle whenever the user insert by him self key and value which contian the double quote.

  • 9. Re: most unique delimiter between key and value.
    rp0428 Guru
    Currently Being Moderated
    i don't understand the need of the brackets use as "So the key is [ke+y1] and the value is [val+u;e]"

    I explained that above.

    I am using [ and ] to make the text stand out; only the characters between [ and ] are the data value.

    I had to use SOMETHING to makethe data values stand out. Otherwise you wouldn't know which characters are part of my sentence and which ones are part of the key or value. I useds the brackets and, as I said, ONLY THE CHARACTERS BETWEEN [ AND ] ARE THE DATA VALUE.

    again i don't understnad the need of that attempt to use one double quote which i think will make this much more hard ??

    I have no idea what you just said.

     

    The rules are VERY SIMPLE.

     

    1. We are using THREE delimiter/special characters. You said you want TWO delimiters: a field delimiter and a list delimiter. And we need ONE more (the double quote) to act as an escape mechanism.

    2. If you examine a key and it contains ANY of those three delimiter characters then you MUST enclose the key in double quotes; one at the start and one at the end. If the key included an embedded double quote then you must add another one to make two in a row.

    3. if you examine a value and it contains EITHER of those delimiter characters then you MUST enclose the value in double lquotes; one at the start and one at the end. If the value included an embedded double quote then you must add another one to make two in a row.

    the main issue now is how to handle whenever the user insert by him self key and value which contian the double quote.

    .And that is yet another reason why you should NOT be storing data this way. You should use a standard relational table with two columns: key and value and add an index on the key. Or use an index-organized table.

     

    If a user inserts a key and value the user needs to follow the rules above.

     

    THERE ARE NO SHORTCUTS! You either have to do it right or be willing to suffer the consequences.

  • 10. Re: most unique delimiter between key and value.
    FouadAzem Newbie
    Currently Being Moderated

    Hi ,

     

    who said that there is only one way to do things in java.

     

    i was n't ignoring your idea ,with all my respect , i just thought to do it differently and to use sequence of slash in the escape mechanism.

     

    use the StringEscapeUtils and make the embedded delimeter as following :\\delimeter, and the real delimeter will be will be without the \\ before of it.

     

    before i ask for the string as following :-

     

    ke+y1+val+u;e;+ke;y2++va+lue2;+k+e;y3+value+3

     

    now each embedded delimeter will be and the real delimeter will be without the \\

     

    String str=key\\++value;ke\\;y\\+key+value;key\\+key+value\\;value;key+value.

     

    please advice.

  • 11. Re: most unique delimiter between key and value.
    rp0428 Guru
    Currently Being Moderated

    who said that there is only one way to do things in java.

    The proper rules for escaping delimited text have NOTHING to do with Java. Those rules have existed for over 40 years and I gave them to you above.

    i was n't ignoring your idea ,with all my respect , i just thought to do it differently and to use sequence of slash in the escape mechanism.

    If you want to do it 'differently' then go ahead. You are on your own. If you want it to work you have to follow the rules I provided above. Those aren't my rules; those are THE rules.

    use the StringEscapeUtils and make the embedded delimeter as following :\\delimeter, and the real delimeter will be will be without the \\ before of it.

    So read the API doc for the 'escapeCsv' method:

    http://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/StringEscapeUtils.html#escapeCsv(java.lang.String)

    Returns a String value for a CSV column enclosed in double quotes, if required.

    If the value contains a comma, newline or double quote, then the String value is returned enclosed in double quotes.

     

    Any double quote characters in the value are escaped with another double quote.

    If the value does not contain a comma, newline or double quote, then the String value is returned unchanged.

    Those are EXACTLY the same rules I gave you above IF you use 'comma' and 'newline' as your field and list delimiter characters.

     

    If you use some other delimiters then you need to apply the SAME rules that 'escapeCsv' applies except you need to check for YOUR two delimiters instead of checking for comma and newline.

     

    So I've told you the rules TWICE now. You try to refer me to StringEscapeUtils and as I just showed you above THEY use the same rules I have tried to tell you.

     

    Yet you still insist of wanting to do it 'differently'. Good luck with that. We showed you the correct way to do it but feel free to do it however you want. But if that is what you want to do they you don't really need to ask for help in the forums.

  • 12. Re: most unique delimiter between key and value.
    FouadAzem Newbie
    Currently Being Moderated

    Hi,

     

    can you give some real example how to implement the rules that you specified:

     

     

    1. We are using THREE delimiter/special characters. You said you want TWO delimiters: a field delimiter and a list delimiter. And we need ONE more (the double quote) to act as an escape mechanism.

    2. If you examine a key and it contains ANY of those three delimiter characters then you MUST enclose the key in double quotes; one at the start and one at the end. If the key included an embedded double quote then you must add another one to make two in a row.

    3. if you examine a value and it contains EITHER of those delimiter characters then you MUST enclose the value in double lquotes; one at the start and one at the end. If the value included an embedded double quote then you must add another one to make two in a row.

     

     

    just please take into consdiration that the input could be as following :- "" or """" and """"""

  • 13. Re: most unique delimiter between key and value.
    rp0428 Guru
    Currently Being Moderated

    1. We are using THREE delimiter/special characters. You said you want TWO delimiters: a field delimiter and a list delimiter. And we need ONE more (the double quote) to act as an escape mechanism.

     

    NO - you are NOT using three you are using two just as I explained. Reread what you, yourself, just said.

     

    You are using double quote 'as an escape mechanism, not as a delimiter.

    can you give some real example how to implement the rules that you specified:

    I already did that.

    So read the API doc for the 'escapeCsv' method:

    YOU have to do the work. YOU have to read the docs.

     

    Just download the source code for that Apache project and see how their code 'implements the rules' that I gave you.

     

    Everything you need is all just sitting there right in front of you waiting for YOU to take action and use it.

  • 14. Re: most unique delimiter between key and value.
    FouadAzem Newbie
    Currently Being Moderated

    yes i have only two delimeters

    i should  use double quote 'as an escape mechanism, not as a delimiter.

     

    Thank you alot.

1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points