6 Replies Latest reply: Feb 1, 2013 4:52 AM by 987592 RSS

    Make searchable a list of keywords containing special chars, like ".net" ?

    987592
      After many tries + Endeca support messages exchanges, it looks like it's not possible to easily make some particular keywords as searchable.
      For example ".net" contains a dot, which is removed before the search, so Endeca result is the same as for "net".

      The only workaround we were suggested was to alter the data during the forge by replacing this expression by a specific term, which would be searched and then replaced by the calling project.
      But we cant' do that as there are other applications requesting our Dgraph and we can't ask each one to do the replacement in both ways.

      This is the reason why we would appreciate some kind of new configuration file where we could add some searchable words (as it already exists for searchable chars) containing any character, like punctuation.
      Or any suggestion to do it without the need of writing a lot of code !
        • 1. Re: Make searchable a list of keywords containing special chars, like ".net" ?
          Michael Peel-Oracle
          If you add "." as a searchable character then a search for ".net" should only return results for ".net" rather than "net" (unless .net doesn't exist and it spell-corrects). Given you've mentioned searchable characters in your post, I'm guessing you are aware of this and actually want to perform term extraction on your data (so domain.net would match a search for ".net") - is that correct?

          Michael
          • 2. Re: Make searchable a list of keywords containing special chars, like ".net" ?
            987592
            Hi Michael,

            If we add "." as searchable, then all words ending with a dot won't be searchable anymore without dot, for example "java" won't be found anymore, but "java." will.

            Sébastien
            • 3. Re: Make searchable a list of keywords containing special chars, like ".net" ?
              Michael Peel-Oracle
              That is correct - if you make "." searchable, then it will consider all occurrences equally - "java" would match "java", but not "java." (unless there were no incidences of "java" and then it would spell-correct to "java."). If you want "." to be indexed for certain words but not others, do you know what those words are in advance, i.e. do you have a pre-defined list of these? I would look at doing term extraction if so - that is the only way I can think of to get what you want without creating issues elsewhere (depending on your data).


              Michael
              • 4. Re: Make searchable a list of keywords containing special chars, like ".net" ?
                987592
                Michael,

                Yes, we know in advance the terms we want to be searchable. It's a very short list :
                - With a dot, the only one we know is : ".net" ; it's by far the most important to make searchable (customers ask every day for this)
                - With other special chars, there may be "$U", "C++" or "C#" (we added "+" and "#" as searchable chars - and it works fine, but if we have the choice we would prefer specify apart only this term and remove "+" and "#" as searchable char)

                Having the hand on this list, to update it when we need (some brand name may be useful), would be perfect. But the basic need if only for ".net"

                I did not understood what you mean by "term extraction" ?

                Sébastien
                • 5. Re: Make searchable a list of keywords containing special chars, like ".net" ?
                  Michael Peel-Oracle
                  Hi Sébastien

                  Term extraction is when you monitor the data during ingest and pull out specific terms you are interested in. It is usually done to extract metadata from unstructured data (e.g. if "java" exists in the text within a CV upload, for example, you would "extract" that term into a separate property, and map that separate property to your "Programming Languages" dimension).

                  I appreciate you want some solution that just works out of the box, but unfortunately given the use of "." as a sentence end, there is no way (or at least, none I can think of) that you can make "." a searchable character only if it appears at the start of the word "net" and not in any other cases. The way I would handle it would be to:
                  1) Add a manipulator (java or perl) to the pipeline that takes a pre-defined list of terms and loops through all (relevant) properties on the records
                  2) Any matches for those terms against the data for those terms get tagged to a new property on the record
                  3) If a match contains a non-searchable, non-alphanumeric character, have these characters replaced with searchable ones, e.g. "." -> "~", so .net -> ~net
                  4) Repeat the transformation step in (3) in your web application (e.g. "." -> "~", so .net becomes ~net)

                  Regards

                  Michael
                  • 6. Re: Make searchable a list of keywords containing special chars, like ".net" ?
                    987592
                    Michael,

                    Thank you for your answer.

                    Unfortunatly, as I explained at the beginning of my 1st post, we can't ask a development to each web applications using the same dgraph to do some transform on-the-fly.

                    And we use snipetting, with an indication of the related property. If we duplicate data in a new property dedicated to .net we will lose the property at the origin (we can imagine to store this info in the new property itself, into something like a record in json format, but it's far too complicated for webapps using our dgraph).

                    Sebastien

                    Edited by: 984589 on 1 févr. 2013 02:52