5 Replies Latest reply: Sep 4, 2012 12:56 PM by jfarris - oracle RSS

    properties file for pipeline?

    951547
      Hi, we have an i18n project and we are going to setup a different endeca instance for each language. In this regard, we are thinking of reusing the same pipeline but it would have some data related to a language, ie for English, the cas ports used are 8500, but for French it would be 8501 etc. Since Eden was out for a few weeks now, I can't research if we can use a properties file with the pipeline so that instead of

      <RECORD_ADAPTER COL_DELIMITER="" DIRECTION="INPUT" FILTER_EMPTY_PROPS="FALSE" FORMAT="JAVA_ADAPTER" FRC_PVAL_IDX="FALSE" JAVA_CLASSNAME="com.endeca.itl.recordstore.forge.RecordStoreSource" JAVA_CLASSPATH="/opt/endeca/CAS/2.2.1/lib/recordstore-forge-adapter/recordstore-forge-adapter-2.2.1.jar" MULTI="FALSE" NAME="DocDataSheets" PREFIX="" REC_DELIMITER="" REQUIRE_DATA="FALSE" ROW_DELIMITER="" STATE="FALSE" URL="">
      <COMMENT></COMMENT>
      <PASS_THROUGH NAME="HOST">localhost</PASS_THROUGH>
      <PASS_THROUGH NAME="PORT">8500</PASS_THROUGH>
      <PASS_THROUGH NAME="READ_TYPE">BASELINE</PASS_THROUGH>
      <PASS_THROUGH NAME="CLIENT_ID">forge_client_2</PASS_THROUGH>
      <PASS_THROUGH NAME="INSTANCE_NAME">DocDataSheets-rs</PASS_THROUGH>
      </RECORD_ADAPTER>

      we could just put

      <RECORD_ADAPTER COL_DELIMITER="" DIRECTION="INPUT" FILTER_EMPTY_PROPS="FALSE" FORMAT="JAVA_ADAPTER" FRC_PVAL_IDX="FALSE" JAVA_CLASSNAME="com.endeca.itl.recordstore.forge.RecordStoreSource" JAVA_CLASSPATH="/opt/endeca/CAS/2.2.1/lib/recordstore-forge-adapter/recordstore-forge-adapter-2.2.1.jar" MULTI="FALSE" NAME="DocDataSheets" PREFIX="" REC_DELIMITER="" REQUIRE_DATA="FALSE" ROW_DELIMITER="" STATE="FALSE" URL="">
      <COMMENT></COMMENT>
      <PASS_THROUGH NAME="HOST">localhost</PASS_THROUGH>
      <PASS_THROUGH NAME="PORT">${casPort}</PASS_THROUGH>
      <PASS_THROUGH NAME="READ_TYPE">BASELINE</PASS_THROUGH>
      <PASS_THROUGH NAME="CLIENT_ID">forge_client_2</PASS_THROUGH>
      <PASS_THROUGH NAME="INSTANCE_NAME">DocDataSheets-rs</PASS_THROUGH>
      </RECORD_ADAPTER>

      and put the values in a properties file.

      Is this possible to do with the pipeline? Thanks
        • 1. Re: properties file for pipeline?
          Michael Peel-Oracle
          Yes, you can use entities - for example &instance_name; - in the pipeline.epx, then pass in the value for &instance_name; as a forge command line key/value, for example in your <forge /> element in the AppConfig.xml:
          <arg>-c</arg>
          <arg>instance_name=DocDataSheets-rs-FR</arg>

          One caveat is that Developer Studio - on save - removes entities from the pipeline. To get around this, the usual in-field approach is to:
          1) add the placeholder as a different format to an entity, e.g. @instance_name@
          2) create a simple script that replaces any instance of @variable@ with &variable;
          3) add the script to run in the BaselineUpdate script prior to the forge running

          Note in the example above I've used instance_name as the variable here - I don't think you need to have different CAS ports, as (if I remember correctly) CAS will work as a single installation and you would have multiple crawling instances defined (one for each language, in your case). You might also need to parameterise the CLIENT_ID variable to be unique, too (if running different language baselines in parallel).

          Michael
          • 2. Re: properties file for pipeline?
            951547
            Hi Michael,

            We are still getting some errors using your way, but it gave us another idea to work on. We decided to try to replace the values first inside the pipeline using a script before actually running the baseline for the specified language. We'll see how this goes.
            • 3. Re: properties file for pipeline?
              jfarris - oracle
              It looks like there might be a limitation on what entities forge will replace using the -c param.

              I was able to get this working using &default_path; which is one of the entities defined in Endeca_Root/conf/dtd/common.dtd.

              I haven't tested it out, but you might be able to update the .dtd to include your new entity type, or use the -d param to point to your own custom .dtd.

              From the ForgeGuide.pdf

              Forge has a set of XML entity definitions whose values can be overridden at the command line, such as current_date, current_time, and end_of_line. You can specify a replacement string for the default entity values using the -c option, or in an .ini file specified with -i (described below).
              The format is:
              +<configValName=configVal>+
              For example:
              end_of_line=”\n”
              which would be specified on the command line with:
              -c end_of_line=”\n”

              or included as a line in an .ini file specified with -i.
              This allows you to assign pipeline values to Forge at the command line. In the above example, you would specify &end_of_line; in your pipeline file instead of hard-coding “\n”, then invoke Forge with the -c option shown above. Forge would substitute “\n” whenever it encountered &end_of_line;.
              For a complete list of entities and their default values, see the ENTITY definitions in Endeca_Root/conf/dtd/common.dtd
              • 4. Re: properties file for pipeline?
                Michael Peel-Oracle
                Ah yes, forgot about that. The perl script that handles the replacement logic (you add the variables as @variable-name@ and it converts them to &variable-name; in the temporary pipeline.epx in ./data/processing just before the forge component runs, @variable-name@ avoids issues with Developer Studio save removing &variable-name; references) also adds any variable definitions to the DTD spec, so the top of the pipeline.epx gets rewritten to look like:
                <!DOCTYPE PIPELINE SYSTEM "pipeline.dtd" [
                <!ENTITY variable-name "">
                ]>

                This approach was put together by Matt Brandwein a few years back - he uploaded to Eden a perl script that can do this along with a simple guide on usage (it was primarily to get around having to include passwords for ODBC connections, if I remember rightly). Not sure if it has been migrated to the Oracle forums yet, would be worth a look if so.

                Michael
                • 5. Re: properties file for pipeline?
                  jfarris - oracle
                  That makes sense, thanks for clarifying that Michael. I'll give that try since its not really the cleanest approach to use the pre-defined entities for this.

                  Would've been great to have Matt's perl still - I think code snippets were pulled from all those EDeN threads before moving to OTN.