1 Reply Latest reply: Aug 10, 2013 2:51 AM by tsuji RSS

    How to check the data for duplicates in xml

    962769

      Hi all,

       

      I have an xml similar to the below. In that I need an xquery which can remove the tags for which the data is same. For example in the below xml for first <customer> first <address> the <houseno>

      and the second <address>'s <houseno> is same in that case there should be only one <houseno> tag with the data in the output xml. Please check the Input XML and Output XML xml formats below .

      I am able to get the OUtputxml but with the same <houseno> repeating. I am not able to find a way in which I can chk the data and stop the tag getting created in the output.

      Could you please suggest me the ways in which I can do. It would be of great help for me. Thanks a ton in advance.

       

      Input XML

      <customers>

           <customer>

                <address>

                     <houseno>212</houseno>

                      <phone>121221</phone>

                </address>

                <address>

                     <houseno>212</houseno>

                     <phone>42334</phone>             

                </address>

           <customer>

           <customer>

                <address>

                     <houseno>3243</houseno>

                     <phone>6565</phone>

                </address>

                <address>

                     <houseno>3434</houseno>

                      <phone>78778</phone>

                </address>

           </customer>

      </customers>

       

      Output XML Expected

       

      <customers>

      <customer>

                <address>

                     <houseno>212</houseno>

                      <phone>121221</phone>

                        <phone>42334</phone>              

                </address>

           <customer>

      <customer>

                <address>

                     <houseno>3243</houseno>

                     <phone>6565</phone>

                     <houseno>3434</houseno>

                      <phone>78778</phone>

                </address>

           </customer>

      </customers>

       

      Output XML Which I am getting

       

      <customers>

      <customer>

                <address>

                     <houseno>212</houseno>

                      <houseno>212</houseno>        

                      <phone>121221</phone>

                        <phone>42334</phone>              

                </address>

           <customer>

      <customer>

                <address>

                     <houseno>3243</houseno>

                     <phone>6565</phone>

                     <houseno>3434</houseno>

                      <phone>6565</phone>

                </address>

           </customer>

      </customers>

       

      Regards

        • 1. Re: How to check the data for duplicates in xml
          tsuji

          First of all the desired output.

          [quote]

          <customers>

          <customer>

                    <address>

                         <houseno>212</houseno>

                          <phone>121221</phone>

                            <phone>42334</phone>              

                    </address>

               <customer>

          <customer>

                    <address>

                         <houseno>3243</houseno>

                         <phone>6565</phone>

                         <houseno>3434</houseno>

                          <phone>78778</phone>

                    </address>

               </customer>

          </customers>

          [/quote]

          I don't think this is a very good choice and will be causing trouble no end in a future stage of using the data...

           

          I would rather propose a better choice to my thinking like this.

          [code]

          <customers>

              <customer>

                    <address>

                         <house houseno="212">

                             <phone>121221</phone>

                             <phone>42334</phone>

                         </house>

                    </address>

               </customer>

              <customer>

                  <address>

                      <house houseno="3243">

                          <phone>6565</phone>

                      </house>

                      <house houseno="3434">

                          <phone>78778</phone>

                      </house>

                 </address>

            </customer></customers>

          [/code]

          In that case, this is capable of producing the regrouped output.

          [code]

          <customers>{

              let $doc:=doc("your_data.xml")

              for $customer in $doc/customers/customer

              return

              <customer>{

                  for $houseno in distinct-values($customer/address/houseno)

                  return

                  <house houseno="{$houseno}">{

                     for $phone in $customer/address[houseno=$houseno]/phone

                     return

                     <phone>{data($phone)}</phone>

                  }</house>

              }</customer>

          }</customers>

          [/code]