8 Replies Latest reply: Nov 15, 2012 7:14 AM by don123 RSS

    duplicate geometry

    don123
      hi

      I want to store duplicate geometry intentionally, will it have any problems ??
      Because same data comes from different sources and going to be big database system.
      I already posted this question in this forum, but i am posting again to get more suggestions.

      regards
      syed masood
        • 1. Re: duplicate geometry
          John O'Toole
          Hi Syed,
          I want to store duplicate geometry intentionally, will it have any problems ??.... I already posted this question in this forum, but i am posting again to get more suggestions.
          If you didn't get enough suggestions first time around, it is probably because you are really the only person who can answer that question. Depending on your exact circumstances the answer could be "Yes", "No" or "It won't make much difference".
          Because same data comes from different sources and going to be big database system
          How about you give us some more useful context.

          - What sort of data are we talking about? Roads? Points of Interest? Cadastral data?....
          - How many different data sources?
          - What are the rough data volumes you will be receiving?
          - How often will you receive the data from these data sources?
          - How many duplicates are you expecting?
          - Why do you not want to deduplicate?
          - What will the data be used for? Will the users of the data care about which data source it came from? Will the users of the data care if it is duplicated?

          John
          • 2. Re: duplicate geometry
            Simon Greener
            Syed,

            Your question can be turned around to point to a more generic question (for remember: "spatial is not special"!).

            Do database tables with date columns mind if you store the same date more than once? Same for numbers, strings etc?

            Clearly, the answer is no !

            Attribute Uniqueness

            Now, if a duplicate date was not allowed for the same observation of a plant in a table called FLORA, we would stop this by creating a unique constraint on the table:
            ALTER TABLE flora ADD CONSTRAINT flora_plant_uq UNIQUE(plant_id,obs_date);
            So the issue about what constitutes valid duplication or not is the way you implement constraints on the table.

            Constraints and Models

            Constraints reflect the "business rules" that constrain the data you are storing so that it is "fit for [your] purpose"

            So, in the end, it is all about the data model.

            One should start all data storage by creating a data model that exposes the rules you want for the data.

            So, for our OBS_DATE column in our FLORA table we may wish that the values entered are valid. But we define valid to mean from 1st January 2005 onwards.
            alter table flora add constraint flora_obs_date_ck check(obs_date >= '01/JAN/2005');
            SQL Developer 3.x has a free data modelling tool built in: it is an excellent tool for designing basic data models.

            Geometry Constraints

            It is difficult to implement uniqueness constraints that involve sdo_geometry data. Perhaps one day our SQL/OGC standards bodies and implementing database vendors will catch up on this problem.

            Even so, we can do something. For example, if the flora table had an sdo_geometry column, called location, which was a point, this will work:
            alter table flora add constraint flora_location_uq unique (location.sdo_point.x,location.sdo_point.y);
            -- or with other columns
            alter table flora add constraint flora_location_uq unique (plant_id,obs_date,location.sdo_point.x,location.sdo_point.y);
            However, this would not work for linestring/polygon data. For example, let's assume the FLORA table allowed observations of a plant to include a patch ie not a single plant (eg a tree) but an area of blackberries. We can't do the following because one cannot reference a function in the constraint:
            alter table flora add constraint flora_location_uq unique (plant_id,obs_date,location.get_wkt());
            OK, so how can we do this?

            AFAIK, we can only do this via a trigger:
            create or replace trigger flora_location_ai 
            after insert on flora 
            for each row 
            Declare
              v_ok pls_integer;
            begin
              SELECT 1
                INTO v_ok
                FROM flora g
               WHERE g.plant_id = new.plant_id
                 AND g.obs_date = new.obs_date
                 AND SDO_EQUAL(g.geom,:new.geom) = 'TRUE';
              -- We found a duplicate, so raise an error
              raise_application_error(-20001,'Duplicate location found for (' || :new.plant_id || ',' || new.obs_date || ')');
              EXCEPTION
                 WHEN NO_DATA_FOUND THEN
                     RETURN; -- No duplicates
            End;
            /
            Summary

            So, in short, you can store anything in any Oracle table. In a table with 1 million rows you could have:

            1. Only 2 unique values of an column called SPECIES.
            2. 20,000 unique OBS_DATE values.
            3. 200,00 unique LOCATION values.

            But whether this is right or not (garbage) depends on the model and the constraints that represent "correctness".

            So, Syed, start with the model.

            regards
            Simon

            Edited by: Simon Greener on Nov 10, 2012 10:23 AM
            • 3. Re: duplicate geometry
              don123
              john

              As you said, the features include roads, admin areas and point of interests, but from 4 to 5 different data sources, they provide same information and so requires to keep in same table and differentiate the records by using attribute, for example, roads features provided by different data sources, logical to keep them in one table.

              The frequency may be 3-6 months, data source is priority than duplication

              simon

              thanks for your detailed explanation.

              regards
              • 4. Re: duplicate geometry
                John O'Toole
                data source is priority than duplication
                In that case it sounds like you might need to store the duplicates. An alternative would be to store one version of each geometry and add metadata that indicates all the data sources that provided that geometry. However that then complicates your loading process as you need to detect duplicates, set the metadata appropriately and then deal with it again when data is supplied again.

                John
                • 5. Re: duplicate geometry
                  don123
                  john / simon

                  Thanks for explaining that duplicate geometry is NOT going to be invalid geometry.
                  However, if i want to implement topology data model, network data model, will it cause any problem ?
                  Do i need to create separate tables or views according data source to implement topology and network models.

                  regards
                  • 6. Re: duplicate geometry
                    don123
                    Simon / John

                    Do I need to create separate tables or view based on data source to work with topology and network data models ?

                    Please suggest..

                    regards
                    • 7. Re: duplicate geometry
                      John O'Toole
                      Hi nameless,

                      You are deviating from the original question here a bit and the questions are getting very general and therefore more difficult to answer. My suggestion is to do some prototyping of the scenario you want to implement and then when you get stuck, come back here with specific questions including reproducible test cases that allow others to respond easily.

                      John
                      • 8. Re: duplicate geometry
                        don123
                        hi john, i present the example below.


                        road_name datasource geometry
                        ------------------------------------------------------------------
                        NH10 ABC mdsys.sdo_geometry
                        NH10 XYZ mdsys.sdo_geometry
                        NH10 PQR mdsys.sdo_geometry



                        i have same line feature, representing natioinal highway 10, from three different data sources and it has same geometry (duplicate geometry).

                        If i want to implement topology or some solutions like network data model, i need to use the data from only one data source.
                        I am storing the geometry in same 'road' table with datasource as attribute.
                        In this case, i need your suggestions so that i build good data model.
                        Whether to store road geometry in separate tables (road_abc, road_xyz, road_pqr) according to data source or store in same table ??
                        Hope i explained the problem..

                        regards