This discussion is archived
8 Replies Latest reply: Nov 15, 2012 5:14 AM by don123 RSS

duplicate geometry

don123 Newbie
Currently Being Moderated
hi

I want to store duplicate geometry intentionally, will it have any problems ??
Because same data comes from different sources and going to be big database system.
I already posted this question in this forum, but i am posting again to get more suggestions.

regards
syed masood
  • 1. Re: duplicate geometry
    John O'Toole Journeyer
    Currently Being Moderated
    Hi Syed,
    I want to store duplicate geometry intentionally, will it have any problems ??.... I already posted this question in this forum, but i am posting again to get more suggestions.
    If you didn't get enough suggestions first time around, it is probably because you are really the only person who can answer that question. Depending on your exact circumstances the answer could be "Yes", "No" or "It won't make much difference".
    Because same data comes from different sources and going to be big database system
    How about you give us some more useful context.

    - What sort of data are we talking about? Roads? Points of Interest? Cadastral data?....
    - How many different data sources?
    - What are the rough data volumes you will be receiving?
    - How often will you receive the data from these data sources?
    - How many duplicates are you expecting?
    - Why do you not want to deduplicate?
    - What will the data be used for? Will the users of the data care about which data source it came from? Will the users of the data care if it is duplicated?

    John
  • 2. Re: duplicate geometry
    Simon Greener Journeyer
    Currently Being Moderated
    Syed,

    Your question can be turned around to point to a more generic question (for remember: "spatial is not special"!).

    Do database tables with date columns mind if you store the same date more than once? Same for numbers, strings etc?

    Clearly, the answer is no !

    Attribute Uniqueness

    Now, if a duplicate date was not allowed for the same observation of a plant in a table called FLORA, we would stop this by creating a unique constraint on the table:
    ALTER TABLE flora ADD CONSTRAINT flora_plant_uq UNIQUE(plant_id,obs_date);
    So the issue about what constitutes valid duplication or not is the way you implement constraints on the table.

    Constraints and Models

    Constraints reflect the "business rules" that constrain the data you are storing so that it is "fit for [your] purpose"

    So, in the end, it is all about the data model.

    One should start all data storage by creating a data model that exposes the rules you want for the data.

    So, for our OBS_DATE column in our FLORA table we may wish that the values entered are valid. But we define valid to mean from 1st January 2005 onwards.
    alter table flora add constraint flora_obs_date_ck check(obs_date >= '01/JAN/2005');
    SQL Developer 3.x has a free data modelling tool built in: it is an excellent tool for designing basic data models.

    Geometry Constraints

    It is difficult to implement uniqueness constraints that involve sdo_geometry data. Perhaps one day our SQL/OGC standards bodies and implementing database vendors will catch up on this problem.

    Even so, we can do something. For example, if the flora table had an sdo_geometry column, called location, which was a point, this will work:
    alter table flora add constraint flora_location_uq unique (location.sdo_point.x,location.sdo_point.y);
    -- or with other columns
    alter table flora add constraint flora_location_uq unique (plant_id,obs_date,location.sdo_point.x,location.sdo_point.y);
    However, this would not work for linestring/polygon data. For example, let's assume the FLORA table allowed observations of a plant to include a patch ie not a single plant (eg a tree) but an area of blackberries. We can't do the following because one cannot reference a function in the constraint:
    alter table flora add constraint flora_location_uq unique (plant_id,obs_date,location.get_wkt());
    OK, so how can we do this?

    AFAIK, we can only do this via a trigger:
    create or replace trigger flora_location_ai 
    after insert on flora 
    for each row 
    Declare
      v_ok pls_integer;
    begin
      SELECT 1
        INTO v_ok
        FROM flora g
       WHERE g.plant_id = new.plant_id
         AND g.obs_date = new.obs_date
         AND SDO_EQUAL(g.geom,:new.geom) = 'TRUE';
      -- We found a duplicate, so raise an error
      raise_application_error(-20001,'Duplicate location found for (' || :new.plant_id || ',' || new.obs_date || ')');
      EXCEPTION
         WHEN NO_DATA_FOUND THEN
             RETURN; -- No duplicates
    End;
    /
    Summary

    So, in short, you can store anything in any Oracle table. In a table with 1 million rows you could have:

    1. Only 2 unique values of an column called SPECIES.
    2. 20,000 unique OBS_DATE values.
    3. 200,00 unique LOCATION values.

    But whether this is right or not (garbage) depends on the model and the constraints that represent "correctness".

    So, Syed, start with the model.

    regards
    Simon

    Edited by: Simon Greener on Nov 10, 2012 10:23 AM
  • 3. Re: duplicate geometry
    don123 Newbie
    Currently Being Moderated
    john

    As you said, the features include roads, admin areas and point of interests, but from 4 to 5 different data sources, they provide same information and so requires to keep in same table and differentiate the records by using attribute, for example, roads features provided by different data sources, logical to keep them in one table.

    The frequency may be 3-6 months, data source is priority than duplication

    simon

    thanks for your detailed explanation.

    regards
  • 4. Re: duplicate geometry
    John O'Toole Journeyer
    Currently Being Moderated
    data source is priority than duplication
    In that case it sounds like you might need to store the duplicates. An alternative would be to store one version of each geometry and add metadata that indicates all the data sources that provided that geometry. However that then complicates your loading process as you need to detect duplicates, set the metadata appropriately and then deal with it again when data is supplied again.

    John
  • 5. Re: duplicate geometry
    don123 Newbie
    Currently Being Moderated
    john / simon

    Thanks for explaining that duplicate geometry is NOT going to be invalid geometry.
    However, if i want to implement topology data model, network data model, will it cause any problem ?
    Do i need to create separate tables or views according data source to implement topology and network models.

    regards
  • 6. Re: duplicate geometry
    don123 Newbie
    Currently Being Moderated
    Simon / John

    Do I need to create separate tables or view based on data source to work with topology and network data models ?

    Please suggest..

    regards
  • 7. Re: duplicate geometry
    John O'Toole Journeyer
    Currently Being Moderated
    Hi nameless,

    You are deviating from the original question here a bit and the questions are getting very general and therefore more difficult to answer. My suggestion is to do some prototyping of the scenario you want to implement and then when you get stuck, come back here with specific questions including reproducible test cases that allow others to respond easily.

    John
  • 8. Re: duplicate geometry
    don123 Newbie
    Currently Being Moderated
    hi john, i present the example below.


    road_name datasource geometry
    ------------------------------------------------------------------
    NH10 ABC mdsys.sdo_geometry
    NH10 XYZ mdsys.sdo_geometry
    NH10 PQR mdsys.sdo_geometry



    i have same line feature, representing natioinal highway 10, from three different data sources and it has same geometry (duplicate geometry).

    If i want to implement topology or some solutions like network data model, i need to use the data from only one data source.
    I am storing the geometry in same 'road' table with datasource as attribute.
    In this case, i need your suggestions so that i build good data model.
    Whether to store road geometry in separate tables (road_abc, road_xyz, road_pqr) according to data source or store in same table ??
    Hope i explained the problem..

    regards

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points