This content has been marked as final. Show 8 replies
I want to store duplicate geometry intentionally, will it have any problems ??.... I already posted this question in this forum, but i am posting again to get more suggestions.If you didn't get enough suggestions first time around, it is probably because you are really the only person who can answer that question. Depending on your exact circumstances the answer could be "Yes", "No" or "It won't make much difference".
Because same data comes from different sources and going to be big database systemHow about you give us some more useful context.
- What sort of data are we talking about? Roads? Points of Interest? Cadastral data?....
- How many different data sources?
- What are the rough data volumes you will be receiving?
- How often will you receive the data from these data sources?
- How many duplicates are you expecting?
- Why do you not want to deduplicate?
- What will the data be used for? Will the users of the data care about which data source it came from? Will the users of the data care if it is duplicated?
Your question can be turned around to point to a more generic question (for remember: "spatial is not special"!).
Do database tables with date columns mind if you store the same date more than once? Same for numbers, strings etc?
Clearly, the answer is no !
Now, if a duplicate date was not allowed for the same observation of a plant in a table called FLORA, we would stop this by creating a unique constraint on the table:
So the issue about what constitutes valid duplication or not is the way you implement constraints on the table.
ALTER TABLE flora ADD CONSTRAINT flora_plant_uq UNIQUE(plant_id,obs_date);
Constraints and Models
Constraints reflect the "business rules" that constrain the data you are storing so that it is "fit for [your] purpose"
So, in the end, it is all about the data model.
One should start all data storage by creating a data model that exposes the rules you want for the data.
So, for our OBS_DATE column in our FLORA table we may wish that the values entered are valid. But we define valid to mean from 1st January 2005 onwards.
SQL Developer 3.x has a free data modelling tool built in: it is an excellent tool for designing basic data models.
alter table flora add constraint flora_obs_date_ck check(obs_date >= '01/JAN/2005');
It is difficult to implement uniqueness constraints that involve sdo_geometry data. Perhaps one day our SQL/OGC standards bodies and implementing database vendors will catch up on this problem.
Even so, we can do something. For example, if the flora table had an sdo_geometry column, called location, which was a point, this will work:
However, this would not work for linestring/polygon data. For example, let's assume the FLORA table allowed observations of a plant to include a patch ie not a single plant (eg a tree) but an area of blackberries. We can't do the following because one cannot reference a function in the constraint:
alter table flora add constraint flora_location_uq unique (location.sdo_point.x,location.sdo_point.y); -- or with other columns alter table flora add constraint flora_location_uq unique (plant_id,obs_date,location.sdo_point.x,location.sdo_point.y);
OK, so how can we do this?
alter table flora add constraint flora_location_uq unique (plant_id,obs_date,location.get_wkt());
AFAIK, we can only do this via a trigger:
create or replace trigger flora_location_ai after insert on flora for each row Declare v_ok pls_integer; begin SELECT 1 INTO v_ok FROM flora g WHERE g.plant_id = new.plant_id AND g.obs_date = new.obs_date AND SDO_EQUAL(g.geom,:new.geom) = 'TRUE'; -- We found a duplicate, so raise an error raise_application_error(-20001,'Duplicate location found for (' || :new.plant_id || ',' || new.obs_date || ')'); EXCEPTION WHEN NO_DATA_FOUND THEN RETURN; -- No duplicates End; /
So, in short, you can store anything in any Oracle table. In a table with 1 million rows you could have:
1. Only 2 unique values of an column called SPECIES.
2. 20,000 unique OBS_DATE values.
3. 200,00 unique LOCATION values.
But whether this is right or not (garbage) depends on the model and the constraints that represent "correctness".
So, Syed, start with the model.
Edited by: Simon Greener on Nov 10, 2012 10:23 AM
As you said, the features include roads, admin areas and point of interests, but from 4 to 5 different data sources, they provide same information and so requires to keep in same table and differentiate the records by using attribute, for example, roads features provided by different data sources, logical to keep them in one table.
The frequency may be 3-6 months, data source is priority than duplication
thanks for your detailed explanation.
data source is priority than duplicationIn that case it sounds like you might need to store the duplicates. An alternative would be to store one version of each geometry and add metadata that indicates all the data sources that provided that geometry. However that then complicates your loading process as you need to detect duplicates, set the metadata appropriately and then deal with it again when data is supplied again.
john / simon
Thanks for explaining that duplicate geometry is NOT going to be invalid geometry.
However, if i want to implement topology data model, network data model, will it cause any problem ?
Do i need to create separate tables or views according data source to implement topology and network models.
You are deviating from the original question here a bit and the questions are getting very general and therefore more difficult to answer. My suggestion is to do some prototyping of the scenario you want to implement and then when you get stuck, come back here with specific questions including reproducible test cases that allow others to respond easily.
hi john, i present the example below.
road_name datasource geometry
NH10 ABC mdsys.sdo_geometry
NH10 XYZ mdsys.sdo_geometry
NH10 PQR mdsys.sdo_geometry
i have same line feature, representing natioinal highway 10, from three different data sources and it has same geometry (duplicate geometry).
If i want to implement topology or some solutions like network data model, i need to use the data from only one data source.
I am storing the geometry in same 'road' table with datasource as attribute.
In this case, i need your suggestions so that i build good data model.
Whether to store road geometry in separate tables (road_abc, road_xyz, road_pqr) according to data source or store in same table ??
Hope i explained the problem..