I need to create a materialized RDF model from a relational dataset with 28 tables which have around 500 of columns and 10 million of rows.
I'm following these steps to build the RDF graph:
--1.- Insert the R2RML mapping triples into staging table insert into STAGE_TABLE ....; --(14439 lines inserted into STAGE_TABLE) --2.- Create a virtual rdf model from this table execute sem_apis.create_rdfview_model( model_name => 'virtual_model', tables => NULL, r2rml_table_owner => 'MYUSER', r2rml_table_name => 'STAGE_TABLE'); --3.- Export into an staging stable the triples from the virtual model truncate table STAGE_TABLE drop storage; execute sem_apis.export_rdfview_model(model_name => 'virtual_model', rdf_table_owner => 'MYUSER', rdf_table_name => 'STAGE_TABLE' ); --4.- Create the rdf model execute sem_apis.create_sem_model('mat_model', 'rdf_data', 'triple'); --5.- Bulk load into the rdf model the triples from staging table execute sem_apis.bulk_load_from_staging_table('mat_model','MYUSER','STAGE_TABLE');
I tried to build the RDF graph, but now the error was with the hard disk space, because the size of hard disk in the server is 200 GB, principally the TEMPORARY tablespace increased very much (~112 GB).
I would like to know:
1.- Why so much space is needed for building the RDF graph? 200 GB was not enough to create the RDF graph from a relational data set of 20 GB.
2.- Is there some relationship between referred metrics (numbers of tables, columns and rows) to size the necessary space that will be required?
Since you have already materialized (exported) the RDF triples from RDF View, please check the count(*) on MYUSER.STAGE_TABLE.
For direct mapping, if the number of columns is in a table is C and the number of rows is R, then number of RDF triples that usually get generated is (C+1) * R.
For R2RML mapping (which you seem to be using), it depends upon the actual mapping (e.g., how many PredicateObjectMaps are being used, and so on).