Skip to Main Content

DevOps, CI/CD and Automation

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Load xml data in Oracle table

841964Feb 23 2011 — edited Jun 28 2013
Hi,

I've already gone through few threads in forums itself, but for my requirement nothing comes closer, I am posting my query. I've one xml file like this
<?xml version="1.0"?>
<ACCOUNT_HEADER_ACK>
<HEADER>
<STATUS_CODE>100</STATUS_CODE>
<STATUS_REMARKS>check</STATUS_REMARKS>
</HEADER>
<DETAILS>
<DETAIL>
<SEGMENT_NUMBER>2</SEGMENT_NUMBER>
<REMARKS>rp polytechnic</REMARKS>
</DETAIL>
<DETAIL>
<SEGMENT_NUMBER>3</SEGMENT_NUMBER>
<REMARKS>rp polytechnic administration</REMARKS>
</DETAIL>
<DETAIL>
<SEGMENT_NUMBER>4</SEGMENT_NUMBER>
<REMARKS>rp polytechnic finance</REMARKS>
</DETAIL>
<DETAIL>
<SEGMENT_NUMBER>5</SEGMENT_NUMBER>
<REMARKS>rp polytechnic logistics</REMARKS>
</DETAIL>
</DETAILS>
<HEADER>
<STATUS_CODE>500</STATUS_CODE>
<STATUS_REMARKS>process exception</STATUS_REMARKS>
</HEADER>
<DETAILS>
<DETAIL>
<SEGMENT_NUMBER>20</SEGMENT_NUMBER>
<REMARKS> base polytechnic</REMARKS>
</DETAIL>
<DETAIL>
<SEGMENT_NUMBER>30</SEGMENT_NUMBER>
</DETAIL>
<DETAIL>
<SEGMENT_NUMBER>40</SEGMENT_NUMBER>
<REMARKS> base polytechnic finance</REMARKS>
</DETAIL>
<DETAIL>
<SEGMENT_NUMBER>50</SEGMENT_NUMBER>
<REMARKS> base polytechnic logistics</REMARKS>
</DETAIL>
</DETAILS>
</ACCOUNT_HEADER_ACK>

Here the xml structure is like Master and child structure I want to insert that data into Oracle tables by using the sql*loader initally I tried to created one control file but in the control file I don't know how to terminate, so I created two control files

load data
infile 'acct.xml' "str '</DETAIL>'"
truncate
into table xxrp_acct_detail
TRAILING NULLCOLS
(
dummy filler terminated by "<DETAIL>",
SEGMENT_NUMBER enclosed by "<SEGMENT_NUMBER>" and "</SEGMENT_NUMBER>",
REMARKS enclosed by "<REMARKS>" and "</REMARKS>"
)


load data
infile acct.xml' "str '</HEADER>'"
truncate
into table xxrp_acct_header
fields terminated by "<HEADER>"
TRAILING NULLCOLS
(
dummy filler terminated by "<HEADER>",
STATUS_CODE enclosed by "<STATUS_CODE>" and "</STATUS_CODE>",
STATUS_REMARKS enclosed by "<STATUS_REMARKS>" and "</STATUS_REMARKS>"
)

I am referring the same xml file in both the control files, where as for the for the first control file I was able to load the records but the second which I assume as header table not able to load the rest records. I am getting the below mentioned error.

Record 2: Rejected - Error on table XXRP_ACCT_HEADER, column DUMMY.
Field in data file exceeds maximum length
Record 3: Rejected - Error on table XXRP_ACCT_HEADER, column DUMMY.
Field in data file exceeds maximum length

Actually if its possible to seggrate in one control file then it will be helpful for me. I am also open for the external table option also. Please help me in this regard.

Thanks in advance.

Regards
Nagendra
This post has been answered by odie_63 on Feb 23 2011
Jump to Answer

Comments

sb92075

if you enable SQL_TRACE, then you'll see where actual time is being spend & can even generate EXPLAIN PLAN for all impactful SQL statements

Hemant K Chitale

>CREATE BITMAP INDEX "QQRCSBI0"."I_S_RMP_EVAL_CSC_MSG_ACTIONS" ON "QQRCSBI0"."S_RMP_EVALUATION_CSC_MESSAGE" (DECODE(INSTR("XML_MESSAGE_TEXT",'<actions>'),0,0,1))

That's strange.  You are creating (a) a Bitmap Index  on (b) a Function  on (c) an out-of-line LOB.

Why do you want to index the LOB ?  Can you store the desired "flag"  (whether "<actions>" exists or not) in a separate column and then index it ?

Hemant K Chitale

Dave Rabone

"I am inserting about 11.000 new records into a table with 5.3 million existing records"

... which is adding 0.2% of the existing rows, so I would not expect the statistics to change significantly, certainly not to the extent that the optimiser would make different decisions. It might be different if you were doing something funky with histograms, but you aren't.

As many have said previously "The quickest way to do something is not to do it at all"

978225

Dave Rabone wrote:

"I am inserting about 11.000 new records into a table with 5.3 million existing records"

... which is adding 0.2% of the existing rows, so I would not expect the statistics to change significantly, certainly not to the extent that the optimiser would make different decisions. It might be different if you were doing something funky with histograms, but you aren't.

As many have said previously "The quickest way to do something is not to do it at all"

Are you saying recollecting the stats is not necessary in the first place?

That may be true for just doing the insert once, but after doing this often enough, I will be so far off that I do need to recollect stats, and then I'll be back to my original problem, won't I?

unknown-7404

A couple of other anomalies:

Why do you have DEFAULT NULL on a column with a NOT NULL constraint?

"DWH_LM_TS_UTC" DATE DEFAULT NULL NOT NULL ENABLE,    

Why are you creating two composite indexes on the same two columns?

"S_RMP_EVALUATION_CSC_MESSAGE" ("ID", "DWH_LM_TS_UTC")   

"S_RMP_EVALUATION_CSC_MESSAGE" ("DWH_LM_TS_UTC", "ID")  

The ID column is already a primary key. So if you provide ID only it will use that index. And if you provide both column values then either of the two indexes will do the job.

978225

Hemant K Chitale wrote:

Why do you want to index the LOB ?  Can you store the desired "flag"  (whether "<actions>" exists or not) in a separate column and then index it ?

Hemant K Chitale

I figured that's what functional indexes are for. And bitmap because it's only 2 possible values, so a regular index wouldn't perform.

I will try your suggestion as a workaround for my problem, but I would still need the bitmap index and I would still have to search through the LOB's value, so it's mostly still the same thing.

Any idea why this is happening in the first place?

Hemant K Chitale

>And bitmap because it's only 2 possible values

If the distribution among the two values (0 and 1) would be equal (50% each), a bitmap index wouldn't really help.  A bitmap index on such a column can be useful if it can be combined with other bitmap indexes on the same table --- which doesn't apply to this table.

If, however, the distribution is, say 90-10 and you are querying for the 10% of rows, a bitmap index can be helpful.

You suspect (or have identified) that the gather_stats has been taking much longer since the introduction of this new index.  Can you evaluate your existing SQL executions to see if the index is being used at all ?  SQL statements must use the same set of decode(instr...) functions to be able to use the index.

A function based index actuall adds a new virtual column to the table gather_table_stats has to read that column.  I wonder how Oracle performs a gather_stats on a function on a LOB.  Never seen it before.

Hemant K Chitale


978225

Why do you have DEFAULT NULL on a column with a NOT NULL constraint?

"DWH_LM_TS_UTC" DATE DEFAULT NULL NOT NULL ENABLE,  

Because I need it NOT NULL and because the only DEFAULT value that would make sense for this case is the result of a user-defined function, which Oracle will not accept as DEFAULT. So this way, I basically force any INSERT to specify a value. If not, the INSERT will fail.

Why are you creating two composite indexes on the same two columns?

"S_RMP_EVALUATION_CSC_MESSAGE" ("ID", "DWH_LM_TS_UTC")  

"S_RMP_EVALUATION_CSC_MESSAGE" ("DWH_LM_TS_UTC", "ID"

I was trying to see if any of these two indexes would be picked up by the queries I am trying to optimize. I will most likely end up removing at least one if not both.

978225

Hemant K Chitale wrote:

>And bitmap because it's only 2 possible values

If the distribution among the two values (0 and 1) would be equal (50% each), a bitmap index wouldn't really help.  A bitmap index on such a column can be useful if it can be combined with other bitmap indexes on the same table --- which doesn't apply to this table.

If, however, the distribution is, say 90-10 and you are querying for the 10% of rows, a bitmap index can be helpful.

It's more like 90/10 and I am querying for the 90 part, but did it this was because I was thinking the index was going to once evaluate the formula and then store the result in the index structure - basically exactly what your original suggestion with the extra flag column would do, but stored differently.

You suspect (or have identified) that the gather_stats has been taking much longer since the introduction of this new index.  Can you evaluate your existing SQL executions to see if the index is being used at all ?  SQL statements must use the same set of decode(instr...) functions to be able to use the index.

Yes, the query is using exactly the same expression as the index.

A function based index actuall adds a new virtual column to the table gather_table_stats has to read that column.  I wonder how Oracle performs a gather_stats on a function on a LOB.  Never seen it before.

Maybe this is where my disconnect is - when is the index's function evaluated? Is it when the index is updated after insert/update of a record, or is it during query time? I would have expected the first, meaning that evaluation of the function should already have happened when I ask Oracle to collect the stats. I would not expect Oracle to evaluate the function for each newly inserted row as part of collecting the stats?!?! Or am I wrong?



Hemant K Chitale

The function should be evaluated when the index is created and when rows are inserted/updated.

However, I am stumped by your using it against a LOB.

What sort of queries do you run against the table ?  If the queries are retrieving 90% of the rows and you are retrieving columns other than the return value of the function, Oracle has to go to the table and not use the index.

Hemant K Chitale

Jonathan Lewis

Hemant has given you the clue.

When you create a function-based index Oracle supports it with a virtual column.

When you gather table stats Oracle evaluates the function for each row in the table - it doesn't try to look at the index to derive the statistics from the index (which, technically, would be possible).

Your extra time for collecting stats is probably spent hacking through all those LOBs.

This might be a case where you should add a column and store the result of the function (and then not bother with an index on the column).

Regards

Jonathan Lewis

978225

However, I am stumped by your using it against a LOB.

Well, it's just a column. It may contain larger content and be stored differently, but still - it's just a column... Isn't it?

What sort of queries do you run against the table ?  If the queries are retrieving 90% of the rows and you are retrieving columns other than the return value of the function, Oracle has to go to the table and not use the index.

The content of the LOB is XML which is later inserted into an object-relational XmlType column. There are some XMLs that contain a tag (<actions>) which violates the structure, thus I am trying to filter those out. I was using an XPath expression for that but figured that a plain text search should be faster, because that will save the engine the effort of parsing the whole thing in some kind of an XML structure.

The query is basically a

INSERT INTO ... SELECT ... from S_RMP_EVALUATION_CSC_MESSAGE

WHERE DECODE(INSTR("XML_MESSAGE_TEXT",'<actions>'),0,0,1)=0

Then I figured I add the function-based BITMAP index to prevent Oracle from having to re-evaluate at query time and instead just use the pre-calculated index value.

You are right, it is not a covering index, so Oracle still needs to access the table data to get the other columns for the INSERT, but I figured it would use the index to select the matching records first and then move on to the records in the table. At that point, a lot would have been gained already, compared to evaluating the WHERE condition during query time.

978225

When you create a function-based index Oracle supports it with a virtual column.

Ok, so far so good. So the virtual column holds the result of the function.

When you gather table stats Oracle evaluates the function for each row in the table - it doesn't try to look at the index to derive the statistics from the index (which, technically, would be possible).

This is what I did not expect. If it already has it evaluated and stored in the virtual column, I would have expected it to simply read it from there rather than re-evaluating the function for each record... Are you sure about this?

Do you happen to know if the same thing is going to happen with a user-declared virtual column? I would be creating the same virtual column as Oracle is doing internally... Guess I'll try that...

Your extra time for collecting stats is probably spent hacking through all those LOBs.

This might be a case where you should add a column and store the result of the function (and then not bother with an index on the column).

This would be an explanation for why things take so long. Although I don't see why Oracle wouldn't simply be using the value from the virtual column.

Jonathan Lewis
Answer

It's a VIRTUAL column - Oracle doesn't store the value of the function in the table for each row, it simply stores the description of the defining function as the column definition in the data dictionary.  The index DOES hold the actual calculated value (with the rowid of the row that each value came from), but Oracle doesn't have a mechanism for looking at the index when it's gathering column stats.  (In fact such a mechanism could only be created for a single column index anyway).

Regards

Jonathan Lewis

Marked as Answer by 978225 · Sep 27 2020
Hemant K Chitale

I actually see the avg_row_len increasing -- as if the computed value is stored with the table.

SQL> create table hkc_test_fbi_vc as select * from dba_objects where rownum < 100;

Table created.

SQL> exec dbms_stats.gather_table_stats('','HKC_TEST_FBI_VC');

PL/SQL procedure successfully completed.

SQL> select num_rows, avg_row_len, blocks
  2  from user_tables
  3  where table_name = 'HKC_TEST_FBI_VC';

  NUM_ROWS AVG_ROW_LEN     BLOCKS
---------- ----------- ----------
        99          76          5

1 row selected.

SQL> create index hkc_test_fbi_vc_ndx on hkc_test_fbi_vc(rpad(owner,100,'X'));

Index created.

SQL> exec dbms_stats.gather_table_stats('','HKC_TEST_FBI_VC');

PL/SQL procedure successfully completed.

SQL> select num_rows, avg_row_len, blocks
  2  from user_tables
  3  where table_name = 'HKC_TEST_FBI_VC';

  NUM_ROWS AVG_ROW_LEN     BLOCKS
---------- ----------- ----------
        99         177          5

1 row selected.

SQL>

As expected, the new column, being virtual, doesn't appear in a describe of the table.

SQL> desc hkc_test_fbi_vc
Name                                                                                Null?    Type
----------------------------------------------------------------------------------- -------- ---------------

OWNER                                                                                        VARCHAR2(30)
OBJECT_NAME                                                                                  VARCHAR2(128)
SUBOBJECT_NAME                                                                               VARCHAR2(30)
OBJECT_ID                                                                                    NUMBER
DATA_OBJECT_ID                                                                               NUMBER
OBJECT_TYPE                                                                                  VARCHAR2(19)
CREATED                                                                                      DATE
LAST_DDL_TIME                                                                                DATE
TIMESTAMP                                                                                    VARCHAR2(19)
STATUS                                                                                       VARCHAR2(7)
TEMPORARY                                                                                    VARCHAR2(1)
GENERATED                                                                                    VARCHAR2(1)
SECONDARY                                                                                    VARCHAR2(1)
NAMESPACE                                                                                    NUMBER
EDITION_NAME                                                                                 VARCHAR2(30)

SQL>

SQL> select column_name, column_id from user_tab_columns
  2  where table_name = 'HKC_TEST_FBI_VC'
  3  order by 2;

COLUMN_NAME                     COLUMN_ID
------------------------------ ----------
OWNER                                   1
OBJECT_NAME                             2
SUBOBJECT_NAME                          3
OBJECT_ID                               4
DATA_OBJECT_ID                          5
OBJECT_TYPE                             6
CREATED                                 7
LAST_DDL_TIME                           8
TIMESTAMP                               9
STATUS                                 10
TEMPORARY                              11
GENERATED                              12
SECONDARY                              13
NAMESPACE                              14
EDITION_NAME                           15

15 rows selected.

SQL>

SQL> select leaf_blocks, blevel, distinct_keys
  2  from user_indexes
  3  where index_name = 'HKC_TEST_FBI_VC_NDX';

LEAF_BLOCKS     BLEVEL DISTINCT_KEYS
----------- ---------- -------------
          2          1             1

1 row selected.

SQL>

(which is OK because all 99 rows have the same OWNER value SYS)

SQL> select distinct owner from hkc_test_fbi_vc;

OWNER
------------------------------
SYS

1 row selected.

SQL>

Hemant K Chitale

unknown-7404
I actually see the avg_row_len increasing -- as if the computed value is stored with the table.

. . .

As expected, the new column, being virtual, doesn't appear in a describe of the table.

That's because that data is sourced from ALL_TAB_COLUMNS which filters hidden columns.

It shows in ALL_TAB_COLS with an AVG_COL_LEN of 101. Sum up all of those and you get 178.

Hemant K Chitale

This was with reference to Jonathan's assertion "Oracle doesn't store the value of the function in the table for each row, it simply stores the description of the defining function as the column definition in the data dictionary"

I believe that the  actual values derived from the function are stored with each row.

Hemant K Chitale


unknown-7404
This was with reference to Jonathan's assertion "Oracle doesn't store the value of the function in the table for each row, it simply stores the description of the defining function as the column definition in the data dictionary"

I believe that the  actual values derived from the function are stored with each row.

If you dump a table block you will see that the virtual data isn't really stored. The value is, of course, stored in an index along with the ROWID of the corresponding row as Jonathan said.

An easier way to see it isn't really stored is to modify your example a bit to make the hidden column REALLY big and then see how many rows you can fit into a block:

CREATE TABLE TEST_FBI_VC1
(
  OWNER           VARCHAR2(30 BYTE)
)

create index hkc_test_fbi_vc_ndx1 on test_fbi_vc1(rpad(owner,3000,'X'));

INSERT INTO TEST_FBI_VC1 VALUES ('ABC')

exec dbms_stats.gather_table_stats('','TEST_FBI_VC1');

SELECT NUM_ROWS, BLOCKS, EMPTY_BLOCKS
FROM ALL_TABLES
WHERE TABLE_NAME = 'TEST_FBI_VC1'

NUM_ROWS,BLOCKS,EMPTY_BLOCKS

540,5,0

Hmmm - 540 rows of 3003 bytes doesn't seem like it would fit in only 5 blocks (the initial allocation).

Jonathan Lewis

Hemant K Chitale wrote:

I actually see the avg_row_len increasing -- as if the computed value is stored with the table.

Hemant,

That's a good point, and worth making.

It's an example of how any new feature that appears in Oracle has a number of little bits of related code that have not been brought up to date. There's some generic code that calculates avg_row_len as sum(avg_col_len), and doesn't exclude the virtual columns.

Arguably, of course, it's the correct thing to do since (if I recall correctly) the optimizer actually uses avg_row_len in its volume (bytes) calculations when your SQL includes select * from base_table" - maybe avg_row_len has always meant "length of in-memory image of row" rather than "length of on-disc size of row".

If you want an example of another anomaly you just have to look at LOBs. If you store an empty_clob() in a column Oracle reports its column size as 86 bytes, even though the on-disc size is 35 bytes (or thereabouts).  Presumably the in-memory size of a LOB locator is the size of a statically declared structure that has a maximum size of 86 bytes.

(Tip: try experimenting with ANALYZE on a table, then add a virtual column - ANALYZE uses the on-disc size as the row length)

Regards

Jonathan Lewis.

1 - 19
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jul 26 2013
Added on Feb 23 2011
14 comments
77,858 views