Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Support for Office 2007 files (especially docx) ?

591866Feb 26 2008 — edited Mar 31 2009
Hi,
can you please give me a hint if it's possible to use ora-text with the current office 2007 file formats ?

I need to have support for word 2007 (docx) format very soon
Is it already available ?
If not, is there a plan to support those formats ?

Thank you

Comments

Roger Ford-Oracle
It's planned for a future release, and we're hoping to backport to various releases, but I'm afraid I can't give you any specifics at the moment.
613654
Is there any update on this?
591866
Hi,
some news about the support for docx office format ?
this is really a problem for me cause my clients want to switch to new office file format and store them in ora

Is there any "workaround" for this missing feature ?
I'm thinking of converting docx content to ansi-text and then do the full text-search on this content...

thank you for any response!
user304344
Hi,

there are many workraound how you can index docx files with oracle text 10.2 or below. One of them is to use parse the docx file (using java and xslt) and extract the content. The extraced content can be passed to oracle text - and you have your doc indexed.
If you need more information e.g. xslts don't hesitate and ask. (http://tinyurl.com/d9ld5p)

Regards,
TS
70846
What are some of the other workarounds? We're a PL/SQL shop and MS2007 is killing us!
user304344
Hi,

the workaround descrived above would also work for your plsql app. Therefore you can use the PL/SQL XSLT Processor in order to extract the content needed from the office 2007 documents. See DBMS_XSLPROCESSOR.

What is your business case? How do you store and index your documents?

p.s Here is an example of an xslt to extract the text content from an docx document. The internal structure of docx document is based on the Open Packaging Conventions (see http://openxmldeveloper.org/articles/OPC_parts.aspx )
****************************
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:v="urn:schemas-microsoft-com:vml"
exclude-result-prefixes="w v">
<xsl:output method="text" indent="no" encoding="UTF-8" version="1.0"/>

<!-- document root -->
<xsl:template match="/">
<!-- root element in document -->
<xsl:apply-templates select="w:document"/>
<!-- root element in header -->
<xsl:apply-templates select="w:hdr"/>
<!-- root element in footer -->
<xsl:apply-templates select="w:ftr"/>
<!-- root element in comments -->
<xsl:apply-templates select="w:comments"/>
<!-- root element in foornodes -->
<xsl:apply-templates select="w:footnotes"/>
<!-- root element in endnodes -->
<xsl:apply-templates select="w:endnotes"/>
<!-- root element in glossary -->
<xsl:apply-templates select="w:glossaryDocument"/>
</xsl:template>


<!-- ****************************
start document
**************************** -->
<xsl:template match="w:document">
<xsl:for-each select="//w:p">
<xsl:apply-templates select="*/w:t"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<!-- used for word art text -->
<xsl:apply-templates select="//v:textpath"/>
</xsl:template>

<!-- get all text nodes within a para -->
<xsl:template match="*/w:t">
<xsl:value-of select="."/>
</xsl:template>

<!-- gword art text -->
<xsl:template match="//v:textpath">
<xsl:value-of select="@string"/>
</xsl:template>

<!-- ****************************
end document
**************************** -->

<!-- ****************************
start header
**************************** -->
<xsl:template match="w:hdr">
<xsl:for-each select="//w:p">
<xsl:apply-templates select="*/w:t"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

<!-- ****************************
end header
**************************** -->

<!-- ****************************
start footer
**************************** -->
<xsl:template match="w:ftr">
<xsl:for-each select="//w:p">
<xsl:apply-templates select="*/w:t"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

<!-- ****************************
end footer
**************************** -->

<!-- ****************************
start comments
**************************** -->
<xsl:template match="w:comments">
<xsl:for-each select="//w:t">
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:template>

<!-- ****************************
end comments
**************************** -->

<!-- ****************************
start footnodes
**************************** -->
<xsl:template match="w:footnotes">
<xsl:for-each select="//w:p">
<xsl:apply-templates select="*/w:t"/>
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:template>

<!-- ****************************
end footnodes
**************************** -->

<!-- ****************************
start endnodes
**************************** -->
<xsl:template match="w:endnotes">
<xsl:for-each select="//w:p">
<xsl:apply-templates select="*/w:t"/>
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:template>

<!-- ****************************
end endnodes
**************************** -->

<!-- ****************************
start glossary
**************************** -->
<xsl:template match="w:glossaryDocument">
<xsl:for-each select="//w:p">
<xsl:apply-templates select="*/w:t"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
<!-- used for word art text -->
<xsl:apply-templates select="//v:textpath"/>
</xsl:template>

<!-- ****************************
end glossary
**************************** -->

</xsl:stylesheet>
*********************************

Edited by: user304344 on Mar 30, 2009 10:23 PM
70846
Thanks!
Stored as blobs uploaded thru the PL/SQL Gateway. Oracle Text index.
user304344
How is oracle text provisioned with new or updated documents? Via a pls/sql call?
70846
Yes. Oracle Text reindexes on commit to the upload table. All called by plsql.
1 - 9
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Apr 28 2009
Added on Feb 26 2008
9 comments
5,593 views