org.opencms.search.documents
Interface I_CmsSearchExtractor

All Known Subinterfaces:
I_CmsDocumentFactory
All Known Implementing Classes:
A_CmsVfsDocument, CmsDocumentGeneric, CmsDocumentHtml, CmsDocumentMsExcel, CmsDocumentMsPowerPoint, CmsDocumentMsWord, CmsDocumentOpenOffice, CmsDocumentPdf, CmsDocumentPlainText, CmsDocumentRtf, CmsDocumentXmlContent, CmsDocumentXmlPage

public interface I_CmsSearchExtractor

Defines a text extractor for the integrated search engine.

The job of a search extractor is to extract indexable plain text from a resource in the OpenCms VFS. This may be from the resource content, for example from a PDF file, or from the resource properties, for example the Title, Keywords and Description properties.

Since:
6.0.0
Version:
$Revision: 1.10 $
Author:
Carsten Weinholz

Method Summary
 I_CmsExtractionResult extractContent(CmsObject cms, CmsResource resource, CmsSearchIndex index)
          Extractes the content of a given index resource according to the resource file type and the configuration of the given index.
 

Method Detail

extractContent

I_CmsExtractionResult extractContent(CmsObject cms,
                                     CmsResource resource,
                                     CmsSearchIndex index)
                                     throws CmsException
Extractes the content of a given index resource according to the resource file type and the configuration of the given index.

Parameters:
cms - the cms object
resource - the resource to extract the content from
index - the index to extract the content for
Returns:
the extracted content of the resource
Throws:
CmsException - if somethin goes wrong