|
||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||
See:
Description
| Interface Summary | |
|---|---|
| I_CmsExtractionResult | The result of a document text extraction. |
| I_CmsTextExtractor | Allows extraction of the indexable "plain" text plus (optional) meta information from a given binary input document format. |
| Class Summary | |
|---|---|
| A_CmsTextExtractor | Base utility class that allows extraction of the indexable "plain" text from a given document format. |
| A_CmsTextExtractorMsOfficeBase | Base class to extract summary information from MS office documents. |
| CmsExtractionResult | The result of a document text extraction. |
| CmsExtractorHtml | Extracts the text from an HTML document. |
| CmsExtractorMsExcel | Extracts the text from an MS Excel document. |
| CmsExtractorMsPowerPoint | Extracts the text from an MS PowerPoint document. |
| CmsExtractorMsWord | Extracts the text from an MS Word document. |
| CmsExtractorOpenOffice | Extracts the text from OpenOffice documents (.ods, .odf). |
| CmsExtractorPdf | Extracts the text from a PDF document. |
| CmsExtractorRtf | Extracts the text from a RTF document. |
Contains a generic, low-level framework for extration of plain text content out of various popular file formats.
|
||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||