org.opencms.search.extractors
Class CmsExtractorMsPowerPoint

java.lang.Object
  extended by org.opencms.search.extractors.A_CmsTextExtractor
      extended by org.opencms.search.extractors.A_CmsTextExtractorMsOfficeBase
          extended by org.opencms.search.extractors.CmsExtractorMsPowerPoint
All Implemented Interfaces:
org.apache.poi.poifs.eventfilesystem.POIFSReaderListener, I_CmsTextExtractor

public final class CmsExtractorMsPowerPoint
extends A_CmsTextExtractorMsOfficeBase

Extracts the text from an MS PowerPoint document.

Since:
6.0.0
Version:
$Revision: 1.17 $
Author:
Alexander Kandzior

Field Summary
 
Fields inherited from class org.opencms.search.extractors.A_CmsTextExtractorMsOfficeBase
ENCODING_CP1252, ENCODING_UTF16, POWERPOINT_EVENT_NAME, PPT_TEXTBYTE_ATOM, PPT_TEXTCHAR_ATOM
 
Fields inherited from class org.opencms.search.extractors.A_CmsTextExtractor
m_inputBuffer
 
Method Summary
 I_CmsExtractionResult extractText(java.io.InputStream in, java.lang.String encoding)
          Extracts the text and meta information from the document on the input stream, using the specified content encoding.
static I_CmsTextExtractor getExtractor()
          Returns an instance of this text extractor.
 void processPOIFSReaderEvent(org.apache.poi.poifs.eventfilesystem.POIFSReaderEvent event)
           
 
Methods inherited from class org.opencms.search.extractors.A_CmsTextExtractorMsOfficeBase
cleanup, createExtractionResult
 
Methods inherited from class org.opencms.search.extractors.A_CmsTextExtractor
combineContentItem, extractText, extractText, extractText, getStreamCopy, removeControlChars
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getExtractor

public static I_CmsTextExtractor getExtractor()
Returns an instance of this text extractor.

Returns:
an instance of this text extractor

extractText

public I_CmsExtractionResult extractText(java.io.InputStream in,
                                         java.lang.String encoding)
                                  throws java.lang.Exception
Description copied from interface: I_CmsTextExtractor
Extracts the text and meta information from the document on the input stream, using the specified content encoding.

The encoding is a hint for the text extractor, if the value given is null then the text extractor should try to figure out the encoding itself.

Specified by:
extractText in interface I_CmsTextExtractor
Overrides:
extractText in class A_CmsTextExtractor
Parameters:
in - the input stream for the document to extract the text from
encoding - the encoding to use
Returns:
the extracted text and meta information
Throws:
java.lang.Exception - if the text extration fails
See Also:
I_CmsTextExtractor.extractText(java.io.InputStream, java.lang.String)

processPOIFSReaderEvent

public void processPOIFSReaderEvent(org.apache.poi.poifs.eventfilesystem.POIFSReaderEvent event)
Specified by:
processPOIFSReaderEvent in interface org.apache.poi.poifs.eventfilesystem.POIFSReaderListener
Overrides:
processPOIFSReaderEvent in class A_CmsTextExtractorMsOfficeBase
See Also:
POIFSReaderListener.processPOIFSReaderEvent(org.apache.poi.poifs.eventfilesystem.POIFSReaderEvent)