public class PDFParserConfig extends Object implements Serializable
| Constructor and Description |
|---|
PDFParserConfig() |
PDFParserConfig(InputStream is)
Loads properties from InputStream and then tries to close InputStream.
|
| Modifier and Type | Method and Description |
|---|---|
void |
configure(org.apache.tika.parser.pdf.PDF2XHTML pdf2XHTML)
Configures the given pdf2XHTML.
|
boolean |
equals(Object obj) |
Float |
getAverageCharTolerance() |
boolean |
getEnableAutoSpace() |
boolean |
getExtractAcroFormContent() |
boolean |
getExtractAnnotationText() |
boolean |
getSortByPosition() |
Float |
getSpacingTolerance() |
boolean |
getSuppressDuplicateOverlappingText() |
boolean |
getUseNonSequentialParser() |
int |
hashCode() |
void |
setAverageCharTolerance(Float averageCharTolerance)
See
PDFTextStripper.setAverageCharTolerance(float) |
void |
setEnableAutoSpace(boolean enableAutoSpace)
If true (the default), the parser should estimate
where spaces should be inserted between words.
|
void |
setExtractAcroFormContent(boolean extractAcroFormContent)
If true (the default), extract content from AcroForms
at the end of the document.
|
void |
setExtractAnnotationText(boolean extractAnnotationText)
If true (the default), text in annotations will be
extracted.
|
void |
setSortByPosition(boolean sortByPosition)
If true, sort text tokens by their x/y position
before extracting text.
|
void |
setSpacingTolerance(Float spacingTolerance)
See
PDFTextStripper.setSpacingTolerance(float) |
void |
setSuppressDuplicateOverlappingText(boolean suppressDuplicateOverlappingText)
If true, the parser should try to remove duplicated
text over the same region.
|
void |
setUseNonSequentialParser(boolean useNonSequentialParser)
If true, uses PDFBox's non-sequential parser.
|
String |
toString() |
public PDFParserConfig()
public PDFParserConfig(InputStream is)
is - public void configure(org.apache.tika.parser.pdf.PDF2XHTML pdf2XHTML)
pdf2XHTML - public void setExtractAcroFormContent(boolean extractAcroFormContent)
b - public boolean getExtractAcroFormContent()
setExtractAcroFormContent(boolean)public boolean getEnableAutoSpace()
#setEnableAutoSpace.public void setEnableAutoSpace(boolean enableAutoSpace)
public boolean getSuppressDuplicateOverlappingText()
public void setSuppressDuplicateOverlappingText(boolean suppressDuplicateOverlappingText)
public boolean getExtractAnnotationText()
setExtractAnnotationText(boolean)public void setExtractAnnotationText(boolean extractAnnotationText)
public boolean getSortByPosition()
setSortByPosition(boolean)public void setSortByPosition(boolean sortByPosition)
public boolean getUseNonSequentialParser()
setUseNonSequentialParser(boolean)public void setUseNonSequentialParser(boolean useNonSequentialParser)
Default is false (use the traditional parser)
useNonSequentialParser - public Float getAverageCharTolerance()
setAverageCharTolerance(Float)public void setAverageCharTolerance(Float averageCharTolerance)
PDFTextStripper.setAverageCharTolerance(float)public Float getSpacingTolerance()
setSpacingTolerance(Float)public void setSpacingTolerance(Float spacingTolerance)
PDFTextStripper.setSpacingTolerance(float)Copyright © 2007-2014 The Apache Software Foundation. All Rights Reserved.