Class LanguagePrefixedTokenStream

java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.alfresco.solr.schema.highlight.LanguagePrefixedTokenStream
All Implemented Interfaces:
Closeable, AutoCloseable

public final class LanguagePrefixedTokenStream extends org.apache.lucene.analysis.Tokenizer
A TokenStream decorator which determines dynamically the field type and the analyzer used for executing the analysis of an input text. Although this class extends Tokenizer, actually it is not a tokenizer: this because in order to individuate the analyzer dynamically, a component must access to a IndexSchema instance, and usually this is not possible in the components involved in the analysis chain (e.g. tokenizer, token filters, char filters). The field type and the analyzer that will control the text analysis are computed in the following way:
  • pre-process the input reader given to this chain in order to detect the locale language code at the very beginning. The locale language prefix includes
    • a beginning sentinel token #0;
    • a language code (two or three chars)
    • a closing sentinel token #0;
  • if any language code has been found, it is used for determine a field type name composed by the prefix "highlighted_text_" and the detected language code (e.g. highlighted_text_ + en = highlighted_text_en).
  • If the field type above doesn't exist in the schema, the the same procedure is repeated using the prefix "text_" (e.g. text_ + en = text_en)
  • If the field type above doesn't exist in the schema, then the "text___" general text field type is used.
  • The input text is analyzed using the (query or index) analyzer associated to the field type determined above.
Author:
Andrea Gazzarini
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

    org.apache.lucene.util.AttributeSource.State
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected org.apache.lucene.analysis.Analyzer
     
    protected String
     
    protected org.apache.solr.schema.IndexSchema
     
     

    Fields inherited from class org.apache.lucene.analysis.Tokenizer

    input

    Fields inherited from class org.apache.lucene.analysis.TokenStream

    DEFAULT_TOKEN_ATTRIBUTE_FACTORY
  • Method Summary

    Modifier and Type
    Method
    Description
    void
     
    void
    end()
     
    boolean
     
    void
     

    Methods inherited from class org.apache.lucene.analysis.Tokenizer

    correctOffset, setReader

    Methods inherited from class org.apache.lucene.util.AttributeSource

    addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Field Details

    • fieldName

      protected String fieldName
    • indexSchema

      protected org.apache.solr.schema.IndexSchema indexSchema
    • mode

    • analyzer

      protected org.apache.lucene.analysis.Analyzer analyzer
  • Method Details

    • reset

      public void reset() throws IOException
      Overrides:
      reset in class org.apache.lucene.analysis.Tokenizer
      Throws:
      IOException
    • incrementToken

      public boolean incrementToken() throws IOException
      Specified by:
      incrementToken in class org.apache.lucene.analysis.TokenStream
      Throws:
      IOException
    • end

      public void end() throws IOException
      Overrides:
      end in class org.apache.lucene.analysis.TokenStream
      Throws:
      IOException
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class org.apache.lucene.analysis.Tokenizer
      Throws:
      IOException