Package org.apache.lucene.analysis.query
Class QueryAutoStopWordAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.AnalyzerWrapper
org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer
- All Implemented Interfaces:
Closeable,AutoCloseable
An
Analyzer used primarily at query time to wrap another analyzer and provide a layer of
protection which prevents very common words from being passed into queries.
For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.
- Since:
- 3.1
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents -
Field Summary
FieldsFields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY -
Constructor Summary
ConstructorsConstructorDescriptionQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader) Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater thandefaultMaxDocFreqPercentQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs) Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocsQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq) Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreqQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs) Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocsQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq) Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq -
Method Summary
Modifier and TypeMethodDescriptionTerm[]Provides information on which stop words have been identified for all fieldsString[]getStopWords(String fieldName) Provides information on which stop words have been identified for a fieldprotected AnalyzergetWrappedAnalyzer(String fieldName) protected Analyzer.TokenStreamComponentswrapComponents(String fieldName, Analyzer.TokenStreamComponents components) Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
attributeFactory, createComponents, getOffsetGap, getPositionIncrementGap, initReader, initReaderForNormalization, normalize, wrapReader, wrapReaderForNormalization, wrapTokenStreamForNormalizationMethods inherited from class org.apache.lucene.analysis.Analyzer
close, getReuseStrategy, normalize, tokenStream, tokenStream
-
Field Details
-
defaultMaxDocFreqPercent
public static final float defaultMaxDocFreqPercent- See Also:
-
-
Constructor Details
-
QueryAutoStopWordAnalyzer
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater thandefaultMaxDocFreqPercent- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords from- Throws:
IOException- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq) throws IOException Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords frommaxDocFreq- Document frequency terms should be above in order to be stopwords- Throws:
IOException- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs) throws IOException Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords frommaxPercentDocs- The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word- Throws:
IOException- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs) throws IOException Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords fromfields- Selection of fields to calculate stopwords formaxPercentDocs- The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word- Throws:
IOException- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq) throws IOException Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords fromfields- Selection of fields to calculate stopwords formaxDocFreq- Document frequency terms should be above in order to be stopwords- Throws:
IOException- Can be thrown while reading from the IndexReader
-
-
Method Details
-
getWrappedAnalyzer
- Specified by:
getWrappedAnalyzerin classAnalyzerWrapper
-
wrapComponents
protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components) - Overrides:
wrapComponentsin classAnalyzerWrapper
-
getStopWords
Provides information on which stop words have been identified for a field- Parameters:
fieldName- The field for which stop words identified in "addStopWords" method calls will be returned- Returns:
- the stop words identified for a field
-
getStopWords
Provides information on which stop words have been identified for all fields- Returns:
- the stop words (as terms)
-