Package org.apache.lucene.search.spell
Class WordBreakSpellChecker
java.lang.Object
org.apache.lucene.search.spell.WordBreakSpellChecker
A spell checker whose sole function is to offer suggestions by combining multiple terms into one
word and/or breaking terms into multiple words.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumDetermines the order to list word break suggestions -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final TermTerm that can be used to prohibit adjacent terms from being combined -
Constructor Summary
ConstructorsConstructorDescriptionCreates a new spellchecker with default configuration values -
Method Summary
Modifier and TypeMethodDescriptionintReturns the maximum number of changes to perform on the inputintReturns the maximum length of a combined suggestionintReturns the maximum number of word combinations to evaluate.intReturns the minimum size of a broken wordintReturns the minimum frequency a term must have to be part of a suggestion.voidsetMaxChanges(int maxChanges) The maximum numbers of changes (word breaks or combinations) to make on the original term(s).voidsetMaxCombineWordLength(int maxCombineWordLength) The maximum length of a suggestion made by combining 1 or more original terms.voidsetMaxEvaluations(int maxEvaluations) The maximum number of word combinations to evaluate.voidsetMinBreakWordLength(int minBreakWordLength) The minimum length to break words down to.voidsetMinSuggestionFrequency(int minSuggestionFrequency) The minimum frequency a term must have to be included as part of a suggestion.SuggestWord[][]suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod) Generate suggestions by breaking the passed-in term into multiple words.suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode) Generate suggestions by combining one or more of the passed-in terms into single words.
-
Field Details
-
SEPARATOR_TERM
Term that can be used to prohibit adjacent terms from being combined
-
-
Constructor Details
-
WordBreakSpellChecker
public WordBreakSpellChecker()Creates a new spellchecker with default configuration values- See Also:
-
-
Method Details
-
suggestWordBreaks
public SuggestWord[][] suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod) throws IOException Generate suggestions by breaking the passed-in term into multiple words. The scores returned are equal to the number of word breaks needed so a lower score is generally preferred over a higher score.- Parameters:
suggestMode- - default =SuggestMode.SUGGEST_WHEN_NOT_IN_INDEXsortMethod- - default =WordBreakSpellChecker.BreakSuggestionSortMethod.NUM_CHANGES_THEN_MAX_FREQUENCY- Returns:
- one or more arrays of words formed by breaking up the original term
- Throws:
IOException- If there is a low-level I/O error.
-
suggestWordCombinations
public CombineSuggestion[] suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode) throws IOException Generate suggestions by combining one or more of the passed-in terms into single words. The returnedCombineSuggestioncontains both aSuggestWordand also an array detailing which passed-in terms were involved in creating this combination. The scores returned are equal to the number of word combinations needed, also one less than the length of the arrayCombineSuggestion.originalTermIndexes(). Generally, a suggestion with a lower score is preferred over a higher score.To prevent two adjacent terms from being combined (for instance, if one is mandatory and the other is prohibited), separate the two terms with
SEPARATOR_TERMWhen suggestMode equals
SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX, each suggestion will include at least one term not in the index.When suggestMode equals
SuggestMode.SUGGEST_MORE_POPULAR, each suggestion will have the same, or better frequency than the most-popular included term.- Returns:
- an array of words generated by combining original terms
- Throws:
IOException- If there is a low-level I/O error.
-
getMinSuggestionFrequency
public int getMinSuggestionFrequency()Returns the minimum frequency a term must have to be part of a suggestion.- See Also:
-
getMaxCombineWordLength
public int getMaxCombineWordLength()Returns the maximum length of a combined suggestion- See Also:
-
getMinBreakWordLength
public int getMinBreakWordLength()Returns the minimum size of a broken word- See Also:
-
getMaxChanges
public int getMaxChanges()Returns the maximum number of changes to perform on the input- See Also:
-
getMaxEvaluations
public int getMaxEvaluations()Returns the maximum number of word combinations to evaluate.- See Also:
-
setMinSuggestionFrequency
public void setMinSuggestionFrequency(int minSuggestionFrequency) The minimum frequency a term must have to be included as part of a suggestion. Default=1 Not applicable when used withSuggestMode.SUGGEST_MORE_POPULAR- See Also:
-
setMaxCombineWordLength
public void setMaxCombineWordLength(int maxCombineWordLength) The maximum length of a suggestion made by combining 1 or more original terms. Default=20- See Also:
-
setMinBreakWordLength
public void setMinBreakWordLength(int minBreakWordLength) The minimum length to break words down to. Default=1- See Also:
-
setMaxChanges
public void setMaxChanges(int maxChanges) The maximum numbers of changes (word breaks or combinations) to make on the original term(s). Default=1- See Also:
-
setMaxEvaluations
public void setMaxEvaluations(int maxEvaluations) The maximum number of word combinations to evaluate. Default=1000. A higher value might improve result quality. A lower value might improve performance.- See Also:
-