Package org.apache.lucene.util.automaton
Class DaciukMihovAutomatonBuilder
- java.lang.Object
-
- org.apache.lucene.util.automaton.DaciukMihovAutomatonBuilder
-
public final class DaciukMihovAutomatonBuilder extends java.lang.ObjectBuilds a minimal, deterministicAutomatonthat accepts a set of strings. The algorithm requires sorted input data, but is very fast (nearly linear with the input size).
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static classDaciukMihovAutomatonBuilder.StateDFSA state withcharlabels on transitions.
-
Field Summary
Fields Modifier and Type Field Description private static java.util.Comparator<CharsRef>comparatorA comparator used for enforcing sorted UTF8 order, used in assertions only.(package private) static intMAX_TERM_LENGTHThis builder rejects terms that are more than 1k chars long since it then uses recursion based on the length of the string, which might cause stack overflows.private CharsRefpreviousPrevious sequence added to the automaton inadd(CharsRef).private DaciukMihovAutomatonBuilder.StaterootRoot automaton state.private java.util.HashMap<DaciukMihovAutomatonBuilder.State,DaciukMihovAutomatonBuilder.State>stateRegistryA "registry" for state interning.
-
Constructor Summary
Constructors Modifier Constructor Description privateDaciukMihovAutomatonBuilder()The default constructor is private.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(CharsRef current)Add another character sequence to this automaton.private voidaddSuffix(DaciukMihovAutomatonBuilder.State state, java.lang.CharSequence current, int fromIndex)Add a suffix ofcurrentstarting atfromIndex(inclusive) to statestate.static Automatonbuild(java.util.Collection<BytesRef> input)Build a minimal, deterministic automaton from a sorted list ofBytesRefrepresenting strings in UTF-8.DaciukMihovAutomatonBuilder.Statecomplete()Finalize the automaton and return the root state.private static intconvert(Automaton.Builder a, DaciukMihovAutomatonBuilder.State s, java.util.IdentityHashMap<DaciukMihovAutomatonBuilder.State,java.lang.Integer> visited)Internal recursive traversal for conversion.private voidreplaceOrRegister(DaciukMihovAutomatonBuilder.State state)Replace last child ofstatewith an already registered state or stateRegistry the last child state.private booleansetPrevious(CharsRef current)Copycurrentinto an internal buffer.
-
-
-
Field Detail
-
MAX_TERM_LENGTH
static final int MAX_TERM_LENGTH
This builder rejects terms that are more than 1k chars long since it then uses recursion based on the length of the string, which might cause stack overflows.- See Also:
- Constant Field Values
-
stateRegistry
private java.util.HashMap<DaciukMihovAutomatonBuilder.State,DaciukMihovAutomatonBuilder.State> stateRegistry
A "registry" for state interning.
-
root
private DaciukMihovAutomatonBuilder.State root
Root automaton state.
-
previous
private CharsRef previous
Previous sequence added to the automaton inadd(CharsRef).
-
comparator
private static final java.util.Comparator<CharsRef> comparator
A comparator used for enforcing sorted UTF8 order, used in assertions only.
-
-
Method Detail
-
add
public void add(CharsRef current)
Add another character sequence to this automaton. The sequence must be lexicographically larger or equal compared to any previous sequences added to this automaton (the input must be sorted).
-
complete
public DaciukMihovAutomatonBuilder.State complete()
Finalize the automaton and return the root state. No more strings can be added to the builder after this call.- Returns:
- Root automaton state.
-
convert
private static int convert(Automaton.Builder a, DaciukMihovAutomatonBuilder.State s, java.util.IdentityHashMap<DaciukMihovAutomatonBuilder.State,java.lang.Integer> visited)
Internal recursive traversal for conversion.
-
build
public static Automaton build(java.util.Collection<BytesRef> input)
Build a minimal, deterministic automaton from a sorted list ofBytesRefrepresenting strings in UTF-8. These strings must be binary-sorted.
-
setPrevious
private boolean setPrevious(CharsRef current)
Copycurrentinto an internal buffer.
-
replaceOrRegister
private void replaceOrRegister(DaciukMihovAutomatonBuilder.State state)
Replace last child ofstatewith an already registered state or stateRegistry the last child state.
-
addSuffix
private void addSuffix(DaciukMihovAutomatonBuilder.State state, java.lang.CharSequence current, int fromIndex)
Add a suffix ofcurrentstarting atfromIndex(inclusive) to statestate.
-
-