public final class StringMetrics extends Object
Consists of well known metrics and methods to create string metrics from
list- or set metrics. All metrics are setup with sensible defaults, to
customize metrics use StringMetricBuilder.
The available metrics are:
All methods return immutable objects provided the arguments are also immutable.
| Modifier and Type | Method and Description |
|---|---|
static StringMetric |
blockDistance()
Returns a string metric that uses a
Tokenizers.whitespace() and
the BlockDistance metric. |
static float[] |
compare(StringMetric metric,
String c,
List<String> strings)
Deprecated.
trivial with no clear use case
|
static float[] |
compare(StringMetric metric,
String c,
String... strings)
Deprecated.
trivial with no clear use case
|
static float[] |
compareArrays(StringMetric metric,
String[] a,
String[] b)
Deprecated.
trivial with no clear use case
|
static StringMetric |
cosineSimilarity()
Returns a string metric that uses a
Tokenizers.whitespace() and
the CosineSimilarity metric. |
static StringMetric |
create(Metric<String> metric)
Either constructs a new string metric or returns the original metric.
|
static StringMetric |
create(Metric<String> metric,
Simplifier simplifier)
Constructs a new composite string metric.
|
static StringMetric |
createForListMetric(Metric<List<String>> metric,
Simplifier simplifier,
Tokenizer tokenizer)
Creates a new composite string metric.The tokenizer is used to tokenize
the simplified strings.
|
static StringMetric |
createForListMetric(Metric<List<String>> metric,
Tokenizer tokenizer)
Creates a new composite string metric.
|
static StringMetric |
createForSetMetric(Metric<Set<String>> metric,
Simplifier simplifier,
Tokenizer tokenizer)
Creates a new composite string metric.The tokenizer is used to tokenize
the simplified strings.
|
static StringMetric |
createForSetMetric(Metric<Set<String>> metric,
Tokenizer tokenizer)
Creates a new composite string metric.
|
static StringMetric |
damerauLevenshtein()
Returns a string metric that uses a
DamerauLevenshtein metric. |
static StringMetric |
diceSimilarity()
Returns a string metric that uses a
Tokenizers.whitespace() and
the DiceSimilarity metric. |
static StringMetric |
euclideanDistance()
Returns a string metric that uses a
Tokenizers.whitespace() and
the EuclideanDistance metric. |
static StringMetric |
identity()
Returns an string metric that uses the
Identity metric. |
static StringMetric |
jaccardSimilarity()
Returns a string metric that uses a
Tokenizers.whitespace() and
the JaccardSimilarity metric. |
static StringMetric |
jaro()
Returns a string metric that uses the
Jaro metric. |
static StringMetric |
jaroWinkler()
Returns a string metric that uses the
JaroWinkler metric. |
static StringMetric |
levenshtein()
Returns a string metric that uses the
Levenshtein metric. |
static StringMetric |
matchingCoefficient()
Returns a string metric that uses a
Tokenizers.whitespace() and
the MatchingCoefficient metric. |
static StringMetric |
mongeElkan()
Returns a string metric that uses a
Tokenizers.whitespace() and
the MongeElkan metric with an internal SmithWatermanGotoh
metric. |
static StringMetric |
needlemanWunch()
Returns a string metric that uses the
NeedlemanWunch metric. |
static StringMetric |
overlapCoefficient()
Returns a string metric that uses a
Tokenizers.whitespace() and
the OverlapCoefficient metric. |
static StringMetric |
qGramsDistance()
Returns a string metric that uses a
Tokenizers.qGramWithPadding(int) for q=3 and the
BlockDistance metric. |
static StringMetric |
simonWhite()
Returns a string metric that uses a
Tokenizers.whitespace()
followed by a Tokenizers.qGramWithPadding(int) for q=2
and the SimonWhite metric. |
static StringMetric |
smithWaterman()
Returns a string metric that uses the
SmithWaterman metric. |
static StringMetric |
smithWatermanGotoh()
Returns a string metric that uses the
SmithWatermanGotoh metric. |
static StringMetric |
soundex()
Returns a string metric that uses a
Soundex and
JaroWinkler metric. |
@Deprecated public static float[] compare(StringMetric metric, String c, List<String> strings)
metric - to compare c with each each value in the listc - string to compare the list againststrings - to compare c against@Deprecated public static float[] compare(StringMetric metric, String c, String... strings)
metric - to compare c with each each value in the listc - string to compare the list againststrings - to compare c against@Deprecated public static float[] compareArrays(StringMetric metric, String[] a, String[] b)
metric - to compare each element in a and ba - array of string to compareb - array of string to compareIllegalArgumentException - when a and b are of a different lengthpublic static StringMetric cosineSimilarity()
Tokenizers.whitespace() and
the CosineSimilarity metric.public static StringMetric create(Metric<String> metric)
metric - a metric for stringspublic static StringMetric create(Metric<String> metric, Simplifier simplifier)
metric - a list metricsimplifier - a simplifierNullPointerException - when either metric or simplifier are nullStringMetricBuilderpublic static StringMetric createForListMetric(Metric<List<String>> metric, Simplifier simplifier, Tokenizer tokenizer)
metric - a list metricsimplifier - a simplifiertokenizer - a tokenizerNullPointerException - when either metric, simplifier or tokenizer are nullStringMetricBuilderpublic static StringMetric createForListMetric(Metric<List<String>> metric, Tokenizer tokenizer)
metric - a list metrictokenizer - a tokenizerNullPointerException - when either metric or tokenizer are nullStringMetricBuilderpublic static StringMetric createForSetMetric(Metric<Set<String>> metric, Simplifier simplifier, Tokenizer tokenizer)
metric - a list metricsimplifier - a simplifiertokenizer - a tokenizerNullPointerException - when either metric, simplifier or tokenizer are nullStringMetricBuilderpublic static StringMetric createForSetMetric(Metric<Set<String>> metric, Tokenizer tokenizer)
metric - a set metrictokenizer - a tokenizerNullPointerException - when either metric or tokenizer are nullStringMetricBuilderpublic static StringMetric blockDistance()
Tokenizers.whitespace() and
the BlockDistance metric.public static StringMetric damerauLevenshtein()
DamerauLevenshtein metric.public static StringMetric diceSimilarity()
Tokenizers.whitespace() and
the DiceSimilarity metric.public static StringMetric euclideanDistance()
Tokenizers.whitespace() and
the EuclideanDistance metric.public static StringMetric identity()
Identity metric.public static StringMetric jaccardSimilarity()
Tokenizers.whitespace() and
the JaccardSimilarity metric.public static StringMetric jaro()
Jaro metric.public static StringMetric jaroWinkler()
JaroWinkler metric.public static StringMetric levenshtein()
Levenshtein metric.public static StringMetric matchingCoefficient()
Tokenizers.whitespace() and
the MatchingCoefficient metric.public static StringMetric mongeElkan()
Tokenizers.whitespace() and
the MongeElkan metric with an internal SmithWatermanGotoh
metric.public static StringMetric needlemanWunch()
NeedlemanWunch metric.public static StringMetric overlapCoefficient()
Tokenizers.whitespace() and
the OverlapCoefficient metric.public static StringMetric qGramsDistance()
Tokenizers.qGramWithPadding(int) for q=3 and the
BlockDistance metric.public static StringMetric simonWhite()
Tokenizers.whitespace()
followed by a Tokenizers.qGramWithPadding(int) for q=2
and the SimonWhite metric.public static StringMetric smithWaterman()
SmithWaterman metric.public static StringMetric smithWatermanGotoh()
SmithWatermanGotoh metric.public static StringMetric soundex()
Soundex and
JaroWinkler metric.Copyright © 2014–2018. All rights reserved.