string_similarity¶
-
corpustools.symbolsim.string_similarity.
string_similarity
(corpus_context, query, algorithm, **kwargs)[source]¶ This function computes similarity of pairs of words across a corpus.
Parameters: corpus_context : CorpusContext
Context manager for a corpus
query: string, tuple, or list of tuples
If this is a string, every word in the corpus will be compared to it, if this is a tuple with two strings, those words will be compared to each other, if this is a list of tuples, each tuple’s strings will be compared to each other.
algorithm: string
The algorithm of string similarity to be used, currently supports ‘khorsi’, ‘edit_distance’, and ‘phono_edit_distance’
max_rel: double
Filters out all words that are higher than max_rel from a relatedness measure
min_rel: double
Filters out all words that are lower than min_rel from a relatedness measure
stop_check : callable or None
Optional function to check whether to gracefully terminate early
call_back : callable or None
Optional function to supply progress information during the function
Returns: list of tuples:
The first two elements of the tuple are the words that were compared and the final element is their relatedness score