string_similarity¶

corpustools.symbolsim.string_similarity.string_similarity(corpus_context, query, algorithm, **kwargs)[source]¶

This function computes similarity of pairs of words across a corpus.

Parameters:

corpus_context : CorpusContext

Context manager for a corpus

query: string, tuple, or list of tuples

If this is a string, every word in the corpus will be compared to it, if this is a tuple with two strings, those words will be compared to each other, if this is a list of tuples, each tuple’s strings will be compared to each other.

algorithm: string

The algorithm of string similarity to be used, currently supports ‘khorsi’, ‘edit_distance’, and ‘phono_edit_distance’

max_rel: double

Filters out all words that are higher than max_rel from a relatedness measure

min_rel: double

Filters out all words that are lower than min_rel from a relatedness measure

stop_check : callable or None

Optional function to check whether to gracefully terminate early

call_back : callable or None

Optional function to supply progress information during the function

Returns:

list of tuples:

The first two elements of the tuple are the words that were compared and the final element is their relatedness score