string_similarity¶
- corpustools.symbolsim.string_similarity.string_similarity(corpus_context, query, algorithm, **kwargs)[source]¶
This function computes similarity of pairs of words across a corpus.
- Parameters
- corpus_contextCorpusContext
Context manager for a corpus
- query: string, tuple, or list of tuples
If this is a string, every word in the corpus will be compared to it, if this is a tuple with two strings, those words will be compared to each other, if this is a list of tuples, each tuple’s strings will be compared to each other.
- algorithm: string
The algorithm of string similarity to be used, currently supports ‘khorsi’, ‘edit_distance’, and ‘phono_edit_distance’
- max_rel: double
Filters out all words that are higher than max_rel from a relatedness measure
- min_rel: double
Filters out all words that are lower than min_rel from a relatedness measure
- stop_checkcallable or None
Optional function to check whether to gracefully terminate early
- call_backcallable or None
Optional function to supply progress information during the function
- Returns
- list of tuples:
The first two elements of the tuple are the words that were compared and the final element is their relatedness score