khorsi

corpustools.symbolsim.khorsi.khorsi(word1, word2, freq_base, sequence_type, max_distance=None)[source]

Calculate the string similarity of two words given a set of characters and their frequencies in a corpus based on Khorsi (2012)

Parameters:

word1: Word

First Word object to compare

word2: Word

Second Word object to compare

freq_base: dictionary

a dictionary where each segment is mapped to its frequency of occurrence in a corpus

sequence_type: string

The type of segments to be used (‘spelling’ = Roman letters, ‘transcription’ = IPA symbols)

Returns:

float

A number representing the relatedness of two words based on Khorsi (2012)