MostFrequentVariantContext¶

class corpustools.contextmanagers.MostFrequentVariantContext(corpus, sequence_type, type_or_token, attribute=None, frequency_threshold=0, log_count=True)[source]¶

Corpus context that uses the most frequent pronunciation variants for transcriptions and tiers

See the documentation of BaseCorpusContext for additional information

Methods

`__init__`(corpus, sequence_type, type_or_token)	Initialize self.
`get_frequency_base`([gramsize, halve_edges, …])	Generate (and cache) frequencies for each segment in the Corpus.
`get_phone_probs`([gramsize, probability, …])	Generate (and cache) phonotactic probabilities for segments in the Corpus.

get_frequency_base(gramsize=1, halve_edges=False, probability=False)¶

Generate (and cache) frequencies for each segment in the Corpus.

Parameters:	halve_edges : boolean If True, word boundary symbols (‘#’) will only be counted once per word, rather than twice. Defaults to False. gramsize : integer Size of n-gram to use for getting frequency, defaults to 1 (unigram) probability : boolean If True, frequency counts will be normalized by total frequency, defaults to False
Returns:	dict Keys are segments (or sequences of segments) and values are their frequency in the Corpus

get_phone_probs(gramsize=1, probability=True, preserve_position=True)¶

Generate (and cache) phonotactic probabilities for segments in the Corpus.

Parameters:

gramsize : integer: Size of n-gram to use for getting frequency, defaults to 1 (unigram)
probability : boolean: If True, frequency counts will be normalized by total frequency, defaults to True
preserve_position : boolean: If True, segments in different positions in the transcription will not be collapsed, defaults to True
log_count : boolean: If True, token frequencies will be logrithmically-transformed prior to being summed

Returns:

dict: Keys are segments (or sequences of segments) and values are their phonotactic probability in the Corpus