MostFrequentVariantContext¶
-
class
corpustools.contextmanagers.
MostFrequentVariantContext
(corpus, sequence_type, type_or_token, attribute=None, frequency_threshold=0, log_count=True)[source]¶ Corpus context that uses the most frequent pronunciation variants for transcriptions and tiers
See the documentation of BaseCorpusContext for additional information
Methods
__init__
(corpus, sequence_type, type_or_token)Initialize self. get_frequency_base
([gramsize, halve_edges, …])Generate (and cache) frequencies for each segment in the Corpus. get_phone_probs
([gramsize, probability, …])Generate (and cache) phonotactic probabilities for segments in the Corpus. -
get_frequency_base
(gramsize=1, halve_edges=False, probability=False)¶ Generate (and cache) frequencies for each segment in the Corpus.
Parameters: - halve_edges : boolean
If True, word boundary symbols (‘#’) will only be counted once per word, rather than twice. Defaults to False.
- gramsize : integer
Size of n-gram to use for getting frequency, defaults to 1 (unigram)
- probability : boolean
If True, frequency counts will be normalized by total frequency, defaults to False
Returns: - dict
Keys are segments (or sequences of segments) and values are their frequency in the Corpus
-
get_phone_probs
(gramsize=1, probability=True, preserve_position=True)¶ Generate (and cache) phonotactic probabilities for segments in the Corpus.
Parameters: - gramsize : integer
Size of n-gram to use for getting frequency, defaults to 1 (unigram)
- probability : boolean
If True, frequency counts will be normalized by total frequency, defaults to True
- preserve_position : boolean
If True, segments in different positions in the transcription will not be collapsed, defaults to True
- log_count : boolean
If True, token frequencies will be logrithmically-transformed prior to being summed
Returns: - dict
Keys are segments (or sequences of segments) and values are their phonotactic probability in the Corpus
-