WeightedVariantContext¶
-
class
corpustools.contextmanagers.
WeightedVariantContext
(corpus, sequence_type, type_or_token, attribute=None, frequency_threshold=0)[source]¶ Corpus context that weights frequency of pronunciation variants by the number of variants or the token frequency for transcriptions and tiers
See the documentation of BaseCorpusContext for additional information
Methods
__init__
(corpus, sequence_type, type_or_token)get_frequency_base
([gramsize, halve_edges, ...])Generate (and cache) frequencies for each segment in the Corpus. get_phone_probs
([gramsize, probability, ...])Generate (and cache) phonotactic probabilities for segments in the Corpus. -
get_frequency_base
(gramsize=1, halve_edges=False, probability=False)¶ Generate (and cache) frequencies for each segment in the Corpus.
Parameters: halve_edges : boolean
If True, word boundary symbols (‘#’) will only be counted once per word, rather than twice. Defaults to False.
gramsize : integer
Size of n-gram to use for getting frequency, defaults to 1 (unigram)
probability : boolean
If True, frequency counts will be normalized by total frequency, defaults to False
Returns: dict
Keys are segments (or sequences of segments) and values are their frequency in the Corpus
-
get_phone_probs
(gramsize=1, probability=True, preserve_position=True, log_count=True)¶ Generate (and cache) phonotactic probabilities for segments in the Corpus.
Parameters: gramsize : integer
Size of n-gram to use for getting frequency, defaults to 1 (unigram)
probability : boolean
If True, frequency counts will be normalized by total frequency, defaults to False
preserve_position : boolean
If True, segments will in different positions in the transcription will not be collapsed, defaults to True
log_count : boolean
If True, token frequencies will be logrithmically-transformed prior to being summed
Returns: dict
Keys are segments (or sequences of segments) and values are their phonotactic probability in the Corpus
-