# WeightedVariantContext¶

class corpustools.contextmanagers.WeightedVariantContext(corpus, sequence_type, type_or_token, attribute=None, frequency_threshold=0)[source]

Corpus context that weights frequency of pronunciation variants by the number of variants or the token frequency for transcriptions and tiers

See the documentation of BaseCorpusContext for additional information

Methods

 __init__(corpus, sequence_type, type_or_token) get_frequency_base([gramsize, halve_edges, ...]) Generate (and cache) frequencies for each segment in the Corpus. get_phone_probs([gramsize, probability, ...]) Generate (and cache) phonotactic probabilities for segments in the Corpus.
get_frequency_base(gramsize=1, halve_edges=False, probability=False)

Generate (and cache) frequencies for each segment in the Corpus.

Parameters: halve_edges : boolean If True, word boundary symbols (‘#’) will only be counted once per word, rather than twice. Defaults to False. gramsize : integer Size of n-gram to use for getting frequency, defaults to 1 (unigram) probability : boolean If True, frequency counts will be normalized by total frequency, defaults to False dict Keys are segments (or sequences of segments) and values are their frequency in the Corpus
get_phone_probs(gramsize=1, probability=True, preserve_position=True, log_count=True)

Generate (and cache) phonotactic probabilities for segments in the Corpus.

Parameters: gramsize : integer Size of n-gram to use for getting frequency, defaults to 1 (unigram) probability : boolean If True, frequency counts will be normalized by total frequency, defaults to False preserve_position : boolean If True, segments will in different positions in the transcription will not be collapsed, defaults to True log_count : boolean If True, token frequencies will be logrithmically-transformed prior to being summed dict Keys are segments (or sequences of segments) and values are their phonotactic probability in the Corpus