
class corpustools.contextmanagers.WeightedVariantContext(corpus, sequence_type, type_or_token, attribute=None, frequency_threshold=0, log_count=True)[source]

Corpus context that weights frequency of pronunciation variants by the number of variants or the token frequency for transcriptions and tiers

See the documentation of BaseCorpusContext for additional information


__init__(corpus, sequence_type, type_or_token) Initialize self.
get_frequency_base([gramsize, halve_edges, …]) Generate (and cache) frequencies for each segment in the Corpus.
get_phone_probs([gramsize, probability, …]) Generate (and cache) phonotactic probabilities for segments in the Corpus.
get_frequency_base(gramsize=1, halve_edges=False, probability=False, need_wb=True)

Generate (and cache) frequencies for each segment in the Corpus.

halve_edges : boolean

If True, word boundary symbols (‘#’) will only be counted once per word, rather than twice. Defaults to False.

gramsize : integer

Size of n-gram to use for getting frequency, defaults to 1 (unigram)

probability : boolean

If True, frequency counts will be normalized by total frequency, defaults to False

need_wb : boolean

If True, word boundaries are added. Defaults to True. False if e.g., for env filter in mutual information


Keys are segments (or sequences of segments) and values are their frequency in the Corpus

get_phone_probs(gramsize=1, probability=True, preserve_position=True)

Generate (and cache) phonotactic probabilities for segments in the Corpus.

gramsize : integer

Size of n-gram to use for getting frequency, defaults to 1 (unigram)

probability : boolean

If True, frequency counts will be normalized by total frequency, defaults to True

preserve_position : boolean

If True, segments in different positions in the transcription will not be collapsed, defaults to True

log_count : boolean

If True, token frequencies will be logrithmically-transformed prior to being summed


Keys are segments (or sequences of segments) and values are their phonotactic probability in the Corpus