
class corpustools.contextmanagers.SeparatedTokensVariantContext(corpus, sequence_type, type_or_token, attribute=None, frequency_threshold=0, log_count=True)[source]

Corpus context that treats pronunciation variants as separate types for transcriptions and tiers

See the documentation of BaseCorpusContext for additional information


__init__(corpus, sequence_type, type_or_token) Initialize self.
get_frequency_base([gramsize, halve_edges, …]) Generate (and cache) frequencies for each segment in the Corpus.
get_phone_probs([gramsize, probability, …]) Generate (and cache) phonotactic probabilities for segments in the Corpus.
get_frequency_base(gramsize=1, halve_edges=False, probability=False)

Generate (and cache) frequencies for each segment in the Corpus.

halve_edges : boolean

If True, word boundary symbols (‘#’) will only be counted once per word, rather than twice. Defaults to False.

gramsize : integer

Size of n-gram to use for getting frequency, defaults to 1 (unigram)

probability : boolean

If True, frequency counts will be normalized by total frequency, defaults to False


Keys are segments (or sequences of segments) and values are their frequency in the Corpus

get_phone_probs(gramsize=1, probability=True, preserve_position=True)

Generate (and cache) phonotactic probabilities for segments in the Corpus.

gramsize : integer

Size of n-gram to use for getting frequency, defaults to 1 (unigram)

probability : boolean

If True, frequency counts will be normalized by total frequency, defaults to True

preserve_position : boolean

If True, segments in different positions in the transcription will not be collapsed, defaults to True

log_count : boolean

If True, token frequencies will be logrithmically-transformed prior to being summed


Keys are segments (or sequences of segments) and values are their phonotactic probability in the Corpus