API Reference

Lexicon classes

lexicon.Attribute(name, att_type[, …]) Attributes are for collecting summary information about attributes of Words or WordTokens, with different types of attributes allowing for different behaviour
lexicon.Corpus(name[, update]) Lexicon to store information about Words, such as transcriptions, spellings and frequencies
lexicon.Inventory([update]) Inventories contain information about a Corpus’ segmental inventory. This class exists mainly for the purposes
lexicon.FeatureMatrix(name, feature_entries) An object that stores feature values for segments
lexicon.Segment(symbol[, features]) Class for segment symbols
lexicon.Transcription(seg_list) Transcription object, sequence of symbols
lexicon.Word(**kwargs) An object representing a word in a corpus
lexicon.EnvironmentFilter(middle_segments[, …]) Filter to use for searching words to generate Environments that match
lexicon.Environment(middle, position[, lhs, rhs]) Specific sequence of segments that was a match for an EnvironmentFilter

Speech corpus classes

spontaneous.Discourse(kwargs) Discourse objects are collections of linear text with word tokens
spontaneous.Speaker(name, **kwargs) Speaker objects contain information about the producers of WordTokens or Discourses
spontaneous.SpontaneousSpeechCorpus(name, …) SpontaneousSpeechCorpus objects a collection of Discourse objects and Corpus objects for frequency information.
spontaneous.WordToken([update]) WordToken objects are individual productions of Words

Corpus context managers

contextmanagers.BaseCorpusContext(corpus, …) Abstract Corpus context class that all other contexts inherit from.
contextmanagers.CanonicalVariantContext(…) Corpus context that uses canonical forms for transcriptions and tiers
contextmanagers.MostFrequentVariantContext(…) Corpus context that uses the most frequent pronunciation variants for transcriptions and tiers
contextmanagers.SeparatedTokensVariantContext(…) Corpus context that treats pronunciation variants as separate types for transcriptions and tiers
contextmanagers.WeightedVariantContext(…) Corpus context that weights frequency of pronunciation variants by the number of variants or the token frequency for transcriptions and tiers

Corpus IO functions

Corpus binaries

binary.download_binary(name, path[, call_back]) Download a binary file of example corpora and feature matrices.
binary.load_binary(path) Unpickle a binary file
binary.save_binary(obj, path) Pickle a Corpus or FeatureMatrix object for later loading

Loading from CSV

csv.load_corpus_csv(corpus_name, path, delimiter) Load a corpus from a column-delimited text file
csv.load_feature_matrix_csv(name, path, …) Load a FeatureMatrix from a column-delimited text file

Export to CSV

csv.export_corpus_csv(corpus, path[, …]) Save a corpus as a column-delimited text file
csv.export_feature_matrix_csv(…[, delimiter]) Save a FeatureMatrix as a column-delimited text file



Running text

text_spelling.inspect_discourse_spelling(path) Generate a list of AnnotationTypes for a specified text file for parsing it as an orthographic text
text_spelling.load_discourse_spelling(…[, …]) Load a discourse from a text file containing running text of orthography
text_spelling.load_directory_spelling(…[, …]) Loads a directory of orthographic texts
text_spelling.export_discourse_spelling(…) Export an orthography discourse to a text file
text_transcription.inspect_discourse_transcription(path) Generate a list of AnnotationTypes for a specified text file for parsing it as a transcribed text
text_transcription.load_discourse_transcription(…) Load a discourse from a text file containing running transcribed text
text_transcription.load_directory_transcription(…) Loads a directory of transcribed texts.
text_transcription.export_discourse_transcription(…) Export an transcribed discourse to a text file

Interlinear gloss text

text_ilg.inspect_discourse_ilg(path[, number]) Generate a list of AnnotationTypes for a specified text file for parsing it as an interlinear gloss text file
text_ilg.load_discourse_ilg(corpus_name, …) Load a discourse from a text file containing interlinear glosses
text_ilg.load_directory_ilg(corpus_name, …) Loads a directory of interlinear gloss text files
text_ilg.export_discourse_ilg(discourse, path) Export a discourse to an interlinear gloss text file, with a maximal line size of 10 words

Other standards

multiple_files.inspect_discourse_multiple_files(…) Generate a list of AnnotationTypes for a specified dialect
multiple_files.load_discourse_multiple_files(…) Load a discourse from a text file containing interlinear glosses
multiple_files.load_directory_multiple_files(…) Loads a directory of corpus standard files (separated into words files and phones files)

Analysis functions

Frequency of alternation

Frequency of alternation is currently not supported in PCT.

freq_of_alt.calc_freq_of_alt(corpus_context, …) Returns a double that is a measure of the frequency of alternation of two sounds in a given corpus

Functional load


Kullback-Leibler divergence

kl.KullbackLeibler(corpus_context, seg1, …) Calculates KL distances between two Phoneme objects in some context, either the left or right-hand side.

Mutual information

mutual_information.pointwise_mi(…[, …]) Calculate the mutual information for a bigram.

Transitional probability

Neighborhood density

neighborhood_density.neighborhood_density(…) Calculate the neighborhood density of a particular word in the corpus.
neighborhood_density.find_mutation_minpairs(…) Find all minimal pairs of the query word based only on segment mutations (not deletions/insertions)

Phonotactic probability

phonotactic_probability.phonotactic_probability_vitevitch(…) Calculate the phonotactic_probability of a particular word using the Vitevitch & Luce algorithm

Predictability of distribution

pred_of_dist.calc_prod_all_envs(…[, …]) Main function for calculating predictability of distribution for two segments over a corpus, regardless of environment.
pred_of_dist.calc_prod(corpus_context, envs) Main function for calculating predictability of distribution for two segments over specified environments in a corpus.

Symbol similarity

string_similarity.string_similarity(…) This function computes similarity of pairs of words across a corpus.
edit_distance.edit_distance(word1, word2, …) Returns the Levenshtein edit distance between a string from two words word1 and word2, code drawn from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python.
khorsi.khorsi(word1, word2, freq_base, …) Calculate the string similarity of two words given a set of characters and their frequencies in a corpus based on Khorsi (2012)
phono_edit_distance.phono_edit_distance(…) Returns an analogue to Levenshtein edit distance but uses phonological _features instead of characters