API Reference

Lexicon classes

lexicon.Attribute(name, att_type[, ...]) Attributes are for collecting summary information about attributes of
lexicon.Corpus(name) Lexicon to store information about Words, such as transcriptions,
lexicon.Inventory([data]) Inventories contain information about a Corpus’ segmental inventory.
lexicon.FeatureMatrix(name, feature_entries) An object that stores feature values for segments
lexicon.Segment(symbol) Class for segment symbols
lexicon.Transcription(seg_list) Transcription object, sequence of symbols
lexicon.Word(**kwargs) An object representing a word in a corpus
lexicon.EnvironmentFilter(middle_segments[, ...]) Filter to use for searching words to generate Environments that match
lexicon.Environment(middle, position[, lhs, rhs]) Specific sequence of segments that was a match for an EnvironmentFilter

Speech corpus classes

spontaneous.Discourse(**kwargs) Discourse objects are collections of linear text with word tokens
spontaneous.Speaker(name, **kwargs) Speaker objects contain information about the producers of WordTokens
spontaneous.SpontaneousSpeechCorpus(name, ...) SpontaneousSpeechCorpus objects a collection of Discourse objects and Corpus objects for frequency information.
spontaneous.WordToken(**kwargs) WordToken objects are individual productions of Words

Corpus context managers

contextmanagers.BaseCorpusContext(corpus, ...) Abstract Corpus context class that all other contexts inherit from.
contextmanagers.CanonicalVariantContext(...) Corpus context that uses canonical forms for transcriptions and tiers
contextmanagers.MostFrequentVariantContext(...) Corpus context that uses the most frequent pronunciation variants
contextmanagers.SeparatedTokensVariantContext(...) Corpus context that treats pronunciation variants as separate types
contextmanagers.WeightedVariantContext(...) Corpus context that weights frequency of pronunciation variants by the

Corpus IO functions

Corpus binaries

binary.download_binary(name, path[, call_back]) Download a binary file of example corpora and feature matrices.
binary.load_binary(path) Unpickle a binary file
binary.save_binary(obj, path) Pickle a Corpus or FeatureMatrix object for later loading

Loading from CSV

csv.load_corpus_csv(corpus_name, path, delimiter) Load a corpus from a column-delimited text file
csv.load_feature_matrix_csv(name, path, ...) Load a FeatureMatrix from a column-delimited text file

Export to CSV

csv.export_corpus_csv(corpus, path[, ...]) Save a corpus as a column-delimited text file
csv.export_feature_matrix_csv(...[, delimiter]) Save a FeatureMatrix as a column-delimited text file

TextGrids

textgrid.inspect_discourse_textgrid
textgrid.load_discourse_textgrid
textgrid.load_directory_textgrid

Running text

text_spelling.inspect_discourse_spelling(path) Generate a list of AnnotationTypes for a specified text file for parsing
text_spelling.load_discourse_spelling(...[, ...]) Load a discourse from a text file containing running text of
text_spelling.load_directory_spelling(...[, ...]) Loads a directory of orthographic texts
text_spelling.export_discourse_spelling(...) Export an orthography discourse to a text file
text_transcription.inspect_discourse_transcription(path) Generate a list of AnnotationTypes for a specified text file for parsing
text_transcription.load_discourse_transcription(...) Load a discourse from a text file containing running transcribed text
text_transcription.load_directory_transcription(...) Loads a directory of transcribed texts.
text_transcription.export_discourse_transcription(...) Export an transcribed discourse to a text file

Interlinear gloss text

text_ilg.inspect_discourse_ilg(path[, number]) Generate a list of AnnotationTypes for a specified text file for parsing
text_ilg.load_discourse_ilg(corpus_name, ...) Load a discourse from a text file containing interlinear glosses
text_ilg.load_directory_ilg(corpus_name, ...) Loads a directory of interlinear gloss text files
text_ilg.export_discourse_ilg(discourse, path) Export a discourse to an interlinear gloss text file, with a maximal

Other standards

multiple_files.inspect_discourse_multiple_files(...) Generate a list of AnnotationTypes for a specified dialect
multiple_files.load_discourse_multiple_files(...) Load a discourse from a text file containing interlinear glosses
multiple_files.load_directory_multiple_files(...) Loads a directory of corpus standard files (separated into words files

Analysis functions

Frequency of alternation

freq_of_alt.calc_freq_of_alt(corpus_context, ...) Returns a double that is a measure of the frequency of

Functional load

functional_load.minpair_fl(corpus_context, ...) Calculate the functional load of the contrast between two segments as a count of minimal pairs.
functional_load.deltah_fl(corpus_context, ...) Calculate the functional load of the contrast between between two segments as the decrease in corpus entropy caused by a merger.
functional_load.relative_minpair_fl(...[, ...]) Calculate the average functional load of the contrasts between a segment and all other segments, as a count of minimal pairs.
functional_load.relative_deltah_fl(...[, ...]) Calculate the average functional load of the contrasts between a segment and all other segments, as the decrease in corpus entropy caused by a merger.

Kullback-Leibler divergence

kl.KullbackLeibler(corpus_context, seg1, ...) Calculates KL distances between two Phoneme objects in some context, either the left or right-hand side.

Mutual information

mutual_information.pointwise_mi(...[, ...]) Calculate the mutual information for a bigram.

Neighborhood density

neighborhood_density.neighborhood_density(...) Calculate the neighborhood density of a particular word in the corpus.
neighborhood_density.find_mutation_minpairs(...) Find all minimal pairs of the query word based only on segment

Phonotactic probability

phonotactic_probability.phonotactic_probability_vitevitch(...) Calculate the phonotactic_probability of a particular word using

Predictability of distribution

pred_of_dist.calc_prod_all_envs(...[, ...]) Main function for calculating predictability of distribution for two segments over a corpus, regardless of environment.
pred_of_dist.calc_prod(corpus_context, envs) Main function for calculating predictability of distribution for two segments over specified environments in a corpus.

Symbol similarity

string_similarity.string_similarity(...) This function computes similarity of pairs of words across a corpus.
edit_distance.edit_distance(word1, word2, ...) Returns the Levenshtein edit distance between a string from two words word1 and word2, code drawn from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python.
khorsi.khorsi(word1, word2, freq_base, ...) Calculate the string similarity of two words given a set of
phono_edit_distance.phono_edit_distance(...) Returns an analogue to Levenshtein edit distance but uses