API Reference¶

Lexicon classes¶

`lexicon.Attribute`(name, att_type[, ...])	Attributes are for collecting summary information about attributes of
`lexicon.Corpus`(name)	Lexicon to store information about Words, such as transcriptions,
`lexicon.Inventory`([data])	Inventories contain information about a Corpus’ segmental inventory.
`lexicon.FeatureMatrix`(name, feature_entries)	An object that stores feature values for segments
`lexicon.Segment`(symbol)	Class for segment symbols
`lexicon.Transcription`(seg_list)	Transcription object, sequence of symbols
`lexicon.Word`(**kwargs)	An object representing a word in a corpus
`lexicon.EnvironmentFilter`(middle_segments[, ...])	Filter to use for searching words to generate Environments that match
`lexicon.Environment`(middle, position[, lhs, rhs])	Specific sequence of segments that was a match for an EnvironmentFilter

Speech corpus classes¶

`spontaneous.Discourse`(**kwargs)	Discourse objects are collections of linear text with word tokens
`spontaneous.Speaker`(name, **kwargs)	Speaker objects contain information about the producers of WordTokens
`spontaneous.SpontaneousSpeechCorpus`(name, ...)	SpontaneousSpeechCorpus objects a collection of Discourse objects and Corpus objects for frequency information.
`spontaneous.WordToken`(**kwargs)	WordToken objects are individual productions of Words

Corpus context managers¶

`contextmanagers.BaseCorpusContext`(corpus, ...)	Abstract Corpus context class that all other contexts inherit from.
`contextmanagers.CanonicalVariantContext`(...)	Corpus context that uses canonical forms for transcriptions and tiers
`contextmanagers.MostFrequentVariantContext`(...)	Corpus context that uses the most frequent pronunciation variants
`contextmanagers.SeparatedTokensVariantContext`(...)	Corpus context that treats pronunciation variants as separate types
`contextmanagers.WeightedVariantContext`(...)	Corpus context that weights frequency of pronunciation variants by the

Corpus IO functions¶

Corpus binaries¶

`binary.download_binary`(name, path[, call_back])	Download a binary file of example corpora and feature matrices.
`binary.load_binary`(path)	Unpickle a binary file
`binary.save_binary`(obj, path)	Pickle a Corpus or FeatureMatrix object for later loading

Loading from CSV¶

`csv.load_corpus_csv`(corpus_name, path, delimiter)	Load a corpus from a column-delimited text file
`csv.load_feature_matrix_csv`(name, path, ...)	Load a FeatureMatrix from a column-delimited text file

Export to CSV¶

`csv.export_corpus_csv`(corpus, path[, ...])	Save a corpus as a column-delimited text file
`csv.export_feature_matrix_csv`(...[, delimiter])	Save a FeatureMatrix as a column-delimited text file

TextGrids¶

`textgrid.inspect_discourse_textgrid`
`textgrid.load_discourse_textgrid`
`textgrid.load_directory_textgrid`

Running text¶

`text_spelling.inspect_discourse_spelling`(path)	Generate a list of AnnotationTypes for a specified text file for parsing
`text_spelling.load_discourse_spelling`(...[, ...])	Load a discourse from a text file containing running text of
`text_spelling.load_directory_spelling`(...[, ...])	Loads a directory of orthographic texts
`text_spelling.export_discourse_spelling`(...)	Export an orthography discourse to a text file
`text_transcription.inspect_discourse_transcription`(path)	Generate a list of AnnotationTypes for a specified text file for parsing
`text_transcription.load_discourse_transcription`(...)	Load a discourse from a text file containing running transcribed text
`text_transcription.load_directory_transcription`(...)	Loads a directory of transcribed texts.
`text_transcription.export_discourse_transcription`(...)	Export an transcribed discourse to a text file

Interlinear gloss text¶

`text_ilg.inspect_discourse_ilg`(path[, number])	Generate a list of AnnotationTypes for a specified text file for parsing
`text_ilg.load_discourse_ilg`(corpus_name, ...)	Load a discourse from a text file containing interlinear glosses
`text_ilg.load_directory_ilg`(corpus_name, ...)	Loads a directory of interlinear gloss text files
`text_ilg.export_discourse_ilg`(discourse, path)	Export a discourse to an interlinear gloss text file, with a maximal

Other standards¶

`multiple_files.inspect_discourse_multiple_files`(...)	Generate a list of AnnotationTypes for a specified dialect
`multiple_files.load_discourse_multiple_files`(...)	Load a discourse from a text file containing interlinear glosses
`multiple_files.load_directory_multiple_files`(...)	Loads a directory of corpus standard files (separated into words files

Analysis functions¶

Frequency of alternation¶

freq_of_alt.calc_freq_of_alt(corpus_context, ...) Returns a double that is a measure of the frequency of

Functional load¶

`functional_load.minpair_fl`(corpus_context, ...)	Calculate the functional load of the contrast between two segments as a count of minimal pairs.
`functional_load.deltah_fl`(corpus_context, ...)	Calculate the functional load of the contrast between between two segments as the decrease in corpus entropy caused by a merger.
`functional_load.relative_minpair_fl`(...[, ...])	Calculate the average functional load of the contrasts between a segment and all other segments, as a count of minimal pairs.
`functional_load.relative_deltah_fl`(...[, ...])	Calculate the average functional load of the contrasts between a segment and all other segments, as the decrease in corpus entropy caused by a merger.

Kullback-Leibler divergence¶

kl.KullbackLeibler(corpus_context, seg1, ...) Calculates KL distances between two Phoneme objects in some context, either the left or right-hand side.

Mutual information¶

mutual_information.pointwise_mi(...[, ...]) Calculate the mutual information for a bigram.

Neighborhood density¶

`neighborhood_density.neighborhood_density`(...)	Calculate the neighborhood density of a particular word in the corpus.
`neighborhood_density.find_mutation_minpairs`(...)	Find all minimal pairs of the query word based only on segment

Phonotactic probability¶

phonotactic_probability.phonotactic_probability_vitevitch(...) Calculate the phonotactic_probability of a particular word using

Predictability of distribution¶

`pred_of_dist.calc_prod_all_envs`(...[, ...])	Main function for calculating predictability of distribution for two segments over a corpus, regardless of environment.
`pred_of_dist.calc_prod`(corpus_context, envs)	Main function for calculating predictability of distribution for two segments over specified environments in a corpus.

Symbol similarity¶

string_similarity.string_similarity(...) This function computes similarity of pairs of words across a corpus.

edit_distance.edit_distance(word1, word2, ...) Returns the Levenshtein edit distance between a string from two words word1 and word2, code drawn from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python.

khorsi.khorsi(word1, word2, freq_base, ...) Calculate the string similarity of two words given a set of

phono_edit_distance.phono_edit_distance(...) Returns an analogue to Levenshtein edit distance but uses