API Reference¶

Lexicon classes¶

`lexicon.Attribute`(name, att_type[, ...])	Attributes are for collecting summary information about attributes of Words or WordTokens, with different types of attributes allowing for different behaviour
`lexicon.Corpus`(name[, update])	Lexicon to store information about Words, such as transcriptions, spellings and frequencies
`lexicon.Inventory`([update])	Inventories contain information about a Corpus' segmental inventory. This class exists mainly for the purposes
`lexicon.FeatureMatrix`(name, feature_entries)	An object that stores feature values for segments
`lexicon.Segment`(symbol[, features])	Class for segment symbols
`lexicon.Transcription`(seg_list)	Transcription object, sequence of symbols
`lexicon.Word`(**kwargs)	An object representing a word in a corpus
`lexicon.EnvironmentFilter`(middle_segments[, ...])	Filter to use for searching words to generate Environments that match
`lexicon.Environment`(middle, position[, lhs, rhs])	Specific sequence of segments that was a match for an EnvironmentFilter

Speech corpus classes¶

`spontaneous.Discourse`(kwargs)	Discourse objects are collections of linear text with word tokens
`spontaneous.Speaker`(name, **kwargs)	Speaker objects contain information about the producers of WordTokens or Discourses
`spontaneous.SpontaneousSpeechCorpus`(name, ...)	SpontaneousSpeechCorpus objects a collection of Discourse objects and Corpus objects for frequency information.
`spontaneous.WordToken`([update])	WordToken objects are individual productions of Words

Corpus context managers¶

`contextmanagers.BaseCorpusContext`(corpus, ...)	Abstract Corpus context class that all other contexts inherit from.
`contextmanagers.CanonicalVariantContext`(...)	Corpus context that uses canonical forms for transcriptions and tiers
`contextmanagers.MostFrequentVariantContext`(...)	Corpus context that uses the most frequent pronunciation variants for transcriptions and tiers
`contextmanagers.SeparatedTokensVariantContext`(...)	Corpus context that treats pronunciation variants as separate types for transcriptions and tiers
`contextmanagers.WeightedVariantContext`(...)	Corpus context that weights frequency of pronunciation variants by the number of variants or the token frequency for transcriptions and tiers

Corpus IO functions¶

Corpus binaries¶

`binary.download_binary`(name, path[, call_back])	Download a binary file of example corpora and feature matrices.
`binary.load_binary`(path)	Unpickle a binary file
`binary.save_binary`(obj, path)	Pickle a Corpus or FeatureMatrix object for later loading

Loading from CSV¶

`csv.load_corpus_csv`(corpus_name, path, delimiter)	Load a corpus from a column-delimited text file
`csv.load_feature_matrix_csv`(name, path, ...)	Load a FeatureMatrix from a column-delimited text file

Export to CSV¶

`csv.export_corpus_csv`(corpus, path[, ...])	Save a corpus as a column-delimited text file
`csv.export_feature_matrix_csv`(...[, delimiter])	Save a FeatureMatrix as a column-delimited text file

TextGrids¶

`pct_textgrid.inspect_discourse_textgrid`(path)	Generate a list of AnnotationTypes for a specified TextGrid file
`pct_textgrid.load_discourse_textgrid`(...[, ...])	Load a discourse from a TextGrid file
`pct_textgrid.load_directory_textgrid`(...[, ...])	Loads a directory of TextGrid files

Running text¶

`text_spelling.inspect_discourse_spelling`(path)	Generate a list of AnnotationTypes for a specified text file for parsing it as an orthographic text
`text_spelling.load_discourse_spelling`(...[, ...])	Load a discourse from a text file containing running text of orthography
`text_spelling.load_directory_spelling`(...[, ...])	Loads a directory of orthographic texts
`text_spelling.export_discourse_spelling`(...)	Export an orthography discourse to a text file
`text_transcription.inspect_discourse_transcription`(path)	Generate a list of AnnotationTypes for a specified text file for parsing it as a transcribed text
`text_transcription.load_discourse_transcription`(...)	Load a discourse from a text file containing running transcribed text
`text_transcription.load_directory_transcription`(...)	Loads a directory of transcribed texts.
`text_transcription.export_discourse_transcription`(...)	Export an transcribed discourse to a text file

Interlinear gloss text¶

`text_ilg.inspect_discourse_ilg`(path[, number])	Generate a list of AnnotationTypes for a specified text file for parsing it as an interlinear gloss text file
`text_ilg.load_discourse_ilg`(corpus_name, ...)	Load a discourse from a text file containing interlinear glosses
`text_ilg.load_directory_ilg`(corpus_name, ...)	Loads a directory of interlinear gloss text files
`text_ilg.export_discourse_ilg`(discourse, path)	Export a discourse to an interlinear gloss text file, with a maximal line size of 10 words

Other standards¶

`multiple_files.inspect_discourse_multiple_files`(...)	Generate a list of AnnotationTypes for a specified dialect
`multiple_files.load_discourse_multiple_files`(...)	Load a discourse from a text file containing interlinear glosses
`multiple_files.load_directory_multiple_files`(...)	Loads a directory of corpus standard files (separated into words files and phones files)

Analysis functions¶

Frequency of alternation¶

Frequency of alternation is currently not supported in PCT.

freq_of_alt.calc_freq_of_alt(corpus_context, ...)

Returns a double that is a measure of the frequency of alternation of two sounds in a given corpus

Functional load¶

Kullback-Leibler divergence¶

kl.KullbackLeibler(corpus_context, seg1, ...)

Calculates KL distances between two Phoneme objects in some context, either the left or right-hand side.

Mutual information¶

mutual_information.pointwise_mi(...[, ...])

Calculate the mutual information for a bigram.

Transitional probability¶

Neighborhood density¶

`neighborhood_density.neighborhood_density`(...)	Calculate the neighborhood density of a particular word in the corpus.
`neighborhood_density.find_mutation_minpairs`(...)	Find all minimal pairs of the query word based only on segment mutations (not deletions/insertions)

Phonotactic probability¶

phonotactic_probability.phonotactic_probability_vitevitch(...)

Calculate the phonotactic_probability of a particular word using the Vitevitch & Luce algorithm

Predictability of distribution¶

`pred_of_dist.calc_prod_all_envs`(...[, ...])	Main function for calculating predictability of distribution for two segments over a corpus, regardless of environment.
`pred_of_dist.calc_prod`(corpus_context, envs)	Main function for calculating predictability of distribution for two segments over specified environments in a corpus.

Symbol similarity¶

This function computes similarity of pairs of words across a corpus.

edit_distance.edit_distance(word1, word2, ...)

Returns the Levenshtein edit distance between a string from two words word1 and word2, code drawn from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python.

khorsi.khorsi(word1, word2, freq_base, ...)

Calculate the string similarity of two words given a set of characters and their frequencies in a corpus based on Khorsi (2012)

phono_edit_distance.phono_edit_distance(...)

Returns an analogue to Levenshtein edit distance but uses phonological _features instead of characters