API Reference¶
Lexicon classes¶
|
Attributes are for collecting summary information about attributes of Words or WordTokens, with different types of attributes allowing for different behaviour |
|
Lexicon to store information about Words, such as transcriptions, spellings and frequencies |
|
Inventories contain information about a Corpus' segmental inventory. This class exists mainly for the purposes |
|
An object that stores feature values for segments |
|
Class for segment symbols |
|
Transcription object, sequence of symbols |
|
An object representing a word in a corpus |
|
Filter to use for searching words to generate Environments that match |
|
Specific sequence of segments that was a match for an EnvironmentFilter |
Speech corpus classes¶
|
Discourse objects are collections of linear text with word tokens |
|
Speaker objects contain information about the producers of WordTokens or Discourses |
|
SpontaneousSpeechCorpus objects a collection of Discourse objects and Corpus objects for frequency information. |
|
WordToken objects are individual productions of Words |
Corpus context managers¶
|
Abstract Corpus context class that all other contexts inherit from. |
Corpus context that uses canonical forms for transcriptions and tiers |
|
Corpus context that uses the most frequent pronunciation variants for transcriptions and tiers |
|
Corpus context that treats pronunciation variants as separate types for transcriptions and tiers |
|
Corpus context that weights frequency of pronunciation variants by the number of variants or the token frequency for transcriptions and tiers |
Corpus IO functions¶
Corpus binaries¶
|
Download a binary file of example corpora and feature matrices. |
|
Unpickle a binary file |
|
Pickle a Corpus or FeatureMatrix object for later loading |
Loading from CSV¶
|
Load a corpus from a column-delimited text file |
|
Load a FeatureMatrix from a column-delimited text file |
Export to CSV¶
|
Save a corpus as a column-delimited text file |
|
Save a FeatureMatrix as a column-delimited text file |
TextGrids¶
Generate a list of AnnotationTypes for a specified TextGrid file |
|
|
Load a discourse from a TextGrid file |
|
Loads a directory of TextGrid files |
Running text¶
Generate a list of AnnotationTypes for a specified text file for parsing it as an orthographic text |
|
|
Load a discourse from a text file containing running text of orthography |
|
Loads a directory of orthographic texts |
Export an orthography discourse to a text file |
|
Generate a list of AnnotationTypes for a specified text file for parsing it as a transcribed text |
|
Load a discourse from a text file containing running transcribed text |
|
Loads a directory of transcribed texts. |
|
Export an transcribed discourse to a text file |
Interlinear gloss text¶
|
Generate a list of AnnotationTypes for a specified text file for parsing it as an interlinear gloss text file |
|
Load a discourse from a text file containing interlinear glosses |
|
Loads a directory of interlinear gloss text files |
|
Export a discourse to an interlinear gloss text file, with a maximal line size of 10 words |
Other standards¶
Generate a list of AnnotationTypes for a specified dialect |
|
Load a discourse from a text file containing interlinear glosses |
|
Loads a directory of corpus standard files (separated into words files and phones files) |
Analysis functions¶
Frequency of alternation¶
Frequency of alternation is currently not supported in PCT.
|
Returns a double that is a measure of the frequency of alternation of two sounds in a given corpus |
Kullback-Leibler divergence¶
|
Calculates KL distances between two Phoneme objects in some context, either the left or right-hand side. |
Mutual information¶
|
Calculate the mutual information for a bigram. |
Transitional probability¶
Neighborhood density¶
Calculate the neighborhood density of a particular word in the corpus. |
|
Find all minimal pairs of the query word based only on segment mutations (not deletions/insertions) |
Phonotactic probability¶
|
Calculate the phonotactic_probability of a particular word using the Vitevitch & Luce algorithm |
Predictability of distribution¶
|
Main function for calculating predictability of distribution for two segments over a corpus, regardless of environment. |
|
Main function for calculating predictability of distribution for two segments over specified environments in a corpus. |
Symbol similarity¶
This function computes similarity of pairs of words across a corpus. |
|
Returns the Levenshtein edit distance between a string from two words word1 and word2, code drawn from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python. |
|
Calculate the string similarity of two words given a set of characters and their frequencies in a corpus based on Khorsi (2012) |
Returns an analogue to Levenshtein edit distance but uses phonological _features instead of characters |