API Reference¶
Lexicon classes¶
lexicon.Attribute (name, att_type[, …]) |
Attributes are for collecting summary information about attributes of Words or WordTokens, with different types of attributes allowing for different behaviour |
lexicon.Corpus (name[, update]) |
Lexicon to store information about Words, such as transcriptions, spellings and frequencies |
lexicon.Inventory ([update]) |
Inventories contain information about a Corpus’ segmental inventory. This class exists mainly for the purposes |
lexicon.FeatureMatrix (name, feature_entries) |
An object that stores feature values for segments |
lexicon.Segment (symbol[, features]) |
Class for segment symbols |
lexicon.Transcription (seg_list) |
Transcription object, sequence of symbols |
lexicon.Word (**kwargs) |
An object representing a word in a corpus |
lexicon.EnvironmentFilter (middle_segments[, …]) |
Filter to use for searching words to generate Environments that match |
lexicon.Environment (middle, position[, lhs, rhs]) |
Specific sequence of segments that was a match for an EnvironmentFilter |
Speech corpus classes¶
spontaneous.Discourse (kwargs) |
Discourse objects are collections of linear text with word tokens |
spontaneous.Speaker (name, **kwargs) |
Speaker objects contain information about the producers of WordTokens or Discourses |
spontaneous.SpontaneousSpeechCorpus (name, …) |
SpontaneousSpeechCorpus objects a collection of Discourse objects and Corpus objects for frequency information. |
spontaneous.WordToken ([update]) |
WordToken objects are individual productions of Words |
Corpus context managers¶
contextmanagers.BaseCorpusContext (corpus, …) |
Abstract Corpus context class that all other contexts inherit from. |
contextmanagers.CanonicalVariantContext (…) |
Corpus context that uses canonical forms for transcriptions and tiers |
contextmanagers.MostFrequentVariantContext (…) |
Corpus context that uses the most frequent pronunciation variants for transcriptions and tiers |
contextmanagers.SeparatedTokensVariantContext (…) |
Corpus context that treats pronunciation variants as separate types for transcriptions and tiers |
contextmanagers.WeightedVariantContext (…) |
Corpus context that weights frequency of pronunciation variants by the number of variants or the token frequency for transcriptions and tiers |
Corpus IO functions¶
Corpus binaries¶
binary.download_binary (name, path[, call_back]) |
Download a binary file of example corpora and feature matrices. |
binary.load_binary (path) |
Unpickle a binary file |
binary.save_binary (obj, path) |
Pickle a Corpus or FeatureMatrix object for later loading |
Loading from CSV¶
csv.load_corpus_csv (corpus_name, path, delimiter) |
Load a corpus from a column-delimited text file |
csv.load_feature_matrix_csv (name, path, …) |
Load a FeatureMatrix from a column-delimited text file |
Export to CSV¶
csv.export_corpus_csv (corpus, path[, …]) |
Save a corpus as a column-delimited text file |
csv.export_feature_matrix_csv (…[, delimiter]) |
Save a FeatureMatrix as a column-delimited text file |
TextGrids¶
textgrid.inspect_discourse_textgrid |
|
textgrid.load_discourse_textgrid |
|
textgrid.load_directory_textgrid |
Running text¶
text_spelling.inspect_discourse_spelling (path) |
Generate a list of AnnotationTypes for a specified text file for parsing it as an orthographic text |
text_spelling.load_discourse_spelling (…[, …]) |
Load a discourse from a text file containing running text of orthography |
text_spelling.load_directory_spelling (…[, …]) |
Loads a directory of orthographic texts |
text_spelling.export_discourse_spelling (…) |
Export an orthography discourse to a text file |
text_transcription.inspect_discourse_transcription (path) |
Generate a list of AnnotationTypes for a specified text file for parsing it as a transcribed text |
text_transcription.load_discourse_transcription (…) |
Load a discourse from a text file containing running transcribed text |
text_transcription.load_directory_transcription (…) |
Loads a directory of transcribed texts. |
text_transcription.export_discourse_transcription (…) |
Export an transcribed discourse to a text file |
Interlinear gloss text¶
text_ilg.inspect_discourse_ilg (path[, number]) |
Generate a list of AnnotationTypes for a specified text file for parsing it as an interlinear gloss text file |
text_ilg.load_discourse_ilg (corpus_name, …) |
Load a discourse from a text file containing interlinear glosses |
text_ilg.load_directory_ilg (corpus_name, …) |
Loads a directory of interlinear gloss text files |
text_ilg.export_discourse_ilg (discourse, path) |
Export a discourse to an interlinear gloss text file, with a maximal line size of 10 words |
Other standards¶
multiple_files.inspect_discourse_multiple_files (…) |
Generate a list of AnnotationTypes for a specified dialect |
multiple_files.load_discourse_multiple_files (…) |
Load a discourse from a text file containing interlinear glosses |
multiple_files.load_directory_multiple_files (…) |
Loads a directory of corpus standard files (separated into words files and phones files) |
Analysis functions¶
Frequency of alternation¶
Frequency of alternation is currently not supported in PCT.
freq_of_alt.calc_freq_of_alt (corpus_context, …) |
Returns a double that is a measure of the frequency of alternation of two sounds in a given corpus |
Functional load¶
functional_load.minpair_fl |
|
functional_load.deltah_fl |
|
functional_load.relative_minpair_fl |
|
functional_load.relative_deltah_fl |
Kullback-Leibler divergence¶
kl.KullbackLeibler (corpus_context, seg1, …) |
Calculates KL distances between two Phoneme objects in some context, either the left or right-hand side. |
Mutual information¶
mutual_information.pointwise_mi (…[, …]) |
Calculate the mutual information for a bigram. |
Transitional probability¶
Neighborhood density¶
neighborhood_density.neighborhood_density (…) |
Calculate the neighborhood density of a particular word in the corpus. |
neighborhood_density.find_mutation_minpairs (…) |
Find all minimal pairs of the query word based only on segment mutations (not deletions/insertions) |
Phonotactic probability¶
phonotactic_probability.phonotactic_probability_vitevitch (…) |
Calculate the phonotactic_probability of a particular word using the Vitevitch & Luce algorithm |
Predictability of distribution¶
pred_of_dist.calc_prod_all_envs (…[, …]) |
Main function for calculating predictability of distribution for two segments over a corpus, regardless of environment. |
pred_of_dist.calc_prod (corpus_context, envs) |
Main function for calculating predictability of distribution for two segments over specified environments in a corpus. |
Symbol similarity¶
string_similarity.string_similarity (…) |
This function computes similarity of pairs of words across a corpus. |
edit_distance.edit_distance (word1, word2, …) |
Returns the Levenshtein edit distance between a string from two words word1 and word2, code drawn from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python. |
khorsi.khorsi (word1, word2, freq_base, …) |
Calculate the string similarity of two words given a set of characters and their frequencies in a corpus based on Khorsi (2012) |
phono_edit_distance.phono_edit_distance (…) |
Returns an analogue to Levenshtein edit distance but uses phonological _features instead of characters |