API Reference¶
Lexicon classes¶
lexicon.Attribute(name, att_type[, …]) |
Attributes are for collecting summary information about attributes of Words or WordTokens, with different types of attributes allowing for different behaviour |
lexicon.Corpus(name[, update]) |
Lexicon to store information about Words, such as transcriptions, spellings and frequencies |
lexicon.Inventory([update]) |
Inventories contain information about a Corpus’ segmental inventory. This class exists mainly for the purposes |
lexicon.FeatureMatrix(name, feature_entries) |
An object that stores feature values for segments |
lexicon.Segment(symbol[, features]) |
Class for segment symbols |
lexicon.Transcription(seg_list) |
Transcription object, sequence of symbols |
lexicon.Word(**kwargs) |
An object representing a word in a corpus |
lexicon.EnvironmentFilter(middle_segments[, …]) |
Filter to use for searching words to generate Environments that match |
lexicon.Environment(middle, position[, lhs, rhs]) |
Specific sequence of segments that was a match for an EnvironmentFilter |
Speech corpus classes¶
spontaneous.Discourse(kwargs) |
Discourse objects are collections of linear text with word tokens |
spontaneous.Speaker(name, **kwargs) |
Speaker objects contain information about the producers of WordTokens or Discourses |
spontaneous.SpontaneousSpeechCorpus(name, …) |
SpontaneousSpeechCorpus objects a collection of Discourse objects and Corpus objects for frequency information. |
spontaneous.WordToken([update]) |
WordToken objects are individual productions of Words |
Corpus context managers¶
contextmanagers.BaseCorpusContext(corpus, …) |
Abstract Corpus context class that all other contexts inherit from. |
contextmanagers.CanonicalVariantContext(…) |
Corpus context that uses canonical forms for transcriptions and tiers |
contextmanagers.MostFrequentVariantContext(…) |
Corpus context that uses the most frequent pronunciation variants for transcriptions and tiers |
contextmanagers.SeparatedTokensVariantContext(…) |
Corpus context that treats pronunciation variants as separate types for transcriptions and tiers |
contextmanagers.WeightedVariantContext(…) |
Corpus context that weights frequency of pronunciation variants by the number of variants or the token frequency for transcriptions and tiers |
Corpus IO functions¶
Corpus binaries¶
binary.download_binary(name, path[, call_back]) |
Download a binary file of example corpora and feature matrices. |
binary.load_binary(path) |
Unpickle a binary file |
binary.save_binary(obj, path) |
Pickle a Corpus or FeatureMatrix object for later loading |
Loading from CSV¶
csv.load_corpus_csv(corpus_name, path, delimiter) |
Load a corpus from a column-delimited text file |
csv.load_feature_matrix_csv(name, path, …) |
Load a FeatureMatrix from a column-delimited text file |
Export to CSV¶
csv.export_corpus_csv(corpus, path[, …]) |
Save a corpus as a column-delimited text file |
csv.export_feature_matrix_csv(…[, delimiter]) |
Save a FeatureMatrix as a column-delimited text file |
TextGrids¶
textgrid.inspect_discourse_textgrid |
|
textgrid.load_discourse_textgrid |
|
textgrid.load_directory_textgrid |
Running text¶
text_spelling.inspect_discourse_spelling(path) |
Generate a list of AnnotationTypes for a specified text file for parsing it as an orthographic text |
text_spelling.load_discourse_spelling(…[, …]) |
Load a discourse from a text file containing running text of orthography |
text_spelling.load_directory_spelling(…[, …]) |
Loads a directory of orthographic texts |
text_spelling.export_discourse_spelling(…) |
Export an orthography discourse to a text file |
text_transcription.inspect_discourse_transcription(path) |
Generate a list of AnnotationTypes for a specified text file for parsing it as a transcribed text |
text_transcription.load_discourse_transcription(…) |
Load a discourse from a text file containing running transcribed text |
text_transcription.load_directory_transcription(…) |
Loads a directory of transcribed texts. |
text_transcription.export_discourse_transcription(…) |
Export an transcribed discourse to a text file |
Interlinear gloss text¶
text_ilg.inspect_discourse_ilg(path[, number]) |
Generate a list of AnnotationTypes for a specified text file for parsing it as an interlinear gloss text file |
text_ilg.load_discourse_ilg(corpus_name, …) |
Load a discourse from a text file containing interlinear glosses |
text_ilg.load_directory_ilg(corpus_name, …) |
Loads a directory of interlinear gloss text files |
text_ilg.export_discourse_ilg(discourse, path) |
Export a discourse to an interlinear gloss text file, with a maximal line size of 10 words |
Other standards¶
multiple_files.inspect_discourse_multiple_files(…) |
Generate a list of AnnotationTypes for a specified dialect |
multiple_files.load_discourse_multiple_files(…) |
Load a discourse from a text file containing interlinear glosses |
multiple_files.load_directory_multiple_files(…) |
Loads a directory of corpus standard files (separated into words files and phones files) |
Analysis functions¶
Frequency of alternation¶
Frequency of alternation is currently not supported in PCT.
freq_of_alt.calc_freq_of_alt(corpus_context, …) |
Returns a double that is a measure of the frequency of alternation of two sounds in a given corpus |
Functional load¶
functional_load.minpair_fl |
|
functional_load.deltah_fl |
|
functional_load.relative_minpair_fl |
|
functional_load.relative_deltah_fl |
Kullback-Leibler divergence¶
kl.KullbackLeibler(corpus_context, seg1, …) |
Calculates KL distances between two Phoneme objects in some context, either the left or right-hand side. |
Mutual information¶
mutual_information.pointwise_mi(…[, …]) |
Calculate the mutual information for a bigram. |
Transitional probability¶
Neighborhood density¶
neighborhood_density.neighborhood_density(…) |
Calculate the neighborhood density of a particular word in the corpus. |
neighborhood_density.find_mutation_minpairs(…) |
Find all minimal pairs of the query word based only on segment mutations (not deletions/insertions) |
Phonotactic probability¶
phonotactic_probability.phonotactic_probability_vitevitch(…) |
Calculate the phonotactic_probability of a particular word using the Vitevitch & Luce algorithm |
Predictability of distribution¶
pred_of_dist.calc_prod_all_envs(…[, …]) |
Main function for calculating predictability of distribution for two segments over a corpus, regardless of environment. |
pred_of_dist.calc_prod(corpus_context, envs) |
Main function for calculating predictability of distribution for two segments over specified environments in a corpus. |
Symbol similarity¶
string_similarity.string_similarity(…) |
This function computes similarity of pairs of words across a corpus. |
edit_distance.edit_distance(word1, word2, …) |
Returns the Levenshtein edit distance between a string from two words word1 and word2, code drawn from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python. |
khorsi.khorsi(word1, word2, freq_base, …) |
Calculate the string similarity of two words given a set of characters and their frequencies in a corpus based on Khorsi (2012) |
phono_edit_distance.phono_edit_distance(…) |
Returns an analogue to Levenshtein edit distance but uses phonological _features instead of characters |