API Reference¶
Lexicon classes¶
lexicon.Attribute (name, att_type[, ...]) |
Attributes are for collecting summary information about attributes of |
lexicon.Corpus (name[, update]) |
Lexicon to store information about Words, such as transcriptions, |
lexicon.Inventory ([update]) |
Inventories contain information about a Corpus’ segmental inventory. |
lexicon.FeatureMatrix (name, feature_entries) |
An object that stores feature values for segments |
lexicon.Segment (symbol[, features]) |
Class for segment symbols |
lexicon.Transcription (seg_list) |
Transcription object, sequence of symbols |
lexicon.Word ([update]) |
An object representing a word in a corpus |
lexicon.EnvironmentFilter (middle_segments[, ...]) |
Filter to use for searching words to generate Environments that match |
lexicon.Environment (middle, position[, lhs, rhs]) |
Specific sequence of segments that was a match for an EnvironmentFilter |
Speech corpus classes¶
spontaneous.Discourse (kwargs) |
Discourse objects are collections of linear text with word tokens |
spontaneous.Speaker (name, **kwargs) |
Speaker objects contain information about the producers of WordTokens |
spontaneous.SpontaneousSpeechCorpus (name, ...) |
SpontaneousSpeechCorpus objects a collection of Discourse objects and Corpus objects for frequency information. |
spontaneous.WordToken ([update]) |
WordToken objects are individual productions of Words |
Corpus context managers¶
contextmanagers.BaseCorpusContext (corpus, ...) |
Abstract Corpus context class that all other contexts inherit from. |
contextmanagers.CanonicalVariantContext (...) |
Corpus context that uses canonical forms for transcriptions and tiers |
contextmanagers.MostFrequentVariantContext (...) |
Corpus context that uses the most frequent pronunciation variants |
contextmanagers.SeparatedTokensVariantContext (...) |
Corpus context that treats pronunciation variants as separate types |
contextmanagers.WeightedVariantContext (...) |
Corpus context that weights frequency of pronunciation variants by the |
Corpus IO functions¶
Corpus binaries¶
binary.download_binary (name, path[, call_back]) |
Download a binary file of example corpora and feature matrices. |
binary.load_binary (path) |
Unpickle a binary file |
binary.save_binary (obj, path) |
Pickle a Corpus or FeatureMatrix object for later loading |
Loading from CSV¶
csv.load_corpus_csv (corpus_name, path, delimiter) |
Load a corpus from a column-delimited text file |
csv.load_feature_matrix_csv (name, path, ...) |
Load a FeatureMatrix from a column-delimited text file |
Export to CSV¶
csv.export_corpus_csv (corpus, path[, ...]) |
Save a corpus as a column-delimited text file |
csv.export_feature_matrix_csv (...[, delimiter]) |
Save a FeatureMatrix as a column-delimited text file |
TextGrids¶
textgrid.inspect_discourse_textgrid |
|
textgrid.load_discourse_textgrid |
|
textgrid.load_directory_textgrid |
Running text¶
text_spelling.inspect_discourse_spelling (path) |
Generate a list of AnnotationTypes for a specified text file for parsing |
text_spelling.load_discourse_spelling (...[, ...]) |
Load a discourse from a text file containing running text of |
text_spelling.load_directory_spelling (...[, ...]) |
Loads a directory of orthographic texts |
text_spelling.export_discourse_spelling (...) |
Export an orthography discourse to a text file |
text_transcription.inspect_discourse_transcription (path) |
Generate a list of AnnotationTypes for a specified text file for parsing |
text_transcription.load_discourse_transcription (...) |
Load a discourse from a text file containing running transcribed text |
text_transcription.load_directory_transcription (...) |
Loads a directory of transcribed texts. |
text_transcription.export_discourse_transcription (...) |
Export an transcribed discourse to a text file |
Interlinear gloss text¶
text_ilg.inspect_discourse_ilg (path[, number]) |
Generate a list of AnnotationTypes for a specified text file for parsing |
text_ilg.load_discourse_ilg (corpus_name, ...) |
Load a discourse from a text file containing interlinear glosses |
text_ilg.load_directory_ilg (corpus_name, ...) |
Loads a directory of interlinear gloss text files |
text_ilg.export_discourse_ilg (discourse, path) |
Export a discourse to an interlinear gloss text file, with a maximal |
Other standards¶
multiple_files.inspect_discourse_multiple_files (...) |
Generate a list of AnnotationTypes for a specified dialect |
multiple_files.load_discourse_multiple_files (...) |
Load a discourse from a text file containing interlinear glosses |
multiple_files.load_directory_multiple_files (...) |
Loads a directory of corpus standard files (separated into words files |
Analysis functions¶
Frequency of alternation¶
freq_of_alt.calc_freq_of_alt (corpus_context, ...) |
Returns a double that is a measure of the frequency of |
Functional load¶
functional_load.minpair_fl (corpus_context, ...) |
Calculate the functional load of the contrast between two segments as a count of minimal pairs. |
functional_load.deltah_fl (corpus_context, ...) |
Calculate the functional load of the contrast between between two segments as the decrease in corpus entropy caused by a merger. |
functional_load.relative_minpair_fl (...[, ...]) |
Calculate the average functional load of the contrasts between a segment and all other segments, as a count of minimal pairs. |
functional_load.relative_deltah_fl (...[, ...]) |
Calculate the average functional load of the contrasts between a segment and all other segments, as the decrease in corpus entropy caused by a merger. |
Kullback-Leibler divergence¶
kl.KullbackLeibler (corpus_context, seg1, ...) |
Calculates KL distances between two Phoneme objects in some context, either the left or right-hand side. |
Mutual information¶
mutual_information.pointwise_mi (...[, ...]) |
Calculate the mutual information for a bigram. |
Neighborhood density¶
neighborhood_density.neighborhood_density (...) |
Calculate the neighborhood density of a particular word in the corpus. |
neighborhood_density.find_mutation_minpairs (...) |
Find all minimal pairs of the query word based only on segment |
Phonotactic probability¶
phonotactic_probability.phonotactic_probability_vitevitch (...) |
Calculate the phonotactic_probability of a particular word using |
Predictability of distribution¶
pred_of_dist.calc_prod_all_envs (...[, ...]) |
Main function for calculating predictability of distribution for two segments over a corpus, regardless of environment. |
pred_of_dist.calc_prod (corpus_context, envs) |
Main function for calculating predictability of distribution for two segments over specified environments in a corpus. |
Symbol similarity¶
string_similarity.string_similarity (...) |
This function computes similarity of pairs of words across a corpus. |
edit_distance.edit_distance (word1, word2, ...) |
Returns the Levenshtein edit distance between a string from two words word1 and word2, code drawn from http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python. |
khorsi.khorsi (word1, word2, freq_base, ...) |
Calculate the string similarity of two words given a set of |
phono_edit_distance.phono_edit_distance (...) |
Returns an analogue to Levenshtein edit distance but uses |