Corpus¶
- class corpustools.corpus.classes.lexicon.Corpus(name, update=False)[source]¶
Lexicon to store information about Words, such as transcriptions, spellings and frequencies
- Parameters
- namestring
Name to identify Corpus
- Attributes
- namestr
Name of the corpus, used only for easy of reference
- attributeslist of Attributes
List of Attributes that Words in the Corpus have
- wordlistdict
Dictionary where every key is a unique string representing a word in a corpus, and each entry is a Word object
- wordslist of strings
All the keys for the wordlist of the Corpus
- specifierFeatureSpecifier
See the FeatureSpecifier object
- inventoryInventory
Inventory that contains information about segments in the Corpus
Methods
__init__
(name[, update])add_abstract_tier
(attribute, spec)Add a abstract tier (currently primarily for generating CV skeletons from tiers).
add_attribute
(attribute[, initialize_defaults])Add an Attribute of any type to the Corpus or replace an existing Attribute.
add_count_attribute
(attribute, ...)Add an Numeric Attribute that is a count of a segments in a tier that match a given specification.
add_tier
(attribute, spec)Add a Tier Attribute based on the transcription of words as a new Attribute that includes all segments that match the specification.
add_word
(word[, allow_duplicates])Add a word to the Corpus.
check_coverage
()Checks the coverage of the specifier (FeatureMatrix) of the Corpus over the inventory of the Corpus
features_to_segments
(feature_description)Given a feature description, return the segments in the inventory that match that feature description
find
(word[, ignore_case])Search for a Word in the corpus
find_all
(spelling)Find all Word objects with the specified spelling
generate_alternative_inventories
()get_features
()Get a list of the _features used to describe Segments
get_or_create_word
(**kwargs)Get a Word object that has the spelling and transcription specified or create that Word, add it to the Corpus and return it.
get_random_subset
(size[, new_corpus_name])Get a new corpus consisting a random selection from the current corpus
initDefaults
()iter_sort
()Sorts the keys in the corpus dictionary, then yields the values in that order
iter_words
()Sorts the keys in the corpus dictionary, then yields the values in that order
key
(word)keys
()random_word
()Return a randomly selected Word
remove_attribute
(attribute)Remove an Attribute from the Corpus and from all its Word objects.
remove_word
(word_key)Remove a Word from the Corpus using its identifier in the Corpus.
retranscribe
(segmap)segment_to_features
(seg)Given a segment, return the _features for that segment.
set_default_representations
()set_feature_matrix
(matrix)Set the feature system to be used by the corpus and make sure every word is using it too.
subset
(filters, mode)Generate a subset of the corpus based on filters.
symbol_to_segment
(symbol)update
(old_corpus)update_features
()update_inventory
(transcription)Update the inventory of the Corpus to ensure it contains all the segments in the given transcription
update_wordlist
(new_wordlist)