Corpus¶
- class corpustools.corpus.classes.lexicon.Corpus(name, update=False)[source]¶
Lexicon to store information about Words, such as transcriptions, spellings and frequencies
- Parameters
- namestring
Name to identify Corpus
- Attributes
- namestr
Name of the corpus, used only for easy of reference
- attributeslist of Attributes
List of Attributes that Words in the Corpus have
- wordlistdict
Dictionary where every key is a unique string representing a word in a corpus, and each entry is a Word object
- wordslist of strings
All the keys for the wordlist of the Corpus
- specifierFeatureSpecifier
See the FeatureSpecifier object
- inventoryInventory
Inventory that contains information about segments in the Corpus
Methods
__init__(name[, update])add_abstract_tier(attribute, spec)Add a abstract tier (currently primarily for generating CV skeletons from tiers).
add_attribute(attribute[, initialize_defaults])Add an Attribute of any type to the Corpus or replace an existing Attribute.
add_count_attribute(attribute, ...)Add an Numeric Attribute that is a count of a segments in a tier that match a given specification.
add_tier(attribute, spec)Add a Tier Attribute based on the transcription of words as a new Attribute that includes all segments that match the specification.
add_word(word[, allow_duplicates])Add a word to the Corpus.
check_coverage()Checks the coverage of the specifier (FeatureMatrix) of the Corpus over the inventory of the Corpus
features_to_segments(feature_description)Given a feature description, return the segments in the inventory that match that feature description
find(word[, ignore_case])Search for a Word in the corpus
find_all(spelling)Find all Word objects with the specified spelling
generate_alternative_inventories()get_features()Get a list of the _features used to describe Segments
get_or_create_word(**kwargs)Get a Word object that has the spelling and transcription specified or create that Word, add it to the Corpus and return it.
get_random_subset(size[, new_corpus_name])Get a new corpus consisting a random selection from the current corpus
initDefaults()iter_sort()Sorts the keys in the corpus dictionary, then yields the values in that order
iter_words()Sorts the keys in the corpus dictionary, then yields the values in that order
key(word)keys()random_word()Return a randomly selected Word
remove_attribute(attribute)Remove an Attribute from the Corpus and from all its Word objects.
remove_word(word_key)Remove a Word from the Corpus using its identifier in the Corpus.
retranscribe(segmap)segment_to_features(seg)Given a segment, return the _features for that segment.
set_default_representations()set_feature_matrix(matrix)Set the feature system to be used by the corpus and make sure every word is using it too.
subset(filters, mode)Generate a subset of the corpus based on filters.
symbol_to_segment(symbol)update(old_corpus)update_features()update_inventory(transcription)Update the inventory of the Corpus to ensure it contains all the segments in the given transcription
update_wordlist(new_wordlist)