Corpus¶

class corpustools.corpus.classes.lexicon.Corpus(name, update=False)[source]¶

Lexicon to store information about Words, such as transcriptions, spellings and frequencies

Parameters

namestring: Name to identify Corpus

Attributes

namestr: Name of the corpus, used only for easy of reference
attributeslist of Attributes: List of Attributes that Words in the Corpus have
wordlistdict: Dictionary where every key is a unique string representing a word in a corpus, and each entry is a Word object
wordslist of strings: All the keys for the wordlist of the Corpus
specifierFeatureSpecifier: See the FeatureSpecifier object
inventoryInventory: Inventory that contains information about segments in the Corpus

Methods

`__init__`(name[, update])
`add_abstract_tier`(attribute, spec)	Add a abstract tier (currently primarily for generating CV skeletons from tiers).
`add_attribute`(attribute[, initialize_defaults])	Add an Attribute of any type to the Corpus or replace an existing Attribute.
`add_count_attribute`(attribute, ...)	Add an Numeric Attribute that is a count of a segments in a tier that match a given specification.
`add_tier`(attribute, spec)	Add a Tier Attribute based on the transcription of words as a new Attribute that includes all segments that match the specification.
`add_word`(word[, allow_duplicates])	Add a word to the Corpus.
`check_coverage`()	Checks the coverage of the specifier (FeatureMatrix) of the Corpus over the inventory of the Corpus
`features_to_segments`(feature_description)	Given a feature description, return the segments in the inventory that match that feature description
`find`(word[, ignore_case])	Search for a Word in the corpus
`find_all`(spelling)	Find all Word objects with the specified spelling
`generate_alternative_inventories`()
`get_features`()	Get a list of the _features used to describe Segments
`get_or_create_word`(**kwargs)	Get a Word object that has the spelling and transcription specified or create that Word, add it to the Corpus and return it.
`get_random_subset`(size[, new_corpus_name])	Get a new corpus consisting a random selection from the current corpus
`initDefaults`()
`iter_sort`()	Sorts the keys in the corpus dictionary, then yields the values in that order
`iter_words`()	Sorts the keys in the corpus dictionary, then yields the values in that order
`key`(word)
`keys`()
`random_word`()	Return a randomly selected Word
`remove_attribute`(attribute)	Remove an Attribute from the Corpus and from all its Word objects.
`remove_word`(word_key)	Remove a Word from the Corpus using its identifier in the Corpus.
`retranscribe`(segmap)
`segment_to_features`(seg)	Given a segment, return the _features for that segment.
`set_default_representations`()
`set_feature_matrix`(matrix)	Set the feature system to be used by the corpus and make sure every word is using it too.
`subset`(filters, mode)	Generate a subset of the corpus based on filters.
`symbol_to_segment`(symbol)
`update`(old_corpus)
`update_features`()
`update_inventory`(transcription)	Update the inventory of the Corpus to ensure it contains all the segments in the given transcription
`update_wordlist`(new_wordlist)