vocabs¶
Concrete Vocab classes.
-
class
deeppavlov.vocabs.wiki_sqlite.
WikiSQLiteVocab
(load_path: str, join_docs: bool = True, shuffle: bool = False, **kwargs)[source]¶ Get content from SQLite database by document ids.
Parameters: - load_path – a path to local DB file
- join_docs – whether to join extracted docs with ‘ ‘ or not
- shuffle – whether to shuffle data or not
-
join_docs
¶ whether to join extracted docs with ‘ ‘ or not
-
class
deeppavlov.vocabs.typos.
RussianWordsVocab
(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]¶ Implementation of
StaticDictionary
that builds data from https://github.com/danakt/russian-words/Parameters: data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory -
dict_name
¶ logical name of the dictionary
-
alphabet
¶ set of all the characters used in this dictionary
-
words_set
¶ set of all the words
-
words_trie
¶ trie structure of all the words
-
-
class
deeppavlov.vocabs.typos.
StaticDictionary
(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, dictionary_name: str = 'dictionary', **kwargs)[source]¶ Trie vocabulary used in spelling correction algorithms
Parameters: - data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
- dictionary_name – logical name of the dictionary
- raw_dictionary_path – path to the source file with the list of words
-
dict_name
¶ logical name of the dictionary
-
alphabet
¶ set of all the characters used in this dictionary
-
words_set
¶ set of all the words
-
words_trie
¶ trie structure of all the words
-
class
deeppavlov.vocabs.typos.
Wiki100KDictionary
(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]¶ Implementation of
StaticDictionary
that builds data from WikitionaryParameters: data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory -
dict_name
¶ logical name of the dictionary
-
alphabet
¶ set of all the characters used in this dictionary
-
words_set
¶ set of all the words
-
words_trie
¶ trie structure of all the words
-