deeppavlov.models.spelling_correction¶
- class deeppavlov.models.spelling_correction.brillmoore.ErrorModel(dictionary: StaticDictionary, window: int = 1, candidates_count: int = 1, *args, **kwargs)[source]¶
Component that uses statistics based error model to find best candidates in a static dictionary. Based on An Improved Error Model for Noisy Channel Spelling Correction by Eric Brill and Robert C. Moore
- Parameters
dictionary – a
StaticDictionary
objectwindow – maximum context window size
candidates_count – maximum number of replacement candidates to return for every token in the input
- costs¶
logarithmic probabilities of character sequences replacements
- dictionary¶
a
StaticDictionary
object
- window¶
maximum context window size
- candidates_count¶
maximum number of replacement candidates to return for every token in the input
- __call__(data: Iterable[Iterable[str]], *args, **kwargs) List[List[List[Tuple[float, str]]]] [source]¶
Propose candidates for tokens in sentences
- Parameters
data – batch of tokenized sentences
- Returns
batch of lists of probabilities and candidates for every token
- class deeppavlov.models.spelling_correction.levenshtein.LevenshteinSearcherComponent(words: Iterable[str], max_distance: int = 1, error_probability: float = 0.0001, vocab_penalty: Optional[float] = None, **kwargs)[source]¶
Component that finds replacement candidates for tokens at a set Damerau-Levenshtein distance
- Parameters
words – list of every correct word
max_distance – maximum allowed Damerau-Levenshtein distance between source words and candidates
error_probability – assigned probability for every edit
vocab_penalty – assigned probability of an out of vocabulary token being the correct one without changes
- max_distance¶
maximum allowed Damerau-Levenshtein distance between source words and candidates
- error_probability¶
assigned logarithmic probability for every edit
- vocab_penalty¶
assigned logarithmic probability of an out of vocabulary token being the correct one without changes
- class deeppavlov.models.spelling_correction.electors.top1_elector.TopOneElector(*args, **kwargs)[source]¶
Component that chooses a candidate with highest base probability for every token
- class deeppavlov.models.spelling_correction.electors.kenlm_elector.KenlmElector(load_path: Path, beam_size: int = 4, *args, **kwargs)[source]¶
Component that chooses a candidate with the highest product of base and language model probabilities
- Parameters
load_path – path to the kenlm model file
beam_size – beam size for highest probability search
- lm¶
kenlm object
- beam_size¶
beam size for highest probability search