deeppavlov.models.torch_bert¶
-
class
deeppavlov.models.preprocessors.torch_transformers_preprocessor.
TorchTransformersPreprocessor
(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, return_tokens: bool = False, **kwargs)[source]¶ Tokenize text on subtokens, encode subtokens with their indices, create tokens and segment masks.
Check details in
bert_dp.preprocessing.convert_examples_to_features()
function.- Parameters
vocab_file – path to vocabulary
do_lower_case – set True if lowercasing is needed
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens
return_tokens – whether to return tuple of input features and tokens, or only input features
-
max_seq_length
¶ max sequence length in subtokens, including [SEP] and [CLS] tokens
-
return_tokens
¶ whether to return tuple of input features and tokens, or only input features
-
tokenizer
¶ instance of Bert FullTokenizer
-
__call__
(texts_a: List[str], texts_b: Optional[List[str]] = None) → Union[List[transformers.data.processors.utils.InputFeatures], Tuple[List[transformers.data.processors.utils.InputFeatures], List[List[str]]]][source]¶ Tokenize and create masks.
texts_a and texts_b are separated by [SEP] token
- Parameters
texts_a – list of texts,
texts_b – list of texts, it could be None, e.g. single sentence classification task
- Returns
batch of
transformers.data.processors.utils.InputFeatures
with subtokens, subtoken ids, subtoken mask, segment mask, or tuple of batch of InputFeatures and Batch of subtokens
-
class
deeppavlov.models.preprocessors.torch_transformers_preprocessor.
TorchTransformersNerPreprocessor
(vocab_file: str, do_lower_case: bool = False, max_seq_length: int = 512, max_subword_length: Optional[int] = None, token_masking_prob: float = 0.0, provide_subword_tags: bool = False, subword_mask_mode: str = 'first', **kwargs)[source]¶ Takes tokens and splits them into bert subtokens, encodes subtokens with their indices. Creates a mask of subtokens (one for the first subtoken, zero for the others).
If tags are provided, calculates tags for subtokens.
- Parameters
vocab_file – path to vocabulary
do_lower_case – set True if lowercasing is needed
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens
max_subword_length – replace token to <unk> if it’s length is larger than this (defaults to None, which is equal to +infinity)
token_masking_prob – probability of masking token while training
provide_subword_tags – output tags for subwords or for words
subword_mask_mode – subword to select inside word tokens, can be “first” or “last” (default=”first”)
-
max_seq_length
¶ max sequence length in subtokens, including [SEP] and [CLS] tokens
-
max_subword_length
¶ rmax lenght of a bert subtoken
-
tokenizer
¶ instance of Bert FullTokenizer
-
class
deeppavlov.models.preprocessors.torch_transformers_preprocessor.
TorchBertRankerPreprocessor
(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, return_tokens: bool = False, **kwargs)[source]¶ Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking.
Builds features for a pair of context with each of the response candidates.
-
__call__
(batch: List[List[str]]) → List[List[transformers.data.processors.utils.InputFeatures]][source]¶ Tokenize and create masks.
- Parameters
batch – list of elemenents where the first element represents the batch with contexts and the rest of elements represent response candidates batches
- Returns
list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask.
-
-
class
deeppavlov.models.torch_bert.torch_transformers_classifier.
TorchTransformersClassifierModel
(n_classes, pretrained_bert, one_hot_labels: bool = False, multilabel: bool = False, return_probas: bool = False, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = 'AdamW', optimizer_parameters: Optional[dict] = None, clip_norm: Optional[float] = None, bert_config_file: Optional[str] = None, is_binary: Optional[bool] = False, **kwargs)[source]¶ Bert-based model for text classification on PyTorch.
It uses output from [CLS] token and predicts labels using linear transformation.
- Parameters
n_classes – number of classes
pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)
one_hot_labels – set True if one-hot encoding for labels is used
multilabel – set True if it is multi-label classification
return_probas – set True if return class probabilites instead of most probable label needed
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
optimizer – optimizer name from torch.optim
optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}
clip_norm – clip gradients by norm coefficient
bert_config_file – path to Bert configuration file (not used if pretrained_bert is key title)
-
__call__
(features: Dict[str, torch.tensor]) → Union[List[int], List[List[float]]][source]¶ Make prediction for given features (texts).
- Parameters
features – batch of InputFeatures
- Returns
predicted classes or probabilities of each class
-
train_on_batch
(features: Dict[str, torch.tensor], y: Union[List[int], List[List[int]]]) → Dict[source]¶ Train model on given batch. This method calls train_op using features and y (labels).
- Parameters
features – batch of InputFeatures
y – batch of labels (class id or one-hot encoding)
- Returns
dict with loss and learning_rate values
-
class
deeppavlov.models.torch_bert.torch_transformers_sequence_tagger.
TorchTransformersSequenceTagger
(n_tags: int, pretrained_bert: str, bert_config_file: Optional[str] = None, return_probas: bool = False, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = 'AdamW', optimizer_parameters: dict = {'lr': 0.001, 'weight_decay': 1e-06}, learning_rate_drop_patience: int = 20, learning_rate_drop_div: float = 2.0, load_before_drop: bool = True, clip_norm: Optional[float] = None, min_learning_rate: float = 1e-07, **kwargs)[source]¶ Transformer-based model on PyTorch for text tagging. It predicts a label for every token (not subtoken) in the text. You can use it for sequence labeling tasks, such as morphological tagging or named entity recognition.
- Parameters
n_tags – number of distinct tags
pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)
return_probas – set this to True if you need the probabilities instead of raw answers
bert_config_file – path to Bert configuration file, or None, if pretrained_bert is a string name
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
optimizer – optimizer name from torch.optim
optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}
learning_rate_drop_patience – how many validations with no improvements to wait
learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations
load_before_drop – whether to load best model before dropping learning rate or not
clip_norm – clip gradients by norm
min_learning_rate – min value of learning rate if learning rate decay is used
-
__call__
(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray]) → Union[List[List[int]], List[numpy.ndarray]][source]¶ Predicts tag indices for a given subword tokens batch
- Parameters
input_ids – indices of the subwords
input_masks – mask that determines where to attend and where not to
y_masks – mask which determines the first subword units in the the word
- Returns
Label indices or class probabilities for each token (not subtoken)
-
train_on_batch
(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray], y: List[List[int]], *args, **kwargs) → Dict[str, float][source]¶ - Parameters
input_ids – batch of indices of subwords
input_masks – batch of masks which determine what should be attended
args – arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.
kwargs – keyword arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.
- Returns
dict with fields ‘loss’, ‘head_learning_rate’, and ‘bert_learning_rate’
-
class
deeppavlov.models.torch_bert.torch_transformers_squad.
TorchTransformersSquad
(pretrained_bert: str, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = 'AdamW', optimizer_parameters: Optional[dict] = None, bert_config_file: Optional[str] = None, learning_rate_drop_patience: int = 20, learning_rate_drop_div: float = 2.0, load_before_drop: bool = True, clip_norm: Optional[float] = None, min_learning_rate: float = 1e-06, **kwargs)[source]¶ Bert-based on PyTorch model for SQuAD-like problem setting: It predicts start and end position of answer for given question and context.
[CLS] token is used as no_answer. If model selects [CLS] token as most probable answer, it means that there is no answer in given context.
Start and end position of answer are predicted by linear transformation of Bert outputs.
- Parameters
pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
optimizer – optimizer name from torch.optim
optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}
bert_config_file – path to Bert configuration file, or None, if pretrained_bert is a string name
learning_rate_drop_patience – how many validations with no improvements to wait
learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations
load_before_drop – whether to load best model before dropping learning rate or not
clip_norm – clip gradients by norm
min_learning_rate – min value of learning rate if learning rate decay is used
-
__call__
(features: List[transformers.data.processors.utils.InputFeatures]) → Tuple[List[int], List[int], List[float], List[float]][source]¶ get predictions using features as input
- Parameters
features – batch of InputFeatures instances
- Returns
start, end positions, start, end logits positions
- Return type
predictions
-
train_on_batch
(features: List[transformers.data.processors.utils.InputFeatures], y_st: List[List[int]], y_end: List[List[int]]) → Dict[source]¶ Train model on given batch. This method calls train_op using features and labels from y_st and y_end
- Parameters
features – batch of InputFeatures instances
y_st – batch of lists of ground truth answer start positions
y_end – batch of lists of ground truth answer end positions
- Returns
dict with loss and learning_rate values
-
class
deeppavlov.models.torch_bert.torch_transformers_squad.
TorchTransformersSquadInfer
(squad_model_config: str, vocab_file: str, do_lower_case: bool, max_seq_length: int = 512, batch_size: int = 10, lang: str = 'en', **kwargs)[source]¶ This model wraps BertSQuADModel to make predictions on longer than 512 tokens sequences.
It splits context on chunks with max_seq_length - 3 - len(question) length, preserving sentences boundaries.
- It reassembles batches with chunks instead of full contexts to optimize performance, e.g.,:
batch_size = 5 number_of_contexts == 2 number of first context chunks == 8 number of second context chunks == 2
we will create two batches with 5 chunks
For each context the best answer is selected via logits or scores from BertSQuADModel.
- Parameters
squad_model_config – path to DeepPavlov BertSQuADModel config file
vocab_file – path to Bert vocab file
do_lower_case – set True if lowercasing is needed
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens
batch_size – size of batch to use during inference
lang – either en or ru, it is used to select sentence tokenizer
-
__call__
(contexts: List[str], questions: List[str], **kwargs) → Tuple[List[str], List[int], List[float]][source]¶ get predictions for given contexts and questions
- Parameters
contexts – batch of contexts
questions – batch of questions
- Returns
answer, answer start position, logits or scores
- Return type
predictions
-
class
deeppavlov.models.torch_bert.torch_bert_ranker.
TorchBertRankerModel
(pretrained_bert: str, bert_config_file: Optional[str] = None, n_classes: int = 2, return_probas: bool = True, optimizer: str = 'AdamW', clip_norm: Optional[float] = None, optimizer_parameters: Optional[dict] = None, **kwargs)[source]¶ BERT-based model for interaction-based text ranking on PyTorch.
Linear transformation is trained over the BERT pooled output from [CLS] token. Predicted probabilities of classes are used as a similarity measure for ranking.
- Parameters
pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)
bert_config_file – path to Bert configuration file (not used if pretrained_bert is key title)
n_classes – number of classes
return_probas – set True if class probabilities are returned instead of the most probable label
optimizer – optimizer name from torch.optim
optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}
-
__call__
(features_li: List[List[transformers.data.processors.utils.InputFeatures]]) → Union[List[int], List[List[float]]][source]¶ Calculate scores for the given context over candidate responses.
- Parameters
features_li – list of elements where each element contains the batch of features for contexts with particular response candidates
- Returns
predicted scores for contexts over response candidates
-
train_on_batch
(features_li: List[List[transformers.data.processors.utils.InputFeatures]], y: Union[List[int], List[List[int]]]) → Dict[source]¶ Train the model on the given batch.
- Parameters
features_li – list with the single element containing the batch of InputFeatures
y – batch of labels (class id or one-hot encoding)
- Returns
dict with loss and learning rate values