deeppavlov.models.torch_bert¶

class deeppavlov.models.preprocessors.torch_bert_preprocessor.TorchBertPreprocessor(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, return_tokens: bool = False, **kwargs)[source]¶

Tokenize text on subtokens, encode subtokens with their indices, create tokens and segment masks.

Check details in bert_dp.preprocessing.convert_examples_to_features() function.

Parameters

vocab_file – path to vocabulary
do_lower_case – set True if lowercasing is needed
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens
return_tokens – whether to return tuple of inputfeatures and tokens, or only inputfeatures

max_seq_length¶: max sequence length in subtokens, including [SEP] and [CLS] tokens

return_tokens¶: whether to return tuple of inputfeatures and tokens, or only inputfeatures

tokenizer¶: instance of Bert FullTokenizer

__call__(texts_a: List[str], texts_b: Optional[List[str]] = None) → Union[List[transformers.data.processors.utils.InputFeatures], Tuple[List[transformers.data.processors.utils.InputFeatures], List[List[str]]]][source]¶

Tokenize and create masks.

texts_a and texts_b are separated by [SEP] token

Parameters

texts_a – list of texts,
texts_b – list of texts, it could be None, e.g. single sentence classification task

Returns

batch of transformers.data.processors.utils.InputFeatures with subtokens, subtoken ids, subtoken mask, segment mask, or tuple of batch of InputFeatures and Batch of subtokens

class deeppavlov.models.preprocessors.torch_bert_preprocessor.TorchBertNerPreprocessor(vocab_file: str, do_lower_case: bool = False, max_seq_length: int = 512, max_subword_length: int = None, token_masking_prob: float = 0.0, provide_subword_tags: bool = False, subword_mask_mode: str = 'first', **kwargs)[source]¶

Takes tokens and splits them into bert subtokens, encodes subtokens with their indices. Creates a mask of subtokens (one for the first subtoken, zero for the others).

If tags are provided, calculates tags for subtokens.

Parameters

vocab_file – path to vocabulary
do_lower_case – set True if lowercasing is needed
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens
max_subword_length – replace token to <unk> if it’s length is larger than this (defaults to None, which is equal to +infinity)
token_masking_prob – probability of masking token while training
provide_subword_tags – output tags for subwords or for words
subword_mask_mode – subword to select inside word tokens, can be “first” or “last” (default=”first”)

max_seq_length¶: max sequence length in subtokens, including [SEP] and [CLS] tokens

max_subword_length¶: rmax lenght of a bert subtoken

tokenizer¶: instance of Bert FullTokenizer

__call__(tokens: Union[List[List[str]], List[str]], tags: List[List[str]] = None, **kwargs)[source]¶: Call self as a function.

class deeppavlov.models.preprocessors.torch_bert_preprocessor.TorchBertRankerPreprocessor(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, return_tokens: bool = False, **kwargs)[source]¶

Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking.

Builds features for a pair of context with each of the response candidates.

__call__(batch: List[List[str]]) → List[List[transformers.data.processors.utils.InputFeatures]][source]¶

Tokenize and create masks.

Parameters: batch – list of elemenents where the first element represents the batch with contexts and the rest of elements represent response candidates batches
Returns: list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask.

class deeppavlov.models.torch_bert.torch_bert_classifier.TorchBertClassifierModel(n_classes, pretrained_bert, one_hot_labels: bool = False, multilabel: bool = False, return_probas: bool = False, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = 'AdamW', optimizer_parameters: dict = {'betas': 0.9, 0.999, 'eps': 1e-06, 'lr': 0.001, 'weight_decay': 0.01}, clip_norm: Optional[float] = None, bert_config_file: Optional[str] = None, **kwargs)[source]¶

Bert-based model for text classification on PyTorch.

It uses output from [CLS] token and predicts labels using linear transformation.

Parameters

n_classes – number of classes
pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)
one_hot_labels – set True if one-hot encoding for labels is used
multilabel – set True if it is multi-label classification
return_probas – set True if return class probabilites instead of most probable label needed
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
optimizer – optimizer name from torch.optim
optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}
clip_norm – clip gradients by norm coefficient
bert_config_file – path to Bert configuration file (not used if pretrained_bert is key title)

__call__(features: List[transformers.data.processors.utils.InputFeatures]) → Union[List[int], List[List[float]]][source]¶

Make prediction for given features (texts).

Parameters: features – batch of InputFeatures
Returns: predicted classes or probabilities of each class

train_on_batch(features: List[transformers.data.processors.utils.InputFeatures], y: Union[List[int], List[List[int]]]) → Dict[source]¶

Train model on given batch. This method calls train_op using features and y (labels).

Parameters

features – batch of InputFeatures
y – batch of labels (class id or one-hot encoding)

Returns

dict with loss and learning_rate values

deeppavlov.models.torch_bert.torch_bert_sequence_tagger.token_from_subtoken(units: torch.Tensor, mask: torch.Tensor) → torch.Tensor[source]¶

Assemble token level units from subtoken level units

Parameters

units – torch.Tensor of shape [batch_size, SUBTOKEN_seq_length, n_features]
mask –
mask of token beginnings. For example: for tokens

[[[CLS] My, capybara, [SEP]], [[CLS] Your, aar, ##dvark, is, awesome, [SEP]]]

the mask will be

[[0, 1, 1, 0, 0, 0, 0], [0, 1, 1, 0, 1, 1, 0]]

Returns

Units assembled from ones in the mask. For the

example above this units will correspond to the following

[[My, capybara], [Your`, ``aar, is, awesome,]]

the shape of this tensor will be [batch_size, TOKEN_seq_length, n_features]

Return type

word_level_units

class deeppavlov.models.torch_bert.torch_bert_sequence_tagger.TorchBertSequenceTagger(n_tags: int, pretrained_bert: str, bert_config_file: Optional[str] = None, return_probas: bool = False, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = 'AdamW', optimizer_parameters: dict = {'lr': 0.001, 'weight_decay': 1e-06}, learning_rate_drop_patience: int = 20, learning_rate_drop_div: float = 2.0, load_before_drop: bool = True, clip_norm: Optional[float] = None, min_learning_rate: float = 1e-07, **kwargs)[source]¶

BERT-based model on PyTorch for text tagging. It predicts a label for every token (not subtoken) in the text. You can use it for sequence labeling tasks, such as morphological tagging or named entity recognition.

Parameters

n_tags – number of distinct tags
pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)
return_probas – set this to True if you need the probabilities instead of raw answers
bert_config_file – path to Bert configuration file, or None, if pretrained_bert is a string name
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
optimizer – optimizer name from torch.optim
optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}
learning_rate_drop_patience – how many validations with no improvements to wait
learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations
load_before_drop – whether to load best model before dropping learning rate or not
clip_norm – clip gradients by norm
min_learning_rate – min value of learning rate if learning rate decay is used

__call__(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray]) → Union[List[List[int]], List[numpy.ndarray]][source]¶

Predicts tag indices for a given subword tokens batch

Parameters

input_ids – indices of the subwords
input_masks – mask that determines where to attend and where not to
y_masks – mask which determines the first subword units in the the word

Returns

Label indices or class probabilities for each token (not subtoken)

train_on_batch(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray], y: List[List[int]], *args, **kwargs) → Dict[str, float][source]¶

Parameters

input_ids – batch of indices of subwords
input_masks – batch of masks which determine what should be attended
args – arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.
kwargs – keyword arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.

Returns

dict with fields ‘loss’, ‘head_learning_rate’, and ‘bert_learning_rate’

class deeppavlov.models.torch_bert.torch_bert_squad.TorchBertSQuADModel(pretrained_bert: str, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = 'AdamW', optimizer_parameters: dict = {'betas': 0.9, 0.999, 'eps': 1e-06, 'lr': 0.01, 'weight_decay': 0.01}, bert_config_file: Optional[str] = None, learning_rate_drop_patience: int = 20, learning_rate_drop_div: float = 2.0, load_before_drop: bool = True, clip_norm: Optional[float] = None, min_learning_rate: float = 1e-06, **kwargs)[source]¶

Bert-based on PyTorch model for SQuAD-like problem setting: It predicts start and end position of answer for given question and context.

[CLS] token is used as no_answer. If model selects [CLS] token as most probable answer, it means that there is no answer in given context.

Start and end position of answer are predicted by linear transformation of Bert outputs.

Parameters

pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
optimizer – optimizer name from torch.optim
optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}
bert_config_file – path to Bert configuration file, or None, if pretrained_bert is a string name
learning_rate_drop_patience – how many validations with no improvements to wait
learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations
load_before_drop – whether to load best model before dropping learning rate or not
clip_norm – clip gradients by norm
min_learning_rate – min value of learning rate if learning rate decay is used

__call__(features: List[transformers.data.processors.utils.InputFeatures]) → Tuple[List[int], List[int], List[float], List[float]][source]¶

get predictions using features as input

Parameters: features – batch of InputFeatures instances
Returns: start, end positions, start, end logits positions
Return type: predictions

train_on_batch(features: List[transformers.data.processors.utils.InputFeatures], y_st: List[List[int]], y_end: List[List[int]]) → Dict[source]¶

Train model on given batch. This method calls train_op using features and labels from y_st and y_end

Parameters

features – batch of InputFeatures instances
y_st – batch of lists of ground truth answer start positions
y_end – batch of lists of ground truth answer end positions

Returns

dict with loss and learning_rate values

class deeppavlov.models.torch_bert.torch_bert_squad.TorchBertSQuADInferModel(squad_model_config: str, vocab_file: str, do_lower_case: bool, max_seq_length: int = 512, batch_size: int = 10, lang: str = 'en', **kwargs)[source]¶

This model wraps BertSQuADModel to make predictions on longer than 512 tokens sequences.

It splits context on chunks with max_seq_length - 3 - len(question) length, preserving sentences boundaries.

It reassembles batches with chunks instead of full contexts to optimize performance, e.g.,:

batch_size = 5 number_of_contexts == 2 number of first context chunks == 8 number of second context chunks == 2

we will create two batches with 5 chunks

For each context the best answer is selected via logits or scores from BertSQuADModel.

Parameters

squad_model_config – path to DeepPavlov BertSQuADModel config file
vocab_file – path to Bert vocab file
do_lower_case – set True if lowercasing is needed
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens
batch_size – size of batch to use during inference
lang – either en or ru, it is used to select sentence tokenizer

__call__(contexts: List[str], questions: List[str], **kwargs) → Tuple[List[str], List[int], List[float]][source]¶

get predictions for given contexts and questions

Parameters

contexts – batch of contexts
questions – batch of questions

Returns

answer, answer start position, logits or scores

Return type

predictions

class deeppavlov.models.torch_bert.torch_bert_ranker.TorchBertRankerModel(pretrained_bert: str, bert_config_file: Optional[str] = None, n_classes: int = 2, return_probas: bool = True, optimizer: str = 'AdamW', optimizer_parameters: dict = {'betas': 0.9, 0.999, 'eps': 1e-06, 'lr': 2e-05, 'weight_decay': 0.01}, **kwargs)[source]¶

BERT-based model for interaction-based text ranking on PyTorch.

Linear transformation is trained over the BERT pooled output from [CLS] token. Predicted probabilities of classes are used as a similarity measure for ranking.

Parameters

pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)
bert_config_file – path to Bert configuration file (not used if pretrained_bert is key title)
n_classes – number of classes
return_probas – set True if class probabilities are returned instead of the most probable label
optimizer – optimizer name from torch.optim
optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}

__call__(features_li: List[List[transformers.data.processors.utils.InputFeatures]]) → Union[List[int], List[List[float]]][source]¶

Calculate scores for the given context over candidate responses.

Parameters: features_li – list of elements where each element contains the batch of features for contexts with particular response candidates
Returns: predicted scores for contexts over response candidates

train_on_batch(features_li: List[List[transformers.data.processors.utils.InputFeatures]], y: Union[List[int], List[List[int]]]) → Dict[source]¶

Train the model on the given batch.

Parameters

features_li – list with the single element containing the batch of InputFeatures
y – batch of labels (class id or one-hot encoding)

Returns

dict with loss and learning rate values

class deeppavlov.models.torch_bert.torch_bert_as_summarizer.TorchBertAsSummarizer(pretrained_bert: str, vocab_file: str, max_summary_length: int, bert_config_file: Optional[str] = None, max_summary_length_in_tokens: bool = False, max_seq_length: int = 128, do_lower_case: bool = False, lang: str = 'ru', save_path: Optional[str] = None, **kwargs)[source]¶

Naive Extractive Summarization model based on BERT on PyTorch. BERT model was trained on Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks. NSP head was trained to detect in [CLS] text_a [SEP] text_b [SEP] if text_b follows text_a in original document.

This NSP head can be used to stack sentences from a long document, based on a initial sentence:

summary_0 = init_sentence

summary_1 = summary_0 + argmax(nsp_score(candidates))

summary_2 = summary_1 + argmax(nsp_score(candidates))

…

, where candidates are all sentences from a document.

Parameters

pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)
bert_config_file – path to Bert configuration file (not used if pretrained_bert is key title)
vocab_file – path to Bert vocabulary
max_summary_length – limit on summary length, number of sentences is used if max_summary_length_in_tokens is set to False, else number of tokens is used.
max_summary_length_in_tokens – Use number of tokens as length of summary. Defaults to False.
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens. max_seq_length is used in Bert to compute NSP scores. Defaults to 128.
do_lower_case – set True if lowercasing is needed. Defaults to False.
lang – use ru_sent_tokenizer for ‘ru’ and ntlk.sent_tokener for other languages. Defaults to 'ru'.

__call__(texts: List[str], init_sentences: Optional[List[str]] = None) → List[List[str]][source]¶

Builds summary for text from texts

Parameters

texts – texts to build summaries for
init_sentences – init_sentence is used as the first sentence in summary. Defaults to None.

Returns

summaries tokenized on sentences

Return type

List[List[str]]

_get_nsp_predictions(sentences: List[str], candidates: List[str])[source]¶

Compute NextSentence probability for every (sentence_i, candidate_i) pair.

[CLS] sentence_i [SEP] candidate_i [SEP]

Parameters

sentences – list of sentences
candidates – list of candidates to be the next sentence

Returns

probabilities that candidate is a next sentence