deeppavlov.models.bert¶
-
class
deeppavlov.models.preprocessors.bert_preprocessor.
BertPreprocessor
(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, **kwargs)[source]¶ Tokenize text on subtokens, encode subtokens with their indices, create tokens and segment masks.
Check details in
bert_dp.preprocessing.convert_examples_to_features()
function.- Parameters
vocab_file – path to vocabulary
do_lower_case – set True if lowercasing is needed
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens
-
max_seq_length
¶ max sequence length in subtokens, including [SEP] and [CLS] tokens
-
tokenizer
¶ instance of Bert FullTokenizer
-
__call__
(texts_a: List[str], texts_b: Optional[List[str]] = None) → List[bert_dp.preprocessing.InputFeatures][source]¶ Call Bert
bert_dp.preprocessing.convert_examples_to_features()
function to tokenize and create masks.texts_a and texts_b are separated by [SEP] token
- Parameters
texts_a – list of texts,
texts_b – list of texts, it could be None, e.g. single sentence classification task
- Returns
batch of
bert_dp.preprocessing.InputFeatures
with subtokens, subtoken ids, subtoken mask, segment mask.
-
class
deeppavlov.models.preprocessors.bert_preprocessor.
BertNerPreprocessor
(vocab_file: str, do_lower_case: bool = False, max_seq_length: int = 512, max_subword_length: int = None, token_masking_prob: float = 0.0, provide_subword_tags: bool = False, subword_mask_mode: str = 'first', **kwargs)[source]¶ Takes tokens and splits them into bert subtokens, encodes subtokens with their indices. Creates a mask of subtokens (one for the first subtoken, zero for the others).
If tags are provided, calculates tags for subtokens.
- Parameters
vocab_file – path to vocabulary
do_lower_case – set True if lowercasing is needed
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens
max_subword_length – replace token to <unk> if it’s length is larger than this (defaults to None, which is equal to +infinity)
token_masking_prob – probability of masking token while training
provide_subword_tags – output tags for subwords or for words
subword_mask_mode – subword to select inside word tokens, can be “first” or “last” (default=”first”)
-
max_seq_length
¶ max sequence length in subtokens, including [SEP] and [CLS] tokens
-
max_subword_length
¶ rmax lenght of a bert subtoken
-
tokenizer
¶ instance of Bert FullTokenizer
-
class
deeppavlov.models.preprocessors.bert_preprocessor.
BertRankerPreprocessor
(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, **kwargs)[source]¶ Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking.
Builds features for a pair of context with each of the response candidates.
-
__call__
(batch: List[List[str]]) → List[List[bert_dp.preprocessing.InputFeatures]][source]¶ Call BERT
bert_dp.preprocessing.convert_examples_to_features()
function to tokenize and create masks.- Parameters
batch – list of elemenents where the first element represents the batch with contexts and the rest of elements represent response candidates batches
- Returns
list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask.
-
-
class
deeppavlov.models.preprocessors.bert_preprocessor.
BertSepRankerPreprocessor
(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, **kwargs)[source]¶ Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking.
Builds features for a context and for each of the response candidates separately.
-
__call__
(batch: List[List[str]]) → List[List[bert_dp.preprocessing.InputFeatures]][source]¶ Call BERT
bert_dp.preprocessing.convert_examples_to_features()
function to tokenize and create masks.- Parameters
batch – list of elemenents where the first element represents the batch with contexts and the rest of elements represent response candidates batches
- Returns
list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask for the context and each of response candidates separately.
-
-
class
deeppavlov.models.preprocessors.bert_preprocessor.
BertSepRankerPredictorPreprocessor
(resps=None, resp_vecs=None, conts=None, cont_vecs=None, **kwargs)[source]¶ Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking.
Builds features for a context and for each of the response candidates separately. In addition, builds features for a response (and corresponding context) text base.
- Parameters
resps – list of strings containing the base of text responses
resp_vecs – BERT vector respresentations of
resps
, if isNone
features for the response base will be buildconts – list of strings containing the base of text contexts
cont_vecs – BERT vector respresentations of
conts
, if isNone
features for the response base will be build
-
__call__
(batch: List[List[str]]) → List[List[bert_dp.preprocessing.InputFeatures]]¶ Call BERT
bert_dp.preprocessing.convert_examples_to_features()
function to tokenize and create masks.- Parameters
batch – list of elemenents where the first element represents the batch with contexts and the rest of elements represent response candidates batches
- Returns
list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask for the context and each of response candidates separately.
-
class
deeppavlov.models.bert.bert_classifier.
BertClassifierModel
(bert_config_file, n_classes, keep_prob, one_hot_labels=False, multilabel=False, return_probas=False, attention_probs_keep_prob=None, hidden_keep_prob=None, optimizer=None, num_warmup_steps=None, weight_decay_rate=0.01, pretrained_bert=None, min_learning_rate=1e-06, **kwargs)[source]¶ Bert-based model for text classification.
It uses output from [CLS] token and predicts labels using linear transformation.
- Parameters
bert_config_file – path to Bert configuration file
n_classes – number of classes
keep_prob – dropout keep_prob for non-Bert layers
one_hot_labels – set True if one-hot encoding for labels is used
multilabel – set True if it is multi-label classification
return_probas – set True if return class probabilites instead of most probable label needed
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
optimizer – name of tf.train.* optimizer or None for AdamWeightDecayOptimizer
num_warmup_steps –
weight_decay_rate – L2 weight decay for AdamWeightDecayOptimizer
pretrained_bert – pretrained Bert checkpoint
min_learning_rate – min value of learning rate if learning rate decay is used
-
__call__
(features: List[bert_dp.preprocessing.InputFeatures]) → Union[List[int], List[List[float]]][source]¶ Make prediction for given features (texts).
- Parameters
features – batch of InputFeatures
- Returns
predicted classes or probabilities of each class
-
train_on_batch
(features: List[bert_dp.preprocessing.InputFeatures], y: Union[List[int], List[List[int]]]) → Dict[source]¶ Train model on given batch. This method calls train_op using features and y (labels).
- Parameters
features – batch of InputFeatures
y – batch of labels (class id or one-hot encoding)
- Returns
dict with loss and learning_rate values
-
deeppavlov.models.bert.bert_sequence_tagger.
token_from_subtoken
(units: tensorflow.Tensor, mask: tensorflow.Tensor) → tensorflow.Tensor[source]¶ Assemble token level units from subtoken level units
- Parameters
units – tf.Tensor of shape [batch_size, SUBTOKEN_seq_length, n_features]
mask –
mask of token beginnings. For example: for tokens
[[
[CLS]
My
,capybara
,[SEP]
], [[CLS]
Your
,aar
,##dvark
,is
,awesome
,[SEP]
]]the mask will be
[[0, 1, 1, 0, 0, 0, 0], [0, 1, 1, 0, 1, 1, 0]]
- Returns
- Units assembled from ones in the mask. For the
example above this units will correspond to the following
[[
My
,capybara
], [Your`, ``aar
,is
,awesome
,]]the shape of this tensor will be [batch_size, TOKEN_seq_length, n_features]
- Return type
word_level_units
-
class
deeppavlov.models.bert.bert_sequence_tagger.
BertSequenceNetwork
(keep_prob: float, bert_config_file: str, pretrained_bert: str = None, attention_probs_keep_prob: float = None, hidden_keep_prob: float = None, encoder_layer_ids: List[int] = (-1, ), encoder_dropout: float = 0.0, optimizer: str = None, weight_decay_rate: float = 1e-06, ema_decay: float = None, ema_variables_on_cpu: bool = True, freeze_embeddings: bool = False, learning_rate: float = 0.001, bert_learning_rate: float = 2e-05, min_learning_rate: float = 1e-07, learning_rate_drop_patience: int = 20, learning_rate_drop_div: float = 2.0, load_before_drop: bool = True, clip_norm: float = 1.0, **kwargs)[source]¶ Basic class for BERT-based sequential architectures.
- Parameters
keep_prob – dropout keep_prob for non-Bert layers
bert_config_file – path to Bert configuration file
pretrained_bert – pretrained Bert checkpoint
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
encoder_layer_ids – list of averaged layers from Bert encoder (layer ids) optimizer: name of tf.train.* optimizer or None for AdamWeightDecayOptimizer weight_decay_rate: L2 weight decay for AdamWeightDecayOptimizer
encoder_dropout – dropout probability of encoder output layer
ema_decay – what exponential moving averaging to use for network parameters, value from 0.0 to 1.0. Values closer to 1.0 put weight on the parameters history and values closer to 0.0 corresponds put weight on the current parameters.
ema_variables_on_cpu – whether to put EMA variables to CPU. It may save a lot of GPU memory
freeze_embeddings – set True to not train input embeddings set True to not train input embeddings set True to not train input embeddings
learning_rate – learning rate of BERT head
bert_learning_rate – learning rate of BERT body
min_learning_rate – min value of learning rate if learning rate decay is used
learning_rate_drop_patience – how many validations with no improvements to wait
learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations
load_before_drop – whether to load best model before dropping learning rate or not
clip_norm – clip gradients by norm
-
train_on_batch
(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], *args, **kwargs) → Dict[str, float][source]¶ - Parameters
input_ids – batch of indices of subwords
input_masks – batch of masks which determine what should be attended
args – arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.
kwargs – keyword arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.
- Returns
dict with fields ‘loss’, ‘head_learning_rate’, and ‘bert_learning_rate’
-
class
deeppavlov.models.bert.bert_sequence_tagger.
BertSequenceTagger
(n_tags: List[str], keep_prob: float, bert_config_file: str, pretrained_bert: str = None, attention_probs_keep_prob: float = None, hidden_keep_prob: float = None, use_crf=False, encoder_layer_ids: List[int] = (-1,), encoder_dropout: float = 0.0, optimizer: str = None, weight_decay_rate: float = 1e-06, use_birnn: bool = False, birnn_cell_type: str = 'lstm', birnn_hidden_size: int = 128, ema_decay: float = None, ema_variables_on_cpu: bool = True, return_probas: bool = False, freeze_embeddings: bool = False, learning_rate: float = 0.001, bert_learning_rate: float = 2e-05, min_learning_rate: float = 1e-07, learning_rate_drop_patience: int = 20, learning_rate_drop_div: float = 2.0, load_before_drop: bool = True, clip_norm: float = 1.0, **kwargs)[source]¶ BERT-based model for text tagging. It predicts a label for every token (not subtoken) in the text. You can use it for sequence labeling tasks, such as morphological tagging or named entity recognition. See
deeppavlov.models.bert.bert_sequence_tagger.BertSequenceNetwork
for the description of inherited parameters.- Parameters
n_tags – number of distinct tags
use_crf – whether to use CRF on top or not
use_birnn – whether to use bidirection rnn after BERT layers. For NER and morphological tagging we usually set it to False as otherwise the model overfits
birnn_cell_type – the type of Bidirectional RNN. Either lstm or gru
birnn_hidden_size – number of hidden units in the BiRNN layer in each direction
return_probas – set this to True if you need the probabilities instead of raw answers
-
__call__
(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray]) → Union[List[List[int]], List[numpy.ndarray]][source]¶ Predicts tag indices for a given subword tokens batch
- Parameters
input_ids – indices of the subwords
input_masks – mask that determines where to attend and where not to
y_masks – mask which determines the first subword units in the the word
- Returns
Label indices or class probabilities for each token (not subtoken)
-
class
deeppavlov.models.bert.bert_squad.
BertSQuADModel
(bert_config_file: str, keep_prob: float, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: Optional[str] = None, weight_decay_rate: Optional[float] = 0.01, pretrained_bert: Optional[str] = None, min_learning_rate: float = 1e-06, **kwargs)[source]¶ Bert-based model for SQuAD-like problem setting: It predicts start and end position of answer for given question and context.
[CLS] token is used as no_answer. If model selects [CLS] token as most probable answer, it means that there is no answer in given context.
Start and end position of answer are predicted by linear transformation of Bert outputs.
- Parameters
bert_config_file – path to Bert configuration file
keep_prob – dropout keep_prob for non-Bert layers
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
optimizer – name of tf.train.* optimizer or None for AdamWeightDecayOptimizer
weight_decay_rate – L2 weight decay for AdamWeightDecayOptimizer
pretrained_bert – pretrained Bert checkpoint
min_learning_rate – min value of learning rate if learning rate decay is used
-
__call__
(features: List[bert_dp.preprocessing.InputFeatures]) → Tuple[List[int], List[int], List[float], List[float]][source]¶ get predictions using features as input
- Parameters
features – batch of InputFeatures instances
- Returns
start, end positions, logits for answer and no_answer score
- Return type
predictions
-
train_on_batch
(features: List[bert_dp.preprocessing.InputFeatures], y_st: List[List[int]], y_end: List[List[int]]) → Dict[source]¶ Train model on given batch. This method calls train_op using features and labels from y_st and y_end
- Parameters
features – batch of InputFeatures instances
y_st – batch of lists of ground truth answer start positions
y_end – batch of lists of ground truth answer end positions
- Returns
dict with loss and learning_rate values
-
class
deeppavlov.models.bert.bert_squad.
BertSQuADInferModel
(squad_model_config: str, vocab_file: str, do_lower_case: bool, max_seq_length: int = 512, batch_size: int = 10, lang='en', **kwargs)[source]¶ This model wraps BertSQuADModel to make predictions on longer than 512 tokens sequences.
It splits context on chunks with max_seq_length - 3 - len(question) length, preserving sentences boundaries.
- It reassembles batches with chunks instead of full contexts to optimize performance, e.g.,:
batch_size = 5 number_of_contexts == 2 number of first context chunks == 8 number of second context chunks == 2
we will create two batches with 5 chunks
For each context the best answer is selected via logits or scores from BertSQuADModel.
- Parameters
squad_model_config – path to DeepPavlov BertSQuADModel config file
vocab_file – path to Bert vocab file
do_lower_case – set True if lowercasing is needed
max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens
batch_size – size of batch to use during inference
lang – either en or ru, it is used to select sentence tokenizer
-
__call__
(contexts: List[str], questions: List[str], **kwargs) → Tuple[List[str], List[int], List[float]][source]¶ get predictions for given contexts and questions
- Parameters
contexts – batch of contexts
questions – batch of questions
- Returns
answer, answer start position, logits or scores
- Return type
predictions
-
class
deeppavlov.models.bert.bert_ranker.
BertRankerModel
(bert_config_file, n_classes=2, keep_prob=0.9, return_probas=True, **kwargs)[source]¶ BERT-based model for interaction-based text ranking.
Linear transformation is trained over the BERT pooled output from [CLS] token. Predicted probabilities of classes are used as a similarity measure for ranking.
- Parameters
bert_config_file – path to Bert configuration file
n_classes – number of classes
keep_prob – dropout keep_prob for non-Bert layers
return_probas – set True if class probabilities are returned instead of the most probable label
-
__call__
(features_li: List[List[bert_dp.preprocessing.InputFeatures]]) → Union[List[int], List[List[float]]][source]¶ Calculate scores for the given context over candidate responses.
- Parameters
features_li – list of elements where each element contains the batch of features for contexts with particular response candidates
- Returns
predicted scores for contexts over response candidates
-
train_on_batch
(features_li: List[List[bert_dp.preprocessing.InputFeatures]], y: Union[List[int], List[List[int]]]) → Dict[source]¶ Train the model on the given batch.
- Parameters
features_li – list with the single element containing the batch of InputFeatures
y – batch of labels (class id or one-hot encoding)
- Returns
dict with loss and learning rate values
-
class
deeppavlov.models.bert.bert_ranker.
BertSepRankerModel
(bert_config_file, keep_prob=0.9, attention_probs_keep_prob=None, hidden_keep_prob=None, optimizer=None, weight_decay_rate=0.01, pretrained_bert=None, min_learning_rate=1e-06, **kwargs)[source]¶ BERT-based model for representation-based text ranking.
BERT pooled output from [CLS] token is used to get a separate representation of a context and a response. Similarity measure is calculated as cosine similarity between these representations.
- Parameters
bert_config_file – path to Bert configuration file
keep_prob – dropout keep_prob for non-Bert layers
attention_probs_keep_prob – keep_prob for Bert self-attention layers
hidden_keep_prob – keep_prob for Bert hidden layers
optimizer – name of tf.train.* optimizer or None for
AdamWeightDecayOptimizer
weight_decay_rate – L2 weight decay for
AdamWeightDecayOptimizer
pretrained_bert – pretrained Bert checkpoint
min_learning_rate – min value of learning rate if learning rate decay is used
-
__call__
(features_li: List[List[bert_dp.preprocessing.InputFeatures]]) → Union[List[int], List[List[float]]][source]¶ Calculate scores for the given context over candidate responses.
- Parameters
features_li – list of elements where the first element represents the context batch of features and the rest of elements represent response candidates batches of features
- Returns
predicted scores for contexts over response candidates
-
train_on_batch
(features_li: List[List[bert_dp.preprocessing.InputFeatures]], y: Union[List[int], List[List[int]]]) → Dict[source]¶ Train the model on the given batch.
- Parameters
features_li – list with two elements, one containing the batch of context features and the other containing the batch of response features
y – batch of labels (class id or one-hot encoding)
- Returns
dict with loss and learning rate values
-
class
deeppavlov.models.bert.bert_ranker.
BertSepRankerPredictor
(bert_config_file, interact_mode=0, batch_size=32, resps=None, resp_features=None, resp_vecs=None, conts=None, cont_features=None, cont_vecs=None, **kwargs)[source]¶ Bert-based model for ranking and receiving a text response.
BERT pooled output from [CLS] token is used to get a separate representation of a context and a response. A similarity score is calculated as cosine similarity between these representations. Based on this similarity score the text response is retrieved provided some base with possible responses (and corresponding contexts). Contexts of responses are used additionaly to get the best possible result of retrieval from the base.
- Parameters
bert_config_file – path to Bert configuration file
interact_mode – mode setting a policy to retrieve the response from the base
batch_size – batch size for building response (and context) vectors over the base
keep_prob – dropout keep_prob for non-Bert layers
resps – list of strings containing the base of text responses
resp_vecs – BERT vector respresentations of
resps
, if isNone
it will be buildresp_features – features of
resps
to build their BERT vector representationsconts – list of strings containing the base of text contexts
cont_vecs – BERT vector respresentations of
conts
, if isNone
it will be buildcont_features – features of
conts
to build their BERT vector representations
-
__call__
(features_li)[source]¶ Get the context vector representation and retrieve the text response from the database.
Uses cosine similarity scores over vectors of responses (and corresponding contexts) from the base. Based on these scores retrieves the text response from the base.
- Parameters
features_li – list of elements where elements represent context batches of features
- Returns
text response with the highest similarity score and its similarity score from the response base
-
class
deeppavlov.models.bert.bert_as_summarizer.
BertAsSummarizer
(bert_config_file: str, pretrained_bert: str, vocab_file: str, max_summary_length: int, max_summary_length_in_tokens: Optional[bool] = False, max_seq_length: Optional[int] = 128, do_lower_case: Optional[bool] = False, lang: Optional[str] = 'ru', **kwargs)[source]¶ Naive Extractive Summarization model based on BERT. BERT model was trained on Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks. NSP head was trained to detect in
[CLS] text_a [SEP] text_b [SEP]
if text_b follows text_a in original document.This NSP head can be used to stack sentences from a long document, based on a initial sentence:
summary_0 = init_sentence
summary_1 = summary_0 + argmax(nsp_score(candidates))
summary_2 = summary_1 + argmax(nsp_score(candidates))
…
, where candidates are all sentences from a document.
- Parameters
bert_config_file – path to Bert configuration file
pretrained_bert – path to pretrained Bert checkpoint
vocab_file – path to Bert vocabulary
max_summary_length – limit on summary length, number of sentences is used if
max_summary_length_in_tokens
is set to False, else number of tokens is used.max_summary_length_in_tokens – Use number of tokens as length of summary. Defaults to
False
.max_seq_length – max sequence length in subtokens, including
[SEP]
and[CLS]
tokens. max_seq_length is used in Bert to compute NSP scores. Defaults to128
.do_lower_case – set
True
if lowercasing is needed. Defaults toFalse
.lang – use ru_sent_tokenizer for ‘ru’ and ntlk.sent_tokener for other languages. Defaults to
'ru'
.
-
__call__
(texts: List[str], init_sentences: Optional[List[str]] = None) → List[List[str]][source]¶ Builds summary for text from texts
- Parameters
texts – texts to build summaries for
init_sentences –
init_sentence
is used as the first sentence in summary. Defaults to None.
- Returns
summaries tokenized on sentences
- Return type
List[List[str]]
-
_get_nsp_predictions
(sentences: List[str], candidates: List[str])[source]¶ Compute NextSentence probability for every (sentence_i, candidate_i) pair.
[CLS] sentence_i [SEP] candidate_i [SEP]
- Parameters
sentences – list of sentences
candidates – list of candidates to be the next sentence
- Returns
probabilities that candidate is a next sentence