deeppavlov.models.multitask_bert¶
-
class
deeppavlov.dataset_readers.multitask_reader.
MultiTaskReader
[source]¶ Class to read several datasets simultaneuosly
-
class
deeppavlov.dataset_iterators.multitask_iterator.
MultiTaskIterator
(data: dict, tasks: dict)[source]¶ Class merges data from several dataset iterators. When used for batch generation batches from merged dataset iterators are united into one batch. If sizes of merged datasets are different smaller datasets are repeated until their size becomes equal to the largest dataset.
- Parameters
data – dictionary which keys are task names and values are dictionaries with fields
"train", "valid", "test"
.tasks – dictionary which keys are task names and values are init params of dataset iterators.
-
data
¶ dictionary of data with fields “train”, “valid” and “test” (or some of them)
-
gen_batches
(batch_size: int, data_type: str = 'train', shuffle: Optional[bool] = None) → Iterator[Tuple[tuple, tuple]][source]¶ Generate batches and expected output to train neural networks. Batches from task iterators are united into one batch. Every element of the largest dataset is used once whereas smaller datasets are repeated until their size is equal to the largest dataset.
- Parameters
batch_size – number of samples in batch
data_type – can be either ‘train’, ‘test’, or ‘valid’
shuffle – whether to shuffle dataset before batching
- Yields
a tuple of a batch of inputs and a batch of expected outputs. Inputs and outputs are tuples. Element of inputs or outputs is a tuple which elements are x values of merged tasks in the order tasks are present in tasks argument of __init__ method.
-
get_instances
(data_type: str = 'train')[source]¶ Returns a tuple of inputs and outputs from all datasets. Lengths of inputs and outputs are equal to the size of the largest dataset. Smaller datasets are repeated until their sizes are equal to the size of the largest dataset.
- Parameters
data_type – can be either ‘train’, ‘test’, or ‘valid’
- Returns
a tuple of all inputs for a data type and all expected outputs for a data type
-
class
deeppavlov.models.multitask_bert.multitask_bert.
MultiTaskBert
(*args, **kwargs)[source]¶ The component for multi-task BERT. It builds the BERT body, launches building of BERT heads.
The component aggregates components implementing BERT heads. The head components are called tasks.
__call__
andtrain_on_batch
methods ofMultiTaskBert
are used for inference and training of BERT heads. BERT head components, which are derived fromMTBertTask
, can be used only inside this class.One training iteration consists of one
train_on_batch
call for every task.If
inference_task_names
is notNone
, then the component is created for training. Otherwise, the component is created for inference. If component is created for inference, several tasks can be run simultaneously. For explanation see parameterinference_task_names
description.- Parameters
tasks – a dictionary. Task names are dictionary keys and objects of
MTBertTask
subclasses are dictionary values. Task names are used as variable scopes in computational graph so it is important to use same names in multi-task BERT train and inference configuration files.bert_config_file – path to BERT configuration file
pretrained_bert – pre-trained BERT checkpoint
attention_probs_keep_prob – keep_prob for BERT self-attention layers
hidden_keep_prob – keep_prob for BERT hidden layers
body_learning_rate – learning rate of BERT body
min_body_learning_rate – min value of body learning rate if learning rate decay is used
learning_rate_drop_patience – how many validations with no improvements to wait
learning_rate_drop_div – the divider of the learning rate after
learning_rate_drop_patience
unsuccessful validationsload_before_drop – whether to load best model before dropping learning rate or not
clip_norm – clip gradients by norm
freeze_embeddings – set to False to train input embeddings
inference_task_names –
names of tasks on which inference is done. If this parameter is provided, the component is created for inference, else the component is created for training.
If
inference_task_names
is a string, then it is a name of the task called separately from other tasks (in individualtf.Session.run
call).If
inference_task_names
is alist
, then elements of this list are either strings or lists of strings. You can combine these options. For example,["task_name1", ["task_name2", "task_name3"], ["task_name4", "task_name5"]]
.If an element of
inference_task_names
list is a string, the element is a name of the task that is computed when__call__
method is called.If an element of the
inference_task_names
parameter is a list of strings["task_name1", "task_name2", ...]
, then tasks"task_name1"
,"task_name2"
and so on are run simultaneously intf.Session.run
call. This option is available if tasks"task_name1"
,"task_name2"
and so on have common inputs. Despite the fact that tasks share inputs, if positional arguments are used in methods__call__
andtrain_on_batch
, all arguments are passed individually. For instance, if"task_name1"
,"task_name2"
, and"task_name3"
all take an argument with namex
in the model pipe, then the__call__
method takes arguments(x, x, x)
.in_distribution –
The distribution of variables listed in the
"in"
config parameter between tasks.in_distribution
can beNone
if only 1 task is called. In that case all variables listed in"in"
are arguments of 1 task.in_distribution
can be a dictionary ofint
. If that is the case, then keys ofin_distribution
are task names and values are numbers of variables from"in"
parameter of config which are inputs of corresponding task. The variables in"in"
parameter have to be in the same order the tasks are listed inin_distribution
.in_distribution
can be a dictionary of lists ofstr
. Strings are names of variables from"in"
configuration parameter. If"in"
parameter is a list, thenin_distribution
works the same way as whenin_distribution
is dictionary ofint
. Values ofin_distribution
, which are lists, are replaced by their lengths. If"in"
parameter in component config is a dictionary, then the order of strings inin_distribution
values has to match the order of arguments oftrain_on_batch
andget_sess_run_infer_args
methods of task components.in_y_distribution – The same as
in_distribution
for"in_y"
config parameter.
-
train_on_batch
(*args, **kwargs) → Dict[str, Dict[str, float]][source]¶ Calls
train_on_batch
methods for every task. This method takesargs
orkwargs
but not both. The order ofargs
is the same as the order of tasks in the component parameters:args = [ task1_in_x[0], task1_in_x[1], task1_in_x[2], ... task1_in_y[0], task1_in_y[1], ... task2_in_x[0], ... ]
If
kwargs
are used andin_distribution
andin_y_distribution
attributes are dictionaries of lists of strings, then keys ofkwargs
have to be same as strings inin_distribution
andin_y_distribution
. Ifin_distribution
andin_y_distribution
are dictionaries ofint
, thenkwargs
values are treated the same way asargs
.- Parameters
args – task inputs and expected outputs
kwargs – task inputs and expected outputs
- Returns
dictionary of dictionaries with task losses and learning rates.
-
__call__
(*args, **kwargs)[source]¶ Calls one or several BERT heads depending on provided task names.
args
andkwargs
contain inputs of BERT tasks.args
andkwargs
cannot be used together. Ifargs
are usedargs
content has to beargs = [ task1_in_x[0], task1_in_x[1], ... task2_in_x[0], task2_in_x[1], ... ]
If
kwargs
are used andin_distribution
is a dictionary ofint
, thenkwargs
’ order has to be the same asargs
order described in the previous paragraph. Ifin_distribution
is a dictionary of lists ofstr
, then all task names fromin_distribution
have to be present inkwargs
keys.- Returns
list of results of called tasks.
-
call
(args: Tuple[Any], kwargs: Dict[str, Any], task_names: Optional[Union[List[str], str]], in_distribution: Optional[Union[Dict[str, int], Dict[str, List[str]]]] = None)[source]¶ Calls one or several BERT heads depending on provided task names in
task_names
parameter.args
andkwargs
contain inputs of BERT tasks.args
andkwargs cannot be used simultaneously. If ``args
are usedargs
, content has to beargs = [ task1_in_x[0], task1_in_x[1], ... task2_in_x[0], task2_in_x[1], ... ]
If
kwargs
is usedkwargs
keys has to match content ofin_names
params of called tasks.- Parameters
args – generally,
args
parameter of__call__
method of this component orMTBertReUser
. Inputs of one or several tasks. Has to be empty ifkwargs
argument is used.kwargs – generally,
kwargs
parameter of__call__
method of this component orMTBertReUser
. Inputs of one or several tasks. Has to be empty ifargs
argument is used.task_names – names of tasks that are called. If
str
, then 1 task is called. If a task name is an element oftask_names
list, then this task is run independently. If task an element oftask_names
is an list of strings, then tasks in the inner list are run simultaneously.in_distribution – a distribution of variables from
"in"
config parameters between tasks. For details see method__init__
docstring.
- Returns
list results of called tasks.
-
class
deeppavlov.models.multitask_bert.multitask_bert.
MTBertTask
(keep_prob: float = 1.0, return_probas: Optional[bool] = None, learning_rate: float = 0.001)[source]¶ Abstract class for multitask BERT tasks. Objects of its subclasses are linked with BERT body when
MultiTaskBert.build
method is called. Training is performed withMultiTaskBert.train_on_batch
method is called. The objects of classes derived fromMTBertTask
don’t have__call__
method. Instead they haveget_sess_run_infer_args
andpost_process_preds
methods, which are called fromcall
method ofMultiTaskBert
class.get_sess_run_infer_args
method returns fetches and feed_dict for inference andpost_process_preds
method retrieves predictions from computed fetches. Classes derived fromMTBertTask
mustget_sess_run_train_args
method that returns fetches and feed_dict for training.- Parameters
keep_prob – dropout keep_prob for non-BERT layers
return_probas – set this to
True
if you need the probabilities instead of raw answerslearning_rate – learning rate of BERT head
-
build
(bert_body: bert_dp.modeling.BertModel, optimizer_params: Dict[str, Union[str, float]], shared_placeholders: Dict[str, tensorflow.placeholder], sess: tensorflow.Session, mode: str, get_train_op_func: Callable, freeze_embeddings: bool, bert_head_variable_scope: str) → None[source]¶ Initiates building of the BERT head and initializes optimizer parameters, placeholders that are common for all tasks.
- Parameters
bert_body – instance of
BertModel
.optimizer_params – a dictionary with four fields:
'optimizer'
(str
) – a name of optimizer class,'body_learning_rate'
(float
) – initial value of BERT body learning rate,'min_body_learning_rate'
(float
) – min BERT body learning rate for learning rate decay,'weight_decay_rate'
(float
) – L2 weight decay forAdamWeightDecayOptimizer
shared_placeholders – a dictionary with placeholders used in all tasks. The dictionary contains fields
'input_ids'
,'input_masks'
,'learning_rate'
,'keep_prob'
,'is_train'
,'token_types'
.sess – current
tf.Session
instancemode –
'train'
or'inference'
get_train_op_func – a function returning
tf.Operation
and with signature similar toLRScheduledTFModel.get_train_op
withoutself
argument. It is a function returning train operation for specified loss and variable scopes.freeze_embeddings – set
False
to train input embeddings.bert_head_variable_scope – variable scope for BERT head.
-
abstract
_init_graph
() → None[source]¶ Build BERT head, initialize task specific placeholders, create attributes containing output probabilities and model loss. Optimizer initialized not in this method but in
_init_optimizer
.
-
get_train_op
(loss: tensorflow.Tensor, body_learning_rate: Union[tensorflow.Tensor, float], **kwargs) → tensorflow.Operation[source]¶ Return operation for the task training. Head learning rate is calculated as a product of
body_learning_rate
and quotient of initial head learning rate and initial body learning rate.- Parameters
loss – the task loss
body_learning_rate – the learning rate for the BERT body
- Returns
train operation for the task
-
train_on_batch
(*args, **kwargs) → Dict[str, float][source]¶ Trains the task on one batch. This method will work correctly if you override
get_sess_run_train_args
for your task.- Parameters
kwargs – the keys are
body_learning_rate
and"in"
and"in_y"
params for the task.- Returns
dictionary with calcutated task loss and body and head learning rates.
-
abstract
get_sess_run_infer_args
(*args) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for inference. Fetches are lists of tensors and feed_dict is dictionary with placeholder values required for fetches computation. The method is used inside
MultiTaskBert
__call__
method.If
self.return_probas
isTrue
fetches contains probabilities tensor and predictions tensor otherwise.Overriding methods take task inputs as positional arguments.
ATTENTION! Let
get_sess_run_infer_args
method haven_x_args
arguments. Then the order of firstn_x_args
arguments ofget_sess_run_train_args
method arguments has to match the order ofget_sess_run_infer_args
arguments.- Parameters
args – task inputs.
- Returns
fetches and feed_dict
-
abstract
get_sess_run_train_args
(*args) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for task
train_on_batch
method.Overriding methods take task inputs as positional arguments.
ATTENTION! Let
get_sess_run_infer_args
method haven_x_args
arguments. Then the order of firstn_x_args
arguments ofget_sess_run_train_args
method arguments has to match the order ofget_sess_run_infer_args
arguments.- Parameters
args – task inputs followed by expect outputs.
- Returns
fetches and feed_dict
-
class
deeppavlov.models.multitask_bert.multitask_bert.
MTBertSequenceTaggingTask
(n_tags: Optional[int] = None, use_crf: Optional[bool] = None, use_birnn: bool = False, birnn_cell_type: str = 'lstm', birnn_hidden_size: int = 128, keep_prob: float = 1.0, encoder_dropout: float = 0.0, return_probas: Optional[bool] = None, encoder_layer_ids: Optional[List[int]] = None, learning_rate: float = 0.001)[source]¶ BERT head for text tagging. It predicts a label for every token (not subtoken) in the text. You can use it for sequence labelling tasks, such as morphological tagging or named entity recognition. Objects of this class should be passed to the constructor of
MultiTaskBert
class in paramtasks
.- Parameters
n_tags – number of distinct tags
use_crf – whether to use CRF on top or not
use_birnn – whether to use bidirection rnn after BERT layers. For NER and morphological tagging we usually set it to
False
as otherwise the model overfitsbirnn_cell_type – the type of Bidirectional RNN. Either
"lstm"
or"gru"
birnn_hidden_size – number of hidden units in the BiRNN layer in each direction
keep_prob – dropout keep_prob for non-Bert layers
encoder_dropout – dropout probability of encoder output layer
return_probas – set this to
True
if you need the probabilities instead of raw answersencoder_layer_ids – list of averaged layers from Bert encoder (layer ids) optimizer: name of
tf.train.*
optimizer or None forAdamWeightDecayOptimizer
weight_decay_rate: L2 weight decay forAdamWeightDecayOptimizer
learning_rate – learning rate of BERT head
-
get_sess_run_infer_args
(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray]) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for model inference. The method is called from
MultiTaskBert.__call__
.- Parameters
input_ids – indices of the subwords in vocabulary
input_masks – mask that determines where to attend and where not to
y_masks – mask which determines the first subword units in the the word
- Returns
list of fetches and feed_dict
-
get_sess_run_train_args
(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray], y: Union[List[List[int]], numpy.ndarray], body_learning_rate: float) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for model
train_on_batch
method.- Parameters
input_ids – indices of the subwords in vocabulary
input_masks – mask that determines where to attend and where not to
y_masks – mask which determines the first subword units in the the word
y – indices of ground truth tags
body_learning_rate – learning rate for BERT body
- Returns
list of fetches and feed_dict
-
post_process_preds
(sess_run_res: List[numpy.ndarray]) → Union[List[List[int]], List[numpy.ndarray]][source]¶ Decodes CRF if needed and returns predictions or probabilities.
- Parameters
sess_run_res – list of computed fetches gathered by
get_sess_run_infer_args
- Returns
predictions or probabilities depending on
return_probas
attribute
-
class
deeppavlov.models.multitask_bert.multitask_bert.
MTBertClassificationTask
(n_classes: Optional[int] = None, return_probas: Optional[bool] = None, one_hot_labels: Optional[bool] = None, keep_prob: float = 1.0, multilabel: bool = False, learning_rate: float = 2e-05, optimizer: str = 'Adam')[source]¶ Task for text classification.
It uses output from [CLS] token and predicts labels using linear transformation.
- Parameters
n_classes – number of classes
return_probas – set
True
if return class probabilities instead of most probable label neededone_hot_labels – set
True
if one-hot encoding for labels is usedkeep_prob – dropout keep_prob for non-BERT layers
multilabel – set
True
if it is multi-label classificationlearning_rate – learning rate of BERT head
optimizer – name of
tf.train.*
optimizer orNone
forAdamWeightDecayOptimizer
-
get_sess_run_infer_args
(features: List[bert_dp.preprocessing.InputFeatures]) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for model inference. The method is called from
MultiTaskBert.__call__
.- Parameters
features – text features created by BERT preprocessor.
- Returns
list of fetches and feed_dict
-
get_sess_run_train_args
(features: List[bert_dp.preprocessing.InputFeatures], y: Union[List[int], List[List[int]]], body_learning_rate: float) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for model
train_on_batch
method.- Parameters
features – text features created by BERT preprocessor.
y – batch of labels (class id or one-hot encoding)
body_learning_rate – learning rate for BERT body
- Returns
list of fetches and feed_dict
-
class
deeppavlov.models.multitask_bert.multitask_bert.
MTBertReUser
(mt_bert: deeppavlov.models.multitask_bert.multitask_bert.MultiTaskBert, task_names: Union[str, List[Union[List[str], str]]], in_distribution: Optional[Union[Dict[str, int], Dict[str, List[str]]]] = None, *args, **kwargs)[source]¶ Instances of this class are for multi-task BERT inference. In inference config
MultiTaskBert
class may not perform inference of some tasks. For example, you may need to sequentially apply two models with BERT. In that case,mt_bert_reuser
is created to call remaining tasks.- Parameters
mt_bert – An instance of
MultiTaskBert
task_names – Names of infered tasks. If
task_names
isstr
, thentask_names
is the name of the only infered task. Iftask_names
islist
, then its elements can be either strings or lists of strings. If an element oftask_names
is a string, then this element is a name of a task that is run independently. If an element oftask_names
is a list of strings, then the element is a list of names of tasks that have common inputs and run simultaneously. For detailed information look upMultiTaskBert
inference_task_names
parameter.
-
__call__
(*args, **kwargs) → List[Any][source]¶ Infer tasks listed in parameter
task_names
. One of parametersargs
andkwargs
has to be empty.- Parameters
args – inputs and labels of infered tasks.
kwargs – inputs and labels of infered tasks.
- Returns
list of results of inference of tasks listed in
task_names
-
class
deeppavlov.models.multitask_bert.multitask_bert.
InputSplitter
(keys_to_extract: Union[List[str], Tuple[str, …]], **kwargs)[source]¶ The instance of these class in pipe splits a batch of sequences of identical length or dictionaries with identical keys into tuple of batches.
- Parameters
keys_to_extract – a sequence of ints or strings that have to match keys of split dictionaries.
-
__call__
(inp: Union[List[dict], List[List[int]], List[Tuple[int]]]) → List[list][source]¶ Returns batches of values from
inp
. Every batch contains values that have same key fromkeys_to_extract
attribute. The order of elements ofkeys_to_extract
is preserved.- Parameters
inp – A sequence of dictionaries with identical keys
- Returns
A list of lists of values of dictionaries from
inp