Features¶
Models¶
NER model [docs]¶
Named Entity Recognition task in DeepPavlov is solved with BERT-based model. The models predict tags (in BIO format) for tokens in input.
BERT-based model is described in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
Dataset |
Lang |
Model |
Test F1 |
---|---|---|---|
Persons-1000 dataset with additional LOC and ORG markup (Collection 3) |
Ru |
97.9 |
|
88.4 ± 0.5 |
|||
93.3 ± 0.3 |
|||
Ontonotes |
Multi |
88.9 |
|
En |
89.2 |
||
ConLL-2003 |
91.7 |
Classification model [docs]¶
Model for classification tasks (intents, sentiment, etc) on word-level. Shallow-and-wide CNN, Deep CNN, BiLSTM, BiLSTM with self-attention and other models are presented. The model also allows multilabel classification of texts. Several pre-trained models are available and presented in Table below.
Task |
Dataset |
Lang |
Model |
Metric |
Valid |
Test |
Downloads |
---|---|---|---|---|---|---|---|
Insult detection |
En |
ROC-AUC |
0.9327 |
0.8602 |
1.1 Gb |
||
Sentiment |
Accuracy |
0.6293 |
0.6626 |
1.1 Gb |
|||
Sentiment |
Ru |
Accuracy |
0.9918 |
0.9923 |
5.8 Gb |
||
F1-weighted |
0.6787 |
0.7005 |
1.3 Gb |
||||
0.739 |
0.7724 |
1.5 Gb |
|||||
0.703 ± 0.0031 |
0.7348 ± 0.0028 |
690 Mb |
|||||
0.7376 ± 0.0045 |
0.7645 ± 0.035 |
1.0 Gb |
As no one had published intent recognition for DSTC-2 data, the comparison of the presented model is given on SNIPS dataset. The evaluation of model scores was conducted in the same way as in 3 to compare with the results from the report of the authors of the dataset. The results were achieved with tuning of parameters and embeddings trained on Reddit dataset.
Model |
AddToPlaylist |
BookRestaurant |
GetWheather |
PlayMusic |
RateBook |
SearchCreativeWork |
SearchScreeningEvent |
---|---|---|---|---|---|---|---|
api.ai |
0.9931 |
0.9949 |
0.9935 |
0.9811 |
0.9992 |
0.9659 |
0.9801 |
ibm.watson |
0.9931 |
0.9950 |
0.9950 |
0.9822 |
0.9996 |
0.9643 |
0.9750 |
microsoft.luis |
0.9943 |
0.9935 |
0.9925 |
0.9815 |
0.9988 |
0.9620 |
0.9749 |
wit.ai |
0.9877 |
0.9913 |
0.9921 |
0.9766 |
0.9977 |
0.9458 |
0.9673 |
snips.ai |
0.9873 |
0.9921 |
0.9939 |
0.9729 |
0.9985 |
0.9455 |
0.9613 |
recast.ai |
0.9894 |
0.9943 |
0.9910 |
0.9660 |
0.9981 |
0.9424 |
0.9539 |
amazon.lex |
0.9930 |
0.9862 |
0.9825 |
0.9709 |
0.9981 |
0.9427 |
0.9581 |
Shallow-and-wide CNN |
0.9956 |
0.9973 |
0.9968 |
0.9871 |
0.9998 |
0.9752 |
0.9854 |
Automatic spelling correction model [docs]¶
Pipelines that use candidates search in a static dictionary and an ARPA language model to correct spelling errors.
Note
About 4.4 GB on disc required for the Russian language model and about 7 GB for the English one.
Comparison on the test set for the SpellRuEval competition on Automatic Spelling Correction for Russian:
Correction method |
Precision |
Recall |
F-measure |
Speed (sentences/s) |
---|---|---|---|---|
Yandex.Speller |
83.09 |
59.86 |
69.59 |
|
53.26 |
53.74 |
53.50 |
29.3 |
|
Hunspell + lm |
41.03 |
48.89 |
44.61 |
2.1 |
JamSpell |
44.57 |
35.69 |
39.64 |
136.2 |
Hunspell |
30.30 |
34.02 |
32.06 |
20.3 |
Ranking model [docs]¶
Available pre-trained models for paraphrase identification:
Dataset |
Model config |
Val (accuracy) |
Test (accuracy) |
Val (F1) |
Test (F1) |
Val (log_loss) |
Test (log_loss) |
Downloads |
---|---|---|---|---|---|---|---|---|
89.8 |
84.2 |
92.2 |
87.4 |
– |
– |
1325M |
||
76.1 ± 0.2 |
64.5 ± 0.5 |
81.8 ± 0.2 |
73.9 ± 0.8 |
– |
– |
618M |
||
86.5 ± 0.5 |
78.9 ± 0.4 |
89.6 ± 0.3 |
83.2 ± 0.5 |
– |
– |
930M |
References:
Yu Wu, Wei Wu, Ming Zhou, and Zhoujun Li. 2017. Sequential match network: A new architecture for multi-turn response selection in retrieval-based chatbots. In ACL, pages 372–381. https://www.aclweb.org/anthology/P17-1046
Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu and Hua Wu. 2018. Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1118-1127, ACL. http://aclweb.org/anthology/P18-1103
Chongyang Tao, Wei Wu, Can Xu, Wenpeng Hu, Dongyan Zhao, and Rui Yan. Multi-Representation Fusion Network for Multi-turn Response Selection in Retrieval-based Chatbots. In WSDM’19. https://dl.acm.org/citation.cfm?id=3290985
Gu, Jia-Chen & Ling, Zhen-Hua & Liu, Quan. (2019). Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots. https://arxiv.org/abs/1901.01824
TF-IDF Ranker model [docs]¶
Based on Reading Wikipedia to Answer Open-Domain Questions. The model solves the task of document retrieval for a given query.
Dataset |
Model |
Wiki dump |
Recall@5 |
Downloads |
||
---|---|---|---|---|---|---|
enwiki (2018-02-11) |
75.6 |
33 GB |
Question Answering model [docs]¶
Models in this section solve the task of looking for an answer on a question in a given context (SQuAD task format). There are two models for this task in DeepPavlov: BERT-based and R-Net. Both models predict answer start and end position in a given context.
BERT-based model is described in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
RuBERT-based model is described in Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language.
Dataset |
Model config |
lang |
EM (dev) |
F-1 (dev) |
Downloads |
---|---|---|---|---|---|
en |
81.49 |
88.86 |
1.2 Gb |
||
en |
75.71 |
80.72 |
1.2 Gb |
||
ru |
66.21 |
84.71 |
1.7 Mb |
||
DeepPavlov RuBERT, trained with tfidf-retrieved negative samples |
ru |
66.24 |
84.71 |
1.6 Gb |
|
ru |
44.2 ± 0.46 |
65.1 ± 0.36 |
867Mb |
||
ru |
61.23 ± 0.42 |
80.36 ± 0.28 |
1.18Gb |
In the case when answer is not necessary present in given context we have qa_squad2_bert model. This model outputs empty string in case if there is no answer in context.
AutoML¶
Hyperparameters optimization [docs]¶
Hyperparameters optimization by cross-validation for DeepPavlov models that requires only some small changes in a config file.
Embeddings¶
Pre-trained embeddings [docs]¶
Word vectors for the Russian language trained on joint Russian Wikipedia and Lenta.ru corpora.
Examples of some models¶
Run insults detection model with console interface:
python -m deeppavlov interact insults_kaggle_bert -d
Run insults detection model with REST API:
python -m deeppavlov riseapi insults_kaggle_bert -d
Predict whether it is an insult on every line in a file:
python -m deeppavlov predict insults_kaggle_bert -d --batch-size 15 < /data/in.txt > /data/out.txt