Open Domain Question Answering Model on Wikipedia¶
Task definition¶
Open Domain Question Answering (ODQA) is a task to find an exact answer to any question in Wikipedia articles. Thus, given only a question, the system outputs the best answer it can find. The default ODQA implementation takes a batch of queries as input and returns the best answer.
Quick Start¶
The example below is given for basic ODQA config en_odqa_infer_wiki. Check what other ODQA configs are available and simply replace en_odqa_infer_wiki with the config name of your preference.
Before using the model make sure that all required packages are installed running the command:
python -m deeppavlov install en_odqa_infer_wiki
Training (if you have your own data)
from deeppavlov import train_evaluate_model_from_config
train_evaluate_model_from_config('en_ranker_tfidf_wiki', download=True)
train_evaluate_model_from_config('qa_squad2_bert', download=True)
Building
from deeppavlov import build_model
odqa = build_model('en_odqa_infer_wiki', download=True)
Inference
result = odqa(['What is the name of Darth Vader\'s son?'])
print(result)
Output:
>> Luke Skywalker
Languages¶
There are pretrained ODQA models for English and Russian languages in DeepPavlov.
Models¶
The architecture of ODQA model is modular and consists of two models, a ranker and a reader. The ranker is based on DrQA 1 proposed by Facebook Research and the reader is based on R-NET 2 proposed by Microsoft Research Asia and its implementation 3 by Wenxuan Zhou.
Running ODQA¶
Note
About 22 GB of RAM required. It is possible to run on a 16 GB machine, but than swap size should be at least 8 GB.
Training¶
ODQA ranker and ODQA reader should be trained separately. Read about training the ranker here. Read about training the reader in our separate [reader tutorial]<SQuAD.ipynb#4.-Train-the-model-on-your-data>.
Interacting¶
When interacting, the ODQA model returns a plain answer to the user’s question.
Run the following to interact with English ODQA:
python -m deeppavlov interact en_odqa_infer_wiki -d
Run the following to interact with Russian ODQA:
python -m deeppavlov interact ru_odqa_infer_wiki -d
Configuration¶
The ODQA configs suit only model inferring purposes. For training purposes use the ranker configs and the [reader tutorial]<SQuAD.ipynb#4.-Train-the-model-on-your-data> accordingly.
There are several ODQA configs available:
Config |
Description |
Basic config for English language. Consists
of TF-IDF ranker and reader. Searches for an
answer in |
|
Basic config for Russian language. Consists
of TF-IDF ranker and reader. Searches for an
answer in |
|
Extended config for English language.
Consists of TF-IDF Ranker, Popularity Ranker
and reader. Searches for an answer in
|
Comparison¶
Scores for ODQA models:
Model |
Lang |
Dataset |
WikiDump |
Ranker@5 |
|||
F1 |
EM |
F1 |
EM |
||||
En |
enwiki20180211 |
29.03 |
22.75 |
31.38 |
25.96 |
||
enwiki20161221 |
- |
27.1 |
- |
- |
|||
37.5 |
29.1 |
- |
- |
||||
Ru |
SDSJ Task B (dev) |
ruwiki20180401 |
42.02 |
29.56 |
- |
- |
EM stands for “exact-match accuracy”. Metrics are counted for top 5 and top 25 documents returned by retrieval module.