Frequently Asked Questions (FAQ)¶

This is implementation of FAQ model which helps to classify incoming questions.

:: What is your open hours?
>> 8am - 8pm

Quick Start¶

Building¶

from deeppavlov import build_model, configs

faq = build_model(configs.faq.tfidf_logreg_en_faq, download=True)

Inference¶

result = faq(['What is your open hours?'])

If some required packages are missing, install all the requirements by running in command line:

python -m deeppavlov install fasttext_avg_autofaq
python -m deeppavlov install fasttext_tfidf_autofaq
python -m deeppavlov install tfidf_autofaq
python -m deeppavlov install tfidf_logreg_autofaq
python -m deeppavlov install tfidf_logreg_en_faq

Config¶

As usual, config consists of:

dataset_reader
dataset_iterator
chainer

You can use you own dataset_reader, dataset_iterator for speficic data. Let’s consider chainer in more details.

Config Structure¶

chainer - pipeline manager
- in - pipeline input data: question
- out - pipeline output data: answer + score[0,1]
preprocessing - it can be tokenization, lemmatization, stemming and etc. In example tfidf_logreg_autofaq.json there are tokenization and lemmatization.
vectorizer - vectorizer of incoming sentences. It can be word embeddings vectorizer, bag of words vectorizer, tf-idf vectorizer and etc. Th output is vectorized sentences (numeric vectors).
classifier - This is faq model that classify incoming question. Model receive vectorized train sentences and vectorized question for inference. Output is classified answer from train dataset.

Vectorizers¶

Vectorizers produce numeric vectors of input sentences

sentence2vector_v2w_tfidf - Sentence vectorizer: weighted sum of word embeddings from sentence
- in - input data: question
- fit_on - train data: [token lemmas of question, word embeddings]
- save_path - path where to save model
- load_path - path where to load model
- out - output data: vectorized sentence

Classifiers for FAQ¶

This is models that classify incoming question and find corresponding answer

cos_sim_classifier - Classifier based on cosine similarity
- in - input data: question
- fit_on - train data: [vectorized sentences, answers]
- save_path - path where to save model
- load_path - path where to load model
- out - output data: [answer, score]
logreg_classifier - Logistic Regression classifier, that output most probable answer with score
- in - input data: question
- fit_on - train data: [vectorized sentences, answers]
- c - regularization parameter for logistic regression model
- penalty - regularization type: ‘l1’ or ‘l2’
- save_path - path where to save model
- load_path - path where to load model
- out - output data: [answer, score]

Running FAQ¶

Training¶

To train your own model by running command train, for example:

python -m deeppavlov train tfidf_autofaq

Interacting¶

After model has trained, you can use it for inference: model will return answers from FAQ data that used for train.

python -m deeppavlov interact tfidf_autofaq -d

Inference example:

:: What is your open hours?
>> 8am - 8pm

Available Data and Pretrained Models¶

As an example you can try pretrained models on FAQ dataset in English: MIPT FAQ for entrants - https://mipt.ru/english/edu/faqs/

tfidf_logreg_classifier_en_mipt_faq.pkl - pre-trained logistic regression classifier for classifying input question (vectorized by tfidf)
tfidf_vectorizer_en_mipt_faq.pkl - pre-trained model for TF-IDF vectorizer based on MIPT FAQ

Example config - tfidf_logreg_en_faq.json

Also you can use pretrained model on Russan FAQ dataset from school-site: https://gobu.ftl.name/page/1279/

tfidf_cos_sim_classifier.pkl - pre-trained cosine similarity classifier for classifying input question (vectorized by tfidf)
tfidf_logreg_classifier_v2.pkl - pre-trained logistic regression classifier for classifying input question (vectorized by tfidf)
fasttext_cos_classifier.pkl - pre-trained cosine similarity classifier for classifying input question (vectorized by word embeddings)
tfidf_vectorizer_ruwiki_v2.pkl - pre-trained model for TF-IDF vectorizer based on Russian Wikipedia