deeppavlov.models.doc_retrieval¶
Ranking classes.
-
class
deeppavlov.models.doc_retrieval.tfidf_ranker.
TfidfRanker
(vectorizer: deeppavlov.models.vectorizers.hashing_tfidf_vectorizer.HashingTfIdfVectorizer, top_n=5, active: bool = True, **kwargs)[source]¶ Rank documents according to input strings.
Parameters: - vectorizer – a vectorizer class
- top_n – a number of doc ids to return
- active – whether to return a number specified by
top_n
(True
) or all ids (False
)
-
top_n
¶ a number of doc ids to return
-
vectorizer
¶ an instance of vectorizer class
-
index2doc
¶ inverted
doc_index
-
iterator
¶ a dataset iterator used for generating batches while fitting the vectorizer
-
class
deeppavlov.models.doc_retrieval.logit_ranker.
LogitRanker
(squad_model: deeppavlov.core.models.component.Component, batch_size: int = 50, sort_noans: bool = False, **kwargs)[source]¶ Select best answer using squad model logits. Make several batches for a single batch, send each batch to the squad model separately and get a single best answer for each batch.
Parameters: - squad_model – a loaded squad model
- batch_size – batch size to use with squad model
- sort_noans – whether to downgrade noans tokens in the most possible answers
-
squad_model
¶ a loaded squad model
-
batch_size
¶ batch size to use with squad model
-
__call__
(contexts_batch: List[List[str]], questions_batch: List[List[str]]) → List[str][source]¶ Sort obtained results from squad reader by logits and get the answer with a maximum logit.
Parameters: - contexts_batch – a batch of contexts which should be treated as a single batch in the outer JSON config
- questions_batch – a batch of questions which should be treated as a single batch in the outer JSON config
Returns: a batch of best answers