Retrieval for Open-Domain QA Datasets

Models TriviaQA Natural Question
Top20 Top100 Top20 Top100
Command to generate run:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics dpr-trivia-test \
  --output run.odqa.BM25-k1_0.9_b_0.4.dpr-trivia-test.hits-100.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics dpr-trivia-test \
  --index wikipedia-dpr \
  --input run.odqa.BM25-k1_0.9_b_0.4.dpr-trivia-test.hits-100.txt \
  --output run.odqa.BM25-k1_0.9_b_0.4.dpr-trivia-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.BM25-k1_0.9_b_0.4.dpr-trivia-test.hits-100.json \
  --topk 20 100
Command to generate run:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics nq-test \
  --output run.odqa.BM25-k1_0.9_b_0.4.nq-test.hits-100.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics nq-test \
  --index wikipedia-dpr \
  --input run.odqa.BM25-k1_0.9_b_0.4.nq-test.hits-100.txt \
  --output run.odqa.BM25-k1_0.9_b_0.4.nq-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.BM25-k1_0.9_b_0.4.nq-test.hits-100.json \
  --topk 20 100
Command to generate run:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics dpr-trivia-test \
  --output run.odqa.BM25-k1_0.9_b_0.4_dpr-topics.dpr-trivia-test.hits-100.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics dpr-trivia-test \
  --index wikipedia-dpr \
  --input run.odqa.BM25-k1_0.9_b_0.4_dpr-topics.dpr-trivia-test.hits-100.txt \
  --output run.odqa.BM25-k1_0.9_b_0.4_dpr-topics.dpr-trivia-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.BM25-k1_0.9_b_0.4_dpr-topics.dpr-trivia-test.hits-100.json \
  --topk 20 100
Command to generate run:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics dpr-nq-test \
  --output run.odqa.BM25-k1_0.9_b_0.4_dpr-topics.dpr-nq-test.hits-100.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics dpr-nq-test \
  --index wikipedia-dpr \
  --input run.odqa.BM25-k1_0.9_b_0.4_dpr-topics.dpr-nq-test.hits-100.txt \
  --output run.odqa.BM25-k1_0.9_b_0.4_dpr-topics.dpr-nq-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.BM25-k1_0.9_b_0.4_dpr-topics.dpr-nq-test.hits-100.json \
  --topk 20 100
Command to generate runs:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics dpr-trivia-test-gar-t5-answers \
  --output run.odqa.GarT5-RRF.dpr-trivia-test.answers.hits-1000.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 1000
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics dpr-trivia-test-gar-t5-titles \
  --output run.odqa.GarT5-RRF.dpr-trivia-test.titles.hits-1000.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 1000
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics dpr-trivia-test-gar-t5-sentences \
  --output run.odqa.GarT5-RRF.dpr-trivia-test.sentences.hits-1000.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 1000
Fusing the results using RRF
python -m pyserini.fusion \
  --runs run.odqa.GarT5-RRF.dpr-trivia-test.answers.hits-1000.txt \
	 run.odqa.GarT5-RRF.dpr-trivia-test.titles.hits-1000.txt \
	 run.odqa.GarT5-RRF.dpr-trivia-test.sentences.hits-1000.txt \
  --output run.odqa.GarT5-RRF.dpr-trivia-test.hits-100.fusion.txt \
  --k 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics dpr-trivia-test \
  --index wikipedia-dpr \
  --input run.odqa.GarT5-RRF.dpr-trivia-test.hits-100.fusion.txt \
  --output run.odqa.GarT5-RRF.dpr-trivia-test.hits-100.fusion.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.GarT5-RRF.dpr-trivia-test.hits-100.fusion.json \
  --topk 20 100
Command to generate runs:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics nq-test-gar-t5-answers \
  --output run.odqa.GarT5-RRF.nq-test.answers.hits-1000.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 1000
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics nq-test-gar-t5-titles \
  --output run.odqa.GarT5-RRF.nq-test.titles.hits-1000.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 1000
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index wikipedia-dpr-100w \
  --topics nq-test-gar-t5-sentences \
  --output run.odqa.GarT5-RRF.nq-test.sentences.hits-1000.txt \
  --bm25 --k1 0.9 --b 0.4 \
  --hits 1000
Fusing the results using RRF
python -m pyserini.fusion \
  --runs run.odqa.GarT5-RRF.nq-test.answers.hits-1000.txt \
	 run.odqa.GarT5-RRF.nq-test.titles.hits-1000.txt \
	 run.odqa.GarT5-RRF.nq-test.sentences.hits-1000.txt \
  --output run.odqa.GarT5-RRF.nq-test.hits-100.fusion.txt \
  --k 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics nq-test \
  --index wikipedia-dpr \
  --input run.odqa.GarT5-RRF.nq-test.hits-100.fusion.txt \
  --output run.odqa.GarT5-RRF.nq-test.hits-100.fusion.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.GarT5-RRF.nq-test.hits-100.fusion.json \
  --topk 20 100
Command to generate run:
python -m pyserini.search.faiss \
  --threads 16 --batch-size 512 \
  --index wikipedia-dpr-100w.dpr-multi \
  --encoder facebook/dpr-question_encoder-multiset-base \
  --topics dpr-trivia-test \
  --output run.odqa.DPR.dpr-trivia-test.hits-100.txt \
  --hits 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics dpr-trivia-test \
  --index wikipedia-dpr \
  --input run.odqa.DPR.dpr-trivia-test.hits-100.txt \
  --output run.odqa.DPR.dpr-trivia-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.DPR.dpr-trivia-test.hits-100.json \
  --topk 20 100
Command to generate run:
python -m pyserini.search.faiss \
  --threads 16 --batch-size 512 \
  --index wikipedia-dpr-100w.dpr-single-nq \
  --encoder facebook/dpr-question_encoder-single-nq-base \
  --topics nq-test \
  --output run.odqa.DPR.nq-test.hits-100.txt \
  --hits 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics nq-test \
  --index wikipedia-dpr \
  --input run.odqa.DPR.nq-test.hits-100.txt \
  --output run.odqa.DPR.nq-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.DPR.nq-test.hits-100.json \
  --topk 20 100
Command to generate run:
python -m pyserini.search.faiss \
  --threads 16 --batch-size 512 \
  --index wikipedia-dpr-100w.dkrr-tqa \
  --encoder castorini/dkrr-dpr-tqa-retriever \
  --topics dpr-trivia-test \
  --output run.odqa.DPR-DKRR.dpr-trivia-test.hits-100.txt --query-prefix question:  \
  --hits 1000
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics dpr-trivia-test \
  --index wikipedia-dpr \
  --input run.odqa.DPR-DKRR.dpr-trivia-test.hits-100.txt \
  --output run.odqa.DPR-DKRR.dpr-trivia-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.DPR-DKRR.dpr-trivia-test.hits-100.json \
  --topk 20 100
Command to generate run:
python -m pyserini.search.faiss \
  --threads 16 --batch-size 512 \
  --index wikipedia-dpr-100w.dkrr-nq \
  --encoder castorini/dkrr-dpr-nq-retriever \
  --topics nq-test \
  --output run.odqa.DPR-DKRR.nq-test.hits-100.txt --query-prefix question:  \
  --hits 1000
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics nq-test \
  --index wikipedia-dpr \
  --input run.odqa.DPR-DKRR.nq-test.hits-100.txt \
  --output run.odqa.DPR-DKRR.nq-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.DPR-DKRR.nq-test.hits-100.json \
  --topk 20 100
Command to generate run:
python -m pyserini.search.hybrid \
 dense  --index wikipedia-dpr-100w.dpr-multi \
	--encoder facebook/dpr-question_encoder-multiset-base \
 sparse --index wikipedia-dpr-100w \
 fusion --alpha 0.95 \
 run	--topics dpr-trivia-test \
	--output run.odqa.DPR-Hybrid.dpr-trivia-test.hits-100.txt \
	--threads 16 --batch-size 512 \
	--hits 1000
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics dpr-trivia-test \
  --index wikipedia-dpr \
  --input run.odqa.DPR-Hybrid.dpr-trivia-test.hits-100.txt \
  --output run.odqa.DPR-Hybrid.dpr-trivia-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.DPR-Hybrid.dpr-trivia-test.hits-100.json \
  --topk 20 100
Command to generate run:
python -m pyserini.search.hybrid \
 dense  --index wikipedia-dpr-100w.dpr-single-nq \
	--encoder facebook/dpr-question_encoder-single-nq-base \
 sparse --index wikipedia-dpr-100w \
 fusion --alpha 1.2 \
 run	--topics nq-test \
	--output run.odqa.DPR-Hybrid.nq-test.hits-100.txt \
	--threads 16 --batch-size 512 \
	--hits 1000
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics nq-test \
  --index wikipedia-dpr \
  --input run.odqa.DPR-Hybrid.nq-test.hits-100.txt \
  --output run.odqa.DPR-Hybrid.nq-test.hits-100.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.DPR-Hybrid.nq-test.hits-100.json \
  --topk 20 100
Runs can be generated using the commands above. Fusing the results using RRF
python -m pyserini.fusion \
  --runs run.odqa.DPR-DKRR.dpr-trivia-test.hits-100.txt \
	 run.odqa.GarT5-RRF.dpr-trivia-test.hits-100.fusion.txt \
  --output run.odqa.GarT5RRF-DKRR-RRF.dpr-trivia-test.txt \
  --k 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics dpr-trivia-test \
  --index wikipedia-dpr \
  --input run.odqa.GarT5RRF-DKRR-RRF.dpr-trivia-test.txt \
  --output run.odqa.GarT5RRF-DKRR-RRF.dpr-trivia-test.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.GarT5RRF-DKRR-RRF.dpr-trivia-test.json \
  --topk 20 100
Runs can be generated using the commands above. Fusing the results using RRF
python -m pyserini.fusion \
  --runs run.odqa.DPR-DKRR.nq-test.hits-100.txt \
	 run.odqa.GarT5-RRF.nq-test.hits-100.fusion.txt \
  --output run.odqa.GarT5RRF-DKRR-RRF.dpr-trivia-test.txt \
  --k 100
Converting from trec into json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
  --topics nq-test \
  --index wikipedia-dpr \
  --input run.odqa.GarT5RRF-DKRR-RRF.dpr-trivia-test.txt \
  --output run.odqa.GarT5RRF-DKRR-RRF.dpr-trivia-test.json
Evaluation commands:
python -m pyserini.eval.evaluate_dpr_retrieval \
  --retrieval run.odqa.GarT5RRF-DKRR-RRF.dpr-trivia-test.json \
  --topk 20 100