MS MARCO V1 Document

The two-click* reproduction matrix below provides commands for reproducing experimental results reported in a number of papers, denoted by the references in square brackets. Instructions for programmatic execution are shown at the bottom of this page (scroll down).

TREC 2019 TREC 2020 dev

AP@100
nDCG@10 R@1K
AP@100
nDCG@10 R@1K RR@100 R@1K
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-doc-default.dl19.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-doc-default.dl20.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-doc-default.dev.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-doc-segmented-default.dl19.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-doc-segmented-default.dl20.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-doc-segmented-default.dev.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-doc-default.dl19.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-doc-default.dl20.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-doc-default.dev.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl19.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl20.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dev.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-default.dl19.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-default.dl20.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-default.dev.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl19.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl20.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dev.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-doc-tuned.dl19.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-doc-tuned.dl20.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-doc-tuned.dev.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl19.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl20.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-doc-segmented-tuned.dev.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl19.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl20.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-doc-tuned.dev.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl19.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl20.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dev.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl19.txt \
  --bm25 --rocchio
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl20.txt \
  --bm25 --rocchio
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dev.txt \
  --bm25 --rocchio
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl19.txt \
  --bm25 --rocchio --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl20.txt \
  --bm25 --rocchio --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dev.txt \
  --bm25 --rocchio --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl19.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl20.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dev.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl19.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl20.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dev.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl19.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl20.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dev.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl19.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl20.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dev.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl19.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl20.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dev.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl19.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl20.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dev.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl19.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl20.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dev.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl19.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl20.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dev.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil-noexp \
  --topics dl19-doc-unicoil-noexp \
  --output run.msmarco-v1-doc.unicoil-noexp.dl19.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.unicoil-noexp.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.unicoil-noexp.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.unicoil-noexp.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil-noexp \
  --topics dl20-unicoil-noexp \
  --output run.msmarco-v1-doc.unicoil-noexp.dl20.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.unicoil-noexp.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.unicoil-noexp.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.unicoil-noexp.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil-noexp \
  --topics msmarco-doc-dev-unicoil-noexp \
  --output run.msmarco-v1-doc.unicoil-noexp.dev.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-noexp.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-noexp.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil-noexp \
  --topics dl19-doc \
  --encoder castorini/unicoil-noexp-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-noexp-pytorch.dl19.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil-noexp \
  --topics dl20 \
  --encoder castorini/unicoil-noexp-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-noexp-pytorch.dl20.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil-noexp \
  --topics msmarco-doc-dev \
  --encoder castorini/unicoil-noexp-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-noexp-pytorch.dev.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil \
  --topics dl19-doc-unicoil \
  --output run.msmarco-v1-doc.unicoil.dl19.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.unicoil.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.unicoil.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.unicoil.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil \
  --topics dl20-unicoil \
  --output run.msmarco-v1-doc.unicoil.dl20.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.unicoil.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.unicoil.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.unicoil.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil \
  --topics msmarco-doc-dev-unicoil \
  --output run.msmarco-v1-doc.unicoil.dev.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil \
  --topics dl19-doc \
  --encoder castorini/unicoil-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-pytorch.dl19.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil \
  --topics dl20 \
  --encoder castorini/unicoil-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-pytorch.dl20.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil \
  --topics msmarco-doc-dev \
  --encoder castorini/unicoil-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-pytorch.dev.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-pytorch.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-pytorch.dev.txt

Programmatic Execution

All experimental runs shown in the above table can be programmatically executed based on the instructions below. To list all the experimental conditions:

python -m pyserini.2cr.msmarco --collection v1-doc --list-conditions

These conditions correspond to the table rows above.

For all conditions, just show the commands in a "dry run":

python -m pyserini.2cr.msmarco --collection v1-doc --all --display-commands --dry-run

To actually run all the experimental conditions:

python -m pyserini.2cr.msmarco --collection v1-doc --all --display-commands

With the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

To show the commands for a specific condition:

python -m pyserini.2cr.msmarco --collection v1-doc --condition bm25-doc-default --display-commands --dry-run

This will generate exactly the commands for a specific condition above (corresponding to a row in the table).

To actually run a specific condition:

python -m pyserini.2cr.msmarco --collection v1-doc --condition bm25-doc-default --display-commands

Again, with the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

Finally, to generate this page:

python -m pyserini.2cr.msmarco --collection v1-doc --generate-report --output msmarco-v1-doc.html

The output file msmarco-v1-doc.html should be identical to this page.