MS MARCO V1 Doc Reproductions

The two-click* reproduction matrix below provides commands for reproducing experimental results reported in a number of papers, denoted by the references in square brackets. These runs take advantage of prebuilt indexes in Pyserini, so you don't need access to the raw corpus. Instructions for programmatic execution are shown at the bottom of this page.

TREC 2019 TREC 2020 dev

AP@100
nDCG@10 R@1K
AP@100
nDCG@10 R@1K RR@100 R@1K
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-doc-default.dl19.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-doc-default.dl20.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-doc-default.dev.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-doc-segmented-default.dl19.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-doc-segmented-default.dl20.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-doc-segmented-default.dev.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-doc-default.dl19.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-doc-default.dl20.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-doc-default.dev.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl19.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl20.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dev.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-default.dl19.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-default.dl20.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-default.dev.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl19.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl20.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dev.txt \
  --bm25 --rocchio --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-doc-tuned.dl19.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-doc-tuned.dl20.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-doc-tuned.dev.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl19.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl20.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-doc-segmented-tuned.dev.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl19.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl20.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-doc-tuned.dev.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl19.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl20.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dev.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl19.txt \
  --bm25 --rocchio
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl20.txt \
  --bm25 --rocchio
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dev.txt \
  --bm25 --rocchio
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl19.txt \
  --bm25 --rocchio --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl20.txt \
  --bm25 --rocchio --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dev.txt \
  --bm25 --rocchio --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rocchio-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl19.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl20.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dev.txt \
  --bm25 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl19.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl20.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dev.txt \
  --bm25 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl19.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl20.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dev.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl19.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl20.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dev.txt \
  --bm25 --rm3 --k1 0.9 --b 0.4 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-default.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl19.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl20.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5 \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dev.txt \
  --bm25
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl19.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl20.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5 \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dev.txt \
  --bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-d2q-t5-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl19.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl20.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc.d2q-t5-docvectors \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dev.txt \
  --bm25 --rm3
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics dl19-doc \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl19.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics dl20 \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl20.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.d2q-t5-docvectors \
  --topics msmarco-doc-dev \
  --output run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dev.txt \
  --bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.bm25-rm3-d2q-t5-doc-segmented-tuned.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil-noexp \
  --topics dl19-doc \
  --encoder castorini/unicoil-noexp-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-noexp-pytorch.dl19.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil-noexp \
  --topics dl20 \
  --encoder castorini/unicoil-noexp-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-noexp-pytorch.dl20.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil-noexp \
  --topics msmarco-doc-dev \
  --encoder castorini/unicoil-noexp-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-noexp-pytorch.dev.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-noexp-pytorch.dev.txt
Command to generate run on TREC 2019 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil \
  --topics dl19-doc \
  --encoder castorini/unicoil-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-pytorch.dl19.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl19-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl19-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl19.txt
Command to generate run on TREC 2020 queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil \
  --topics dl20 \
  --encoder castorini/unicoil-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-pytorch.dl20.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m map dl20-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 dl20-doc \
  run.msmarco-v1-doc.unicoil-pytorch.dl20.txt
Command to generate run on dev queries:
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-doc-segmented.unicoil \
  --topics msmarco-doc-dev \
  --encoder castorini/unicoil-msmarco-passage \
  --output run.msmarco-v1-doc.unicoil-pytorch.dev.txt \
  --impact --hits 10000 --max-passage-hits 1000 --max-passage
Evaluation commands:
python -m pyserini.eval.trec_eval -c -M 100 -m recip_rank msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-pytorch.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-doc-dev \
  run.msmarco-v1-doc.unicoil-pytorch.dev.txt

References

Programmatic Execution

All experimental runs shown in the above table can be programmatically executed based on the instructions below. To list all the experimental conditions:

python -m pyserini.2cr.msmarco --collection v1-doc --list-conditions

These conditions correspond to the table rows above.

For all conditions, just show the commands in a "dry run":

python -m pyserini.2cr.msmarco --collection v1-doc --all --display-commands --dry-run

To actually run all the experimental conditions:

python -m pyserini.2cr.msmarco --collection v1-doc --all --display-commands

With the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

To show the commands for a specific condition:

python -m pyserini.2cr.msmarco --collection v1-doc --condition bm25-doc-default --display-commands --dry-run

This will generate exactly the commands for a specific condition above (corresponding to a row in the table).

To actually run a specific condition:

python -m pyserini.2cr.msmarco --collection v1-doc --condition bm25-doc-default --display-commands

Again, with the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

Finally, to generate this page:

python -m pyserini.2cr.msmarco --collection v1-doc --generate-report --output msmarco-v1-doc.html

The output file msmarco-v1-doc.html should be identical to this page.