RankLLM Reproductions: MS MARCO V1 Passage

The two-click^* reproduction matrix below provides commands for reproducing experimental results reported in a number of papers, denoted by the references in square brackets. Instructions for programmatic execution are shown at the bottom of this page (scroll down).

		Multi-Pass	First-Stage Method	Top-k	TREC 2019	TREC 2020
					nDCG@10	nDCG@10
[1]	RankVicuna 7B V1	-	SPLADE++ EnsembleDistil	100	0.7325	0.7458
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_vicuna_7b_v1 \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=SPLADE++_EnsembleDistil_ONNX \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_vicuna_7b_v1 \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=SPLADE++_EnsembleDistil_ONNX \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096` Command to generate run on dev queries:
[2]	RankZephyr 7B V1 - Full	-	SPLADE++ EnsembleDistil	100	0.7781	0.8147
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_zephyr_7b_v1_full \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=SPLADE++_EnsembleDistil_ONNX \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_zephyr_7b_v1_full \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=SPLADE++_EnsembleDistil_ONNX \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate run on dev queries:
[2]	RankZephyr 7B V1 - Full	3	SPLADE++ EnsembleDistil	100	0.7777	0.8031
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_zephyr_7b_v1_full \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=SPLADE++_EnsembleDistil_ONNX \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages \ --num_passes=3` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_zephyr_7b_v1_full \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=SPLADE++_EnsembleDistil_ONNX \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages \ --num_passes=3` Command to generate run on dev queries:
[3]	Monot5 3B MSMARCO-10k	-	BM25	100	0.7173	0.6887
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/monot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/monot5_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/monot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/monot5_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate run on dev queries:
[3]	Duot5 3B MSMARCO-10k	-	BM25	100	0.7302	0.6913
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/duot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/duot5_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/duot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/duot5_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate run on dev queries:
[3]	Lit5Distill Large	-	BM25	100	0.7247	0.7049
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/LiT5-Distill-large \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_fid_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/LiT5-Distill-large \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_fid_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate run on dev queries:
[3]	RankVicuna 7B V1	-	BM25	100	0.6790	0.6582
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_vicuna_7b_v1 \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_vicuna_7b_v1 \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate run on dev queries:
[3]	RankZephyr 7B V1 - Full	-	BM25	100	0.7365	0.7080
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_zephyr_7b_v1_full \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_zephyr_7b_v1_full \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate run on dev queries:
[3]	First Mistral	-	BM25	100	0.7294	0.7043
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/first_mistral \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages --use_alpha --use_logits` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/first_mistral \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages --use_alpha --use_logits` Command to generate run on dev queries:
[3]	Qwen 2.5 7B Instruct	-	BM25	100	0.6987	0.6298
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=Qwen/Qwen2.5-7B-Instruct \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=Qwen/Qwen2.5-7B-Instruct \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate run on dev queries:
[3]	LLaMA 3.1 8B Instruct	-	BM25	100	0.6779	0.6326
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=meta-llama/Llama-3.1-8B-Instruct \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=meta-llama/Llama-3.1-8B-Instruct \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate run on dev queries:
[3]	Gemini 2.0 Flash	-	BM25	100	0.7362	0.6930
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gemini-2.0-flash \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gemini-2.0-flash \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate run on dev queries:
[3]	RankGPT (gpt-4o-mini)	-	BM25	100	0.7345	0.6841
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate run on dev queries:
[3]	RankGPTAPEER (gpt-4o-mini)	-	BM25	100	0.7312	0.6845
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate run on dev queries:
[3]	LRL (gpt-4o-mini)	-	BM25	100	0.7296	0.6807
TREC 2019 TREC 2020 Command to generate and evaluate run on TREC 2019 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl19 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_lrl_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate and evaluate run on TREC 2020 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl20 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_lrl_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate run on dev queries:

[1] Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models. arXiv:2309.15088, September 2023.
[2] Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! arXiv:2312.02724, December 2023.
[3] Sahel Sharifymoghaddam, Ronak Pradeep, Andre Slavescu, Ryan Nguyen, Andrew Xu, Zijian Chen, Yilin Zhang, Yidi Chen, Jasper Xian, and Jimmy Lin. RankLLM: A Python Package for Reranking with LLMs Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2025), May 2025.

Programmatic Execution

Activate Conda Environment:

conda create -n rz python=3.10 conda activate rz

All experimental runs shown in the above table can be programmatically executed based on the instructions below. To list all the experimental conditions:

python -m src.rank_llm.2cr.msmarco --collection v1-passage --list-conditions

These conditions correspond to the table rows above.

For all conditions, just show the commands in a "dry run":

python -m src.rank_llm.2cr.msmarco --collection v1-passage --all --display-commands --dry-run

To actually run all the experimental conditions:

python -m src.rank_llm.2cr.msmarco --collection v1-passage --all --display-commands

With the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

To show the commands for a specific condition:

python -m src.rank_llm.2cr.msmarco --collection v1-passage --condition bm25-default --display-commands --dry-run

This will generate exactly the commands for a specific condition above (corresponding to a row in the table).

To actually run a specific condition:

python -m src.rank_llm.2cr.msmarco --collection v1-passage --condition bm25-default --display-commands

Again, with the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

Finally, to generate this page:

python -m src.rank_llm.2cr.msmarco --collection v1-passage --generate-report --output msmarco-v1-passage.html

The output file msmarco-v1-passage.html should be identical to this page.