RankLLM Reproductions: MS MARCO V2 Passage

The two-click^* reproduction matrix below provides commands for reproducing experimental results reported in a number of papers, denoted by the references in square brackets. Instructions for programmatic execution are shown at the bottom of this page (scroll down).

		Multi-Pass	First-Stage Method	Top-k	TREC 2021	TREC 2022	TREC 2023
					nDCG@10	nDCG@10	nDCG@10
[1]	Monot5 3B MSMARCO-10k	-	BM25	100	0.6682	0.4954	0.4502
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/monot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/monot5_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/monot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/monot5_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/monot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/monot5_template.yaml \ --context_size=4096 \ --variable_passages`
[1]	Duot5 3B MSMARCO-10k	-	BM25	100	0.6951	0.5158	0.4600
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/duot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/duot5_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/duot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/duot5_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/duot5-3b-msmarco-10k \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/duot5_template.yaml \ --context_size=4096 \ --variable_passages`
[1]	Lit5Distill Large	-	BM25	100	0.6671	0.5102	0.4578
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/LiT5-Distill-large \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_fid_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/LiT5-Distill-large \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_fid_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/LiT5-Distill-large \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_fid_template.yaml \ --context_size=4096 \ --variable_passages`
[1]	RankVicuna 7B V1	-	BM25	100	0.6194	0.4336	0.3988
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_vicuna_7b_v1 \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_vicuna_7b_v1 \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_vicuna_7b_v1 \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages`
[1]	RankZephyr 7B V1 - Full	-	BM25	100	0.7016	0.5152	0.4373
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_zephyr_7b_v1_full \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_zephyr_7b_v1_full \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/rank_zephyr_7b_v1_full \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages`
[1]	First Mistral	-	BM25	100	0.6849	0.4893	0.4470
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/first_mistral \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages --use_alpha --use_logits` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/first_mistral \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages --use_alpha --use_logits` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=castorini/first_mistral \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages --use_alpha --use_logits`
[1]	Qwen 2.5 7B Instruct	-	BM25	100	0.6358	0.4186	0.3925
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=Qwen/Qwen2.5-7B-Instruct \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=Qwen/Qwen2.5-7B-Instruct \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=Qwen/Qwen2.5-7B-Instruct \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages`
[1]	LLaMA 3.1 8B Instruct	-	BM25	100	0.6390	0.4517	0.4112
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=meta-llama/Llama-3.1-8B-Instruct \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=meta-llama/Llama-3.1-8B-Instruct \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=meta-llama/Llama-3.1-8B-Instruct \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages`
[1]	Gemini 2.0 Flash	-	BM25	100	0.6807	0.4805	0.4650
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gemini-2.0-flash \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gemini-2.0-flash \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gemini-2.0-flash \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml \ --context_size=4096 \ --variable_passages`
[1]	RankGPT (gpt-4o-mini)	-	BM25	100	0.6868	0.4878	0.4703
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai`
[1]	RankGPTAPEER (gpt-4o-mini)	-	BM25	100	0.6695	0.4901	0.4514
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai`
[1]	LRL (gpt-4o-mini)	-	BM25	100	0.6720	0.4813	0.4599
TREC 2021 TREC 2022 TREC 2023 Command to generate and evaluate run on TREC 2021 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl21 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_lrl_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate and evaluate run on TREC 2022 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl22 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_lrl_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai` Command to generate and evaluate run on TREC 2023 queries: `python src/rank_llm/scripts/run_rank_llm.py \ --model_path=gpt-4o-mini \ --top_k_candidates=100 --dataset=dl23 \ --retrieval_method=bm25 \ --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_lrl_template.yaml \ --context_size=4096 \ --variable_passages --use_azure_openai`

MS MARCO V2 Passage

Programmatic Execution