Pyserini Reproductions

The two-click^* reproductions below provides commands for reproducing experimental results on BRIGHT. Instructions for programmatic execution are shown at the bottom of this page (scroll down).

Main Results

The main results table provides commands for reproducing runs using the following models:

BM25 BoW: BM25 "bag-of-words" baseline (bm25) [1]
Query-side BM25: Query-side BM25 BM25 baseline (bm25qs) [1]
SPLADE: SPLADE-v3 (splade-v3) [1]
BGE: BGE-large-en-v1.5 (Lucene flat index) (bge-large-en-v1.5.flat) [1]
Diver: Diver-Retriever-4B (Faiss flat index) (diver-retriever-4b) [2]
Reason-Embed: reason-embed-qwen3-4b-0928 (Faiss flat index) (reason-embed-qwen3-4b-0928) [3]

	BM25 BoW		Query-side BM25		SPLADE		BGE		Diver		Reason-Embed
	nDCG@10	R@100	nDCG@10	R@100	nDCG@10	R@100	nDCG@10	R@100	nDCG@10	R@100	nDCG@10	R@100
biology	0.182	0.420	0.197	0.458	0.210	0.560	0.124	0.408	0.425	0.798	0.541	0.924
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-biology \ --topics bright-biology \ --output run.bright.bm25.biology.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-biology \ run.bright.bm25.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-biology \ run.bright.bm25.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-biology \ run.bright.bm25.biology.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-biology \ --topics bright-biology \ --output run.bright.bm25qs.biology.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-biology \ run.bright.bm25qs.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-biology \ run.bright.bm25qs.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-biology \ run.bright.bm25qs.biology.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-biology.splade-v3 \ --topics bright-biology \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.biology.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-biology \ run.bright.splade-v3.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-biology \ run.bright.splade-v3.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-biology \ run.bright.splade-v3.biology.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-biology.bge-large-en-v1.5.flat \ --topics bright-biology \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.biology.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-biology \ run.bright.bge-large-en-v1.5.flat.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-biology \ run.bright.bge-large-en-v1.5.flat.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-biology \ run.bright.bge-large-en-v1.5.flat.biology.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-biology.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-biology-original \ --output run.bright.diver-retriever-4b.biology.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-biology \ run.bright.diver-retriever-4b.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-biology \ run.bright.diver-retriever-4b.biology.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-biology.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Biology post, retrieve relevant passages that help answer the post.\nQuery: ' \ --topics bright-biology-original \ --output run.bright.reason-embed-qwen3-4b-0928.biology.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-biology \ run.bright.reason-embed-qwen3-4b-0928.biology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-biology \ run.bright.reason-embed-qwen3-4b-0928.biology.txt`
earth-science	0.279	0.596	0.279	0.601	0.267	0.578	0.255	0.540	0.464	0.791	0.537	0.854
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-earth-science \ --topics bright-earth-science \ --output run.bright.bm25.earth-science.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-earth-science \ run.bright.bm25.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-earth-science \ run.bright.bm25.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-earth-science \ run.bright.bm25.earth-science.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-earth-science \ --topics bright-earth-science \ --output run.bright.bm25qs.earth-science.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-earth-science \ run.bright.bm25qs.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-earth-science \ run.bright.bm25qs.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-earth-science \ run.bright.bm25qs.earth-science.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-earth-science.splade-v3 \ --topics bright-earth-science \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.earth-science.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-earth-science \ run.bright.splade-v3.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-earth-science \ run.bright.splade-v3.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-earth-science \ run.bright.splade-v3.earth-science.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-earth-science.bge-large-en-v1.5.flat \ --topics bright-earth-science \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.earth-science.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-earth-science \ run.bright.bge-large-en-v1.5.flat.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-earth-science \ run.bright.bge-large-en-v1.5.flat.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-earth-science \ run.bright.bge-large-en-v1.5.flat.earth-science.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-earth-science.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-earth-science-original \ --output run.bright.diver-retriever-4b.earth-science.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-earth-science \ run.bright.diver-retriever-4b.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-earth-science \ run.bright.diver-retriever-4b.earth-science.txt` Command to generate run: python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-earth-science.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given an Earth Science post, retrieve relevant passages that help answer the post.\nQuery: ' \ --topics bright-earth-science-original \ --output run.bright.reason-embed-qwen3-4b-0928.earth-science.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0 Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-earth-science \ run.bright.reason-embed-qwen3-4b-0928.earth-science.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-earth-science \ run.bright.reason-embed-qwen3-4b-0928.earth-science.txt`
economics	0.165	0.408	0.152	0.377	0.160	0.448	0.166	0.487	0.222	0.573	0.339	0.716
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-economics \ --topics bright-economics \ --output run.bright.bm25.economics.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-economics \ run.bright.bm25.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-economics \ run.bright.bm25.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-economics \ run.bright.bm25.economics.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-economics \ --topics bright-economics \ --output run.bright.bm25qs.economics.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-economics \ run.bright.bm25qs.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-economics \ run.bright.bm25qs.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-economics \ run.bright.bm25qs.economics.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-economics.splade-v3 \ --topics bright-economics \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.economics.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-economics \ run.bright.splade-v3.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-economics \ run.bright.splade-v3.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-economics \ run.bright.splade-v3.economics.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-economics.bge-large-en-v1.5.flat \ --topics bright-economics \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.economics.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-economics \ run.bright.bge-large-en-v1.5.flat.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-economics \ run.bright.bge-large-en-v1.5.flat.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-economics \ run.bright.bge-large-en-v1.5.flat.economics.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-economics.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-economics-original \ --output run.bright.diver-retriever-4b.economics.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-economics \ run.bright.diver-retriever-4b.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-economics \ run.bright.diver-retriever-4b.economics.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-economics.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given an Economics post, retrieve relevant passages that help answer the post.\nQuery: ' \ --topics bright-economics-original \ --output run.bright.reason-embed-qwen3-4b-0928.economics.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-economics \ run.bright.reason-embed-qwen3-4b-0928.economics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-economics \ run.bright.reason-embed-qwen3-4b-0928.economics.txt`
psychology	0.134	0.377	0.127	0.410	0.153	0.469	0.180	0.465	0.337	0.747	0.459	0.816
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-psychology \ --topics bright-psychology \ --output run.bright.bm25.psychology.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-psychology \ run.bright.bm25.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-psychology \ run.bright.bm25.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-psychology \ run.bright.bm25.psychology.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-psychology \ --topics bright-psychology \ --output run.bright.bm25qs.psychology.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-psychology \ run.bright.bm25qs.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-psychology \ run.bright.bm25qs.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-psychology \ run.bright.bm25qs.psychology.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-psychology.splade-v3 \ --topics bright-psychology \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.psychology.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-psychology \ run.bright.splade-v3.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-psychology \ run.bright.splade-v3.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-psychology \ run.bright.splade-v3.psychology.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-psychology.bge-large-en-v1.5.flat \ --topics bright-psychology \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.psychology.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-psychology \ run.bright.bge-large-en-v1.5.flat.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-psychology \ run.bright.bge-large-en-v1.5.flat.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-psychology \ run.bright.bge-large-en-v1.5.flat.psychology.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-psychology.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-psychology-original \ --output run.bright.diver-retriever-4b.psychology.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-psychology \ run.bright.diver-retriever-4b.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-psychology \ run.bright.diver-retriever-4b.psychology.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-psychology.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Psychology post, retrieve relevant passages that help answer the post.\nQuery: ' \ --topics bright-psychology-original \ --output run.bright.reason-embed-qwen3-4b-0928.psychology.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-psychology \ run.bright.reason-embed-qwen3-4b-0928.psychology.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-psychology \ run.bright.reason-embed-qwen3-4b-0928.psychology.txt`
robotics	0.109	0.371	0.139	0.467	0.158	0.447	0.123	0.344	0.209	0.515	0.345	0.706
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-robotics \ --topics bright-robotics \ --output run.bright.bm25.robotics.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-robotics \ run.bright.bm25.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-robotics \ run.bright.bm25.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-robotics \ run.bright.bm25.robotics.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-robotics \ --topics bright-robotics \ --output run.bright.bm25qs.robotics.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-robotics \ run.bright.bm25qs.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-robotics \ run.bright.bm25qs.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-robotics \ run.bright.bm25qs.robotics.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-robotics.splade-v3 \ --topics bright-robotics \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.robotics.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-robotics \ run.bright.splade-v3.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-robotics \ run.bright.splade-v3.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-robotics \ run.bright.splade-v3.robotics.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-robotics.bge-large-en-v1.5.flat \ --topics bright-robotics \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.robotics.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-robotics \ run.bright.bge-large-en-v1.5.flat.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-robotics \ run.bright.bge-large-en-v1.5.flat.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-robotics \ run.bright.bge-large-en-v1.5.flat.robotics.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-robotics.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-robotics-original \ --output run.bright.diver-retriever-4b.robotics.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-robotics \ run.bright.diver-retriever-4b.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-robotics \ run.bright.diver-retriever-4b.robotics.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-robotics.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Robotics post, retrieve relevant passages that help answer the post.\nQuery: ' \ --topics bright-robotics-original \ --output run.bright.reason-embed-qwen3-4b-0928.robotics.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-robotics \ run.bright.reason-embed-qwen3-4b-0928.robotics.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-robotics \ run.bright.reason-embed-qwen3-4b-0928.robotics.txt`
stackoverflow	0.163	0.409	0.185	0.463	0.129	0.432	0.110	0.532	0.211	0.672	0.358	0.808
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-stackoverflow \ --topics bright-stackoverflow \ --output run.bright.bm25.stackoverflow.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-stackoverflow \ run.bright.bm25.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-stackoverflow \ run.bright.bm25.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-stackoverflow \ run.bright.bm25.stackoverflow.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-stackoverflow \ --topics bright-stackoverflow \ --output run.bright.bm25qs.stackoverflow.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-stackoverflow \ run.bright.bm25qs.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-stackoverflow \ run.bright.bm25qs.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-stackoverflow \ run.bright.bm25qs.stackoverflow.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-stackoverflow.splade-v3 \ --topics bright-stackoverflow \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.stackoverflow.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-stackoverflow \ run.bright.splade-v3.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-stackoverflow \ run.bright.splade-v3.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-stackoverflow \ run.bright.splade-v3.stackoverflow.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-stackoverflow.bge-large-en-v1.5.flat \ --topics bright-stackoverflow \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.stackoverflow.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-stackoverflow \ run.bright.bge-large-en-v1.5.flat.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-stackoverflow \ run.bright.bge-large-en-v1.5.flat.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-stackoverflow \ run.bright.bge-large-en-v1.5.flat.stackoverflow.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-stackoverflow.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-stackoverflow-original \ --output run.bright.diver-retriever-4b.stackoverflow.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-stackoverflow \ run.bright.diver-retriever-4b.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-stackoverflow \ run.bright.diver-retriever-4b.stackoverflow.txt` Command to generate run: python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-stackoverflow.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Stack Overflow post, retrieve relevant passages that help answer the post.\nQuery: ' \ --topics bright-stackoverflow-original \ --output run.bright.reason-embed-qwen3-4b-0928.stackoverflow.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0 Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-stackoverflow \ run.bright.reason-embed-qwen3-4b-0928.stackoverflow.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-stackoverflow \ run.bright.reason-embed-qwen3-4b-0928.stackoverflow.txt`
sustainable-living	0.161	0.420	0.151	0.465	0.150	0.492	0.144	0.516	0.243	0.682	0.372	0.804
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-sustainable-living \ --topics bright-sustainable-living \ --output run.bright.bm25.sustainable-living.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-sustainable-living \ run.bright.bm25.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-sustainable-living \ run.bright.bm25.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-sustainable-living \ run.bright.bm25.sustainable-living.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-sustainable-living \ --topics bright-sustainable-living \ --output run.bright.bm25qs.sustainable-living.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-sustainable-living \ run.bright.bm25qs.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-sustainable-living \ run.bright.bm25qs.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-sustainable-living \ run.bright.bm25qs.sustainable-living.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-sustainable-living.splade-v3 \ --topics bright-sustainable-living \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.sustainable-living.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-sustainable-living \ run.bright.splade-v3.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-sustainable-living \ run.bright.splade-v3.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-sustainable-living \ run.bright.splade-v3.sustainable-living.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-sustainable-living.bge-large-en-v1.5.flat \ --topics bright-sustainable-living \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.sustainable-living.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-sustainable-living \ run.bright.bge-large-en-v1.5.flat.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-sustainable-living \ run.bright.bge-large-en-v1.5.flat.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-sustainable-living \ run.bright.bge-large-en-v1.5.flat.sustainable-living.txt` Command to generate run: python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-sustainable-living.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-sustainable-living-original \ --output run.bright.diver-retriever-4b.sustainable-living.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0 Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-sustainable-living \ run.bright.diver-retriever-4b.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-sustainable-living \ run.bright.diver-retriever-4b.sustainable-living.txt` Command to generate run: python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-sustainable-living.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Sustainable Living post, retrieve relevant passages that help answer the post.\nQuery: ' \ --topics bright-sustainable-living-original \ --output run.bright.reason-embed-qwen3-4b-0928.sustainable-living.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0 Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-sustainable-living \ run.bright.reason-embed-qwen3-4b-0928.sustainable-living.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-sustainable-living \ run.bright.reason-embed-qwen3-4b-0928.sustainable-living.txt`
pony	0.043	0.172	0.079	0.282	0.144	0.245	0.034	0.180	0.128	0.248	0.118	0.268
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-pony \ --topics bright-pony \ --output run.bright.bm25.pony.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-pony \ run.bright.bm25.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-pony \ run.bright.bm25.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-pony \ run.bright.bm25.pony.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-pony \ --topics bright-pony \ --output run.bright.bm25qs.pony.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-pony \ run.bright.bm25qs.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-pony \ run.bright.bm25qs.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-pony \ run.bright.bm25qs.pony.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-pony.splade-v3 \ --topics bright-pony \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.pony.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-pony \ run.bright.splade-v3.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-pony \ run.bright.splade-v3.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-pony \ run.bright.splade-v3.pony.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-pony.bge-large-en-v1.5.flat \ --topics bright-pony \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.pony.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-pony \ run.bright.bge-large-en-v1.5.flat.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-pony \ run.bright.bge-large-en-v1.5.flat.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-pony \ run.bright.bge-large-en-v1.5.flat.pony.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-pony.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-pony-original \ --output run.bright.diver-retriever-4b.pony.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-pony \ run.bright.diver-retriever-4b.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-pony \ run.bright.diver-retriever-4b.pony.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-pony.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Pony question, retrieve relevant passages that help answer the question.\nQuery: ' \ --topics bright-pony-original \ --output run.bright.reason-embed-qwen3-4b-0928.pony.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-pony \ run.bright.reason-embed-qwen3-4b-0928.pony.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-pony \ run.bright.reason-embed-qwen3-4b-0928.pony.txt`
leetcode	0.247	0.508	0.250	0.538	0.260	0.502	0.267	0.529	0.371	0.702	0.371	0.745
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-leetcode \ --topics bright-leetcode \ --output run.bright.bm25.leetcode.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-leetcode \ run.bright.bm25.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-leetcode \ run.bright.bm25.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-leetcode \ run.bright.bm25.leetcode.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-leetcode \ --topics bright-leetcode \ --output run.bright.bm25qs.leetcode.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-leetcode \ run.bright.bm25qs.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-leetcode \ run.bright.bm25qs.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-leetcode \ run.bright.bm25qs.leetcode.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-leetcode.splade-v3 \ --topics bright-leetcode \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.leetcode.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-leetcode \ run.bright.splade-v3.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-leetcode \ run.bright.splade-v3.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-leetcode \ run.bright.splade-v3.leetcode.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-leetcode.bge-large-en-v1.5.flat \ --topics bright-leetcode \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.leetcode.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-leetcode \ run.bright.bge-large-en-v1.5.flat.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-leetcode \ run.bright.bge-large-en-v1.5.flat.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-leetcode \ run.bright.bge-large-en-v1.5.flat.leetcode.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-leetcode.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-leetcode-original \ --output run.bright.diver-retriever-4b.leetcode.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-leetcode \ run.bright.diver-retriever-4b.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-leetcode \ run.bright.diver-retriever-4b.leetcode.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-leetcode.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Coding problem, retrieve relevant examples that help answer the problem.\nQuery: ' \ --topics bright-leetcode-original \ --output run.bright.reason-embed-qwen3-4b-0928.leetcode.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-leetcode \ run.bright.reason-embed-qwen3-4b-0928.leetcode.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-leetcode \ run.bright.reason-embed-qwen3-4b-0928.leetcode.txt`
aops	0.065	0.197	0.063	0.217	0.069	0.262	0.064	0.240	0.108	0.344	0.115	0.373
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-aops \ --topics bright-aops \ --output run.bright.bm25.aops.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-aops \ run.bright.bm25.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-aops \ run.bright.bm25.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-aops \ run.bright.bm25.aops.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-aops \ --topics bright-aops \ --output run.bright.bm25qs.aops.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-aops \ run.bright.bm25qs.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-aops \ run.bright.bm25qs.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-aops \ run.bright.bm25qs.aops.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-aops.splade-v3 \ --topics bright-aops \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.aops.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-aops \ run.bright.splade-v3.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-aops \ run.bright.splade-v3.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-aops \ run.bright.splade-v3.aops.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-aops.bge-large-en-v1.5.flat \ --topics bright-aops \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.aops.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-aops \ run.bright.bge-large-en-v1.5.flat.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-aops \ run.bright.bge-large-en-v1.5.flat.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-aops \ run.bright.bge-large-en-v1.5.flat.aops.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-aops.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-aops-original \ --output run.bright.diver-retriever-4b.aops.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-aops \ run.bright.diver-retriever-4b.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-aops \ run.bright.diver-retriever-4b.aops.txt` Command to generate run: `python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-aops.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Math problem, retrieve relevant examples that help answer the problem.\nQuery: ' \ --topics bright-aops-original \ --output run.bright.reason-embed-qwen3-4b-0928.aops.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-aops \ run.bright.reason-embed-qwen3-4b-0928.aops.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-aops \ run.bright.reason-embed-qwen3-4b-0928.aops.txt`
theoremqa-theorems	0.021	0.134	0.049	0.184	0.055	0.262	0.053	0.259	0.374	0.795	0.452	0.917
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-theoremqa-theorems \ --topics bright-theoremqa-theorems \ --output run.bright.bm25.theoremqa-theorems.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-theorems \ run.bright.bm25.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-theorems \ run.bright.bm25.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-theoremqa-theorems \ run.bright.bm25.theoremqa-theorems.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-theoremqa-theorems \ --topics bright-theoremqa-theorems \ --output run.bright.bm25qs.theoremqa-theorems.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-theorems \ run.bright.bm25qs.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-theorems \ run.bright.bm25qs.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-theoremqa-theorems \ run.bright.bm25qs.theoremqa-theorems.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-theoremqa-theorems.splade-v3 \ --topics bright-theoremqa-theorems \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.theoremqa-theorems.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-theorems \ run.bright.splade-v3.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-theorems \ run.bright.splade-v3.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-theoremqa-theorems \ run.bright.splade-v3.theoremqa-theorems.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-theoremqa-theorems.bge-large-en-v1.5.flat \ --topics bright-theoremqa-theorems \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-theorems \ run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-theorems \ run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-theoremqa-theorems \ run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt` Command to generate run: python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-theoremqa-theorems.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-theoremqa-theorems-original \ --output run.bright.diver-retriever-4b.theoremqa-theorems.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0 Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-theorems \ run.bright.diver-retriever-4b.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-theorems \ run.bright.diver-retriever-4b.theoremqa-theorems.txt` Command to generate run: python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-theoremqa-theorems.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Math problem, retrieve relevant theorems that help answer the problem.\nQuery: ' \ --topics bright-theoremqa-theorems-original \ --output run.bright.reason-embed-qwen3-4b-0928.theoremqa-theorems.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0 Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-theorems \ run.bright.reason-embed-qwen3-4b-0928.theoremqa-theorems.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-theorems \ run.bright.reason-embed-qwen3-4b-0928.theoremqa-theorems.txt`
theoremqa-questions	0.073	0.159	0.104	0.212	0.111	0.266	0.141	0.282	0.381	0.585	0.409	0.637
BM25 BM25QS SPLADE BGE Diver Reason-Embed Command to generate run: `python -m pyserini.search.lucene \ --index bright-theoremqa-questions \ --topics bright-theoremqa-questions \ --output run.bright.bm25.theoremqa-questions.txt \ --output-format trec \ --hits 1000 --bm25 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-questions \ run.bright.bm25.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-questions \ run.bright.bm25.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-theoremqa-questions \ run.bright.bm25.theoremqa-questions.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-theoremqa-questions \ --topics bright-theoremqa-questions \ --output run.bright.bm25qs.theoremqa-questions.txt \ --output-format trec \ --hits 1000 --bm25qs --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-questions \ run.bright.bm25qs.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-questions \ run.bright.bm25qs.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-theoremqa-questions \ run.bright.bm25qs.theoremqa-questions.txt` Command to generate run: `python -m pyserini.search.lucene \ --index bright-theoremqa-questions.splade-v3 \ --topics bright-theoremqa-questions \ --onnx-encoder SpladeV3 \ --output run.bright.splade-v3.theoremqa-questions.txt \ --output-format trec \ --hits 1000 --impact --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-questions \ run.bright.splade-v3.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-questions \ run.bright.splade-v3.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-theoremqa-questions \ run.bright.splade-v3.theoremqa-questions.txt` Command to generate run: `python -m pyserini.search.lucene --dense --flat \ --index bright-theoremqa-questions.bge-large-en-v1.5.flat \ --topics bright-theoremqa-questions \ --onnx-encoder BgeLargeEn15 \ --output run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt \ --output-format trec \ --hits 1000 --remove-query` Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-questions \ run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-questions \ run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.1000 bright-theoremqa-questions \ run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt` Command to generate run: python -m pyserini.search.faiss \ --encoder AQ-MedAI/Diver-Retriever-4B \ --encoder-class qwen3 \ --index bright-theoremqa-questions.diver-retriever-4b \ --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \ --topics bright-theoremqa-questions-original \ --output run.bright.diver-retriever-4b.theoremqa-questions.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0 Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-questions \ run.bright.diver-retriever-4b.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-questions \ run.bright.diver-retriever-4b.theoremqa-questions.txt` Command to generate run: python -m pyserini.search.faiss \ --encoder hanhainebula/reason-embed-qwen3-4b-0928 \ --encoder-class qwen3 \ --index bright-theoremqa-questions.reason-embed-qwen3-4b-0928 \ --query-prefix $'Instruct: Given a Math problem, retrieve relevant examples that help answer the problem.\nQuery: ' \ --topics bright-theoremqa-questions-original \ --output run.bright.reason-embed-qwen3-4b-0928.theoremqa-questions.txt \ --hits 1000 --remove-query \ --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0 Evaluation commands: `python -m pyserini.eval.trec_eval \ -c -m ndcg_cut.10 bright-theoremqa-questions \ run.bright.reason-embed-qwen3-4b-0928.theoremqa-questions.txt python -m pyserini.eval.trec_eval \ -c -m recall.100 bright-theoremqa-questions \ run.bright.reason-embed-qwen3-4b-0928.theoremqa-questions.txt`

References

[1] Yijun Ge, Sahel Sharifymoghaddam, and Jimmy Lin. Lighting the Way for BRIGHT: Reproducible Baselines with Anserini, Pyserini, and RankLLM. arXiv:2509.02558, September 2025.
[2] Meixiu Long, Duolin Sun, Dan Yang, Junjie Wang, Yue Shen, Jian Wang, Peng Wei, Jinjie Gu, and Jiahai Wang. DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval. arXiv:2508.07995, August 2025.
[3] Jianlyu Chen, Junwei Lan, Chaofan Li, Defu Lian, and Zheng Liu. ReasonEmbed: Enhanced Text Embeddings for Reasoning-Intensive Document Retrieval. arXiv:2510.08252, October 2025.

Programmatic Execution

All experimental runs shown in the above table can be programmatically executed based on the instructions below. To list all the experimental conditions:

python -m pyserini.2cr.bright --list-conditions

These conditions correspond to the table rows above.

For all conditions, just show the commands in a "dry run":

python -m pyserini.2cr.bright --all --display-commands --dry-run

To actually run all the experimental conditions:

python -m pyserini.2cr.bright --all --display-commands

With the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

To show the commands for a specific condition:

python -m pyserini.2cr.bright --condition bm25qs --display-commands --dry-run

This will generate exactly the commands for a specific condition above (corresponding to a row in the table).

To actually run a specific condition:

python -m pyserini.2cr.bright --condition bm25qs --display-commands

Again, with the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

Finally, to generate this page:

python -m pyserini.2cr.bright --generate-report --output bright.html

The output file bright.html should be identical to this page.

Pyserini BRIGHT Regressions

Main Results

References

Programmatic Execution