Pyserini BRIGHT Regressions

The two-click* reproductions below provides commands for reproducing experimental results on BRIGHT. Instructions for programmatic execution are shown at the bottom of this page (scroll down).

Main Results

The main results table provides commands for reproducing runs using the following models:

BM25 BoW Query-side BM25 SPLADE BGE Diver Reason-Embed
nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-biology \
  --topics bright-biology \
  --output run.bright.bm25.biology.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-biology \
  run.bright.bm25.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-biology \
  run.bright.bm25.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-biology \
  run.bright.bm25.biology.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-biology \
  --topics bright-biology \
  --output run.bright.bm25qs.biology.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-biology \
  run.bright.bm25qs.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-biology \
  run.bright.bm25qs.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-biology \
  run.bright.bm25qs.biology.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-biology.splade-v3 \
  --topics bright-biology \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.biology.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-biology \
  run.bright.splade-v3.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-biology \
  run.bright.splade-v3.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-biology \
  run.bright.splade-v3.biology.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-biology.bge-large-en-v1.5.flat \
  --topics bright-biology \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.biology.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-biology \
  run.bright.bge-large-en-v1.5.flat.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-biology \
  run.bright.bge-large-en-v1.5.flat.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-biology \
  run.bright.bge-large-en-v1.5.flat.biology.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-biology.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-biology-original \
  --output run.bright.diver-retriever-4b.biology.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-biology \
  run.bright.diver-retriever-4b.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-biology \
  run.bright.diver-retriever-4b.biology.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-biology.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Biology post, retrieve relevant passages that help answer the post.\nQuery: ' \
  --topics bright-biology-original \
  --output run.bright.reason-embed-qwen3-4b-0928.biology.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-biology \
  run.bright.reason-embed-qwen3-4b-0928.biology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-biology \
  run.bright.reason-embed-qwen3-4b-0928.biology.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-earth-science \
  --topics bright-earth-science \
  --output run.bright.bm25.earth-science.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-earth-science \
  run.bright.bm25.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-earth-science \
  run.bright.bm25.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-earth-science \
  run.bright.bm25.earth-science.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-earth-science \
  --topics bright-earth-science \
  --output run.bright.bm25qs.earth-science.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-earth-science \
  run.bright.bm25qs.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-earth-science \
  run.bright.bm25qs.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-earth-science \
  run.bright.bm25qs.earth-science.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-earth-science.splade-v3 \
  --topics bright-earth-science \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.earth-science.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-earth-science \
  run.bright.splade-v3.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-earth-science \
  run.bright.splade-v3.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-earth-science \
  run.bright.splade-v3.earth-science.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-earth-science.bge-large-en-v1.5.flat \
  --topics bright-earth-science \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.earth-science.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-earth-science \
  run.bright.bge-large-en-v1.5.flat.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-earth-science \
  run.bright.bge-large-en-v1.5.flat.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-earth-science \
  run.bright.bge-large-en-v1.5.flat.earth-science.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-earth-science.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-earth-science-original \
  --output run.bright.diver-retriever-4b.earth-science.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-earth-science \
  run.bright.diver-retriever-4b.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-earth-science \
  run.bright.diver-retriever-4b.earth-science.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-earth-science.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given an Earth Science post, retrieve relevant passages that help answer the post.\nQuery: ' \
  --topics bright-earth-science-original \
  --output run.bright.reason-embed-qwen3-4b-0928.earth-science.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-earth-science \
  run.bright.reason-embed-qwen3-4b-0928.earth-science.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-earth-science \
  run.bright.reason-embed-qwen3-4b-0928.earth-science.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-economics \
  --topics bright-economics \
  --output run.bright.bm25.economics.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-economics \
  run.bright.bm25.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-economics \
  run.bright.bm25.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-economics \
  run.bright.bm25.economics.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-economics \
  --topics bright-economics \
  --output run.bright.bm25qs.economics.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-economics \
  run.bright.bm25qs.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-economics \
  run.bright.bm25qs.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-economics \
  run.bright.bm25qs.economics.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-economics.splade-v3 \
  --topics bright-economics \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.economics.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-economics \
  run.bright.splade-v3.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-economics \
  run.bright.splade-v3.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-economics \
  run.bright.splade-v3.economics.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-economics.bge-large-en-v1.5.flat \
  --topics bright-economics \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.economics.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-economics \
  run.bright.bge-large-en-v1.5.flat.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-economics \
  run.bright.bge-large-en-v1.5.flat.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-economics \
  run.bright.bge-large-en-v1.5.flat.economics.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-economics.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-economics-original \
  --output run.bright.diver-retriever-4b.economics.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-economics \
  run.bright.diver-retriever-4b.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-economics \
  run.bright.diver-retriever-4b.economics.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-economics.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given an Economics post, retrieve relevant passages that help answer the post.\nQuery: ' \
  --topics bright-economics-original \
  --output run.bright.reason-embed-qwen3-4b-0928.economics.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-economics \
  run.bright.reason-embed-qwen3-4b-0928.economics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-economics \
  run.bright.reason-embed-qwen3-4b-0928.economics.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-psychology \
  --topics bright-psychology \
  --output run.bright.bm25.psychology.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-psychology \
  run.bright.bm25.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-psychology \
  run.bright.bm25.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-psychology \
  run.bright.bm25.psychology.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-psychology \
  --topics bright-psychology \
  --output run.bright.bm25qs.psychology.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-psychology \
  run.bright.bm25qs.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-psychology \
  run.bright.bm25qs.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-psychology \
  run.bright.bm25qs.psychology.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-psychology.splade-v3 \
  --topics bright-psychology \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.psychology.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-psychology \
  run.bright.splade-v3.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-psychology \
  run.bright.splade-v3.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-psychology \
  run.bright.splade-v3.psychology.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-psychology.bge-large-en-v1.5.flat \
  --topics bright-psychology \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.psychology.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-psychology \
  run.bright.bge-large-en-v1.5.flat.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-psychology \
  run.bright.bge-large-en-v1.5.flat.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-psychology \
  run.bright.bge-large-en-v1.5.flat.psychology.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-psychology.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-psychology-original \
  --output run.bright.diver-retriever-4b.psychology.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-psychology \
  run.bright.diver-retriever-4b.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-psychology \
  run.bright.diver-retriever-4b.psychology.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-psychology.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Psychology post, retrieve relevant passages that help answer the post.\nQuery: ' \
  --topics bright-psychology-original \
  --output run.bright.reason-embed-qwen3-4b-0928.psychology.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-psychology \
  run.bright.reason-embed-qwen3-4b-0928.psychology.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-psychology \
  run.bright.reason-embed-qwen3-4b-0928.psychology.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-robotics \
  --topics bright-robotics \
  --output run.bright.bm25.robotics.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-robotics \
  run.bright.bm25.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-robotics \
  run.bright.bm25.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-robotics \
  run.bright.bm25.robotics.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-robotics \
  --topics bright-robotics \
  --output run.bright.bm25qs.robotics.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-robotics \
  run.bright.bm25qs.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-robotics \
  run.bright.bm25qs.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-robotics \
  run.bright.bm25qs.robotics.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-robotics.splade-v3 \
  --topics bright-robotics \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.robotics.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-robotics \
  run.bright.splade-v3.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-robotics \
  run.bright.splade-v3.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-robotics \
  run.bright.splade-v3.robotics.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-robotics.bge-large-en-v1.5.flat \
  --topics bright-robotics \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.robotics.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-robotics \
  run.bright.bge-large-en-v1.5.flat.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-robotics \
  run.bright.bge-large-en-v1.5.flat.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-robotics \
  run.bright.bge-large-en-v1.5.flat.robotics.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-robotics.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-robotics-original \
  --output run.bright.diver-retriever-4b.robotics.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-robotics \
  run.bright.diver-retriever-4b.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-robotics \
  run.bright.diver-retriever-4b.robotics.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-robotics.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Robotics post, retrieve relevant passages that help answer the post.\nQuery: ' \
  --topics bright-robotics-original \
  --output run.bright.reason-embed-qwen3-4b-0928.robotics.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-robotics \
  run.bright.reason-embed-qwen3-4b-0928.robotics.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-robotics \
  run.bright.reason-embed-qwen3-4b-0928.robotics.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-stackoverflow \
  --topics bright-stackoverflow \
  --output run.bright.bm25.stackoverflow.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-stackoverflow \
  run.bright.bm25.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-stackoverflow \
  run.bright.bm25.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-stackoverflow \
  run.bright.bm25.stackoverflow.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-stackoverflow \
  --topics bright-stackoverflow \
  --output run.bright.bm25qs.stackoverflow.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-stackoverflow \
  run.bright.bm25qs.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-stackoverflow \
  run.bright.bm25qs.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-stackoverflow \
  run.bright.bm25qs.stackoverflow.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-stackoverflow.splade-v3 \
  --topics bright-stackoverflow \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.stackoverflow.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-stackoverflow \
  run.bright.splade-v3.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-stackoverflow \
  run.bright.splade-v3.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-stackoverflow \
  run.bright.splade-v3.stackoverflow.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-stackoverflow.bge-large-en-v1.5.flat \
  --topics bright-stackoverflow \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.stackoverflow.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-stackoverflow \
  run.bright.bge-large-en-v1.5.flat.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-stackoverflow \
  run.bright.bge-large-en-v1.5.flat.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-stackoverflow \
  run.bright.bge-large-en-v1.5.flat.stackoverflow.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-stackoverflow.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-stackoverflow-original \
  --output run.bright.diver-retriever-4b.stackoverflow.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-stackoverflow \
  run.bright.diver-retriever-4b.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-stackoverflow \
  run.bright.diver-retriever-4b.stackoverflow.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-stackoverflow.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Stack Overflow post, retrieve relevant passages that help answer the post.\nQuery: ' \
  --topics bright-stackoverflow-original \
  --output run.bright.reason-embed-qwen3-4b-0928.stackoverflow.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-stackoverflow \
  run.bright.reason-embed-qwen3-4b-0928.stackoverflow.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-stackoverflow \
  run.bright.reason-embed-qwen3-4b-0928.stackoverflow.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-sustainable-living \
  --topics bright-sustainable-living \
  --output run.bright.bm25.sustainable-living.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-sustainable-living \
  run.bright.bm25.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-sustainable-living \
  run.bright.bm25.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-sustainable-living \
  run.bright.bm25.sustainable-living.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-sustainable-living \
  --topics bright-sustainable-living \
  --output run.bright.bm25qs.sustainable-living.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-sustainable-living \
  run.bright.bm25qs.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-sustainable-living \
  run.bright.bm25qs.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-sustainable-living \
  run.bright.bm25qs.sustainable-living.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-sustainable-living.splade-v3 \
  --topics bright-sustainable-living \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.sustainable-living.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-sustainable-living \
  run.bright.splade-v3.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-sustainable-living \
  run.bright.splade-v3.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-sustainable-living \
  run.bright.splade-v3.sustainable-living.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-sustainable-living.bge-large-en-v1.5.flat \
  --topics bright-sustainable-living \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.sustainable-living.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-sustainable-living \
  run.bright.bge-large-en-v1.5.flat.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-sustainable-living \
  run.bright.bge-large-en-v1.5.flat.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-sustainable-living \
  run.bright.bge-large-en-v1.5.flat.sustainable-living.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-sustainable-living.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-sustainable-living-original \
  --output run.bright.diver-retriever-4b.sustainable-living.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-sustainable-living \
  run.bright.diver-retriever-4b.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-sustainable-living \
  run.bright.diver-retriever-4b.sustainable-living.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-sustainable-living.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Sustainable Living post, retrieve relevant passages that help answer the post.\nQuery: ' \
  --topics bright-sustainable-living-original \
  --output run.bright.reason-embed-qwen3-4b-0928.sustainable-living.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-sustainable-living \
  run.bright.reason-embed-qwen3-4b-0928.sustainable-living.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-sustainable-living \
  run.bright.reason-embed-qwen3-4b-0928.sustainable-living.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-pony \
  --topics bright-pony \
  --output run.bright.bm25.pony.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-pony \
  run.bright.bm25.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-pony \
  run.bright.bm25.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-pony \
  run.bright.bm25.pony.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-pony \
  --topics bright-pony \
  --output run.bright.bm25qs.pony.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-pony \
  run.bright.bm25qs.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-pony \
  run.bright.bm25qs.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-pony \
  run.bright.bm25qs.pony.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-pony.splade-v3 \
  --topics bright-pony \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.pony.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-pony \
  run.bright.splade-v3.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-pony \
  run.bright.splade-v3.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-pony \
  run.bright.splade-v3.pony.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-pony.bge-large-en-v1.5.flat \
  --topics bright-pony \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.pony.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-pony \
  run.bright.bge-large-en-v1.5.flat.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-pony \
  run.bright.bge-large-en-v1.5.flat.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-pony \
  run.bright.bge-large-en-v1.5.flat.pony.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-pony.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-pony-original \
  --output run.bright.diver-retriever-4b.pony.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-pony \
  run.bright.diver-retriever-4b.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-pony \
  run.bright.diver-retriever-4b.pony.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-pony.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Pony question, retrieve relevant passages that help answer the question.\nQuery: ' \
  --topics bright-pony-original \
  --output run.bright.reason-embed-qwen3-4b-0928.pony.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-pony \
  run.bright.reason-embed-qwen3-4b-0928.pony.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-pony \
  run.bright.reason-embed-qwen3-4b-0928.pony.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-leetcode \
  --topics bright-leetcode \
  --output run.bright.bm25.leetcode.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-leetcode \
  run.bright.bm25.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-leetcode \
  run.bright.bm25.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-leetcode \
  run.bright.bm25.leetcode.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-leetcode \
  --topics bright-leetcode \
  --output run.bright.bm25qs.leetcode.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-leetcode \
  run.bright.bm25qs.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-leetcode \
  run.bright.bm25qs.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-leetcode \
  run.bright.bm25qs.leetcode.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-leetcode.splade-v3 \
  --topics bright-leetcode \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.leetcode.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-leetcode \
  run.bright.splade-v3.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-leetcode \
  run.bright.splade-v3.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-leetcode \
  run.bright.splade-v3.leetcode.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-leetcode.bge-large-en-v1.5.flat \
  --topics bright-leetcode \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.leetcode.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-leetcode \
  run.bright.bge-large-en-v1.5.flat.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-leetcode \
  run.bright.bge-large-en-v1.5.flat.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-leetcode \
  run.bright.bge-large-en-v1.5.flat.leetcode.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-leetcode.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-leetcode-original \
  --output run.bright.diver-retriever-4b.leetcode.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-leetcode \
  run.bright.diver-retriever-4b.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-leetcode \
  run.bright.diver-retriever-4b.leetcode.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-leetcode.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Coding problem, retrieve relevant examples that help answer the problem.\nQuery: ' \
  --topics bright-leetcode-original \
  --output run.bright.reason-embed-qwen3-4b-0928.leetcode.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-leetcode \
  run.bright.reason-embed-qwen3-4b-0928.leetcode.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-leetcode \
  run.bright.reason-embed-qwen3-4b-0928.leetcode.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-aops \
  --topics bright-aops \
  --output run.bright.bm25.aops.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-aops \
  run.bright.bm25.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-aops \
  run.bright.bm25.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-aops \
  run.bright.bm25.aops.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-aops \
  --topics bright-aops \
  --output run.bright.bm25qs.aops.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-aops \
  run.bright.bm25qs.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-aops \
  run.bright.bm25qs.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-aops \
  run.bright.bm25qs.aops.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-aops.splade-v3 \
  --topics bright-aops \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.aops.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-aops \
  run.bright.splade-v3.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-aops \
  run.bright.splade-v3.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-aops \
  run.bright.splade-v3.aops.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-aops.bge-large-en-v1.5.flat \
  --topics bright-aops \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.aops.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-aops \
  run.bright.bge-large-en-v1.5.flat.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-aops \
  run.bright.bge-large-en-v1.5.flat.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-aops \
  run.bright.bge-large-en-v1.5.flat.aops.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-aops.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-aops-original \
  --output run.bright.diver-retriever-4b.aops.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-aops \
  run.bright.diver-retriever-4b.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-aops \
  run.bright.diver-retriever-4b.aops.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-aops.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Math problem, retrieve relevant examples that help answer the problem.\nQuery: ' \
  --topics bright-aops-original \
  --output run.bright.reason-embed-qwen3-4b-0928.aops.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-aops \
  run.bright.reason-embed-qwen3-4b-0928.aops.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-aops \
  run.bright.reason-embed-qwen3-4b-0928.aops.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-theoremqa-theorems \
  --topics bright-theoremqa-theorems \
  --output run.bright.bm25.theoremqa-theorems.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-theorems \
  run.bright.bm25.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-theorems \
  run.bright.bm25.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-theoremqa-theorems \
  run.bright.bm25.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-theoremqa-theorems \
  --topics bright-theoremqa-theorems \
  --output run.bright.bm25qs.theoremqa-theorems.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-theorems \
  run.bright.bm25qs.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-theorems \
  run.bright.bm25qs.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-theoremqa-theorems \
  run.bright.bm25qs.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-theoremqa-theorems.splade-v3 \
  --topics bright-theoremqa-theorems \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.theoremqa-theorems.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-theorems \
  run.bright.splade-v3.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-theorems \
  run.bright.splade-v3.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-theoremqa-theorems \
  run.bright.splade-v3.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-theoremqa-theorems.bge-large-en-v1.5.flat \
  --topics bright-theoremqa-theorems \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-theorems \
  run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-theorems \
  run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-theoremqa-theorems \
  run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-theoremqa-theorems.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-theoremqa-theorems-original \
  --output run.bright.diver-retriever-4b.theoremqa-theorems.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-theorems \
  run.bright.diver-retriever-4b.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-theorems \
  run.bright.diver-retriever-4b.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-theoremqa-theorems.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Math problem, retrieve relevant theorems that help answer the problem.\nQuery: ' \
  --topics bright-theoremqa-theorems-original \
  --output run.bright.reason-embed-qwen3-4b-0928.theoremqa-theorems.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-theorems \
  run.bright.reason-embed-qwen3-4b-0928.theoremqa-theorems.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-theorems \
  run.bright.reason-embed-qwen3-4b-0928.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-theoremqa-questions \
  --topics bright-theoremqa-questions \
  --output run.bright.bm25.theoremqa-questions.txt \
  --output-format trec \
  --hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-questions \
  run.bright.bm25.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-questions \
  run.bright.bm25.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-theoremqa-questions \
  run.bright.bm25.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-theoremqa-questions \
  --topics bright-theoremqa-questions \
  --output run.bright.bm25qs.theoremqa-questions.txt \
  --output-format trec \
  --hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-questions \
  run.bright.bm25qs.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-questions \
  run.bright.bm25qs.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-theoremqa-questions \
  run.bright.bm25qs.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.lucene \
  --index bright-theoremqa-questions.splade-v3 \
  --topics bright-theoremqa-questions \
  --onnx-encoder SpladeV3 \
  --output run.bright.splade-v3.theoremqa-questions.txt \
  --output-format trec \
  --hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-questions \
  run.bright.splade-v3.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-questions \
  run.bright.splade-v3.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-theoremqa-questions \
  run.bright.splade-v3.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
  --index bright-theoremqa-questions.bge-large-en-v1.5.flat \
  --topics bright-theoremqa-questions \
  --onnx-encoder BgeLargeEn15 \
  --output run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt \
  --output-format trec \
  --hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-questions \
  run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-questions \
  run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.1000 bright-theoremqa-questions \
  run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder AQ-MedAI/Diver-Retriever-4B \
  --encoder-class qwen3 \
  --index bright-theoremqa-questions.diver-retriever-4b \
  --query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
  --topics bright-theoremqa-questions-original \
  --output run.bright.diver-retriever-4b.theoremqa-questions.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-questions \
  run.bright.diver-retriever-4b.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-questions \
  run.bright.diver-retriever-4b.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.faiss \
  --encoder hanhainebula/reason-embed-qwen3-4b-0928 \
  --encoder-class qwen3 \
  --index bright-theoremqa-questions.reason-embed-qwen3-4b-0928 \
  --query-prefix $'Instruct: Given a Math problem, retrieve relevant examples that help answer the problem.\nQuery: ' \
  --topics bright-theoremqa-questions-original \
  --output run.bright.reason-embed-qwen3-4b-0928.theoremqa-questions.txt \
  --hits 1000 --remove-query \
  --topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 bright-theoremqa-questions \
  run.bright.reason-embed-qwen3-4b-0928.theoremqa-questions.txt

python -m pyserini.eval.trec_eval \
  -c -m recall.100 bright-theoremqa-questions \
  run.bright.reason-embed-qwen3-4b-0928.theoremqa-questions.txt

References

Programmatic Execution

All experimental runs shown in the above table can be programmatically executed based on the instructions below. To list all the experimental conditions:

python -m pyserini.2cr.bright --list-conditions

These conditions correspond to the table rows above.

For all conditions, just show the commands in a "dry run":

python -m pyserini.2cr.bright --all --display-commands --dry-run

To actually run all the experimental conditions:

python -m pyserini.2cr.bright --all --display-commands

With the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

To show the commands for a specific condition:

python -m pyserini.2cr.bright --condition bm25qs --display-commands --dry-run

This will generate exactly the commands for a specific condition above (corresponding to a row in the table).

To actually run a specific condition:

python -m pyserini.2cr.bright --condition bm25qs --display-commands

Again, with the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

Finally, to generate this page:

python -m pyserini.2cr.bright --generate-report --output bright.html

The output file bright.html should be identical to this page.