The two-click* reproductions below provides commands for reproducing experimental results on BRIGHT.
Instructions for programmatic execution are shown at the bottom of this page (scroll down).
Main Results
The main results table provides commands for reproducing runs using the following models:
- BM25 BoW: BM25 "bag-of-words" baseline (bm25) [1]
- Query-side BM25: Query-side BM25 BM25 baseline (bm25qs) [1]
- SPLADE: SPLADE-v3 (splade-v3) [1]
- BGE: BGE-large-en-v1.5 (Lucene flat index) (bge-large-en-v1.5.flat) [1]
- Diver: Diver-Retriever-4B (Faiss flat index) (diver-retriever-4b) [2]
- Reason-Embed: reason-embed-qwen3-4b-0928 (Faiss flat index) (reason-embed-qwen3-4b-0928) [3]
|
biology |
0.182 |
0.420 |
|
0.197 |
0.458 |
|
0.210 |
0.560 |
|
0.124 |
0.408 |
|
0.425 |
0.798 |
|
0.541 |
0.924 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-biology \
--topics bright-biology \
--output run.bright.bm25.biology.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-biology \
run.bright.bm25.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-biology \
run.bright.bm25.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-biology \
run.bright.bm25.biology.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-biology \
--topics bright-biology \
--output run.bright.bm25qs.biology.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-biology \
run.bright.bm25qs.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-biology \
run.bright.bm25qs.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-biology \
run.bright.bm25qs.biology.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-biology.splade-v3 \
--topics bright-biology \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.biology.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-biology \
run.bright.splade-v3.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-biology \
run.bright.splade-v3.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-biology \
run.bright.splade-v3.biology.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-biology.bge-large-en-v1.5.flat \
--topics bright-biology \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.biology.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-biology \
run.bright.bge-large-en-v1.5.flat.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-biology \
run.bright.bge-large-en-v1.5.flat.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-biology \
run.bright.bge-large-en-v1.5.flat.biology.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-biology.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-biology-original \
--output run.bright.diver-retriever-4b.biology.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-biology \
run.bright.diver-retriever-4b.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-biology \
run.bright.diver-retriever-4b.biology.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-biology.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Biology post, retrieve relevant passages that help answer the post.\nQuery: ' \
--topics bright-biology-original \
--output run.bright.reason-embed-qwen3-4b-0928.biology.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-biology \
run.bright.reason-embed-qwen3-4b-0928.biology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-biology \
run.bright.reason-embed-qwen3-4b-0928.biology.txt
|
|
earth-science |
0.279 |
0.596 |
|
0.279 |
0.601 |
|
0.267 |
0.578 |
|
0.255 |
0.540 |
|
0.464 |
0.791 |
|
0.537 |
0.854 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-earth-science \
--topics bright-earth-science \
--output run.bright.bm25.earth-science.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-earth-science \
run.bright.bm25.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-earth-science \
run.bright.bm25.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-earth-science \
run.bright.bm25.earth-science.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-earth-science \
--topics bright-earth-science \
--output run.bright.bm25qs.earth-science.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-earth-science \
run.bright.bm25qs.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-earth-science \
run.bright.bm25qs.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-earth-science \
run.bright.bm25qs.earth-science.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-earth-science.splade-v3 \
--topics bright-earth-science \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.earth-science.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-earth-science \
run.bright.splade-v3.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-earth-science \
run.bright.splade-v3.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-earth-science \
run.bright.splade-v3.earth-science.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-earth-science.bge-large-en-v1.5.flat \
--topics bright-earth-science \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.earth-science.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-earth-science \
run.bright.bge-large-en-v1.5.flat.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-earth-science \
run.bright.bge-large-en-v1.5.flat.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-earth-science \
run.bright.bge-large-en-v1.5.flat.earth-science.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-earth-science.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-earth-science-original \
--output run.bright.diver-retriever-4b.earth-science.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-earth-science \
run.bright.diver-retriever-4b.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-earth-science \
run.bright.diver-retriever-4b.earth-science.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-earth-science.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given an Earth Science post, retrieve relevant passages that help answer the post.\nQuery: ' \
--topics bright-earth-science-original \
--output run.bright.reason-embed-qwen3-4b-0928.earth-science.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-earth-science \
run.bright.reason-embed-qwen3-4b-0928.earth-science.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-earth-science \
run.bright.reason-embed-qwen3-4b-0928.earth-science.txt
|
|
economics |
0.165 |
0.408 |
|
0.152 |
0.377 |
|
0.160 |
0.448 |
|
0.166 |
0.487 |
|
0.222 |
0.573 |
|
0.339 |
0.716 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-economics \
--topics bright-economics \
--output run.bright.bm25.economics.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-economics \
run.bright.bm25.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-economics \
run.bright.bm25.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-economics \
run.bright.bm25.economics.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-economics \
--topics bright-economics \
--output run.bright.bm25qs.economics.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-economics \
run.bright.bm25qs.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-economics \
run.bright.bm25qs.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-economics \
run.bright.bm25qs.economics.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-economics.splade-v3 \
--topics bright-economics \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.economics.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-economics \
run.bright.splade-v3.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-economics \
run.bright.splade-v3.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-economics \
run.bright.splade-v3.economics.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-economics.bge-large-en-v1.5.flat \
--topics bright-economics \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.economics.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-economics \
run.bright.bge-large-en-v1.5.flat.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-economics \
run.bright.bge-large-en-v1.5.flat.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-economics \
run.bright.bge-large-en-v1.5.flat.economics.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-economics.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-economics-original \
--output run.bright.diver-retriever-4b.economics.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-economics \
run.bright.diver-retriever-4b.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-economics \
run.bright.diver-retriever-4b.economics.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-economics.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given an Economics post, retrieve relevant passages that help answer the post.\nQuery: ' \
--topics bright-economics-original \
--output run.bright.reason-embed-qwen3-4b-0928.economics.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-economics \
run.bright.reason-embed-qwen3-4b-0928.economics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-economics \
run.bright.reason-embed-qwen3-4b-0928.economics.txt
|
|
psychology |
0.134 |
0.377 |
|
0.127 |
0.410 |
|
0.153 |
0.469 |
|
0.180 |
0.465 |
|
0.337 |
0.747 |
|
0.459 |
0.816 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-psychology \
--topics bright-psychology \
--output run.bright.bm25.psychology.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-psychology \
run.bright.bm25.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-psychology \
run.bright.bm25.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-psychology \
run.bright.bm25.psychology.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-psychology \
--topics bright-psychology \
--output run.bright.bm25qs.psychology.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-psychology \
run.bright.bm25qs.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-psychology \
run.bright.bm25qs.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-psychology \
run.bright.bm25qs.psychology.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-psychology.splade-v3 \
--topics bright-psychology \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.psychology.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-psychology \
run.bright.splade-v3.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-psychology \
run.bright.splade-v3.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-psychology \
run.bright.splade-v3.psychology.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-psychology.bge-large-en-v1.5.flat \
--topics bright-psychology \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.psychology.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-psychology \
run.bright.bge-large-en-v1.5.flat.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-psychology \
run.bright.bge-large-en-v1.5.flat.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-psychology \
run.bright.bge-large-en-v1.5.flat.psychology.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-psychology.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-psychology-original \
--output run.bright.diver-retriever-4b.psychology.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-psychology \
run.bright.diver-retriever-4b.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-psychology \
run.bright.diver-retriever-4b.psychology.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-psychology.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Psychology post, retrieve relevant passages that help answer the post.\nQuery: ' \
--topics bright-psychology-original \
--output run.bright.reason-embed-qwen3-4b-0928.psychology.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-psychology \
run.bright.reason-embed-qwen3-4b-0928.psychology.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-psychology \
run.bright.reason-embed-qwen3-4b-0928.psychology.txt
|
|
robotics |
0.109 |
0.371 |
|
0.139 |
0.467 |
|
0.158 |
0.447 |
|
0.123 |
0.344 |
|
0.209 |
0.515 |
|
0.345 |
0.706 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-robotics \
--topics bright-robotics \
--output run.bright.bm25.robotics.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-robotics \
run.bright.bm25.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-robotics \
run.bright.bm25.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-robotics \
run.bright.bm25.robotics.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-robotics \
--topics bright-robotics \
--output run.bright.bm25qs.robotics.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-robotics \
run.bright.bm25qs.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-robotics \
run.bright.bm25qs.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-robotics \
run.bright.bm25qs.robotics.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-robotics.splade-v3 \
--topics bright-robotics \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.robotics.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-robotics \
run.bright.splade-v3.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-robotics \
run.bright.splade-v3.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-robotics \
run.bright.splade-v3.robotics.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-robotics.bge-large-en-v1.5.flat \
--topics bright-robotics \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.robotics.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-robotics \
run.bright.bge-large-en-v1.5.flat.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-robotics \
run.bright.bge-large-en-v1.5.flat.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-robotics \
run.bright.bge-large-en-v1.5.flat.robotics.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-robotics.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-robotics-original \
--output run.bright.diver-retriever-4b.robotics.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-robotics \
run.bright.diver-retriever-4b.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-robotics \
run.bright.diver-retriever-4b.robotics.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-robotics.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Robotics post, retrieve relevant passages that help answer the post.\nQuery: ' \
--topics bright-robotics-original \
--output run.bright.reason-embed-qwen3-4b-0928.robotics.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-robotics \
run.bright.reason-embed-qwen3-4b-0928.robotics.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-robotics \
run.bright.reason-embed-qwen3-4b-0928.robotics.txt
|
|
stackoverflow |
0.163 |
0.409 |
|
0.185 |
0.463 |
|
0.129 |
0.432 |
|
0.110 |
0.532 |
|
0.211 |
0.672 |
|
0.358 |
0.808 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-stackoverflow \
--topics bright-stackoverflow \
--output run.bright.bm25.stackoverflow.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-stackoverflow \
run.bright.bm25.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-stackoverflow \
run.bright.bm25.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-stackoverflow \
run.bright.bm25.stackoverflow.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-stackoverflow \
--topics bright-stackoverflow \
--output run.bright.bm25qs.stackoverflow.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-stackoverflow \
run.bright.bm25qs.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-stackoverflow \
run.bright.bm25qs.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-stackoverflow \
run.bright.bm25qs.stackoverflow.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-stackoverflow.splade-v3 \
--topics bright-stackoverflow \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.stackoverflow.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-stackoverflow \
run.bright.splade-v3.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-stackoverflow \
run.bright.splade-v3.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-stackoverflow \
run.bright.splade-v3.stackoverflow.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-stackoverflow.bge-large-en-v1.5.flat \
--topics bright-stackoverflow \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.stackoverflow.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-stackoverflow \
run.bright.bge-large-en-v1.5.flat.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-stackoverflow \
run.bright.bge-large-en-v1.5.flat.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-stackoverflow \
run.bright.bge-large-en-v1.5.flat.stackoverflow.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-stackoverflow.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-stackoverflow-original \
--output run.bright.diver-retriever-4b.stackoverflow.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-stackoverflow \
run.bright.diver-retriever-4b.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-stackoverflow \
run.bright.diver-retriever-4b.stackoverflow.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-stackoverflow.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Stack Overflow post, retrieve relevant passages that help answer the post.\nQuery: ' \
--topics bright-stackoverflow-original \
--output run.bright.reason-embed-qwen3-4b-0928.stackoverflow.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-stackoverflow \
run.bright.reason-embed-qwen3-4b-0928.stackoverflow.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-stackoverflow \
run.bright.reason-embed-qwen3-4b-0928.stackoverflow.txt
|
|
sustainable-living |
0.161 |
0.420 |
|
0.151 |
0.465 |
|
0.150 |
0.492 |
|
0.144 |
0.516 |
|
0.243 |
0.682 |
|
0.372 |
0.804 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-sustainable-living \
--topics bright-sustainable-living \
--output run.bright.bm25.sustainable-living.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-sustainable-living \
run.bright.bm25.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-sustainable-living \
run.bright.bm25.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-sustainable-living \
run.bright.bm25.sustainable-living.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-sustainable-living \
--topics bright-sustainable-living \
--output run.bright.bm25qs.sustainable-living.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-sustainable-living \
run.bright.bm25qs.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-sustainable-living \
run.bright.bm25qs.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-sustainable-living \
run.bright.bm25qs.sustainable-living.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-sustainable-living.splade-v3 \
--topics bright-sustainable-living \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.sustainable-living.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-sustainable-living \
run.bright.splade-v3.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-sustainable-living \
run.bright.splade-v3.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-sustainable-living \
run.bright.splade-v3.sustainable-living.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-sustainable-living.bge-large-en-v1.5.flat \
--topics bright-sustainable-living \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.sustainable-living.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-sustainable-living \
run.bright.bge-large-en-v1.5.flat.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-sustainable-living \
run.bright.bge-large-en-v1.5.flat.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-sustainable-living \
run.bright.bge-large-en-v1.5.flat.sustainable-living.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-sustainable-living.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-sustainable-living-original \
--output run.bright.diver-retriever-4b.sustainable-living.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-sustainable-living \
run.bright.diver-retriever-4b.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-sustainable-living \
run.bright.diver-retriever-4b.sustainable-living.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-sustainable-living.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Sustainable Living post, retrieve relevant passages that help answer the post.\nQuery: ' \
--topics bright-sustainable-living-original \
--output run.bright.reason-embed-qwen3-4b-0928.sustainable-living.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-sustainable-living \
run.bright.reason-embed-qwen3-4b-0928.sustainable-living.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-sustainable-living \
run.bright.reason-embed-qwen3-4b-0928.sustainable-living.txt
|
|
pony |
0.043 |
0.172 |
|
0.079 |
0.282 |
|
0.144 |
0.245 |
|
0.034 |
0.180 |
|
0.128 |
0.248 |
|
0.118 |
0.268 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-pony \
--topics bright-pony \
--output run.bright.bm25.pony.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-pony \
run.bright.bm25.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-pony \
run.bright.bm25.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-pony \
run.bright.bm25.pony.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-pony \
--topics bright-pony \
--output run.bright.bm25qs.pony.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-pony \
run.bright.bm25qs.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-pony \
run.bright.bm25qs.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-pony \
run.bright.bm25qs.pony.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-pony.splade-v3 \
--topics bright-pony \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.pony.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-pony \
run.bright.splade-v3.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-pony \
run.bright.splade-v3.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-pony \
run.bright.splade-v3.pony.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-pony.bge-large-en-v1.5.flat \
--topics bright-pony \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.pony.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-pony \
run.bright.bge-large-en-v1.5.flat.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-pony \
run.bright.bge-large-en-v1.5.flat.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-pony \
run.bright.bge-large-en-v1.5.flat.pony.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-pony.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-pony-original \
--output run.bright.diver-retriever-4b.pony.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-pony \
run.bright.diver-retriever-4b.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-pony \
run.bright.diver-retriever-4b.pony.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-pony.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Pony question, retrieve relevant passages that help answer the question.\nQuery: ' \
--topics bright-pony-original \
--output run.bright.reason-embed-qwen3-4b-0928.pony.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-pony \
run.bright.reason-embed-qwen3-4b-0928.pony.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-pony \
run.bright.reason-embed-qwen3-4b-0928.pony.txt
|
|
leetcode |
0.247 |
0.508 |
|
0.250 |
0.538 |
|
0.260 |
0.502 |
|
0.267 |
0.529 |
|
0.371 |
0.702 |
|
0.371 |
0.745 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-leetcode \
--topics bright-leetcode \
--output run.bright.bm25.leetcode.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-leetcode \
run.bright.bm25.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-leetcode \
run.bright.bm25.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-leetcode \
run.bright.bm25.leetcode.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-leetcode \
--topics bright-leetcode \
--output run.bright.bm25qs.leetcode.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-leetcode \
run.bright.bm25qs.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-leetcode \
run.bright.bm25qs.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-leetcode \
run.bright.bm25qs.leetcode.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-leetcode.splade-v3 \
--topics bright-leetcode \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.leetcode.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-leetcode \
run.bright.splade-v3.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-leetcode \
run.bright.splade-v3.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-leetcode \
run.bright.splade-v3.leetcode.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-leetcode.bge-large-en-v1.5.flat \
--topics bright-leetcode \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.leetcode.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-leetcode \
run.bright.bge-large-en-v1.5.flat.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-leetcode \
run.bright.bge-large-en-v1.5.flat.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-leetcode \
run.bright.bge-large-en-v1.5.flat.leetcode.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-leetcode.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-leetcode-original \
--output run.bright.diver-retriever-4b.leetcode.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-leetcode \
run.bright.diver-retriever-4b.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-leetcode \
run.bright.diver-retriever-4b.leetcode.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-leetcode.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Coding problem, retrieve relevant examples that help answer the problem.\nQuery: ' \
--topics bright-leetcode-original \
--output run.bright.reason-embed-qwen3-4b-0928.leetcode.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-leetcode \
run.bright.reason-embed-qwen3-4b-0928.leetcode.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-leetcode \
run.bright.reason-embed-qwen3-4b-0928.leetcode.txt
|
|
aops |
0.065 |
0.197 |
|
0.063 |
0.217 |
|
0.069 |
0.262 |
|
0.064 |
0.240 |
|
0.108 |
0.344 |
|
0.115 |
0.373 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-aops \
--topics bright-aops \
--output run.bright.bm25.aops.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-aops \
run.bright.bm25.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-aops \
run.bright.bm25.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-aops \
run.bright.bm25.aops.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-aops \
--topics bright-aops \
--output run.bright.bm25qs.aops.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-aops \
run.bright.bm25qs.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-aops \
run.bright.bm25qs.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-aops \
run.bright.bm25qs.aops.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-aops.splade-v3 \
--topics bright-aops \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.aops.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-aops \
run.bright.splade-v3.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-aops \
run.bright.splade-v3.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-aops \
run.bright.splade-v3.aops.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-aops.bge-large-en-v1.5.flat \
--topics bright-aops \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.aops.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-aops \
run.bright.bge-large-en-v1.5.flat.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-aops \
run.bright.bge-large-en-v1.5.flat.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-aops \
run.bright.bge-large-en-v1.5.flat.aops.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-aops.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-aops-original \
--output run.bright.diver-retriever-4b.aops.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-aops \
run.bright.diver-retriever-4b.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-aops \
run.bright.diver-retriever-4b.aops.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-aops.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Math problem, retrieve relevant examples that help answer the problem.\nQuery: ' \
--topics bright-aops-original \
--output run.bright.reason-embed-qwen3-4b-0928.aops.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-aops \
run.bright.reason-embed-qwen3-4b-0928.aops.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-aops \
run.bright.reason-embed-qwen3-4b-0928.aops.txt
|
|
theoremqa-theorems |
0.021 |
0.134 |
|
0.049 |
0.184 |
|
0.055 |
0.262 |
|
0.053 |
0.259 |
|
0.374 |
0.795 |
|
0.452 |
0.917 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-theoremqa-theorems \
--topics bright-theoremqa-theorems \
--output run.bright.bm25.theoremqa-theorems.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-theorems \
run.bright.bm25.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-theorems \
run.bright.bm25.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-theoremqa-theorems \
run.bright.bm25.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-theoremqa-theorems \
--topics bright-theoremqa-theorems \
--output run.bright.bm25qs.theoremqa-theorems.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-theorems \
run.bright.bm25qs.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-theorems \
run.bright.bm25qs.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-theoremqa-theorems \
run.bright.bm25qs.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-theoremqa-theorems.splade-v3 \
--topics bright-theoremqa-theorems \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.theoremqa-theorems.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-theorems \
run.bright.splade-v3.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-theorems \
run.bright.splade-v3.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-theoremqa-theorems \
run.bright.splade-v3.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-theoremqa-theorems.bge-large-en-v1.5.flat \
--topics bright-theoremqa-theorems \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-theorems \
run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-theorems \
run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-theoremqa-theorems \
run.bright.bge-large-en-v1.5.flat.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-theoremqa-theorems.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-theoremqa-theorems-original \
--output run.bright.diver-retriever-4b.theoremqa-theorems.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-theorems \
run.bright.diver-retriever-4b.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-theorems \
run.bright.diver-retriever-4b.theoremqa-theorems.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-theoremqa-theorems.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Math problem, retrieve relevant theorems that help answer the problem.\nQuery: ' \
--topics bright-theoremqa-theorems-original \
--output run.bright.reason-embed-qwen3-4b-0928.theoremqa-theorems.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-theorems \
run.bright.reason-embed-qwen3-4b-0928.theoremqa-theorems.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-theorems \
run.bright.reason-embed-qwen3-4b-0928.theoremqa-theorems.txt
|
|
theoremqa-questions |
0.073 |
0.159 |
|
0.104 |
0.212 |
|
0.111 |
0.266 |
|
0.141 |
0.282 |
|
0.381 |
0.585 |
|
0.409 |
0.637 |
|
Command to generate run:
python -m pyserini.search.lucene \
--index bright-theoremqa-questions \
--topics bright-theoremqa-questions \
--output run.bright.bm25.theoremqa-questions.txt \
--output-format trec \
--hits 1000 --bm25 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-questions \
run.bright.bm25.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-questions \
run.bright.bm25.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-theoremqa-questions \
run.bright.bm25.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-theoremqa-questions \
--topics bright-theoremqa-questions \
--output run.bright.bm25qs.theoremqa-questions.txt \
--output-format trec \
--hits 1000 --bm25qs --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-questions \
run.bright.bm25qs.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-questions \
run.bright.bm25qs.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-theoremqa-questions \
run.bright.bm25qs.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.lucene \
--index bright-theoremqa-questions.splade-v3 \
--topics bright-theoremqa-questions \
--onnx-encoder SpladeV3 \
--output run.bright.splade-v3.theoremqa-questions.txt \
--output-format trec \
--hits 1000 --impact --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-questions \
run.bright.splade-v3.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-questions \
run.bright.splade-v3.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-theoremqa-questions \
run.bright.splade-v3.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.lucene --dense --flat \
--index bright-theoremqa-questions.bge-large-en-v1.5.flat \
--topics bright-theoremqa-questions \
--onnx-encoder BgeLargeEn15 \
--output run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt \
--output-format trec \
--hits 1000 --remove-query
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-questions \
run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-questions \
run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.1000 bright-theoremqa-questions \
run.bright.bge-large-en-v1.5.flat.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder AQ-MedAI/Diver-Retriever-4B \
--encoder-class qwen3 \
--index bright-theoremqa-questions.diver-retriever-4b \
--query-prefix $'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:' \
--topics bright-theoremqa-questions-original \
--output run.bright.diver-retriever-4b.theoremqa-questions.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --explicit-truncate --fp16 --l2-norm --max-length 16384 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-questions \
run.bright.diver-retriever-4b.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-questions \
run.bright.diver-retriever-4b.theoremqa-questions.txt
Command to generate run:
python -m pyserini.search.faiss \
--encoder hanhainebula/reason-embed-qwen3-4b-0928 \
--encoder-class qwen3 \
--index bright-theoremqa-questions.reason-embed-qwen3-4b-0928 \
--query-prefix $'Instruct: Given a Math problem, retrieve relevant examples that help answer the problem.\nQuery: ' \
--topics bright-theoremqa-questions-original \
--output run.bright.reason-embed-qwen3-4b-0928.theoremqa-questions.txt \
--hits 1000 --remove-query \
--topics-format raw_jsonl --fp16 --l2-norm --max-length 8192 --device cuda:0
Evaluation commands:
python -m pyserini.eval.trec_eval \
-c -m ndcg_cut.10 bright-theoremqa-questions \
run.bright.reason-embed-qwen3-4b-0928.theoremqa-questions.txt
python -m pyserini.eval.trec_eval \
-c -m recall.100 bright-theoremqa-questions \
run.bright.reason-embed-qwen3-4b-0928.theoremqa-questions.txt
|
References
[1] Yijun Ge, Sahel Sharifymoghaddam, and Jimmy Lin.
Lighting the Way for BRIGHT: Reproducible Baselines with Anserini, Pyserini, and RankLLM.
arXiv:2509.02558, September 2025.
[2] Meixiu Long, Duolin Sun, Dan Yang, Junjie Wang, Yue Shen, Jian Wang, Peng Wei, Jinjie Gu, and Jiahai Wang.
DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval.
arXiv:2508.07995, August 2025.
[3] Jianlyu Chen, Junwei Lan, Chaofan Li, Defu Lian, and Zheng Liu.
ReasonEmbed: Enhanced Text Embeddings for Reasoning-Intensive Document Retrieval.
arXiv:2510.08252, October 2025.
Programmatic Execution
All experimental runs shown in the above table can be programmatically executed based on the instructions below.
To list all the experimental conditions:
python -m pyserini.2cr.bright --list-conditions
These conditions correspond to the table rows above.
For all conditions, just show the commands in a "dry run":
python -m pyserini.2cr.bright --all --display-commands --dry-run
To actually run all the experimental conditions:
python -m pyserini.2cr.bright --all --display-commands
With the above command, run files will be placed in the current directory.
Use the option --directory runs/ to place the runs in a sub-directory.
To show the commands for a specific condition:
python -m pyserini.2cr.bright --condition bm25qs --display-commands --dry-run
This will generate exactly the commands for a specific condition above (corresponding to a row in the table).
To actually run a specific condition:
python -m pyserini.2cr.bright --condition bm25qs --display-commands
Again, with the above command, run files will be placed in the current directory.
Use the option --directory runs/ to place the runs in a sub-directory.
Finally, to generate this page:
python -m pyserini.2cr.bright --generate-report --output bright.html
The output file bright.html should be identical to this page.