Heading-aware Chunking+18.4%
Experiment

Compared fixed-size chunking (800 tokens) against heading-aware chunking (~900 tokens). Evaluated on 500 golden queries.

Queries Tested
500
documents
2,340
Duration
4m 32s
Date
May 24, 2025
A
Baseline
StrategyFixed-size
Chunk Size800 tokens
Overlap100 tokens
Split LogicToken count
Section TitlesNot preserved
B
VariantWINNER
StrategyHeading-aware
Chunk Size~900 tokens
OverlapSemantic boundary
Split LogicH1/H2/H3 headers
Section TitlesPreserved in chunk
Metric Comparison
Overall RagScore+12
64/10076/100
Retrieval Recall@5+13pp
71%84%
Context Precision+11pp
58%69%
Citation Accuracy+12pp
61%73%
Faithfulness+4pp
82%86%
Freshness Score+4pp
74%78%
Sample Query Results (10 of 500)
환불 정책 변경사항?
3478
Enterprise SLA response time?
4172
SSO configuration steps?
5268
Data residency in APAC?
5881
Annual plan cancellation?
2971
GDPR deletion process?
4775
Webhook notification setup?
5569
Bulk embedding API?
6382
Role-based access control?
4473
Audit log retention period?
5077
Ready to apply?

This will re-chunk 2,340 documents and rebuild the vector index. Estimated time: ~15min.