Research Build

RAG & LLM Benchmarking - English and Nepali

Created a structured evaluation setup for multilingual retrieval and generation, comparing chunking strategies, benchmark tasks, and model behavior in realistic English and Nepali use cases.

Year2026

Impact

Made model and retrieval tradeoffs visible before deployment, helping select practical LLM setups for production constraints instead of relying on intuition alone.

Problem

RAG pipelines can look good in demos but fail under real document structures, multilingual prompts, and smaller self-hosted models, especially when latency matters.

Approach

I evaluated recursive, section-aware, paragraph-based, and table-aware chunking methods while testing multiple local models on medical and financial document tasks in both English and Nepali.

Outcome

The work produced a clearer benchmark for what actually improves retrieval quality and what only adds complexity without meaningful gains in production settings.