← Back To Work

Research Build

RAG & LLM Benchmarking - English and Nepali

Created a structured evaluation setup for multilingual retrieval and generation, comparing chunking strategies, benchmark tasks, and model behavior in realistic English and Nepali use cases.

Year2026
Impact

Made model and retrieval tradeoffs visible before deployment, helping select practical LLM setups for production constraints instead of relying on intuition alone.

Problem

Problem

RAG pipelines can look good in demos but fail under real document structures, multilingual prompts, and smaller self-hosted models, especially when latency matters.

Approach

Approach

I evaluated recursive, section-aware, paragraph-based, and table-aware chunking methods while testing multiple local models on medical and financial document tasks in both English and Nepali.

Outcome

Outcome

The work produced a clearer benchmark for what actually improves retrieval quality and what only adds complexity without meaningful gains in production settings.