India’s Sarvam AI defeated Google Gemini and ChatGPT, the world was impressed.
India finally has an AI model that appears to be world-class, at least for some India-specific tasks. Sarvam has come up with an OCR tool called AI Vision which outperforms Gemini and ChatGPT in reading documents in Indian languages, as well as Bulbul v3 which excels in AI voice generation.

When it comes to AI models, the focus is mostly on the US and China. India, despite its scale and deep talent pool, has rarely been seen as a source of mainstream AI development. But Bengaluru-based startup Sarvam AI is changing that perception with what it calls “Sovereign AI.” The company is developing basic AI models from scratch in India. This week its two tools Sarvam Vision and Bulbul are gaining a lot of discussion. For all the right reasons.
Servum Vision is clearly outperforming larger and more well-known AI models like ChatGPT, Google Gemini, and Anthropic Cloud on some benchmarks in optical character recognition (OCR), which is its area of expertise. Its performance appears to be so good that it is receiving praise from users and experts alike.
Sarvam AI co-founder Pratyush Kumar recently shared details of the latest achievements of the company’s in-house AI models in a series of posts on X. According to the company, Sarvam Vision has achieved an accuracy score of 84.3 percent on the OLMOCR-bench. This score is higher than Gemini 3 Pro and recent OCR models such as DeepSeek OCR v2, while ChatGPT ranks much lower.
Additionally, Servum Vision also scored well on OmniDocBench v1.5, a benchmark that tests how AI systems read and understand real-world documents. It scored 93.28 per cent overall, with particularly strong results on complex layouts, technical tables and mathematical formulas. These are areas where traditional OCR systems often struggle due to disorganized formatting and dense content.
The performance of AI tools has attracted global attention. Sarvam, which was earlier questioned for its focus on an Indic-language model, is now turning that skepticism into approval.
Tech commentator Dedi Das, who had previously questioned the value of building small Indic-language models, recently admitted that he had underestimated the company. In a post on
“I was wrong about Sarvam. When I wrote about them a year ago, I thought the direction to train small Indic language models was wrong. But boy, have they turned it around,” he wrote. “They have the best text-to-speech, speech-to-text, and OCR models for Indic languages, and it’s really valuable. The pricing is very reasonable.”
There has also been praise from users. One user talked about his experience with Servum’s models and wrote, “I used this a few days ago! Oh man wow.”
Bulbul brings AI voice to Indic languages
Apart from the OCR tool, Servum has also launched its new AI voice model called Bulbul v3. It is a text-to-speech AI model that aims to generate audio using AI. In a way, it is similar to the AI tool offered by ElevenLabs, a company considered to be the best in this field.
“Today we are releasing Bulbul v3, our most capable text-to-speech model, designed to deliver natural, expressive, and production-ready voices for Indian languages,” Sarvam said in a blog post. “Bulbul V3 minimizes failure modes, delivers content-accurate, stable speech on critical inputs for India-specific use cases.”
Currently, the tool supports over 35 voices in 11 Indian languages. The company says it plans to expand language support to a total of 22 languages.
Bulbul is also garnering some praise. Prateek Desai, founder of KissanAI, wrote on