Has Sarvam AI Really Beat ChatGPT, Google Gemini? yes and no
Sarvam AI has been at the center of many discussions on social media in the last few days. The Indian AI company has outperformed ChatGPT and Google Gemini in some key areas. So, can we say that Sarvam AI has beaten the best including Google Gemini and ChatGPT? Yes and no.


It is not every day that an Indian startup comes into limelight by defeating the world’s best startups. But when it happens, we can’t stop talking about it here in India. That’s what’s happening with Sarvam AI, a company that has come up with two new tools called Vision and Bulbul. These tools are so good that they even outperform ChatGPT, Anthropic Cloud, and Google Gemini. The results are the kind of headlines you might have read here on India Today: “India’s Sarvam AI beats Google Gemini and ChatGPT, world impressed.”
But now that the conversation is going on about Sarvam AI, different people are considering this matter. Is it really the case that Servum AI is better than Gemini or ChatGPT?
Let’s take a closer look at the perfect Servum AI that beats the ChatGPIT saga. Because there are nuances involved. Servum AI outperforms Google Gemini and ChatGPT. And yet, it is not even this. Confused, let us explain.
Some things Servum AI does better than ChatGPT, Gemini
On February 5, Sarvam AI co-founder Pratyush Kumar announced that the startup’s Sarvam Vision had outperformed every major AI model on the OLMOCR-bench. This benchmark measures optical character recognition (OCR), which is the ability of AI models to recognize and understand images, scanned documents, and other visual elements. The benchmark measures whether AI models can recognize and understand complex fonts, handwriting and other data from such inputs.


In OLMOCR-bench, the accuracy of Survum Vision was 84.3 percent. The indigenous AI model outperformed OpenAI’s ChatGPT, Google’s Gemini 3 Pro and even China’s DeepSeek OCR v2. On OmniDocBench v1.5, the Servum Vision scored 93.28 percent. The AI performed particularly well with complex layouts, technical tables and mathematical formulas.
Vision is apparently unreliable at doing OCR on Indic scripts. This is probably because it has been trained on Indian languages and Indian writing systems. It is better acquainted with Indian scripts, including scripts of regional languages. While ChatGPT, Gemini and others also have good OCR capabilities, they are not properly prepared for Indic scripts like Sarvam Vision.
This effectively means that Servum AI can reliably handle scanned documents, forms and mixed language content. It could also give Indian companies an affordable, Indian alternative to foreign AI models for services like document processing.
Then there is Bulbul V3. This is another AI tool that has been released by Sarvam and which is creating quite a buzz as it is even better than ElevenLabs, the global leader in text-to-speech AI models, when it comes to generating Indian voices. Again, the reason Bulbul gets high scores in benchmarks related to the Indian context is that it is specifically tailored to the way languages are spoken in India.
Sarvam models are small, Gemini and ChatGPT are ahead
So, if you’re talking about specific use cases like OCR and text-to-speech, Sarvam AI is actually beating Google Gemini and ChatGPT in India-specific workloads. But this is largely the result of focusing on a particular type of work. The model behind Vision and Bulbul is not a general-purpose model like ChatGPT and Gemini.
This is where no part of our answer comes in. Sarvam AI outperforms Gemini and ChatGPT in certain types of workloads. It is not competitive with these two global models in regular daily use of AI. For example, Gemini can give you a mock JEE test paper and then guide you while solving it. Servum AI can’t. Similarly, ChatGPT can read your X-ray film and then give you some idea of what it is seeing. Servum AI can’t.
In other words, ChatGPT and Gemini are jacks of all trades, while Servum AI is the master of two very specific workloads at the moment.
This is exactly what is believed because universal AI models are much smaller than global AI models. For example the Servum Vision AI model has 3 billion parameters. This pales in comparison to something like Google Gemini 3 which is rumored to have around 2 trillion parameters.
In general more parameters means smarter AI. However running and training a large AI model also means you need to have hundreds of thousands of GPUs, something that is not yet available in India.
Sarvam is still a great achievement for AI
Despite the limited nature of AI, Vision and Bulbul are a major achievement for an Indian company. They may not be the kind of AI models that can perform complex coding tasks or have philosophical chats with a user for hours, but they show something – Indian companies are capable of building world-class AI tools almost from scratch.
They also highlight that what limits India in AI is not the capacity of Indian startups. But the ground reality is related to compute and infrastructure. The reason Servum AI models are small is that they don’t have access to the infrastructure needed to train larger AI models – millions of GPUs and huge data centers.
In this way, there is proof of Sarvam Drishti and Bulbul concept. The fact that they are beating world-class AI models at specific tasks proves one thing. This makes them worthy of celebration.
