We tested Servum AI against global models. Here’s what we found

0
4

We tested Servum AI against global models. Here’s what we found

To test Sarvam AI’s claims about the alleged superiority of its large language models (LLMs) over other global models like Google Gemini and ChatGPT on task-based, real-world signals, India Today’s Open Source Intelligence (OSINT) team conducted its own validation.

Advertisement
Sarvam AI
When India Today was asked to translate a Sanskrit verse into Hindi and explain its meaning in simple English, India Today found that Sarvam AI gave the most balanced output.

As the global AI race gathers some momentum in the shadow of Silicon Valley and China, Bengaluru, India-based startup Sarvam AI has made an ambitious claim: its Large Language Model (LLM) “outperforms” other global models like Google Gemini and ChatGPT on task-based, real-world signals.

To test the fact behind this claim, India Today’s Open Source Intelligence (OSINT) team conducted an independent verification using a set of real-world signals.

Advertisement

When tested against the same signals in parallel with ChatGPT (v5.2), Google Gemini (3i), DeepSeek AI, and What we found is important from the perspective of India’s sovereign AI model search.

How different AI models responded to our signals

Hint 1: Translate Sanskrit Shloka: “…” in Hindi and explain its meaning in simple English.

India Today found that Sarvam AI gave the most balanced output by combining accurate Hindi translation with a culturally based and philosophically restrained English interpretation.

While ChatGPT, Gemini and Grok also produced technically correct translations, their explanations were less relevant to Indian users. DeepSeek failed to respond, pointing to a limitation in handling classical Indic languages. Overall, Sarvam demonstrated strong Sanskrit understanding and contextual relevance.

Hint 2: Rural Governance Application: “The road in my village is bad, write an application in Gujarati to give to the Panchayat.”

In our testing, Sarvam produced the simplest and submission-ready application using clear administrative language accessible to rural and semi-literate users.

The Gemini and Grok showed more detail but were comparatively verbose. ChatGPT took a middle path, while DeepSeek’s response, although clear, was less user-friendly. From the point of view of last-mile governance, Sarvam proved to be the most practical.

In Indic language, governance, and document-comprehension tasks, Sarvam’s edge lies not in raw verbosity but on India-specific linguistic basis.

OCR for handwritten text extraction

Hint 3: Extract all visible text exactly as written.

When we conducted handwritten optical character recognition (OCR) testing, Servum AI produced the most accurate word-for-word extraction, matching the original text without any omissions, distortions, or additions.

Gemini was largely accurate but there were minor discrepancies in capitalization. In contrast, ChatGPT and Grok introduced visible errors at the end of the output, including including content not present in the source image. DeepSeq was less reliable, missing key elements such as the date at the beginning and the page number at the end, reducing overall accuracy.

OCR for table parsing

Hint 4: Perform OCR on the attached image and return the table as exactly structured data with correct rows, columns, numbers and headers without any explanation.

Advertisement

In India Today’s table-based OCR, the gap widened further. The indigenous AI model preserved the original table structure, bilingual Hindi-English text and numerical data with only minor replication issues.

Grok prepared a table but expanded the content beyond the source, introducing errors and labels of figures not present in the original. ChatGPT omitted important information including table titles, source lines, and footnotes, while Gemini 3 also missed table titles. DeepSeek again performed poorly, failing to capture the title, source line, and footnote.

However, this evaluation does not evaluate reasoning depth, coding capability, or long-context performance, areas where global marginal models can still benefit.

– ends
tune in

LEAVE A REPLY

Please enter your comment!
Please enter your name here