Uproar over ECI scan

Uproar over ECI scan

Uproar over ECI scan

As allegations of trade from Rahul Gandhi and Election Commission of India (ECI), you are going to hear something called OCR again and again. What is OCR and what is it related to the quarrel between Rahul Gandhi and ECI? We explain the technical angle.

Advertisement
Uproar over ECI scan
What is OCR and why many Bihar voters are dishonest on scans

In short

  • ECI has published a draft voter list for Bihar in scanned format
  • Scanned format requires OCR to remove data
  • Many people allege that scanned images make voter roll analysis cumbersome

It has politics on one side. On the second, there is a technical angle for this. And in some examples, both touch each other. Yes, we are talking about Broha in the draft voter list for the upcoming Bihar elections. The Election Commission of India is making an amendment of voter lists in the state and now it has come out with a draft list. There are reports that more than 6 million voter IDs have been removed due to various reasons. As all this happens in political, some are happy with modifications but some are not. Now those who are not, and the leader of the opposition Rahul Gandhi is among them, want to analyze the draft lists. It is a matter of regret that they too will have to make difficult efforts for this.

Advertisement

try harder. This is the place where the technical angle comes, there is a component of this whole thing which has many from the opposition camp, as well as many activists, researchers, journalists and commentators, in weapons. At its center, Bihar is a scanning of voter lists and that thing is called OCR.

What is controversy?

The dispute is simple, at least on its face. ECI had earlier given a voter list in a format on its website which was easy to find. Think of a regular PDF file with a regular structured tables and text, which is digitally placed in the file, say with a keyboard.

But recently, in the midst of its quarrel with ECI, Gandhi and a dispute over the amendment of rolls, decided to remove these files from public access. Instead, there were files – according to media reports – was replaced with scanned copies of voter lists. These scanned copies are essentially images and they have lessons that are mostly printed or written and type.

Enter OcR

The move from ECI has alleged that the organization does not want to share voter lists in a format that is easy to analyze and digest. For example, if we change this feature piece into a word file and then from that word file we make a PDF, it will give birth to a document that will be easily readable by a machine. But if we write this entire piece on paper and then click on a picture of it and feed it on the computer, the computer may fail to completely remove and analyze the data completely or partially.

This is a dispute. The drafts published by the ECI is not in the machine-elective format. Instead, they have converted images into PDF. This makes it difficult to drive optical character recognition on them and remove data for further analysis. This has inspired many people to slam ECI. For example, Tehsen Poonwala tweeted: “Manual shifting the manual from text format on the election roll 2023 and then trying to hide the data under any aid, is a privacy or otherwise uncertain ECI! Please share the voter lists in a digital, machine-readable format.”

Now, why ECI has replaced PDF voter lists with scanned lists, this is a question that we are not worried here in this piece. In this piece, let’s look at the technique and see if the claims made against ECI seem accurate or not.

Advertisement

In some ways the claims are not accurate, but in some other ways they seem to pointing to something. But the first things first: even scanned images can be analyzed.

When it comes to analyzing data that is not typed, it is required to be removed from an image or file using OCR. It stands for optical character recognition. Therefore, for example, if you have a hand written note, your Android phone, or iPhone, or MAC can easily read the note and convert what is written on it in the digital text that becomes searchable. On a large scale, there are scripts and machine-learning algorithms, which can automate the whole thing.

But the problem with OCR is that it is time-taking, and when you are working with 1000S files, it requires a lot of time-think about the week and week, as well as to read a lot of computing resources-superious fast machines. In addition, it also requires little skill to extract data. You may have to use tools such as Tsseract and Python script.

Advertisement

And even if you can remove the data well, say 99 percent with accuracy, if the quantity of data is large, such as the draft voter lists, millions of entries, even 0.1 percent inaccuracy can derail the entire analysis. Nowadays, even when the AI device is used – and AI is quite good in OCR – the results may not be 100 percent accurate.

In other words, it is not that handwritten lists that have been scanned cannot be described by machines. They can be. But it takes time and 100 percent accuracy is also not guaranteed with best available equipment.

It complicates OCR and analysis

Lack of guarantee on accuracy means that there is always a slight doubt about analysis on a large data set that has arisen from scanned images. The factor of doubt that ECI has been intentionally or unknowingly added to the mixture, increasing the hex of many people in the country.

Perhaps this is why critics of ECI say that change from digital to image-based voter lists is not just a cosmetic format, it is one step behind access. In the earlier format, the watchdog group and analysts can run duplicate IDs, ghost voters, or large-scale deletion in specific areas, run a quick, software-powered check for the flag. With scanned images, this process becomes slow, error-prone, and in some cases, almost impossible, OCR or not.

– Ends

Zeen Subscribe
A customizable subscription slide-in box to promote your newsletter
[mc4wp_form id="314"]