Suchir Balaji, a 26-year-old OpenAI researcher turned whistleblower, was found dead in an apartment in San Francisco, US, last month. His death on November 26 was ruled a suicide by the San Francisco Medical Examiner’s Office because police found no evidence of foul play.
Balaji, who left OpenAI in August, has in recent months spoken out openly against the artificial intelligence company’s practice of training chatbots on copyrighted material plucked from the internet. The artificial intelligence (AI) giant is fighting multiple lawsuits related to its data-gathering practices.
About Suchir Balaji
Indian American Suchir Balaji grew up in Cupertino, California. A remarkably bright child, he excelled in programming competitions, placing 31st in the ACM ICPC 2018 World Finals and winning first place in the 2017 Pacific Northwest Regional and Berkeley Programming Competitions.
Balaji also placed 7th in Kaggle’s TSA-sponsored “Passenger Screening Algorithm Challenge”, earning a $100,000 prize. According to his LinkedIn profile, he was the US Open 2016 National Champion and USACO finalist.
Like most others in his field, Balaji was fascinated by the promise of artificial intelligence from an early age. In an interview with The New York Times in October, he explained that his interest in AI began when he saw a news story about the technology in his teens and imagined that neural networks could solve humanity’s biggest problems.
According to the NYT report, he said, “I thought AI was something that could be used to solve insoluble problems like curing diseases and stopping aging… I thought we could The kind of scientist we can invent that can help solve them.” ,
Even before graduating, he worked at Scale AI, Helia, and was a software engineer at Quora. In 2020, Balaji joined a series of Berkeley graduates who went to work for OpenAI.
Suchir Balaji’s time at OpenAI
He worked at OpenAI for four years, during which time for a year and a half he helped the company collect and organize the massive amounts of Internet data used to create its online chatbot, ChatGPT.
Balaji told the NYT that during his early data years at OpenAI, he did not carefully consider whether the company had the legal right to build its product using both copyrighted and open Internet data. It was only after the release of ChatGPT in late 2022 that they began to consider the issue and realized that technologies like ChatGPT were harming the Internet by using copyrighted data, violating the law in the process.
By 2024, Balaji realized that “he no longer wants to contribute to technologies that he believes will cause more harm than good to society.” He left the company in August this year without a new job and started working on “personal projects”.
He died a day after his name was filed in a court filing as someone whose files OpenAI would search as part of a lawsuit brought by people who had sued the AI giant.
Suchir Balaji’s allegation against OpenAI
After leaving OpenAI, Suchir Balaji publicly spoke out against the way AI companies use copyrighted data to create their technologies. He alleged that AI models are too dependent on the labor of others as they are trained on copyrighted material pulled from the internet without permission.
“This is not a sustainable model for the Internet ecosystem as a whole,” he told the NYT.
He also addressed his concerns on his personal website, where he stated that generative models rarely produce output identical to their training data, with the act of copying copyrighted material during training not being protected under “fair use”. But laws may be violated.
“Since fair use is determined on a case-by-case basis, no blanket statement can be made about when generative AI qualifies as fair use,” he said.
Balaji argued in several cases that chatbots directly compete with the copyrighted works they learned from. “Generative models are designed to mimic online data, so they can replace “basically anything” on the Internet, from news stories to online forums,” he said.
According to him, the biggest problem is that as AI technologies are gradually replacing existing Internet services, they sometimes generate “false and sometimes completely fabricated information – what researchers call “hallucinations”.
The internet is changing for the worse, he said.
Accusations against AI companies
Balaji wasn’t alone in worrying about AI companies misusing copywriter data to train their chatbots. Several American and Canadian news publishers, including The New York Times, have filed a lawsuit against OpenAI and its primary partner, Microsoft, claiming they have used millions of their articles to create chatbots that are now used as sources of reliable information. Compete with news outlets in the U.S.
Several best-selling authors, including John Grisham, have also filed lawsuits against the company.
OpenAI disputes the claims
OpenAI has denied Balaji’s claims and insisted that their data use follows fair use principles and legal precedents.
“We build our AI models using publicly available data, in a manner protected by fair use and related principles, and supported by long-standing and widely accepted legal precedents. We extend this principle to creators. Appropriate, necessary for innovators and critical to American competitiveness, OpenAI said in a statement.
The company told the BBC in November that its software is “based on fair use and related international copyright principles that are fair to creators and support innovation”.
Responding to Balaji’s death, an OpenAI spokesperson said, “We are deeply saddened to learn of this incredibly sad news today and our thoughts are with Suchir’s loved ones at this difficult time.”