AI agent hacks Stanford computer network, beats professional human hackers making six-figure salaries
Stanford’s AI agent Artemis successfully identified security holes left by expert hackers. This AI agent took 16 hours to outperform professional penetration testers, revealing vulnerabilities that even experienced hackers had missed.

Stanford University’s computer network has become the latest battleground in the growing rivalry between humans and artificial intelligence, and this time the machines are ahead of us. In a remarkable experiment, an AI agent developed by Stanford researchers managed to outwit professional cybersecurity experts by uncovering hidden vulnerabilities in thousands of devices, while working on a much smaller budget than its human counterparts.
The AI, named ARTEMIS, deployed across Stanford’s private and public computer science networks for 16 hours, searching nearly 8,000 devices, including servers, computers and smart systems. By the end of the test, ARTEMIS had outperformed nine out of ten professional penetration testers, revealing vulnerabilities that even experienced hackers had missed.
The results were published this week in a new Stanford study led by researchers Justin Lin, Eliot Jones and Donovan Jasper, who are experts in cybersecurity, AI agents, and machine-learning security.
AI beats human hackers
According to the report, unlike traditional AI tools that falter during long, multi-step tasks, ARTEMIS was designed to operate autonomously for extended periods, scanning, investigating, and analyzing complex systems.
For the experiment, the researchers invited ten experienced penetration testers and asked them to work for at least 10 hours each. Artemis ran for 16 hours over two days, although direct comparisons focused on the first 10 hours of its activity.
Within that time, the AI detected nine legitimate security flaws with an impressive accuracy rate of 82 percent, putting it ahead of almost all human testers. The researchers said that while other existing AI systems lag behind humans, Artemis’ performance was “on par with the strongest participants”.
Perhaps even more surprising was the cost difference. The study said it costs about $18 (Rs 1,630) per hour to run Artemis, while the average human penetration tester in the United States earns about $125,000 (i.e. more than Rs 1.13 crore) per year.The more advanced version of the agent costs $59 (Rs 5,343) per hour, which is still significantly cheaper than hiring a top-tier expert.
“Such a capability has the potential to dramatically reduce the cost of cybersecurity auditing,” said one of the researchers in the report. He said AI can handle repetitive, time-consuming tasks that often overwhelm human teams.
How ARTEMIS cracked what humans missed
Part of Artemis’s edge lies in its unique design. Whenever it detected something “notable” in its network scans, the AI immediately launched “sub-agents”, smaller, autonomous systems that investigated the anomaly in parallel. This allowed ARTEMIS to analyze multiple threats simultaneously, which humans cannot easily do.
In one case, it discovered a security flaw on an old server that human testers had ignored because their browsers refused to load it. ARTEMIS bypasses the problem entirely, instead accessing the system through a command-line interface.
However, AI is not perfect. Artemis faltered when faced with tasks requiring navigating a graphical user interface, missing a serious vulnerability as it could not perform simple clicks or interact with visual elements. It also presented some false positives by mistaking harmless network activity as signs of a hack.
The researchers noted that ARTEMIS performed excellently in “code-like environments”, where it could quickly interpret data and text-based output, but struggled whenever the system relied heavily on graphical displays.
AI and the new hacking frontier
The Stanford study comes amid growing concerns that advances in artificial intelligence are making hacking easier and more dangerous. AI tools are increasingly being used by cyber criminals to automate attacks, generate fake identities, and even infiltrate corporate networks.
A few months ago, a North Korean hacking group reportedly used ChatGPT to create fake military IDs for a phishing operation. Meanwhile, a report from Anthropic found that North Korean operatives exploited its cloud AI models to apply for remote jobs at Fortune 500 companies, gaining insider access to corporate systems.
The same report said a Chinese threat actor had used the cloud to launch attacks against Vietnamese telecommunications and government infrastructure.
While ARTEMIS was designed for research and defense, its performance underscores how rapidly technology is evolving, and how quickly the best human hackers can compete with their own digital apprentices.

