Cloudflare has released new findings on the growing role of frontier AI models in cybersecurity research, highlighting both the opportunities and risks these advanced systems create for defenders. The insights were published through the company’s Project Glasswing initiative, which evaluates how cyber-focused AI models perform in real-world security environments.
According to Cloudflare, the rapid acceleration of cyber threats means organizations can no longer rely solely on faster detection and response times. Instead, the company argues that security teams must focus on designing systems that are resilient by default, making exploitation significantly harder even when vulnerabilities exist. Cloudflare noted that shrinking attacker timelines require defenders to prioritize durable engineering and resilience over reactive speed alone.
As part of the research, Cloudflare tested an AI model called Mythos against live production code across several critical infrastructure areas, including runtime systems, edge data paths, protocol stacks, control planes, and open-source dependencies. One of the company’s most notable findings was the model’s ability to combine several seemingly low-severity vulnerabilities into a single, more dangerous exploit chain. While other AI systems were able to identify individual bugs, Cloudflare said Mythos demonstrated a stronger capability to connect those weaknesses into coordinated attack paths that traditional assessments might overlook.
The company also raised concerns about the unpredictability of AI safety mechanisms during cybersecurity tasks. Researchers observed inconsistent refusal behaviors from the model, where it sometimes declined to perform vulnerability analysis without any clear or transparent policy explanation. In one example, Mythos initially refused to analyze a codebase for vulnerabilities but later completed the same task after researchers removed a hidden .git folder, despite no changes being made to the actual code under review.
Cloudflare further emphasized that human oversight remains critical when using AI for security research. The company reported that Mythos generated a large number of speculative findings and false positives, particularly in memory-unsafe programming languages such as C and C++. This tendency toward over-reporting created substantial triage work for human analysts, reducing operational efficiency and increasing the burden on security teams tasked with separating legitimate threats from noise.
The findings from Project Glasswing reflect a broader challenge facing the cybersecurity industry as AI models become increasingly sophisticated in offensive and defensive security applications. Cloudflare’s research suggests that while frontier AI systems may significantly improve vulnerability discovery and threat analysis, organizations must also address governance, reliability, and operational resilience to safely integrate these technologies into real-world security workflows.

