At Symbiotic Security, we’re building more than just tools that automatically fix vulnerable code. We’re building systems that help developers understand why their code is vulnerable by taking an educational approach, complete with AI-driven insights and guidance.
Together with my incredible team, we’ve been working on AI-powered remediation leveraging Large Language Models (LLMs) for the official launch of version 1 of our code security solution - a topic which you can (and should) read more about here. In particular, we’ve focused on Infrastructure as Code (IaC) remediation, a topic where very little research and data exist. Here’s what we’ve learned so far:
Our first attempt was simple: detect a vulnerability, hand it to an LLM, and ask for a fix. We tried this with tools like Copilot and a variety of local models. The results weren’t terrible: between 65 and 75 percent of the AI-generated fixes were accepted by developers without any changes.
But in security, “good enough” is never actually good enough. Developers need a high level of confidence that a fix is correct, safe, and won’t cause downstream issues. Anything less adds friction and erodes trust in the system.
If you’re working with an off-the-shelf LLM, results will vary significantly depending on the model’s training data, reasoning capabilities, and fine-tuning.
To test this, we ran an experiment on 13 different LLMs, asking them to fix specific vulnerabilities. Each model got four attempts per vulnerability, and we measured:
Here are our full findings:
Here are a few key takeaways:
When models failed, it often wasn’t because they didn’t try to fix the problem. Instead, they introduced syntax errors, created new vulnerabilities in the process, or suggested solutions that just didn’t hold up in the real world. That’s why we added a post-processing step to sanitize AI-generated code - without it, the failure rate would have been much higher.
After a lot of trial and error, here are a few key things that have made a real difference in our results:
Especially in IaC security, we’ve seen clear diminishing returns. Larger LLMs often perform no better than smaller ones, likely because they’re all drawing from the same limited pool of public IaC data.
Instead of relying on generic inference, our approach enhances the LLM with structured context:
• Detailed descriptions of the vulnerability
• Examples of secure code (as close as possible to the vulnerable snippet)
• Guided remediation steps
By augmenting LLMs rather than relying on their pre-trained security knowledge, we saw a major improvement in results.
Rather than a single AI model, specialized AI agents drastically improve reliability:
Beyond just automated remediation, we provide an interactive AI chat so developers can:
Below is an example of AI remediation and interaction with developers together, where more context equals better AI remediation:
To see it in action for yourself, check out the demos on this page here.
For IaC security remediation, larger LLMs don’t necessarily perform better. There’s a clear plateau effect, showing that classical AI models - trained mostly on publicly available data - have reached their limit in security remediation. Since these models rely on generic security knowledge from open-source repositories, documentation, and research papers, they struggle with real-world, context-specific vulnerabilities.
The next leap in AI remediation isn’t about bigger models, it’s about better context. Specific, high-quality contextual data is now the true fuel for improving AI performance. By injecting precise vulnerability descriptions, secure code patterns, and real-world remediation strategies, we can bypass the plateau and unlock significantly better results.