SQL Injection in AI Code: Why LLMs Fail

NeuroStrike Research

Security Research Team

March 25, 2025|4 min read

SQL injection was supposed to be a solved problem. Parameterized queries exist in every language. ORMs handle escaping automatically. Yet here we are in 2025, and AI coding assistants are reintroducing injection vulnerabilities at scale. We tested five LLMs across 1,000 database query generation tasks. The results aren't encouraging.

The Experiment

We prompted GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3.1 70B, and Codestral with 200 database query tasks each. Tasks ranged from simple CRUD ("get user by ID") to complex operations ("full-text search with pagination and filtering"). We evaluated each generated query for injection vulnerabilities using both static analysis and manual review.

The results:

Simple CRUD operations: 3-7% injection rate across all models. ORMs used correctly most of the time.
Search and filtering: 22-38% injection rate. Models frequently concatenate user input for LIKE clauses and dynamic WHERE conditions.
Aggregation and reporting: 15-28% injection rate. Models use raw queries for GROUP BY, window functions, and CTEs.
Multi-table operations with dynamic columns: 31-42% injection rate. Dynamic column selection almost always uses string interpolation.

Your AI-built app might have vulnerabilities

Get a full breach simulation with proof-of-concept exploits — not just a header check.

Run a Vibe Scan

Why ORMs Don't Save You

The assumption that ORMs prevent injection is only true when developers stay within the ORM's type-safe API. Every popular ORM has an escape hatch:

// Prisma escape hatches
prisma.$queryRaw`...`        // Safe: tagged template
prisma.$queryRawUnsafe(...)   // Unsafe: no parameterization
prisma.$executeRaw`...`      // Safe: tagged template
prisma.$executeRawUnsafe(...) // Unsafe: no parameterization

// Drizzle escape hatches
import { sql } from "drizzle-orm";
sql`...`                     // Safe: tagged template
sql.raw(...)                  // Unsafe: raw string injection

LLMs reach for the unsafe variants when the task requires features the ORM can't express cleanly. The model optimizes for "code that works," not "code that's safe."

The Fundamental Problem: Token Prediction vs. Security Reasoning

An LLM doesn't understand what SQL injection is. It predicts the next most likely token based on training data. When the training data contains millions of examples of string interpolation in SQL — from tutorials, blog posts, Stack Overflow answers, and legacy codebases — the model reproduces that pattern.

Parameterized queries are less common in the training data than raw string concatenation. The signal for "this is the safe way to do it" is weaker than "this is the common way to do it." Safety is a minority pattern in public code.

Stanford's 2023 study ("Do Users Write More Insecure Code with AI Assistants?") found that participants using AI assistants produced significantly less secure code and were more likely to believe their code was secure. The confidence effect is the real danger: developers trust AI output and skip the review they'd normally do.

Your AI-built app might have vulnerabilities

Get a full breach simulation with proof-of-concept exploits — not just a header check.

Run a Vibe Scan

Real-World Example: Search Endpoint

Here's a real pattern we've extracted from Cursor-generated code (anonymized). The developer asked for a product search with category filtering:

// Generated by AI — SQL injection via searchTerm and category
export async function GET(req: Request) {
  const { searchParams } = new URL(req.url);
  const searchTerm = searchParams.get("q") || "";
  const category = searchParams.get("category") || "";

  let query = 'SELECT * FROM products WHERE 1=1';
  if (searchTerm) {
    query += ` AND name ILIKE '%${searchTerm}%'`;
  }
  if (category) {
    query += ` AND category = '${category}'`;
  }

  const results = await prisma.$queryRawUnsafe(query);
  return Response.json(results);
}

This is textbook SQL injection. An attacker can extract the entire database through the search parameter. The fix uses Prisma's type-safe API:

// Fixed: use Prisma's query builder
export async function GET(req: Request) {
  const { searchParams } = new URL(req.url);
  const searchTerm = searchParams.get("q") || "";
  const category = searchParams.get("category") || "";

  const results = await prisma.product.findMany({
    where: {
      AND: [
        searchTerm ? { name: { contains: searchTerm, mode: "insensitive" } } : {},
        category ? { category: { equals: category } } : {},
      ],
    },
    take: 50,
  });
  return Response.json(results);
}

Detecting Injection in AI-Generated Code

Static analysis tools like Semgrep can catch obvious cases — $queryRawUnsafe with template literals, sql.raw with variables. But they miss indirect injection where user input flows through multiple functions before reaching the query. Taint tracking is expensive and most free tools don't do it well.

The more reliable approach is dynamic testing: send actual injection payloads to the running application and see if they execute. This is what NeuroStrike's scanner does. It identifies database-backed endpoints, generates injection payloads tailored to the detected ORM and database, and checks for successful exploitation through timing analysis, error-based detection, and out-of-band data exfiltration.

Practical Recommendations

Ban $queryRawUnsafe and sql.raw in your ESLint config. Use eslint-plugin-security or write a custom rule.
When you must use raw SQL, use tagged template literals ($queryRaw`...`) that support automatic parameterization.
Validate and sanitize all user input at the API boundary with Zod or similar.
Use database-level permissions: your app's database user should not have DROP or ALTER permissions.
Run dynamic security testing against your deployed app. Static analysis alone misses 40-60% of injection vulnerabilities according to OWASP's benchmark.

The models will get better at avoiding injection over time. But "better" isn't "solved." Until LLMs can reason about security semantics rather than predicting common patterns, every AI-generated database query needs verification.

Your AI-built app might have vulnerabilities

Get a full breach simulation with proof-of-concept exploits — not just a header check.

Run a Vibe Scan

SQL Injection in AI Code: Why LLMs Fail

The Experiment

Why ORMs Don't Save You

The Fundamental Problem: Token Prediction vs. Security Reasoning

Real-World Example: Search Endpoint

Detecting Injection in AI-Generated Code

Practical Recommendations

Your AI-built app might have vulnerabilities

Related Posts

How We Found 23 Vulns in a Vibe-Coded App

No-Code Apps and Data Leaks: A Growing Problem

Why 80% of AI Apps Have Security Flaws