How CodeQL Could Have Detected the Critical n8n Vulnerability (CVE-2026-21858)

When CVE-2026-21858 dropped with a perfect CVSS 10.0 score, I wanted to understand not just what the vulnerability was, but why our existing security tooling missed it. The answer reveals a fundamental limitation in pattern-based static analysis and showcases why CodeQL's semantic analysis approach matters for catching this class of vulnerability.

The Vulnerability: Content-Type Confusion in n8n

CVE-2026-21858, nicknamed "Ni8mare" by the Cyera Labs researchers who discovered it, is an unauthenticated arbitrary file read vulnerability in n8n's form webhook handler that chains into remote code execution.

Affected Versions: All versions below 1.121.0

How the Attack Works

When a legitimate file upload occurs via multipart/form-data, n8n's middleware:

Saves the uploaded file to a random temp path like /tmp/upload_abc123
Populates req.body.files with server-controlled metadata

The vulnerable code blindly trusts this files object:

// VULNERABLE CODE (v1.65.0)
export async function prepareFormReturnItem(context) {
    const bodyData = (context.getBodyData().data as IDataObject) ?? {};
    const files = (context.getBodyData().files as IDataObject) ?? {};
 
    for (const key of Object.keys(files)) {
        const file = files[key];
        // Copies file from filepath to persistent storage
        await context.nodeHelpers.copyBinaryFile(
            file.filepath,  // Attacker controls this!
            file.originalFilename,
            file.mimetype,
        );
    }
}

An attacker can send Content-Type: application/json instead of multipart/form-data:

curl -X POST https://target/webhook/form \
  -H "Content-Type: application/json" \
  -d '{
    "files": {
      "upload": {
        "filepath": "/etc/passwd",
        "originalFilename": "stolen.txt"
      }
    }
  }'

Since the code never validates the Content-Type, it processes the JSON body identically to a real file upload, copying /etc/passwd (or any file) to attacker-accessible storage.

The Impact Chain

Arbitrary File Read → Steal n8n credentials/keys → Forge admin session →
Create malicious workflow → Remote Code Execution

This achieves CVSS 10.0 because it's:

Unauthenticated: No login required
Network exploitable: Simple HTTP request
Full compromise: Chains to RCE

The Fix: A Single Validation Check

The fix in v1.121.0 adds content-type validation before processing:

// FIXED CODE (v1.121.0)
export async function prepareFormReturnItem(context) {
    const req = context.getRequestObject();
 
    // THE FIX: Validate content-type first
    a.ok(req.contentType === 'multipart/form-data',
         'Expected multipart/form-data');
 
    const bodyData = (context.getBodyData().data as IDataObject) ?? {};
    const files = (context.getBodyData().files as IDataObject) ?? {};
    // ... rest of code
}

Now JSON requests are rejected before files is ever accessed.

Why Semgrep Missed This

When I investigated our Semgrep SAST findings, I discovered we had scanned over 3,700 findings in the n8n codebase, but none flagged this vulnerability. The vulnerable files were being scanned - they just produced zero findings.

Running Semgrep locally confirmed this:

$ semgrep --config=auto packages/nodes-base/nodes/Form/utils/utils.ts
Ran 299 rules on 1 file: 0 findings.

The fundamental issue: Semgrep detects the presence of bad patterns, not the absence of good patterns.

What Semgrep Can Match

# Semgrep can only ask: "Does this pattern exist?"
pattern: $X.getBodyData().files

This matches both vulnerable and fixed code because both contain getBodyData().files. Semgrep cannot express "this pattern exists WITHOUT that other pattern before it."

The Pattern Matching Limitation

What Semgrep CAN Detect	What Semgrep CANNOT Detect
`eval(userInput)` - dangerous function	Missing input validation
SQL injection patterns	Missing authentication check
`dangerouslySetInnerHTML` - XSS sink	Missing content-type verification
`child_process.exec(cmd)`	Missing authorization

The vulnerable code has no dangerous pattern - it's just normal TypeScript accessing object properties:

const files = (context.getBodyData().files as IDataObject) ?? {};

This is perfectly valid, safe-looking code. The vulnerability is that a validation check is missing before this line.

Why CodeQL Can Detect This

CodeQL operates fundamentally differently from pattern matchers. It performs taint tracking with barriers - tracing data flow and recognizing when sanitizers block dangerous paths.

The Taint Tracking Model

┌─────────────────────────────────────────────────────────────────┐
│                    CodeQL Taint Analysis                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  SOURCE: context.getBodyData().files                            │
│          └─ Data marked as "tainted" (attacker-controlled)      │
│                                                                 │
│  BARRIER: a.ok(req.contentType === 'multipart/form-data')       │
│           └─ If present before sink, blocks taint flow          │
│                                                                 │
│  SINK: copyBinaryFile(file.filepath, ...)                       │
│        └─ Dangerous file operation                              │
│                                                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  VULNERABLE:  Source ────────────────────────────► Sink         │
│               (no barrier)                         ALERT        │
│                                                                 │
│  FIXED:       Source ────► Barrier ────X (blocked)              │
│                                                    SAFE         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

CodeQL can express: "Alert if data flows from A to B WITHOUT passing through C" - exactly what's needed for this CVE.

Writing the Custom CodeQL Query

I wrote a CodeQL query specifically for this vulnerability pattern:

/**
 * @name CVE-2026-21858: Arbitrary file read via content-type confusion
 * @description Detects when getBodyData().files is accessed without
 *              validating content-type, enabling arbitrary file read
 * @kind problem
 * @problem.severity error
 * @security-severity 10.0
 * @precision high
 * @id js/n8n-content-type-confusion
 * @tags security
 *       external/cwe/cwe-434
 */
 
import javascript
import semmle.javascript.security.dataflow.TaintedPathQuery
 
/**
 * A source of tainted file data from getBodyData().files
 */
class GetBodyDataFilesAccess extends DataFlow::Node {
  GetBodyDataFilesAccess() {
    exists(DataFlow::CallNode getBodyDataCall, DataFlow::PropRead filesRead |
      getBodyDataCall.getCalleeName() = "getBodyData" and
      filesRead.getBase().getALocalSource() = getBodyDataCall and
      filesRead.getPropertyName() = "files" and
      this = filesRead
    )
  }
}
 
/**
 * A sink where file paths are used in file operations
 */
class FilePathSink extends DataFlow::Node {
  FilePathSink() {
    exists(DataFlow::CallNode call |
      call.getCalleeName() = "copyBinaryFile" and
      this = call.getArgument(0)
    )
  }
}
 
/**
 * Content-type validation that acts as a barrier
 */
class ContentTypeValidation extends DataFlow::CallNode {
  ContentTypeValidation() {
    (this.getCalleeName() = "ok" or this.getCalleeName() = "assert") and
    exists(DataFlow::Node arg |
      arg = this.getAnArgument() and
      arg.toString().matches("%contentType%multipart/form-data%")
    )
  }
}
 
/**
 * Configuration for tracking tainted file paths
 */
module ContentTypeConfusionConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source instanceof GetBodyDataFilesAccess
  }
 
  predicate isSink(DataFlow::Node sink) {
    sink instanceof FilePathSink or
    exists(DataFlow::PropRead pr |
      pr.getPropertyName() = "filepath" and
      sink = pr
    )
  }
 
  predicate isBarrier(DataFlow::Node node) {
    exists(ContentTypeValidation validation |
      validation.getFile() = node.getFile() and
      validation.getLocation().getStartLine() <
        node.getLocation().getStartLine()
    )
  }
}
 
module ContentTypeConfusionFlow = TaintTracking::Global<ContentTypeConfusionConfig>;
 
from DataFlow::Node source, DataFlow::Node sink
where ContentTypeConfusionFlow::flow(source, sink)
select sink,
  "Potential arbitrary file read: getBodyData().files accessed without " +
  "content-type validation, allowing attacker to specify arbitrary file paths."

Query Components Explained

Component	What It Matches	Purpose
Source	`context.getBodyData().files`	Where attacker-controlled data enters
Sink	`copyBinaryFile(file.filepath, ...)`	Where the data is used dangerously
Barrier	`a.ok(req.contentType === 'multipart/form-data')`	Validation that makes it safe

Note: The barrier detection here is simplified - it checks if a content-type validation appears earlier in the same file, not whether it's actually in the control flow path. A production query would use CodeQL's control flow analysis to verify the validation dominates the sink. For n8n's codebase where the validation and file access are in the same function, this simplified approach works.

Validating the Query

I created test fixtures for both the vulnerable and fixed versions:

Vulnerable Version (v1.65.0 style)

// utils.ts - VULNERABLE
export async function prepareFormReturnItem(
    context: IWebhookFunctions,
    formFields: FormFieldsParameter,
    mode: 'test' | 'production',
) {
    const bodyData = (context.getBodyData().data as IDataObject) ?? {};
    const files = (context.getBodyData().files as IDataObject) ?? {};
 
    for (const key of Object.keys(files)) {
        const file = files[key] as FileUpload;
        await context.nodeHelpers.copyBinaryFile(
            file.filepath,
            file.originalFilename,
            file.mimetype,
        );
    }
}

Fixed Version (v1.121.0 style)

// utils.ts - FIXED
import * as a from 'node:assert';
 
export async function prepareFormReturnItem(
    context: IWebhookFunctions,
    formFields: FormFieldsParameter,
    mode: 'test' | 'production',
) {
    const req = context.getRequestObject() as MultiPartFormData.Request;
    a.ok(req.contentType === 'multipart/form-data',
         'Expected multipart/form-data');
 
    const bodyData = (context.getBodyData().data as IDataObject) ?? {};
    const files = (context.getBodyData().files as IDataObject) ?? {};
    // ... same file processing code
}

Running the Tests

# Create CodeQL databases
$ codeql database create vulnerable-db --language=javascript \
    --source-root=vulnerable
$ codeql database create fixed-db --language=javascript \
    --source-root=fixed
 
# Run query against vulnerable version
$ codeql database analyze vulnerable-db cve-2026-21858.ql \
    --format=sarifv2.1.0 --output=vulnerable-results.sarif

Results

Version	Content-Type Check	CodeQL Result
Vulnerable (v1.65.0)	Missing	1 finding - CVE detected
Fixed (v1.121.0)	Present	0 findings - Barrier blocks flow

The SARIF output for the vulnerable version:

{
  "runs": [{
    "results": [{
      "ruleId": "js/n8n-content-type-confusion",
      "message": {
        "text": "Potential arbitrary file read: getBodyData().files accessed
                 without content-type validation"
      },
      "locations": [{
        "physicalLocation": {
          "artifactLocation": { "uri": "utils.ts" },
          "region": { "startLine": 29 }
        }
      }]
    }]
  }]
}

Semgrep vs CodeQL: Capability Comparison

Capability	Semgrep	CodeQL
Pattern matching	Yes	Yes
Data flow tracking	Limited	Full
Taint analysis	Basic	Advanced
Barrier/Sanitizer detection	No	Yes
"Flow without validation" queries	No	Yes
Custom framework modeling	Limited	Extensive
Query language expressiveness	YAML-based	Full QL (Datalog variant)

When Would CodeQL Have Caught This?

The honest answer: probably not with default rules.

CodeQL's power lies in its ability to write queries like the one above, but:

No built-in rule exists for this n8n-specific pattern
Custom APIs like getBodyData() and copyBinaryFile() aren't standard Node.js APIs
Someone needs to write the query - either security researchers or the n8n team

However, once written, this query could be added to n8n's CI/CD pipeline, shared with the security community, or generalized for similar content-type confusion patterns in other codebases.

The Practical Reality

Here's the problem: most engineering teams aren't going to write custom CodeQL queries. It took me about an hour to develop and validate this one, and I already knew exactly what vulnerability I was looking for. Proactively discovering new vulnerability classes through custom static analysis requires security expertise that many teams don't have in-house.

This is exactly why we built Waclaude. Rather than expecting every team to become CodeQL experts, Waclaude aggregates findings from multiple security tools, correlates them with known CVEs, and helps teams prioritize and auto-remediate vulnerabilities before they ship. When CVE-2026-21858 dropped, teams using Waclaude with SCA scanning were alerted immediately if they were running vulnerable n8n versions.

Different security tools have fundamentally different detection capabilities. Pattern matchers find bad code. Semantic analyzers find missing code. SCA tools find known vulnerabilities in dependencies. No single tool catches everything, and understanding these gaps is the first step to closing them.