When CVE-2026-21858 dropped with a perfect CVSS 10.0 score, I wanted to understand not just what the vulnerability was, but why our existing security tooling missed it. The answer reveals a fundamental limitation in pattern-based static analysis and showcases why CodeQL's semantic analysis approach matters for catching this class of vulnerability.
The Vulnerability: Content-Type Confusion in n8n
CVE-2026-21858, nicknamed "Ni8mare" by the Cyera Labs researchers who discovered it, is an unauthenticated arbitrary file read vulnerability in n8n's form webhook handler that chains into remote code execution.
Affected Versions: All versions below 1.121.0
How the Attack Works
When a legitimate file upload occurs via multipart/form-data, n8n's middleware:
- Saves the uploaded file to a random temp path like
/tmp/upload_abc123 - Populates
req.body.fileswith server-controlled metadata
The vulnerable code blindly trusts this files object:
// VULNERABLE CODE (v1.65.0)
export async function prepareFormReturnItem(context) {
const bodyData = (context.getBodyData().data as IDataObject) ?? {};
const files = (context.getBodyData().files as IDataObject) ?? {};
for (const key of Object.keys(files)) {
const file = files[key];
// Copies file from filepath to persistent storage
await context.nodeHelpers.copyBinaryFile(
file.filepath, // Attacker controls this!
file.originalFilename,
file.mimetype,
);
}
}An attacker can send Content-Type: application/json instead of multipart/form-data:
curl -X POST https://target/webhook/form \
-H "Content-Type: application/json" \
-d '{
"files": {
"upload": {
"filepath": "/etc/passwd",
"originalFilename": "stolen.txt"
}
}
}'Since the code never validates the Content-Type, it processes the JSON body identically to a real file upload, copying /etc/passwd (or any file) to attacker-accessible storage.
The Impact Chain
Arbitrary File Read → Steal n8n credentials/keys → Forge admin session →
Create malicious workflow → Remote Code ExecutionThis achieves CVSS 10.0 because it's:
- Unauthenticated: No login required
- Network exploitable: Simple HTTP request
- Full compromise: Chains to RCE
The Fix: A Single Validation Check
The fix in v1.121.0 adds content-type validation before processing:
// FIXED CODE (v1.121.0)
export async function prepareFormReturnItem(context) {
const req = context.getRequestObject();
// THE FIX: Validate content-type first
a.ok(req.contentType === 'multipart/form-data',
'Expected multipart/form-data');
const bodyData = (context.getBodyData().data as IDataObject) ?? {};
const files = (context.getBodyData().files as IDataObject) ?? {};
// ... rest of code
}Now JSON requests are rejected before files is ever accessed.
Why Semgrep Missed This
When I investigated our Semgrep SAST findings, I discovered we had scanned over 3,700 findings in the n8n codebase, but none flagged this vulnerability. The vulnerable files were being scanned - they just produced zero findings.
Running Semgrep locally confirmed this:
$ semgrep --config=auto packages/nodes-base/nodes/Form/utils/utils.ts
Ran 299 rules on 1 file: 0 findings.The fundamental issue: Semgrep detects the presence of bad patterns, not the absence of good patterns.
What Semgrep Can Match
# Semgrep can only ask: "Does this pattern exist?"
pattern: $X.getBodyData().filesThis matches both vulnerable and fixed code because both contain getBodyData().files. Semgrep cannot express "this pattern exists WITHOUT that other pattern before it."
The Pattern Matching Limitation
| What Semgrep CAN Detect | What Semgrep CANNOT Detect |
|---|---|
eval(userInput) - dangerous function | Missing input validation |
| SQL injection patterns | Missing authentication check |
dangerouslySetInnerHTML - XSS sink | Missing content-type verification |
child_process.exec(cmd) | Missing authorization |
The vulnerable code has no dangerous pattern - it's just normal TypeScript accessing object properties:
const files = (context.getBodyData().files as IDataObject) ?? {};This is perfectly valid, safe-looking code. The vulnerability is that a validation check is missing before this line.
Why CodeQL Can Detect This
CodeQL operates fundamentally differently from pattern matchers. It performs taint tracking with barriers - tracing data flow and recognizing when sanitizers block dangerous paths.
The Taint Tracking Model
┌─────────────────────────────────────────────────────────────────┐
│ CodeQL Taint Analysis │
├─────────────────────────────────────────────────────────────────┤
│ │
│ SOURCE: context.getBodyData().files │
│ └─ Data marked as "tainted" (attacker-controlled) │
│ │
│ BARRIER: a.ok(req.contentType === 'multipart/form-data') │
│ └─ If present before sink, blocks taint flow │
│ │
│ SINK: copyBinaryFile(file.filepath, ...) │
│ └─ Dangerous file operation │
│ │
├─────────────────────────────────────────────────────────────────┤
│ │
│ VULNERABLE: Source ────────────────────────────► Sink │
│ (no barrier) ALERT │
│ │
│ FIXED: Source ────► Barrier ────X (blocked) │
│ SAFE │
│ │
└─────────────────────────────────────────────────────────────────┘CodeQL can express: "Alert if data flows from A to B WITHOUT passing through C" - exactly what's needed for this CVE.
Writing the Custom CodeQL Query
I wrote a CodeQL query specifically for this vulnerability pattern:
/**
* @name CVE-2026-21858: Arbitrary file read via content-type confusion
* @description Detects when getBodyData().files is accessed without
* validating content-type, enabling arbitrary file read
* @kind problem
* @problem.severity error
* @security-severity 10.0
* @precision high
* @id js/n8n-content-type-confusion
* @tags security
* external/cwe/cwe-434
*/
import javascript
import semmle.javascript.security.dataflow.TaintedPathQuery
/**
* A source of tainted file data from getBodyData().files
*/
class GetBodyDataFilesAccess extends DataFlow::Node {
GetBodyDataFilesAccess() {
exists(DataFlow::CallNode getBodyDataCall, DataFlow::PropRead filesRead |
getBodyDataCall.getCalleeName() = "getBodyData" and
filesRead.getBase().getALocalSource() = getBodyDataCall and
filesRead.getPropertyName() = "files" and
this = filesRead
)
}
}
/**
* A sink where file paths are used in file operations
*/
class FilePathSink extends DataFlow::Node {
FilePathSink() {
exists(DataFlow::CallNode call |
call.getCalleeName() = "copyBinaryFile" and
this = call.getArgument(0)
)
}
}
/**
* Content-type validation that acts as a barrier
*/
class ContentTypeValidation extends DataFlow::CallNode {
ContentTypeValidation() {
(this.getCalleeName() = "ok" or this.getCalleeName() = "assert") and
exists(DataFlow::Node arg |
arg = this.getAnArgument() and
arg.toString().matches("%contentType%multipart/form-data%")
)
}
}
/**
* Configuration for tracking tainted file paths
*/
module ContentTypeConfusionConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof GetBodyDataFilesAccess
}
predicate isSink(DataFlow::Node sink) {
sink instanceof FilePathSink or
exists(DataFlow::PropRead pr |
pr.getPropertyName() = "filepath" and
sink = pr
)
}
predicate isBarrier(DataFlow::Node node) {
exists(ContentTypeValidation validation |
validation.getFile() = node.getFile() and
validation.getLocation().getStartLine() <
node.getLocation().getStartLine()
)
}
}
module ContentTypeConfusionFlow = TaintTracking::Global<ContentTypeConfusionConfig>;
from DataFlow::Node source, DataFlow::Node sink
where ContentTypeConfusionFlow::flow(source, sink)
select sink,
"Potential arbitrary file read: getBodyData().files accessed without " +
"content-type validation, allowing attacker to specify arbitrary file paths."Query Components Explained
| Component | What It Matches | Purpose |
|---|---|---|
| Source | context.getBodyData().files | Where attacker-controlled data enters |
| Sink | copyBinaryFile(file.filepath, ...) | Where the data is used dangerously |
| Barrier | a.ok(req.contentType === 'multipart/form-data') | Validation that makes it safe |
Note: The barrier detection here is simplified - it checks if a content-type validation appears earlier in the same file, not whether it's actually in the control flow path. A production query would use CodeQL's control flow analysis to verify the validation dominates the sink. For n8n's codebase where the validation and file access are in the same function, this simplified approach works.
Validating the Query
I created test fixtures for both the vulnerable and fixed versions:
Vulnerable Version (v1.65.0 style)
// utils.ts - VULNERABLE
export async function prepareFormReturnItem(
context: IWebhookFunctions,
formFields: FormFieldsParameter,
mode: 'test' | 'production',
) {
const bodyData = (context.getBodyData().data as IDataObject) ?? {};
const files = (context.getBodyData().files as IDataObject) ?? {};
for (const key of Object.keys(files)) {
const file = files[key] as FileUpload;
await context.nodeHelpers.copyBinaryFile(
file.filepath,
file.originalFilename,
file.mimetype,
);
}
}Fixed Version (v1.121.0 style)
// utils.ts - FIXED
import * as a from 'node:assert';
export async function prepareFormReturnItem(
context: IWebhookFunctions,
formFields: FormFieldsParameter,
mode: 'test' | 'production',
) {
const req = context.getRequestObject() as MultiPartFormData.Request;
a.ok(req.contentType === 'multipart/form-data',
'Expected multipart/form-data');
const bodyData = (context.getBodyData().data as IDataObject) ?? {};
const files = (context.getBodyData().files as IDataObject) ?? {};
// ... same file processing code
}Running the Tests
# Create CodeQL databases
$ codeql database create vulnerable-db --language=javascript \
--source-root=vulnerable
$ codeql database create fixed-db --language=javascript \
--source-root=fixed
# Run query against vulnerable version
$ codeql database analyze vulnerable-db cve-2026-21858.ql \
--format=sarifv2.1.0 --output=vulnerable-results.sarifResults
| Version | Content-Type Check | CodeQL Result |
|---|---|---|
| Vulnerable (v1.65.0) | Missing | 1 finding - CVE detected |
| Fixed (v1.121.0) | Present | 0 findings - Barrier blocks flow |
The SARIF output for the vulnerable version:
{
"runs": [{
"results": [{
"ruleId": "js/n8n-content-type-confusion",
"message": {
"text": "Potential arbitrary file read: getBodyData().files accessed
without content-type validation"
},
"locations": [{
"physicalLocation": {
"artifactLocation": { "uri": "utils.ts" },
"region": { "startLine": 29 }
}
}]
}]
}]
}Semgrep vs CodeQL: Capability Comparison
| Capability | Semgrep | CodeQL |
|---|---|---|
| Pattern matching | Yes | Yes |
| Data flow tracking | Limited | Full |
| Taint analysis | Basic | Advanced |
| Barrier/Sanitizer detection | No | Yes |
| "Flow without validation" queries | No | Yes |
| Custom framework modeling | Limited | Extensive |
| Query language expressiveness | YAML-based | Full QL (Datalog variant) |
When Would CodeQL Have Caught This?
The honest answer: probably not with default rules.
CodeQL's power lies in its ability to write queries like the one above, but:
- No built-in rule exists for this n8n-specific pattern
- Custom APIs like
getBodyData()andcopyBinaryFile()aren't standard Node.js APIs - Someone needs to write the query - either security researchers or the n8n team
However, once written, this query could be added to n8n's CI/CD pipeline, shared with the security community, or generalized for similar content-type confusion patterns in other codebases.
The Practical Reality
Here's the problem: most engineering teams aren't going to write custom CodeQL queries. It took me about an hour to develop and validate this one, and I already knew exactly what vulnerability I was looking for. Proactively discovering new vulnerability classes through custom static analysis requires security expertise that many teams don't have in-house.
This is exactly why we built Waclaude. Rather than expecting every team to become CodeQL experts, Waclaude aggregates findings from multiple security tools, correlates them with known CVEs, and helps teams prioritize and auto-remediate vulnerabilities before they ship. When CVE-2026-21858 dropped, teams using Waclaude with SCA scanning were alerted immediately if they were running vulnerable n8n versions.
Different security tools have fundamentally different detection capabilities. Pattern matchers find bad code. Semantic analyzers find missing code. SCA tools find known vulnerabilities in dependencies. No single tool catches everything, and understanding these gaps is the first step to closing them.