[Bug] KQL Validation Add Wildcard w/ Space token value (#5753)

* [Bug] KQL Validation Add Wildcard w/ Space token value

## Summary
Fixes KQL parser to support wildcard values containing spaces (e.g., `*S3 Browser*`), which work in Kibana but were rejected by our unit tests.

**Issue:** #5750

## Changes

### Grammar (`lib/kql/kql/kql.g`)
- Added `WILDCARD_LITERAL` token with priority 3 to match wildcard patterns containing spaces
- Uses negative lookahead to stop before `or`/`and`/`not` keywords
- Added to `value` rule (not `literal`) so field names remain unaffected

### Parser (`lib/kql/kql/parser.py`)
- Handle new `WILDCARD_LITERAL` token type as wildcards
- Quoted strings (`"*text*"`) now treated as literals, matching Kibana behavior

## Behavior

| Query | Before | After |
|-------|--------|-------|
| `field: *S3 Browser*` |  Parse error |  Wildcard |
| `field: *test*` |  Wildcard |  Wildcard |
| `common.*: value` |  Works |  Works |
| `field: "*text*"` | Wildcard |  Literal (matches Kibana) |

## Test plan
- [x] All 63 existing KQL unit tests pass
- [x] New wildcard-with-spaces patterns parse correctly
- [x] Wildcard field names (`common.*`) still work
- [x] Keywords (`or`, `and`, `not`) correctly recognized as separators
- [x] Tested against rule file from PR #5694

* update pyproject version

* update kibana and kql pyproject.toml versions

update kibana and kql pyproject.toml versions

* update wildcard_literal pattern to account for false matches with leading keywords

Add Negative lookahead at start of Pattern 2 - uses (?!(?:or|and|not)\b) at the start to prevent matching values that begin with keywords like 'not /path*'

* adding NOT keyword token and support for wildcard in the middle of spaced phrase

# KQL Parser Changes - Wildcard Spaces and NOT Prefix Fix

## Overview

This update fixes two issues in the KQL parser:
1. **Wildcard values with spaces** - Values like `*S3 Browser*` now parse correctly
2. **NOT prefix false match** - Values like `not /tmp/go-build*` are no longer incorrectly consumed as a single wildcard literal

## Files Modified

### `lib/kql/kql/kql.g` (Grammar)

**Added `optional_not` rule** to handle `NOT` as an explicit grammar element:
```
?list_of_values: "(" or_list_of_values ")"
| optional_not value
?optional_not: NOT optional_not
|
```

**Expanded `WILDCARD_LITERAL`** with 4 patterns to support all wildcard-with-space cases:

| Pattern | Description | Example |
|---------|-------------|---------|
| 1 | Starts with `*` | `*S3 Browser`, `*S3 Browser*` |
| 2 | Ends with `*` (doesn't start with `*`) | `S3 Browser*` |
| 3a | `*` appears after a space | `S3 B*owser` |
| 3b | `*` appears before a space | `S3* Browser` |

### `lib/kql/kql/parser.py`

Added methods to handle the new grammar rules:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps values with `NotValue`

### `lib/kql/kql/kql2eql.py`

Added corresponding methods for EQL conversion:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps with `eql.ast.Not`

## Test Results

All 63 kuery tests pass. Verified wildcard cases:

| Input | Result |
|-------|--------|
| `field: *S3 Browser*` | `field:*S3\ Browser*` |
| `field: S3 Browser*` | `field:S3\ Browser*` |
| `field: *S3 Browser` | `field:*S3\ Browser` |
| `field: S3 B*owser` | `field:S3\ B*owser` |
| `field: S3* Browser` | `field:S3*\ Browser` |
| `field: foo* bar* baz` | `field:foo*\ bar*\ baz` |
| `process.executable: not /tmp/go-build*` | `not process.executable:/tmp/go-build*` |
| `field < value` | `field < value` (range expression, not wildcard) |

## Technical Notes

### Pattern 3a Fix
Pattern 3a requires at least one character AFTER the `*` (uses `[...]+` instead of `[...]*`). This prevents Pattern 2 from incorrectly matching shorter strings like `S3 B*` when the full value is `S3 B*owser`.

### NOT Keyword Handling
The `optional_not` grammar approach explicitly parses `NOT` as a keyword before the value, preventing it from being consumed as part of a wildcard literal. This is safer than regex-only approaches because:
- `NOT` token only matches the exact word "not" (case-insensitive)
- Values like `notafile*` are still parsed as `UNQUOTED_LITERAL`
- Edge case: literal value "not" must be quoted: `field: "not"`

* Changes to Addresses Review Comments

### Changes to Addresses Review Comments @Mikaayenson

1. **Fixed regex patterns to prevent trailing whitespace capture** (`kql.g`)
   - Added `(?=\s|$|[()":{}])` lookahead to all WILDCARD_LITERAL patterns
   - This ensures patterns stop at boundaries without capturing trailing whitespace

2. **Removed `.rstrip()` workaround** (`parser.py`)
   - No longer needed since regex now handles boundaries correctly

3. **Added explicit WILDCARD_LITERAL handling** (`kql2eql.py`)
   - Now checks `token.type == "WILDCARD_LITERAL"` explicitly
   - Mirrors the approach used in `parser.py`

4. **Added unit tests** (`tests/kuery/test_parser.py`)
   - `test_wildcard_with_spaces` - all 4 WILDCARD_LITERAL patterns
   - `test_wildcard_with_spaces_and_keywords` - wildcards with `and`/`or` boundaries
   - `test_not_prefix_with_wildcard` - NOT keyword not consumed as wildcard
   - `test_quoted_wildcard_as_literal` - quoted wildcards are literal strings
   - `test_triple_not_optimization` - `not not not foo` → `not foo`

* changed test directory from tmp

* changed format of new tests

* Update pyproject.toml

Update pyproject.toml

---------

Co-authored-by: Eric Forte <119343520+eric-forte-elastic@users.noreply.github.com>
This commit is contained in:
Isai
2026-03-18 17:38:24 -04:00
committed by GitHub
parent f84617ba8e
commit 7ae298005d
7 changed files with 147 additions and 9 deletions
+1 -1
View File
@@ -1,6 +1,6 @@
[project]
name = "detection-rules-kibana"
version = "0.4.4"
version = "0.4.5"
description = "Kibana API utilities for Elastic Detection Rules"
license = {text = "Elastic License v2"}
keywords = ["Elastic", "Kibana", "Detection Rules", "Security", "Elasticsearch"]
+19 -1
View File
@@ -15,7 +15,9 @@ field_value_expression: field ":" list_of_values
?value_expression: value
?list_of_values: "(" or_list_of_values ")"
| value
| optional_not value
?optional_not: NOT optional_not
|
?or_list_of_values: and_list_of_values (OR and_list_of_values)*
?and_list_of_values: not_list_of_values (AND not_list_of_values)*
?not_list_of_values: NOT? list_of_values
@@ -23,6 +25,7 @@ field_value_expression: field ":" list_of_values
field: literal
value: QUOTED_STRING
| WILDCARD_LITERAL
| UNQUOTED_LITERAL
@@ -34,6 +37,21 @@ RANGE_OPERATOR: "<="
| "<"
| ">"
// Wildcard literal - for wildcard values containing spaces
// Priority 3 ensures it matches before keywords (priority 2) and unquoted literals
// Uses word boundary \b to stop before 'or', 'and', 'not' keywords
// MUST contain at least one space to differentiate from field names like common.*
// Must not start with not/or/and so "not /tmp/go-build*" is not matched (value is UNQUOTED_LITERAL)
// All patterns end with (?=\s|$|[()":{}]) to prevent capturing trailing whitespace
// Pattern 1: Starts with * (e.g., *S3 Browser, *S3 Browser*)
// Pattern 2: Ends with * but doesn't start with * (e.g., S3 Browser*)
// Pattern 3a: Middle * - star appears AFTER a space (e.g., S3 B*owser)
// Pattern 3b: Middle * - star appears BEFORE a space (e.g., S3* Browser)
WILDCARD_LITERAL.3: /\*[^\s\r\n()"':{}]*(?:\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+)+\*?(?=\s|$|[()":{}])/i
| /(?!(?:not|or|and)\b)[^\s\r\n()"':{}][^\s\r\n()"':{}]*(?:\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+)+\*(?=\s|$|[()":{}])/i
| /(?!(?:not|or|and)\b)[^\s\r\n()"'*:{}][^\s\r\n()"':{}]*\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]*\*[^\s\r\n()"':{}]+(?:\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+)*(?<!\*)(?=\s|$|[()":{}])/i
| /(?!(?:not|or|and)\b)[^\s\r\n()"'*:{}][^\s\r\n()"':{}]*\*[^\s\r\n()"':{}]*\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+(?:\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+)*(?<!\*)(?=\s|$|[()":{}])/i
UNQUOTED_LITERAL: UNQUOTED_CHAR+
UNQUOTED_CHAR: "\\" /[trn]/ // escaped whitespace
| "\\" /[\\():<>"*{}]/ // escaped specials
+33 -3
View File
@@ -72,19 +72,49 @@ class KqlToEQL(BaseKqlParser):
def not_list_of_values(self, tree):
return eql.ast.Not(self.visit(tree.children[-1]))
def list_of_values(self, tree):
if len(tree.children) == 2:
# optional_not value
opt_not_tree, value_tree = tree.children
not_count = self.visit(opt_not_tree)
value = self.visit(value_tree)
for _ in range(not_count):
value = eql.ast.Not(value)
return value
else:
# "(" or_list_of_values ")"
return self.visit(tree.children[1])
def optional_not(self, tree):
count = 0
for child in tree.children:
if hasattr(child, "type") and child.type == "NOT":
count += 1
else:
count += self.visit(child)
return count
def field(self, tree):
literal = self.visit(tree.children[0])
return eql.utils.to_unicode(literal)
def value(self, tree):
# TODO: check the logic for kuery.peg
value = self.unescape_literal(tree.children[0])
if self.scoped_field is None:
raise self.error(tree, "Value not tied to field")
token = tree.children[0]
value = self.unescape_literal(token)
field_name = self.scoped_field
field = self.to_eql_field(field_name)
# Handle wildcard literals (may contain spaces) and unquoted literals with wildcards
if token.type == "WILDCARD_LITERAL" or (token.type == "UNQUOTED_LITERAL" and "*" in str(token.value)):
if eql.utils.is_string(value) and value.replace("*", "").strip() == "":
return eql.ast.IsNotNull(field)
value_ast = eql.ast.Literal.from_python(value)
return eql.ast.FunctionCall("wildcard", [field, value_ast])
value = self.convert_value(field_name, value, tree)
value_ast = eql.ast.Literal.from_python(value)
+32 -2
View File
@@ -338,6 +338,28 @@ class KqlParser(BaseKqlParser):
self.assert_lower_token(*tree.child_tokens)
return NotValue(self.visit(tree.children[-1]))
def list_of_values(self, tree):
if len(tree.children) == 2:
# optional_not value
opt_not_tree, value_tree = tree.children
not_count = self.visit(opt_not_tree)
value = self.visit(value_tree)
for _ in range(not_count):
value = NotValue(value)
return value
else:
# "(" or_list_of_values ")"
return self.visit(tree.children[1])
def optional_not(self, tree):
count = 0
for child in tree.children:
if hasattr(child, "type") and child.type == "NOT":
count += 1
else:
count += self.visit(child)
return count
def literal(self, tree):
return self.unescape_literal(tree.children[0])
@@ -353,9 +375,11 @@ class KqlParser(BaseKqlParser):
token = tree.children[0]
value = self.unescape_literal(token)
if token.type == "UNQUOTED_LITERAL" and "*" in token.value:
# Handle wildcard literals (may contain spaces) and unquoted literals with wildcards
if token.type == "WILDCARD_LITERAL" or (token.type == "UNQUOTED_LITERAL" and "*" in token.value):
field_type = self.get_field_type(field_name)
if len(value.replace("*", "")) == 0:
if len(token.value.replace("*", "").strip()) == 0:
return Exists()
if field_type is not None and field_type not in ("keyword", "wildcard"):
@@ -364,6 +388,12 @@ class KqlParser(BaseKqlParser):
return Wildcard(token.value)
# For quoted strings, treat as literal values (wildcards in quotes are literal in Kibana)
# This bypasses Value.from_python's wildcard conversion for quoted strings
if token.type == "QUOTED_STRING":
value = self.convert_value(field_name, value, tree)
return String(eql.utils.to_unicode(value)) if eql.utils.is_string(value) else Value.from_python(value)
# try to convert the value to the appropriate type
# example: 1 -> "1" if the field is actually keyword
value = self.convert_value(field_name, value, tree)
+1 -1
View File
@@ -1,6 +1,6 @@
[project]
name = "detection-rules-kql"
version = "0.1.9"
version = "0.1.10"
description = "Kibana Query Language parser for Elastic Detection Rules"
license = {text = "Elastic License v2"}
keywords = ["Elastic", "sour", "Detection Rules", "Security", "Elasticsearch", "kql"]
+1 -1
View File
@@ -1,6 +1,6 @@
[project]
name = "detection_rules"
version = "1.6.4"
version = "1.6.5"
description = "Detection Rules is the home for rules used by Elastic Security. This repository is used for the development, maintenance, testing, validation, and release of rules for Elastic Securitys Detection Engine."
readme = "README.md"
requires-python = ">=3.12"
+60
View File
@@ -7,12 +7,16 @@ import unittest
import kql
from kql.ast import (
AndExpr,
Exists,
Field,
FieldComparison,
FieldRange,
NotExpr,
NotValue,
Number,
String,
Wildcard,
)
@@ -112,3 +116,59 @@ class ParserTests(unittest.TestCase):
kql.lark_parse('"Test-ServiceDaclPermission" \nor "Update-ExeFunctions"')
kql.lark_parse('"Test-ServiceDaclPermission" or\n "Update-ExeFunctions"')
kql.lark_parse('"Test-ServiceDaclPermissionOr" or\n "Update-ExeAndFunctions"')
def test_wildcard_with_spaces(self):
"""Test wildcard values containing spaces (WILDCARD_LITERAL patterns)."""
# Pattern 1: Starts with * (e.g., *S3 Browser, *S3 Browser*)
self.validate("field: *S3 Browser*", FieldComparison(Field("field"), Wildcard("*S3 Browser*")))
self.validate("field: *S3 Browser", FieldComparison(Field("field"), Wildcard("*S3 Browser")))
# Pattern 2: Ends with * but doesn't start with * (e.g., S3 Browser*)
self.validate("field: S3 Browser*", FieldComparison(Field("field"), Wildcard("S3 Browser*")))
# Pattern 3a: Middle * - star appears AFTER a space (e.g., S3 B*owser)
self.validate("field: S3 B*owser", FieldComparison(Field("field"), Wildcard("S3 B*owser")))
# Pattern 3b: Middle * - star appears BEFORE a space (e.g., S3* Browser)
self.validate("field: S3* Browser", FieldComparison(Field("field"), Wildcard("S3* Browser")))
# Multiple wildcards with spaces
self.validate("field: foo* bar* baz", FieldComparison(Field("field"), Wildcard("foo* bar* baz")))
def test_wildcard_with_spaces_and_keywords(self):
"""Test wildcard values containing spaces followed by keywords."""
# Wildcard followed by 'and' keyword
result = kql.parse("field: *S3 Browser* and other: value", optimize=False)
self.assertIsInstance(result, AndExpr)
# Wildcard followed by 'or' keyword
result = kql.parse("field: S3 Browser* or other: value", optimize=False)
self.assertIsNotNone(result)
# Keywords inside wildcard values (should be part of the wildcard)
self.validate("field: *or something*", FieldComparison(Field("field"), Wildcard("*or something*")))
self.validate("field: *not this*", FieldComparison(Field("field"), Wildcard("*not this*")))
def test_not_prefix_with_wildcard(self):
"""Test NOT keyword is not consumed as part of wildcard literal."""
# NOT prefix should create NotExpr, not be part of the wildcard
result = kql.parse("process.executable: not /test/go-build*", optimize=False)
self.assertIsInstance(result, FieldComparison)
self.assertIsInstance(result.value, NotValue)
self.assertIsInstance(result.value.value, Wildcard)
self.assertEqual(result.value.value.value, "/test/go-build*")
def test_quoted_wildcard_as_literal(self):
"""Test that quoted wildcards are treated as literal strings, not wildcards."""
# Quoted wildcard should be a String, not a Wildcard
self.validate('field: "*text*"', FieldComparison(Field("field"), String("*text*")))
def test_triple_not_optimization(self):
"""Test that triple NOT optimizes correctly: not(not(not(x))) = not(x)."""
# Triple NOT should optimize to single NOT
result = kql.parse("process.name: not not not foo", optimize=True)
# After optimization, not(not(not(foo))) should become not(foo)
# The structure is NotExpr(FieldComparison(..., String(...)))
self.assertIsInstance(result, NotExpr)
self.assertIsInstance(result.expr, FieldComparison)
self.assertEqual(result.expr.value.value, "foo")