[Bug] KQL Validation Add Wildcard w/ Space token value (#5753)

* [Bug] KQL Validation Add Wildcard w/ Space token value ## Summary Fixes KQL parser to support wildcard values containing spaces (e.g., `*S3 Browser*`), which work in Kibana but were rejected by our unit tests. **Issue:** #5750 ## Changes ### Grammar (`lib/kql/kql/kql.g`) - Added `WILDCARD_LITERAL` token with priority 3 to match wildcard patterns containing spaces - Uses negative lookahead to stop before `or`/`and`/`not` keywords - Added to `value` rule (not `literal`) so field names remain unaffected ### Parser (`lib/kql/kql/parser.py`) - Handle new `WILDCARD_LITERAL` token type as wildcards - Quoted strings (`"*text*"`) now treated as literals, matching Kibana behavior ## Behavior | Query | Before | After | |-------|--------|-------| | `field: *S3 Browser*` | ❌ Parse error | ✅ Wildcard | | `field: *test*` | ✅ Wildcard | ✅ Wildcard | | `common.*: value` | ✅ Works | ✅ Works | | `field: "*text*"` | Wildcard | ✅ Literal (matches Kibana) | ## Test plan - [x] All 63 existing KQL unit tests pass - [x] New wildcard-with-spaces patterns parse correctly - [x] Wildcard field names (`common.*`) still work - [x] Keywords (`or`, `and`, `not`) correctly recognized as separators - [x] Tested against rule file from PR #5694 * update pyproject version * update kibana and kql pyproject.toml versions update kibana and kql pyproject.toml versions * update wildcard_literal pattern to account for false matches with leading keywords Add Negative lookahead at start of Pattern 2 - uses (?!(?:or|and|not)\b) at the start to prevent matching values that begin with keywords like 'not /path*' * adding NOT keyword token and support for wildcard in the middle of spaced phrase # KQL Parser Changes - Wildcard Spaces and NOT Prefix Fix ## Overview This update fixes two issues in the KQL parser: 1. **Wildcard values with spaces** - Values like `*S3 Browser*` now parse correctly 2. **NOT prefix false match** - Values like `not /tmp/go-build*` are no longer incorrectly consumed as a single wildcard literal ## Files Modified ### `lib/kql/kql/kql.g` (Grammar) **Added `optional_not` rule** to handle `NOT` as an explicit grammar element: ``` ?list_of_values: "(" or_list_of_values ")" | optional_not value ?optional_not: NOT optional_not | ``` **Expanded `WILDCARD_LITERAL`** with 4 patterns to support all wildcard-with-space cases: | Pattern | Description | Example | |---------|-------------|---------| | 1 | Starts with `*` | `*S3 Browser`, `*S3 Browser*` | | 2 | Ends with `*` (doesn't start with `*`) | `S3 Browser*` | | 3a | `*` appears after a space | `S3 B*owser` | | 3b | `*` appears before a space | `S3* Browser` | ### `lib/kql/kql/parser.py` Added methods to handle the new grammar rules: - `list_of_values()` - handles `optional_not value` structure - `optional_not()` - counts NOT occurrences and wraps values with `NotValue` ### `lib/kql/kql/kql2eql.py` Added corresponding methods for EQL conversion: - `list_of_values()` - handles `optional_not value` structure - `optional_not()` - counts NOT occurrences and wraps with `eql.ast.Not` ## Test Results All 63 kuery tests pass. Verified wildcard cases: | Input | Result | |-------|--------| | `field: *S3 Browser*` | `field:*S3\ Browser*` | | `field: S3 Browser*` | `field:S3\ Browser*` | | `field: *S3 Browser` | `field:*S3\ Browser` | | `field: S3 B*owser` | `field:S3\ B*owser` | | `field: S3* Browser` | `field:S3*\ Browser` | | `field: foo* bar* baz` | `field:foo*\ bar*\ baz` | | `process.executable: not /tmp/go-build*` | `not process.executable:/tmp/go-build*` | | `field < value` | `field < value` (range expression, not wildcard) | ## Technical Notes ### Pattern 3a Fix Pattern 3a requires at least one character AFTER the `*` (uses `[...]+` instead of `[...]*`). This prevents Pattern 2 from incorrectly matching shorter strings like `S3 B*` when the full value is `S3 B*owser`. ### NOT Keyword Handling The `optional_not` grammar approach explicitly parses `NOT` as a keyword before the value, preventing it from being consumed as part of a wildcard literal. This is safer than regex-only approaches because: - `NOT` token only matches the exact word "not" (case-insensitive) - Values like `notafile*` are still parsed as `UNQUOTED_LITERAL` - Edge case: literal value "not" must be quoted: `field: "not"` * Changes to Addresses Review Comments ### Changes to Addresses Review Comments @Mikaayenson 1. **Fixed regex patterns to prevent trailing whitespace capture** (`kql.g`) - Added `(?=\s|$|[()":{}])` lookahead to all WILDCARD_LITERAL patterns - This ensures patterns stop at boundaries without capturing trailing whitespace 2. **Removed `.rstrip()` workaround** (`parser.py`) - No longer needed since regex now handles boundaries correctly 3. **Added explicit WILDCARD_LITERAL handling** (`kql2eql.py`) - Now checks `token.type == "WILDCARD_LITERAL"` explicitly - Mirrors the approach used in `parser.py` 4. **Added unit tests** (`tests/kuery/test_parser.py`) - `test_wildcard_with_spaces` - all 4 WILDCARD_LITERAL patterns - `test_wildcard_with_spaces_and_keywords` - wildcards with `and`/`or` boundaries - `test_not_prefix_with_wildcard` - NOT keyword not consumed as wildcard - `test_quoted_wildcard_as_literal` - quoted wildcards are literal strings - `test_triple_not_optimization` - `not not not foo` → `not foo` * changed test directory from tmp * changed format of new tests * Update pyproject.toml Update pyproject.toml --------- Co-authored-by: Eric Forte <119343520+eric-forte-elastic@users.noreply.github.com>
2026-03-18 17:38:24 -04:00
parent f84617ba8e
commit 7ae298005d
7 changed files with 147 additions and 9 deletions
@@ -1,6 +1,6 @@
 [project]
 name = "detection-rules-kibana"
-version = "0.4.4"
+version = "0.4.5"
 description = "Kibana API utilities for Elastic Detection Rules"
 license = {text = "Elastic License v2"}
 keywords = ["Elastic", "Kibana", "Detection Rules", "Security", "Elasticsearch"]
@@ -15,7 +15,9 @@ field_value_expression: field ":" list_of_values
 ?value_expression: value

 ?list_of_values: "(" or_list_of_values ")"
-               | value
+               | optional_not value
+?optional_not: NOT optional_not
+              |
 ?or_list_of_values:  and_list_of_values (OR and_list_of_values)*
 ?and_list_of_values: not_list_of_values (AND not_list_of_values)*
 ?not_list_of_values: NOT? list_of_values
@@ -23,6 +25,7 @@ field_value_expression: field ":" list_of_values
 field: literal

 value: QUOTED_STRING
+      | WILDCARD_LITERAL
      | UNQUOTED_LITERAL


@@ -34,6 +37,21 @@ RANGE_OPERATOR: "<="
              | "<"
              | ">"

+// Wildcard literal - for wildcard values containing spaces
+// Priority 3 ensures it matches before keywords (priority 2) and unquoted literals
+// Uses word boundary \b to stop before 'or', 'and', 'not' keywords
+// MUST contain at least one space to differentiate from field names like common.*
+// Must not start with not/or/and so "not /tmp/go-build*" is not matched (value is UNQUOTED_LITERAL)
+// All patterns end with (?=\s|$|[()":{}]) to prevent capturing trailing whitespace
+// Pattern 1: Starts with * (e.g., *S3 Browser, *S3 Browser*)
+// Pattern 2: Ends with * but doesn't start with * (e.g., S3 Browser*)
+// Pattern 3a: Middle * - star appears AFTER a space (e.g., S3 B*owser)
+// Pattern 3b: Middle * - star appears BEFORE a space (e.g., S3* Browser)
+WILDCARD_LITERAL.3: /\*[^\s\r\n()"':{}]*(?:\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+)+\*?(?=\s|$|[()":{}])/i
+                  | /(?!(?:not|or|and)\b)[^\s\r\n()"':{}][^\s\r\n()"':{}]*(?:\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+)+\*(?=\s|$|[()":{}])/i
+                  | /(?!(?:not|or|and)\b)[^\s\r\n()"'*:{}][^\s\r\n()"':{}]*\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]*\*[^\s\r\n()"':{}]+(?:\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+)*(?<!\*)(?=\s|$|[()":{}])/i
+                  | /(?!(?:not|or|and)\b)[^\s\r\n()"'*:{}][^\s\r\n()"':{}]*\*[^\s\r\n()"':{}]*\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+(?:\s+(?!(?:or|and|not)\b)[^\s\r\n()"':{}]+)*(?<!\*)(?=\s|$|[()":{}])/i
+
 UNQUOTED_LITERAL: UNQUOTED_CHAR+
 UNQUOTED_CHAR: "\\" /[trn]/              // escaped whitespace
             | "\\" /[\\():<>"*{}]/      // escaped specials
@@ -72,19 +72,49 @@ class KqlToEQL(BaseKqlParser):
    def not_list_of_values(self, tree):
        return eql.ast.Not(self.visit(tree.children[-1]))

+    def list_of_values(self, tree):
+        if len(tree.children) == 2:
+            # optional_not value
+            opt_not_tree, value_tree = tree.children
+            not_count = self.visit(opt_not_tree)
+            value = self.visit(value_tree)
+            for _ in range(not_count):
+                value = eql.ast.Not(value)
+            return value
+        else:
+            # "(" or_list_of_values ")"
+            return self.visit(tree.children[1])
+
+    def optional_not(self, tree):
+        count = 0
+        for child in tree.children:
+            if hasattr(child, "type") and child.type == "NOT":
+                count += 1
+            else:
+                count += self.visit(child)
+        return count
+
    def field(self, tree):
        literal = self.visit(tree.children[0])
        return eql.utils.to_unicode(literal)

    def value(self, tree):
-        # TODO: check the logic for kuery.peg
-        value = self.unescape_literal(tree.children[0])
-
        if self.scoped_field is None:
            raise self.error(tree, "Value not tied to field")

+        token = tree.children[0]
+        value = self.unescape_literal(token)
+
        field_name = self.scoped_field
        field = self.to_eql_field(field_name)
+
+        # Handle wildcard literals (may contain spaces) and unquoted literals with wildcards
+        if token.type == "WILDCARD_LITERAL" or (token.type == "UNQUOTED_LITERAL" and "*" in str(token.value)):
+            if eql.utils.is_string(value) and value.replace("*", "").strip() == "":
+                return eql.ast.IsNotNull(field)
+            value_ast = eql.ast.Literal.from_python(value)
+            return eql.ast.FunctionCall("wildcard", [field, value_ast])
+
        value = self.convert_value(field_name, value, tree)
        value_ast = eql.ast.Literal.from_python(value)

@@ -338,6 +338,28 @@ class KqlParser(BaseKqlParser):
        self.assert_lower_token(*tree.child_tokens)
        return NotValue(self.visit(tree.children[-1]))

+    def list_of_values(self, tree):
+        if len(tree.children) == 2:
+            # optional_not value
+            opt_not_tree, value_tree = tree.children
+            not_count = self.visit(opt_not_tree)
+            value = self.visit(value_tree)
+            for _ in range(not_count):
+                value = NotValue(value)
+            return value
+        else:
+            # "(" or_list_of_values ")"
+            return self.visit(tree.children[1])
+
+    def optional_not(self, tree):
+        count = 0
+        for child in tree.children:
+            if hasattr(child, "type") and child.type == "NOT":
+                count += 1
+            else:
+                count += self.visit(child)
+        return count
+
    def literal(self, tree):
        return self.unescape_literal(tree.children[0])

@@ -353,9 +375,11 @@ class KqlParser(BaseKqlParser):
        token = tree.children[0]
        value = self.unescape_literal(token)

-        if token.type == "UNQUOTED_LITERAL" and "*" in token.value:
+        # Handle wildcard literals (may contain spaces) and unquoted literals with wildcards
+        if token.type == "WILDCARD_LITERAL" or (token.type == "UNQUOTED_LITERAL" and "*" in token.value):
            field_type = self.get_field_type(field_name)
-            if len(value.replace("*", "")) == 0:
+
+            if len(token.value.replace("*", "").strip()) == 0:
                return Exists()

            if field_type is not None and field_type not in ("keyword", "wildcard"):
@@ -364,6 +388,12 @@ class KqlParser(BaseKqlParser):

            return Wildcard(token.value)

+        # For quoted strings, treat as literal values (wildcards in quotes are literal in Kibana)
+        # This bypasses Value.from_python's wildcard conversion for quoted strings
+        if token.type == "QUOTED_STRING":
+            value = self.convert_value(field_name, value, tree)
+            return String(eql.utils.to_unicode(value)) if eql.utils.is_string(value) else Value.from_python(value)
+
        # try to convert the value to the appropriate type
        # example: 1 -> "1" if the field is actually keyword
        value = self.convert_value(field_name, value, tree)
@@ -1,6 +1,6 @@
 [project]
 name = "detection-rules-kql"
-version = "0.1.9"
+version = "0.1.10"
 description = "Kibana Query Language parser for Elastic Detection Rules"
 license = {text = "Elastic License v2"}
 keywords = ["Elastic", "sour", "Detection Rules", "Security", "Elasticsearch", "kql"]
@@ -1,6 +1,6 @@
 [project]
 name = "detection_rules"
-version = "1.6.4"
+version = "1.6.5"
 description = "Detection Rules is the home for rules used by Elastic Security. This repository is used for the development, maintenance, testing, validation, and release of rules for Elastic Security’s Detection Engine."
 readme = "README.md"
 requires-python = ">=3.12"
@@ -7,12 +7,16 @@ import unittest

 import kql
 from kql.ast import (
+    AndExpr,
    Exists,
    Field,
    FieldComparison,
    FieldRange,
+    NotExpr,
+    NotValue,
    Number,
    String,
+    Wildcard,
 )


@@ -112,3 +116,59 @@ class ParserTests(unittest.TestCase):
        kql.lark_parse('"Test-ServiceDaclPermission" \nor "Update-ExeFunctions"')
        kql.lark_parse('"Test-ServiceDaclPermission" or\n "Update-ExeFunctions"')
        kql.lark_parse('"Test-ServiceDaclPermissionOr" or\n "Update-ExeAndFunctions"')
+
+    def test_wildcard_with_spaces(self):
+        """Test wildcard values containing spaces (WILDCARD_LITERAL patterns)."""
+        # Pattern 1: Starts with * (e.g., *S3 Browser, *S3 Browser*)
+        self.validate("field: *S3 Browser*", FieldComparison(Field("field"), Wildcard("*S3 Browser*")))
+        self.validate("field: *S3 Browser", FieldComparison(Field("field"), Wildcard("*S3 Browser")))
+
+        # Pattern 2: Ends with * but doesn't start with * (e.g., S3 Browser*)
+        self.validate("field: S3 Browser*", FieldComparison(Field("field"), Wildcard("S3 Browser*")))
+
+        # Pattern 3a: Middle * - star appears AFTER a space (e.g., S3 B*owser)
+        self.validate("field: S3 B*owser", FieldComparison(Field("field"), Wildcard("S3 B*owser")))
+
+        # Pattern 3b: Middle * - star appears BEFORE a space (e.g., S3* Browser)
+        self.validate("field: S3* Browser", FieldComparison(Field("field"), Wildcard("S3* Browser")))
+
+        # Multiple wildcards with spaces
+        self.validate("field: foo* bar* baz", FieldComparison(Field("field"), Wildcard("foo* bar* baz")))
+
+    def test_wildcard_with_spaces_and_keywords(self):
+        """Test wildcard values containing spaces followed by keywords."""
+        # Wildcard followed by 'and' keyword
+        result = kql.parse("field: *S3 Browser* and other: value", optimize=False)
+        self.assertIsInstance(result, AndExpr)
+
+        # Wildcard followed by 'or' keyword
+        result = kql.parse("field: S3 Browser* or other: value", optimize=False)
+        self.assertIsNotNone(result)
+
+        # Keywords inside wildcard values (should be part of the wildcard)
+        self.validate("field: *or something*", FieldComparison(Field("field"), Wildcard("*or something*")))
+        self.validate("field: *not this*", FieldComparison(Field("field"), Wildcard("*not this*")))
+
+    def test_not_prefix_with_wildcard(self):
+        """Test NOT keyword is not consumed as part of wildcard literal."""
+        # NOT prefix should create NotExpr, not be part of the wildcard
+        result = kql.parse("process.executable: not /test/go-build*", optimize=False)
+        self.assertIsInstance(result, FieldComparison)
+        self.assertIsInstance(result.value, NotValue)
+        self.assertIsInstance(result.value.value, Wildcard)
+        self.assertEqual(result.value.value.value, "/test/go-build*")
+
+    def test_quoted_wildcard_as_literal(self):
+        """Test that quoted wildcards are treated as literal strings, not wildcards."""
+        # Quoted wildcard should be a String, not a Wildcard
+        self.validate('field: "*text*"', FieldComparison(Field("field"), String("*text*")))
+
+    def test_triple_not_optimization(self):
+        """Test that triple NOT optimizes correctly: not(not(not(x))) = not(x)."""
+        # Triple NOT should optimize to single NOT
+        result = kql.parse("process.name: not not not foo", optimize=True)
+        # After optimization, not(not(not(foo))) should become not(foo)
+        # The structure is NotExpr(FieldComparison(..., String(...)))
+        self.assertIsInstance(result, NotExpr)
+        self.assertIsInstance(result.expr, FieldComparison)
+        self.assertEqual(result.expr.value.value, "foo")