Files
sigma-rules/pyproject.toml
T
Isai 7ae298005d [Bug] KQL Validation Add Wildcard w/ Space token value (#5753)
* [Bug] KQL Validation Add Wildcard w/ Space token value

## Summary
Fixes KQL parser to support wildcard values containing spaces (e.g., `*S3 Browser*`), which work in Kibana but were rejected by our unit tests.

**Issue:** #5750

## Changes

### Grammar (`lib/kql/kql/kql.g`)
- Added `WILDCARD_LITERAL` token with priority 3 to match wildcard patterns containing spaces
- Uses negative lookahead to stop before `or`/`and`/`not` keywords
- Added to `value` rule (not `literal`) so field names remain unaffected

### Parser (`lib/kql/kql/parser.py`)
- Handle new `WILDCARD_LITERAL` token type as wildcards
- Quoted strings (`"*text*"`) now treated as literals, matching Kibana behavior

## Behavior

| Query | Before | After |
|-------|--------|-------|
| `field: *S3 Browser*` |  Parse error |  Wildcard |
| `field: *test*` |  Wildcard |  Wildcard |
| `common.*: value` |  Works |  Works |
| `field: "*text*"` | Wildcard |  Literal (matches Kibana) |

## Test plan
- [x] All 63 existing KQL unit tests pass
- [x] New wildcard-with-spaces patterns parse correctly
- [x] Wildcard field names (`common.*`) still work
- [x] Keywords (`or`, `and`, `not`) correctly recognized as separators
- [x] Tested against rule file from PR #5694

* update pyproject version

* update kibana and kql pyproject.toml versions

update kibana and kql pyproject.toml versions

* update wildcard_literal pattern to account for false matches with leading keywords

Add Negative lookahead at start of Pattern 2 - uses (?!(?:or|and|not)\b) at the start to prevent matching values that begin with keywords like 'not /path*'

* adding NOT keyword token and support for wildcard in the middle of spaced phrase

# KQL Parser Changes - Wildcard Spaces and NOT Prefix Fix

## Overview

This update fixes two issues in the KQL parser:
1. **Wildcard values with spaces** - Values like `*S3 Browser*` now parse correctly
2. **NOT prefix false match** - Values like `not /tmp/go-build*` are no longer incorrectly consumed as a single wildcard literal

## Files Modified

### `lib/kql/kql/kql.g` (Grammar)

**Added `optional_not` rule** to handle `NOT` as an explicit grammar element:
```
?list_of_values: "(" or_list_of_values ")"
| optional_not value
?optional_not: NOT optional_not
|
```

**Expanded `WILDCARD_LITERAL`** with 4 patterns to support all wildcard-with-space cases:

| Pattern | Description | Example |
|---------|-------------|---------|
| 1 | Starts with `*` | `*S3 Browser`, `*S3 Browser*` |
| 2 | Ends with `*` (doesn't start with `*`) | `S3 Browser*` |
| 3a | `*` appears after a space | `S3 B*owser` |
| 3b | `*` appears before a space | `S3* Browser` |

### `lib/kql/kql/parser.py`

Added methods to handle the new grammar rules:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps values with `NotValue`

### `lib/kql/kql/kql2eql.py`

Added corresponding methods for EQL conversion:
- `list_of_values()` - handles `optional_not value` structure
- `optional_not()` - counts NOT occurrences and wraps with `eql.ast.Not`

## Test Results

All 63 kuery tests pass. Verified wildcard cases:

| Input | Result |
|-------|--------|
| `field: *S3 Browser*` | `field:*S3\ Browser*` |
| `field: S3 Browser*` | `field:S3\ Browser*` |
| `field: *S3 Browser` | `field:*S3\ Browser` |
| `field: S3 B*owser` | `field:S3\ B*owser` |
| `field: S3* Browser` | `field:S3*\ Browser` |
| `field: foo* bar* baz` | `field:foo*\ bar*\ baz` |
| `process.executable: not /tmp/go-build*` | `not process.executable:/tmp/go-build*` |
| `field < value` | `field < value` (range expression, not wildcard) |

## Technical Notes

### Pattern 3a Fix
Pattern 3a requires at least one character AFTER the `*` (uses `[...]+` instead of `[...]*`). This prevents Pattern 2 from incorrectly matching shorter strings like `S3 B*` when the full value is `S3 B*owser`.

### NOT Keyword Handling
The `optional_not` grammar approach explicitly parses `NOT` as a keyword before the value, preventing it from being consumed as part of a wildcard literal. This is safer than regex-only approaches because:
- `NOT` token only matches the exact word "not" (case-insensitive)
- Values like `notafile*` are still parsed as `UNQUOTED_LITERAL`
- Edge case: literal value "not" must be quoted: `field: "not"`

* Changes to Addresses Review Comments

### Changes to Addresses Review Comments @Mikaayenson

1. **Fixed regex patterns to prevent trailing whitespace capture** (`kql.g`)
   - Added `(?=\s|$|[()":{}])` lookahead to all WILDCARD_LITERAL patterns
   - This ensures patterns stop at boundaries without capturing trailing whitespace

2. **Removed `.rstrip()` workaround** (`parser.py`)
   - No longer needed since regex now handles boundaries correctly

3. **Added explicit WILDCARD_LITERAL handling** (`kql2eql.py`)
   - Now checks `token.type == "WILDCARD_LITERAL"` explicitly
   - Mirrors the approach used in `parser.py`

4. **Added unit tests** (`tests/kuery/test_parser.py`)
   - `test_wildcard_with_spaces` - all 4 WILDCARD_LITERAL patterns
   - `test_wildcard_with_spaces_and_keywords` - wildcards with `and`/`or` boundaries
   - `test_not_prefix_with_wildcard` - NOT keyword not consumed as wildcard
   - `test_quoted_wildcard_as_literal` - quoted wildcards are literal strings
   - `test_triple_not_optimization` - `not not not foo` → `not foo`

* changed test directory from tmp

* changed format of new tests

* Update pyproject.toml

Update pyproject.toml

---------

Co-authored-by: Eric Forte <119343520+eric-forte-elastic@users.noreply.github.com>
2026-03-18 17:38:24 -04:00

197 lines
5.6 KiB
TOML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
[project]
name = "detection_rules"
version = "1.6.5"
description = "Detection Rules is the home for rules used by Elastic Security. This repository is used for the development, maintenance, testing, validation, and release of rules for Elastic Securitys Detection Engine."
readme = "README.md"
requires-python = ">=3.12"
license = {file = "LICENSE.txt"}
keywords = ["Detection Rules", "Continuous Monitoring", "Data Protection", "Elastic", "Elastic Endgame", "Endpoint Security"]
classifiers = [
"Topic :: Software Development :: Build Tools",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.12",
"Topic :: Security",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Software Development :: Libraries",
"Topic :: Software Development :: Testing",
"Topic :: Software Development",
"Topic :: Utilities"
]
dependencies = [
"Click~=8.3.0",
"elasticsearch~=8.12.1",
"eql==0.9.19",
"jsl==0.2.4",
"jsonschema>=4.21.1",
"marko==2.2.1",
"marshmallow-dataclass==8.7.1",
"marshmallow-jsonschema~=0.13.0",
"marshmallow-union~=0.1.15",
"marshmallow~=3.26.1",
"pywin32 ; platform_system=='Windows'",
# FIXME: pytoml is outdated and should not be used
"pytoml==0.1.21",
"PyYAML~=6.0.1",
"requests~=2.31.0",
"toml==0.10.2",
"typing-inspect==0.9.0",
"typing-extensions>=4.12",
"XlsxWriter~=3.2.0",
"semver==3.0.4",
"PyGithub==2.8.1",
"detection-rules-kql @ git+https://github.com/elastic/detection-rules.git#subdirectory=lib/kql",
"detection-rules-kibana @ git+https://github.com/elastic/detection-rules.git#subdirectory=lib/kibana",
"setuptools==78.1.1"
]
[project.optional-dependencies]
dev = [
"pep8-naming==0.15.1",
"flake8==7.3.0",
"pyflakes==3.4.0",
"pytest>=8.1.1",
"nodeenv==1.9.1",
"pre-commit==3.8.0",
"ruff>=0.11",
"pyright>=1.1",
]
hunting = ["tabulate==0.9.0"]
[project.urls]
"Homepage" = "https://github.com/elastic/detection-rules"
"Bug Reports" = "https://github.com/elastic/detection-rules/issues"
"Research" = "https://www.elastic.co/security-labs"
"Elastic" = "https://www.elastic.co"
[build-system]
requires = ["setuptools", "wheel", "setuptools_scm"]
build-backend = "setuptools.build_meta"
[tool.setuptools]
package-data = {"kql" = ["*.g"]}
packages = ["detection_rules", "hunting"]
[tool.pytest.ini_options]
filterwarnings = [
"ignore::DeprecationWarning"
]
[tool.ruff]
line-length = 120
indent-width = 4
include = [
"pyproject.toml",
"detection_rules/**/*.py",
"hunting/**/*.py",
"tests/**/*.py",
]
show-fixes = true
[tool.ruff.lint]
select = [
"E", # pycodestyle
"F", # Pyflakes
"UP", # pyupgrade
"B", # flake8-bugbear
"SIM", # flake8-simplify
"I", # isort
"N", # pep8-naming
"UP", # pyupgrade
"YTT", # flake8-2020
"ANN", # flake8-annotations
"ASYNC", # flake8-async
"S", # flake8-bandit
"BLE", # flake8-blind-except
"B", # flake8-bugbear
"A", # flake8-builtins
"COM", # flake8-commas
"C4", # flake8-comprehensions
"DTZ", # flake8-datetimez
"T10", # flake8-debugger
"DJ", # flake8-django
"EM", # flake8-errmsg
"EXE", # flake8-executable
"ISC", # flake8-implicit-str-concat
"ICN", # flake8-import-conventions
"G", # flake8-logging-format
"INP", # flake8-no-pep420
"PIE", # flake8-pie
"PYI", # flake8-pyi
"PT", # flake8-pytest-style
"Q", # flake8-quotes
"RSE", # flake8-raise
"RET", # flake8-return
"SLF", # flake8-self
"SLOT", # flake8-slots
"TID", # flake8-tidy-imports
"TCH", # flake8-type-checking
"INT", # flake8-gettext
"ARG", # flake8-unused-arguments
"PTH", # flake8-use-pathlib
"TD", # flake8-todos
"FIX", # flake8-fixme
"ERA", # eradicate
"PGH", # pygrep-hooks
"PL", # Pylint
"TRY", # tryceratops
"FLY", # flynt
"PERF", # Perflint
"RUF", # Ruff-specific rules
]
ignore = [
"ANN401", # any-type
"EM101", # raw-string-in-exception
"EM102", # f-string-in-exception
"PT009", # pytest-unittest-assertion
"TRY003", # raise-vanilla-args
"N815", # mixed-case-variable-in-class-scope
"PLC0415", # import-outside-top-level, erratic behavior
"S603", # subprocess-without-shell-equals-true, prone to false positives
"COM812", # missing-trailing-comma, might cause issues with ruff formatter
]
[tool.ruff.lint.per-file-ignores]
"tests/*" = [
"ANN001", # missing-type-function-argument
"ANN002", # missing-type-args
"ANN003", # missing-type-kwargs
"ANN101", # missing-type-self
"ANN102", # missing-type-cls
"ANN201", # missing-return-type-undocumented-public-function
"ANN202", # missing-return-type-private-function
"ANN205", # missing-return-type-static-method
"ARG001", # unused-function-argument
"ANN206", # missing-return-type-class-method
"PLR2004", # magic-value-comparison
"SIM300", # yoda-conditions
"S101", # assert
"PT009", # pytest-unittest-assertion
"PT012", # pytest-raises-with-multiple-statements
"PT027", # pytest-unittest-raises-assertion
"FIX001", # line-contains-fixme
"FIX002", # line-contains-todo
# FIXME: the long static strings should be moved to the resource files
"E501", # line-too-long
# FIXME: we should avoid TODOs in the code as much as possible
"TD002", # missing-todo-author
"TD003", # missing-todo-link
]
[tool.pyright]
include = [
"detection_rules/",
"hunting/",
]
exclude = [
"tests/",
]
reportMissingTypeStubs = true
reportUnusedCallResult = "error"
typeCheckingMode = "strict"