[FR] Add copilot instructions to catch the gotchas (#5733)
Co-authored-by: Eric Forte <119343520+eric-forte-elastic@users.noreply.github.com> Co-authored-by: Isai <59296946+imays11@users.noreply.github.com> Co-authored-by: Ruben Groenewoud <78494512+Aegrah@users.noreply.github.com> Co-authored-by: Samirbous <64742097+Samirbous@users.noreply.github.com> Co-authored-by: Jonhnathan <26856693+w0rk3r@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
2605d38018
commit
25f3d6a879
@@ -0,0 +1,190 @@
|
||||
---
|
||||
applyTo:
|
||||
- rules/**/*.toml
|
||||
- rules_building_block/**/*.toml
|
||||
---
|
||||
|
||||
|
||||
You are a **detection engineer**. Your task is to **review pull requests** (PRs) to improve the quality of detection rules in the `detection-rules` repository. Take your time, think carefully, and perform a thorough review before writing any suggestions.
|
||||
|
||||
This repository supports multiple query languages (**KQL**, **EQL**, **ES|QL**) and multiple rule types (**query**, **eql**, **esql**, **threshold**, **new_terms**, **threat_match**, **machine_learning**). Each has distinct fields, constraints, and performance implications. Review accordingly.
|
||||
|
||||
There are different types of pull requests:
|
||||
|
||||
- **New rule:** The PR name often starts with `[New Rule]`
|
||||
- **Tuning an existing rule:** The PR name often starts with `[Tuning]`
|
||||
- **Deprecating a rule:** The PR name often starts with `[Deprecation]`
|
||||
- **New/tuning hunt:** The PR name often starts with `[Hunt]`
|
||||
- **Mixed PRs:** May combine tuning, new rules, or deprecations
|
||||
|
||||
For each rule `.toml` file in the PR, review the **metadata**, **rule fields**, and **query** sections and their respective "metadata" and "rule" TOML tables and "query" key using the following guidance:
|
||||
|
||||
---
|
||||
|
||||
### <Metadata>
|
||||
|
||||
- Report typos in `rule.name` and `rule.description`.
|
||||
- For new rules, `creation_date` should match the date the PR was first opened.
|
||||
- `updated_date` should match `creation_date` for new rules, and the date the PR was opened for tunings.
|
||||
- Deprecations happen across **two separate PRs in two releases**:
|
||||
- **First PR (release N):** Prepend `"Deprecated - "` to `rule.name` and references to the rule name, such as `### Investigating <rule.name>`. The rule stays at its current maturity. No other field changes besides `updated_date`.
|
||||
- **Second PR (release N+1):** Set `maturity = "deprecated"`, add `deprecation_date`, and move the rule to `rules/_deprecated/`.
|
||||
- `updated_date` and `deprecation_date` should match the date of the current PR merge.
|
||||
- Assess and suggest a better `rule.name` if the existing name does not accurately reflect the rule's query logic and detection scope.
|
||||
- Assess and suggest improvements to `rule.description` — limit to two or three sentences that clearly summarize the query logic and detection scope.
|
||||
- `min_stack_version` should support the widest stack versions unless the rule uses features exclusive to a newer stack version.
|
||||
- If `min_stack_comments` is set, verify it explains why `min_stack_version` is restricted.
|
||||
- Report any inconsistencies in the **MITRE ATT&CK** mappings: missing relevant techniques, unrelated techniques, or missing subtechniques where applicable. When possible, suggest the most accurate and up-to-date technique/subtechnique mappings based on the query logic.
|
||||
- Review the **references** section and report inconsistencies between referenced content and the rule's description or logic.
|
||||
|
||||
</Metadata>
|
||||
|
||||
---
|
||||
|
||||
### <Rule Fields>
|
||||
|
||||
#### Common Fields (All Rule Types)
|
||||
|
||||
- `risk_score` should align with `severity`:
|
||||
- `low` → 21
|
||||
- `medium` → 47
|
||||
- `high` → 73
|
||||
- `critical` → 99
|
||||
- `tags` should be relevant:
|
||||
- Include `Domain:` tags (Endpoint, Cloud, Network, Container).
|
||||
- Include `OS:` tags (Windows, Linux, macOS) where applicable.
|
||||
- Include `Data Source:` tags matching the used integrations.
|
||||
- Include `Resources: Investigation Guide` if the `note` field is present.
|
||||
- Include `Rule Type: BBR` for building block rules.
|
||||
- Include `Rule Type: ML` or `Rule Type: Machine Learning` for ML rules.
|
||||
- Include `Tactic:` tags matching each MITRE tactic in the threat mapping.
|
||||
- `index` patterns should be neither too specific nor too vague — they must accurately match the relevant data stream (e.g., `logs-endpoint.events.process-*` for process events, not `logs-endpoint.events.*` unless multiple event types are needed).
|
||||
- `from` and `interval` should not create gaps. The lookback window (`from`) must cover at least the `interval` period. The default `interval` period, if not explicitly changed, is 5 minutes.
|
||||
- `timestamp_override` should be set to `"event.ingested"` for most rules to avoid ingestion delay issues.
|
||||
- `max_signals` defaults to 100. Only override with justification.
|
||||
|
||||
#### Investigation Guide (`note` field)
|
||||
|
||||
- If present, the note should include actionable triage steps, false positive guidance, and response/remediation steps.
|
||||
- When OSQuery queries are included in investigation guides, validate their syntax and ensure the referenced tables and columns are correct.
|
||||
- The `setup` field should include necessary steps to configure the integration or data source.
|
||||
- Check for typos in the investigation guide content.
|
||||
|
||||
#### Building Block Rules
|
||||
|
||||
- Must be placed in the `rules_building_block/` directory.
|
||||
- Must have a `risk_score` of `21` and `severity` of `"low"`.
|
||||
|
||||
#### Alert Suppression
|
||||
|
||||
- Verify the suppression `group_by` fields are meaningful for deduplication.
|
||||
|
||||
#### Threshold Rules
|
||||
|
||||
- Verify the `threshold.value` is meaningful and not too low (noisy) or too high (misses detections).
|
||||
- When `threshold.cardinality` is used, verify the cardinality field and value make sense for the detection logic.
|
||||
|
||||
#### New Terms Rules
|
||||
|
||||
- `history_window_start` should be appropriate for the detection context (typically 5–10 days) longer values may impact performance.
|
||||
- Verify the `new_terms_fields` combination makes semantic sense — detecting "first seen" on arbitrary field combinations can produce excessive noise.
|
||||
- Assess whether it is truly necessary to leverage multiple `new_terms_fields` keys, as each newly added key negatively impacts performance.
|
||||
|
||||
#### Threat Match Rules
|
||||
|
||||
- `threat_indicator_path` should be set (default: `threat.indicator`).
|
||||
- `from` and `interval` for threat match rules are typically wider (e.g., `from: "now-65m"`, `interval: "1h"`) due to indicator matching overhead.
|
||||
|
||||
#### Machine Learning Rules
|
||||
|
||||
- `setup` should document the required ML job installation steps.
|
||||
|
||||
</Rule Fields>
|
||||
|
||||
---
|
||||
|
||||
### <Query — All Languages>
|
||||
|
||||
- Review for typos in known system file names (e.g., `WmiPrvS.exe` instead of `WmiPrvSe.exe`).
|
||||
- Ensure the **query logic aligns with the rule description** (e.g., the description says "Detect Certutil abuse," but the query looks for `svchost.exe`).
|
||||
- Verify there are no duplicate entries in the query (e.g., same exclusion listed twice).
|
||||
- Flag risky false-positive exclusions (e.g., `not file.path : "C:\\Users\\*"` — paths under `Users` are world-writable and attacker-controlled).
|
||||
- Check exclusions where the drive letter is hardcoded (e.g., `"C:\\Program Files\\*"` should use `"?:\\Program Files\\*"` to cover all drive letters). Applies to **Windows rules only**.
|
||||
- Flag unnecessary or overly broad wildcard usage when more specific patterns would work.
|
||||
|
||||
</Query — All Languages>
|
||||
|
||||
---
|
||||
|
||||
### <Query — EQL Specific>
|
||||
|
||||
- The **`:` operator** is **case-insensitive** and supports wildcards but can be expensive. Use it only where necessary (e.g., in file paths that could be controlled by an attacker). Use `==` for exact matches.
|
||||
- String comparison operators with `~` suffix are **case-insensitive** (e.g., `like~`, `==~`). Without `~`, they are case-sensitive.
|
||||
- In EQL, `?` matches any single character and `*` matches zero or more characters in wildcard contexts.
|
||||
- Verify all paths include proper escape characters (`\\` for backslashes in Windows paths).
|
||||
- Validate regular expressions (prefixed by `regex` or `regex~`).
|
||||
- On Linux/macOS, file paths are typically case-sensitive — prefer `==` or `like` over `:` for file path comparisons to avoid the overhead of case-insensitive matching.
|
||||
- Sequences with `maxspan > 5m` are generally inefficient unless justified for evasion prevention.
|
||||
- For sequences, verify the join keys (`by` clause) are appropriate and indexed fields.
|
||||
- Simplify overly complex logic (e.g., a sequence detecting `cmd.exe` spawning `svchost.exe` followed by a network event — the first condition is already sufficiently suspicious).
|
||||
- For **LOLBIN detection on Windows**, always include the original file name for resilience: `(process.name : "curl.exe" or process.pe.original_file_name == "curl.exe")` instead of just `process.name : "curl.exe"`.
|
||||
- For LOLBIN detection covering multiple related binaries, suggest additions if critical ones are missing (e.g., for `process.name : ("osascript", "python", "perl")`, suggest adding `ruby` and `node`).
|
||||
- For network and C2 rules where the scenario does not expect connections to loopback or private IPs, suggest excluding them with `not cidrmatch(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16")`.
|
||||
- `event.type`, `event.action`, and `event.category` should be correct for the event being detected. Commonly:
|
||||
- Process: `event.type == "start"` with `event.action in ("exec", "exec_event", "start")`
|
||||
- File: `event.type in ("creation", "change")` or similar
|
||||
- Network: `event.type == "start"` with `event.category == "network"`
|
||||
|
||||
</Query — EQL Specific>
|
||||
|
||||
---
|
||||
|
||||
### <Query — KQL Specific>
|
||||
|
||||
- KQL uses `:` for field matching with wildcard support. Case sensitivity depends on the field mapping — `keyword` fields are case-sensitive, `text` fields with standard analyzers are case-insensitive.
|
||||
- Boolean logic uses `and`, `or`, `not` — parentheses must be correct for operator precedence.
|
||||
- KQL does not support sequences or joins — if temporal correlation is needed, suggest EQL instead.
|
||||
- For complex exclusion logic, consider whether `[[rule.filters]]` would be cleaner than inline `not` clauses.
|
||||
|
||||
</Query — KQL Specific>
|
||||
|
||||
---
|
||||
|
||||
### <Query — ES|QL Specific>
|
||||
|
||||
- Validate `EVAL` expressions for correct syntax and type handling.
|
||||
- `DISSECT` and `GROK` patterns should be validated for correctness.
|
||||
- For aggregate queries using `| stats ... by`, verify the aggregation makes sense for the detection logic and the `by` fields provide meaningful grouping.
|
||||
- ES|QL does not support sequences — if temporal correlation is needed, suggest EQL instead.
|
||||
- `LIKE` and `RLIKE` are case-sensitive in ES|QL. Use `TO_LOWER()` if case-insensitive matching is needed.
|
||||
- `IN` operator is case-sensitive in ES|QL. For case-insensitive list matching, use `TO_LOWER(field) IN ("value1", "value2")`.
|
||||
- `MV_*` (multi-value) functions require proper null handling — always check for `IS NOT NULL` before using.
|
||||
- Prefer `| keep` with explicit field lists over `| keep *` for clarity and to control which fields appear in alerts.
|
||||
- Verify that `FROM` source indices are correct and not overly broad.
|
||||
|
||||
</Query — ES|QL Specific>
|
||||
|
||||
---
|
||||
|
||||
### <Performance>
|
||||
|
||||
- Avoid expensive regex on high-volume fields (e.g., `process.command_line regex ".*"` patterns).
|
||||
- EQL `:` operator is case-insensitive with wildcard support — it is more expensive than `==` or `like`. Use it judiciously.
|
||||
- Lookback windows (`from`) should be as narrow as practical. Wider windows scan more data.
|
||||
- Threshold rules with very low `threshold.value` (e.g., 1) are effectively standard query rules with extra overhead — verify this is intentional.
|
||||
- New terms rules with very long `history_window_start` values (e.g., 30+ days) increase resource consumption.
|
||||
- Threat match rules are inherently expensive — verify `from` and `interval` are appropriate and not running too frequently.
|
||||
- For EQL sequences, large `maxspan` values increase memory and compute usage.
|
||||
- Aggregate ES|QL queries scanning large time windows should have appropriate filters to reduce the dataset early.
|
||||
- Avoid leading wildcards in field comparisons where possible (e.g., `process.name : "*script.exe"` is expensive).
|
||||
|
||||
</Performance>
|
||||
|
||||
---
|
||||
|
||||
### <Suggestions>
|
||||
|
||||
Keep suggestions **short and focused** — no need to be verbose.
|
||||
*(Maximum 1–2 sentences per suggestion.)*
|
||||
|
||||
</Suggestions>
|
||||
+1
-1
@@ -1,6 +1,6 @@
|
||||
[project]
|
||||
name = "detection_rules"
|
||||
version = "1.5.44"
|
||||
version = "1.5.45"
|
||||
description = "Detection Rules is the home for rules used by Elastic Security. This repository is used for the development, maintenance, testing, validation, and release of rules for Elastic Security’s Detection Engine."
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.12"
|
||||
|
||||
Reference in New Issue
Block a user