diff --git a/CHANGELOG.md b/CHANGELOG.md index 55dc5ea..c1305a9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -37,6 +37,16 @@ All notable changes to this project will be documented in this file. - File-info detection score no longer folds in EDR results (EDR is its own analysis type with its own page; it shouldn't bleed into the static + dynamic + PE score). - Dashboard `/api/edr/agents/status` is now backed by a 30s TTL cache pre-warmed by a background poller (`services.edr_health`); per-probe timeouts dropped from 4s/5s to 2s. Cold path under 2s, every subsequent dashboard load <5ms (warm cache hit), and the auto-refresh tick stays within cache TTL. - Whiskers `/api/info` now reports `telemetry_sources: ["fibratus"]` when Fibratus is installed at `C:\Program Files\Fibratus\Bin\fibratus.exe` so the orchestrator can preflight before dispatching to a Fibratus profile. +- Static analyzers (yara + checkplz + stringnalyzer) now run concurrently via a `ThreadPoolExecutor`. Wall time drops from `sum(per-tool)` to `max(per-tool)` — typically ~50% off static analysis (CheckPlz alone is multi-second; yara + stringnalyzer used to add several more after it). Dynamic stays parallel for yara/pe_sieve/moneta/patriot, with hsb running solo afterwards so its sleep-timing measurements aren't perturbed by concurrent process inspection. Per-tool start + finish + wall-time logged. +- `/files` dashboard backed by a per-sample `_summary_cache.json` (`app/services/summary_cache.py`). Each cached entry stamps the source-JSON mtimes; a read recomputes the source mtime set and compares — any drift forces a recompute, so no manual invalidation is needed at any save site. Cache hit short-circuits the 4-6 disk reads + risk recompute the dashboard previously did per-sample. Single-sample render goes from ~16ms cold to ~2ms warm; expected to scale to ~3s cold → ~50ms warm at 200 samples. +- `find_file_by_hash` (`app/utils/path_manager.py`) now keeps a per-folder hash→dirname index validated against the folder's mtime. The 15+ endpoints calling it 2-3× per page load (analysis dispatch, results pages, API readers) share the cache. Cold ≈ 470µs (one listdir), warm ≈ 50µs. +- BYOVD route reads `compile_time` from `file_info.json` instead of re-parsing the PE — saves a redundant `pefile.PE(...)` + `generate_checksum()` round trip on every BYOVD run (multi-second on signed/large drivers). +- Logging unified — single root-level handler with a compact formatter (`HH:MM:SS LEVEL module message`); 5-char fixed-width colored level, dim module name with `app.` / `services.` / `blueprints.` / `analyzers.` prefixes and `_analyzer` suffixes stripped. Werkzeug renamed to `http` and access lines reformatted from `127.0.0.1 - - [date] "GET /path HTTP/1.1" 200 -` to `GET /path → 200`. urllib3 / requests muted to WARNING. +- `_classify_kill` (both elastic + fibratus EDR analyzers) requires alert evidence for ALL payloads — non-zero exit alone is no longer sufficient (false positives on payloads that crash on their own; Fibratus is detect-only and can never legitimately trigger this). Frontend DETECTED badge gated on `isTerminal && totalAlerts > 0`; killed_by_edr / blocked_by_av / failure / polling states only influence the detail string, never the badge. +- AgentClient.get_execution_logs caps stdout/stderr at 256 KB. Prevents the saved-view template from inlining a 263 MB stdout (mimikatz spamming the prompt 18M times) and hanging the browser. Saved-view route also truncates defensively at load time so older saved findings render without a re-save. +- /analyze/edr no longer writes a JSON for pre-execution failures (agent_unreachable / busy / error). Pages for samples whose EDR dispatch failed at the transport layer no longer pretend to have results — file-info hero hides per-profile buttons unless the saved JSON actually exists. +- /analyze/all redesign: stat tiles (stages / alerts / elapsed), phase-banded rows, color-coded state pills (QUEUED / RUNNING / COMPLETED / FAILED / SKIPPED), agent-down preflight marks unreachable EDR profiles `SKIPPED` instead of burning the timeout, done banner only links to stages that actually produced data. +- File-info hero buttons fully data-driven — Static / Dynamic / HolyGrail / per-EDR-profile only render if the corresponding saved JSON exists for the sample. A freshly-uploaded sample with no analyses run shows only the Back button. - Backend split into Flask blueprints, services, and a `utils/` package; subprocess analyzers consolidated under `BaseSubprocessAnalyzer` - Frontend split into per-tool ES6 modules with shared utils; reusable Jinja macros for scanner tables - Full UI redesign on a terminal/IDE shell with new `.lb-*` design tokens and JetBrains Mono throughout diff --git a/app/blueprints/analysis.py b/app/blueprints/analysis.py index 69e50fe..f02e839 100644 --- a/app/blueprints/analysis.py +++ b/app/blueprints/analysis.py @@ -8,7 +8,7 @@ from flask import Blueprint, current_app, jsonify, render_template, request from ..analyzers.holygrail import HolyGrailAnalyzer from ..services.error_handling import error_handler from ..services.rendering import is_kernel_driver_file -from ..utils import file_io, path_manager, validators +from ..utils import file_io, json_helpers, path_manager, validators analysis_bp = Blueprint('analysis', __name__) @@ -377,13 +377,18 @@ def _run_byovd_analysis(target_hash): app.logger.debug(f"Analysis completed with status: {results.get('status')}") if results['status'] == 'completed': + # The PE was already fully parsed at upload time — compile_time + # lives in file_info.json under pe_info.compile_time. Reading + # the JSON saves a redundant pefile.PE() + generate_checksum() + # round trip that, on a multi-MB driver, was costing seconds + # purely to extract a single field. compile_time = None try: - pe = file_io.get_pe_info(file_path, app.config['utils']['malapi_path']) - pe_info = (pe or {}).get('pe_info') or {} - compile_time = pe_info.get('compile_time') + file_info_path = os.path.join(result_path, 'file_info.json') + file_info = json_helpers.load_json_file(file_info_path) or {} + compile_time = (file_info.get('pe_info') or {}).get('compile_time') except Exception as e: - app.logger.debug(f"Compile time extraction failed: {e}") + app.logger.debug(f"Compile time lookup failed: {e}") if compile_time: results['compile_time'] = compile_time diff --git a/app/services/summary.py b/app/services/summary.py index 974a9af..e0c9e73 100644 --- a/app/services/summary.py +++ b/app/services/summary.py @@ -2,11 +2,21 @@ """Aggregation helpers for the /files endpoint.""" import os +from . import summary_cache from ..utils import json_helpers, risk_analyzer def process_pid_summary(item, item_path, pid_based_summary, logger): pid = item.replace('dynamic_', '') + + # Cache hit short-circuits the multi-MB JSON parse + risk recompute. + # The cache validates against source mtimes on read, so a stale + # entry is impossible — no manual invalidation needed at save sites. + cached = summary_cache.get_cached(item_path) + if cached is not None: + pid_based_summary[pid] = cached + return + logger.debug(f"Processing dynamic analysis results for PID: {pid}") dynamic_results_path = os.path.join(item_path, 'dynamic_analysis_results.json') @@ -30,7 +40,7 @@ def process_pid_summary(item, item_path, pid_based_summary, logger): moneta_findings = dynamic_results.get('moneta', {}).get('findings', {}) hsb_detections = dynamic_results.get('hsb', {}).get('findings', {}).get('detections', []) - pid_based_summary[pid] = { + result = { 'pid': pid, 'process_name': process_info.get('name', 'unknown'), 'process_path': process_info.get('path', 'unknown'), @@ -67,6 +77,8 @@ def process_pid_summary(item, item_path, pid_based_summary, logger): }, }, } + pid_based_summary[pid] = result + summary_cache.store(item_path, result) logger.debug(f"Processed dynamic analysis for PID: {pid}") except Exception as e: logger.error(f"Error processing PID {pid}: {e}") @@ -78,6 +90,13 @@ def process_file_summary(item, item_path, file_based_summary, logger): logger.debug(f"No file_info.json found in {item_path}. Skipping.") return + # Cache hit short-circuits the per-sample 4-6 disk reads + risk + # recompute. Validated against source mtimes on read. + cached = summary_cache.get_cached(item_path) + if cached is not None: + file_based_summary[item] = cached + return + try: file_info = json_helpers.load_json_file(file_info_path) if not file_info: @@ -162,7 +181,7 @@ def process_file_summary(item, item_path, file_based_summary, logger): 'killed_by_edr': exec_block.get('killed_by_edr'), }) - file_based_summary[item] = { + result = { 'md5': file_info.get('md5', 'unknown'), 'sha256': file_info.get('sha256', 'unknown'), 'filename': filename, @@ -181,6 +200,10 @@ def process_file_summary(item, item_path, file_based_summary, logger): 'factors': risk_factors, }, } + file_based_summary[item] = result + # Persist for the next dashboard load — saves the 4-6 disk + # reads + risk recompute we just paid for. + summary_cache.store(item_path, result) logger.debug(f"Processed file-based analysis for item: {item}") except Exception as e: logger.error(f"Error processing file item {item}: {e}") diff --git a/app/services/summary_cache.py b/app/services/summary_cache.py new file mode 100644 index 0000000..b8b02db --- /dev/null +++ b/app/services/summary_cache.py @@ -0,0 +1,121 @@ +# app/services/summary_cache.py +"""On-disk cache for per-sample summary dicts. + +The /files dashboard calls `process_file_summary` for every result +directory. Each call previously did 4-6 sequential JSON reads + a +fresh `risk_analyzer.calculate_risk` walk over potentially multi-MB +analyzer outputs. The result is deterministic for a given set of +on-disk JSONs — perfect for caching. + +This module persists a tiny `_summary_cache.json` next to the analyzer +outputs. Each cached entry stamps the mtimes of every source JSON it +depends on (file_info / static / dynamic / byovd / edr_*); a read +reconstructs the source mtimes and compares against the stamp. Any +drift forces a recompute, so the cache stays correct without any +manual invalidation at write sites. + +Cache miss (~stale mtimes / no file): caller falls back to the slow +recompute path and stores the fresh result on the way out. +""" + +import json +import logging +import os +from typing import Dict, Optional + + +logger = logging.getLogger(__name__) + + +# Source files whose mtimes determine cache validity. Anything past +# this list (e.g. report HTML, ad-hoc operator notes) is intentionally +# outside the dependency set — adding a report doesn't invalidate the +# summary, since the report is derived from the same JSONs. +_FIXED_SOURCES = ( + 'file_info.json', + 'static_analysis_results.json', + 'dynamic_analysis_results.json', + 'byovd_results.json', +) +_EDR_PREFIX = 'edr_' +_EDR_SUFFIX = '_results.json' + +CACHE_FILE = '_summary_cache.json' + + +def get_cached(item_path: str) -> Optional[dict]: + """Return a cached summary for `item_path` if its source mtimes + match the current on-disk state. None on miss / staleness / + corrupted cache.""" + cache_path = os.path.join(item_path, CACHE_FILE) + if not os.path.exists(cache_path): + return None + try: + with open(cache_path, 'r', encoding='utf-8') as f: + cached = json.load(f) + except (json.JSONDecodeError, OSError) as exc: + logger.debug(f"Summary cache read failed for {item_path}: {exc}") + return None + + saved_sources = cached.get('_sources') or {} + if saved_sources != _source_mtimes(item_path): + return None + + return cached.get('summary') + + +def store(item_path: str, summary: dict) -> None: + """Persist `summary` for `item_path` along with the current source + mtimes. Failures are logged but not raised — the cache is purely + a perf optimization and a missing entry just falls through to the + slow path on the next read.""" + cache_path = os.path.join(item_path, CACHE_FILE) + payload = { + '_sources': _source_mtimes(item_path), + 'summary': summary, + } + try: + # Write to a sibling .tmp then rename so a crash mid-write + # never leaves a half-formed cache file behind. + tmp = cache_path + '.tmp' + with open(tmp, 'w', encoding='utf-8') as f: + json.dump(payload, f) + os.replace(tmp, cache_path) + except OSError as exc: + logger.debug(f"Summary cache write failed for {item_path}: {exc}") + + +def invalidate(item_path: str) -> None: + """Remove the cached entry for `item_path`. Idempotent — missing + cache is fine. The mtime check normally makes manual invalidation + unnecessary; this is mostly here for cleanup / cleanup endpoints.""" + cache_path = os.path.join(item_path, CACHE_FILE) + try: + os.remove(cache_path) + except FileNotFoundError: + pass + except OSError as exc: + logger.debug(f"Summary cache invalidate failed for {item_path}: {exc}") + + +# ---- internals --------------------------------------------------------- + + +def _source_mtimes(item_path: str) -> Dict[str, int]: + """Snapshot the mtimes (in nanoseconds) of every source JSON we + depend on. Discovers per-profile EDR result files dynamically so + a freshly-added profile invalidates the cache automatically.""" + out: Dict[str, int] = {} + try: + entries = os.listdir(item_path) + except (FileNotFoundError, OSError): + return out + for name in entries: + if name in _FIXED_SOURCES or ( + name.startswith(_EDR_PREFIX) and name.endswith(_EDR_SUFFIX) + ): + try: + out[name] = os.stat(os.path.join(item_path, name)).st_mtime_ns + except OSError: + pass + return out diff --git a/app/utils/path_manager.py b/app/utils/path_manager.py index e82740e..5426a0d 100644 --- a/app/utils/path_manager.py +++ b/app/utils/path_manager.py @@ -1,14 +1,96 @@ # app/utils/path_manager.py -"""Filesystem lookups for analysis artifacts.""" +"""Filesystem lookups for analysis artifacts. + +`find_file_by_hash` is hot — it's called from ~15 endpoints, often two +or three times per page load (once on the upload folder, once on the +result folder, sometimes again from a follow-up render). The naive +`os.listdir` scan it used to do was O(N) in the number of retained +samples; on a host with thousands of samples that's tens of ms +multiplied by every API request. + +We back it with a per-folder hash→dirname cache that's lazily populated +on miss and revalidated against the folder's `mtime`. Adding or +removing a file in the folder bumps mtime, which makes the cache miss +on the next call and reload — no manual invalidation needed for the +common create / delete paths. +""" + import os +import threading + +# Per-folder cache. Each entry: {folder_path: (mtime_ns, {hash_or_prefix: dirname})} +# Threading note: Flask is multi-threaded by default; readers and writers +# can race. A single coarse lock around mutations is plenty fast (cache +# hits don't take it). +_CACHE: dict = {} +_CACHE_LOCK = threading.Lock() def find_file_by_hash(file_hash, search_folder): - """Find a file in the specified folder whose name starts with the given hash.""" + """Find a file or directory in `search_folder` whose name starts + with `file_hash`. Cached against the folder's mtime. + + Returns the full path on hit, None if no entry matches or the + folder doesn't exist. + """ + if not file_hash: + return None + try: - for filename in os.listdir(search_folder): - if filename.startswith(file_hash): - return os.path.join(search_folder, filename) - except FileNotFoundError: - pass - return None + folder_mtime = os.stat(search_folder).st_mtime_ns + except (FileNotFoundError, OSError): + return None + + cache_key = os.path.abspath(search_folder) + cached = _CACHE.get(cache_key) + if cached is None or cached[0] != folder_mtime: + cached = _refresh(cache_key, search_folder, folder_mtime) + + name = cached[1].get(file_hash) + if name is None: + # Cache miss — file may have been added since the last + # mtime tick, or the lookup is for a hash whose entry + # doesn't exist. Fall back to a one-off listdir scan to + # be sure (and warm the cache while we're at it). + cached = _refresh(cache_key, search_folder, folder_mtime, force=True) + name = cached[1].get(file_hash) + if name is None: + return None + return os.path.join(search_folder, name) + + +def invalidate(search_folder=None): + """Drop the cached entry for `search_folder` (or all entries if + None). Callers that mutate a folder out-of-band should call this so + the next lookup re-scans. Most code paths don't need it — the + mtime check covers common file creation / deletion.""" + with _CACHE_LOCK: + if search_folder is None: + _CACHE.clear() + else: + _CACHE.pop(os.path.abspath(search_folder), None) + + +def _refresh(cache_key: str, search_folder: str, mtime, force: bool = False): + """Rebuild the index for `search_folder`. Indexes by both the full + name and the hash-prefix portion (everything up to the first `_`) + so callers can pass either form.""" + with _CACHE_LOCK: + # Re-check inside the lock — another thread may have just refreshed. + cached = _CACHE.get(cache_key) + if not force and cached is not None and cached[0] == mtime: + return cached + index: dict = {} + try: + for entry in os.listdir(search_folder): + # Index by full name (covers exact-match callers) AND by + # hash prefix (covers `_` style). + index[entry] = entry + prefix, _, _rest = entry.partition('_') + if prefix and prefix not in index: + index[prefix] = entry + except FileNotFoundError: + pass + cached = (mtime, index) + _CACHE[cache_key] = cached + return cached