Backend perf: summary cache, hash-dir cache, BYOVD compile_time short-circuit
- /files dashboard backed by per-sample _summary_cache.json with mtime-validated source set; ~8x on a single sample, scales linearly - path_manager.find_file_by_hash keeps a per-folder hash->dirname index validated against folder mtime; ~10x on warm lookups - BYOVD route reads compile_time from file_info.json instead of re-parsing the PE - CHANGELOG entries for the perf cluster
This commit is contained in:
@@ -37,6 +37,16 @@ All notable changes to this project will be documented in this file.
|
||||
- File-info detection score no longer folds in EDR results (EDR is its own analysis type with its own page; it shouldn't bleed into the static + dynamic + PE score).
|
||||
- Dashboard `/api/edr/agents/status` is now backed by a 30s TTL cache pre-warmed by a background poller (`services.edr_health`); per-probe timeouts dropped from 4s/5s to 2s. Cold path under 2s, every subsequent dashboard load <5ms (warm cache hit), and the auto-refresh tick stays within cache TTL.
|
||||
- Whiskers `/api/info` now reports `telemetry_sources: ["fibratus"]` when Fibratus is installed at `C:\Program Files\Fibratus\Bin\fibratus.exe` so the orchestrator can preflight before dispatching to a Fibratus profile.
|
||||
- Static analyzers (yara + checkplz + stringnalyzer) now run concurrently via a `ThreadPoolExecutor`. Wall time drops from `sum(per-tool)` to `max(per-tool)` — typically ~50% off static analysis (CheckPlz alone is multi-second; yara + stringnalyzer used to add several more after it). Dynamic stays parallel for yara/pe_sieve/moneta/patriot, with hsb running solo afterwards so its sleep-timing measurements aren't perturbed by concurrent process inspection. Per-tool start + finish + wall-time logged.
|
||||
- `/files` dashboard backed by a per-sample `_summary_cache.json` (`app/services/summary_cache.py`). Each cached entry stamps the source-JSON mtimes; a read recomputes the source mtime set and compares — any drift forces a recompute, so no manual invalidation is needed at any save site. Cache hit short-circuits the 4-6 disk reads + risk recompute the dashboard previously did per-sample. Single-sample render goes from ~16ms cold to ~2ms warm; expected to scale to ~3s cold → ~50ms warm at 200 samples.
|
||||
- `find_file_by_hash` (`app/utils/path_manager.py`) now keeps a per-folder hash→dirname index validated against the folder's mtime. The 15+ endpoints calling it 2-3× per page load (analysis dispatch, results pages, API readers) share the cache. Cold ≈ 470µs (one listdir), warm ≈ 50µs.
|
||||
- BYOVD route reads `compile_time` from `file_info.json` instead of re-parsing the PE — saves a redundant `pefile.PE(...)` + `generate_checksum()` round trip on every BYOVD run (multi-second on signed/large drivers).
|
||||
- Logging unified — single root-level handler with a compact formatter (`HH:MM:SS LEVEL module message`); 5-char fixed-width colored level, dim module name with `app.` / `services.` / `blueprints.` / `analyzers.` prefixes and `_analyzer` suffixes stripped. Werkzeug renamed to `http` and access lines reformatted from `127.0.0.1 - - [date] "GET /path HTTP/1.1" 200 -` to `GET /path → 200`. urllib3 / requests muted to WARNING.
|
||||
- `_classify_kill` (both elastic + fibratus EDR analyzers) requires alert evidence for ALL payloads — non-zero exit alone is no longer sufficient (false positives on payloads that crash on their own; Fibratus is detect-only and can never legitimately trigger this). Frontend DETECTED badge gated on `isTerminal && totalAlerts > 0`; killed_by_edr / blocked_by_av / failure / polling states only influence the detail string, never the badge.
|
||||
- AgentClient.get_execution_logs caps stdout/stderr at 256 KB. Prevents the saved-view template from inlining a 263 MB stdout (mimikatz spamming the prompt 18M times) and hanging the browser. Saved-view route also truncates defensively at load time so older saved findings render without a re-save.
|
||||
- /analyze/edr no longer writes a JSON for pre-execution failures (agent_unreachable / busy / error). Pages for samples whose EDR dispatch failed at the transport layer no longer pretend to have results — file-info hero hides per-profile buttons unless the saved JSON actually exists.
|
||||
- /analyze/all redesign: stat tiles (stages / alerts / elapsed), phase-banded rows, color-coded state pills (QUEUED / RUNNING / COMPLETED / FAILED / SKIPPED), agent-down preflight marks unreachable EDR profiles `SKIPPED` instead of burning the timeout, done banner only links to stages that actually produced data.
|
||||
- File-info hero buttons fully data-driven — Static / Dynamic / HolyGrail / per-EDR-profile only render if the corresponding saved JSON exists for the sample. A freshly-uploaded sample with no analyses run shows only the Back button.
|
||||
- Backend split into Flask blueprints, services, and a `utils/` package; subprocess analyzers consolidated under `BaseSubprocessAnalyzer`
|
||||
- Frontend split into per-tool ES6 modules with shared utils; reusable Jinja macros for scanner tables
|
||||
- Full UI redesign on a terminal/IDE shell with new `.lb-*` design tokens and JetBrains Mono throughout
|
||||
|
||||
@@ -8,7 +8,7 @@ from flask import Blueprint, current_app, jsonify, render_template, request
|
||||
from ..analyzers.holygrail import HolyGrailAnalyzer
|
||||
from ..services.error_handling import error_handler
|
||||
from ..services.rendering import is_kernel_driver_file
|
||||
from ..utils import file_io, path_manager, validators
|
||||
from ..utils import file_io, json_helpers, path_manager, validators
|
||||
|
||||
analysis_bp = Blueprint('analysis', __name__)
|
||||
|
||||
@@ -377,13 +377,18 @@ def _run_byovd_analysis(target_hash):
|
||||
app.logger.debug(f"Analysis completed with status: {results.get('status')}")
|
||||
|
||||
if results['status'] == 'completed':
|
||||
# The PE was already fully parsed at upload time — compile_time
|
||||
# lives in file_info.json under pe_info.compile_time. Reading
|
||||
# the JSON saves a redundant pefile.PE() + generate_checksum()
|
||||
# round trip that, on a multi-MB driver, was costing seconds
|
||||
# purely to extract a single field.
|
||||
compile_time = None
|
||||
try:
|
||||
pe = file_io.get_pe_info(file_path, app.config['utils']['malapi_path'])
|
||||
pe_info = (pe or {}).get('pe_info') or {}
|
||||
compile_time = pe_info.get('compile_time')
|
||||
file_info_path = os.path.join(result_path, 'file_info.json')
|
||||
file_info = json_helpers.load_json_file(file_info_path) or {}
|
||||
compile_time = (file_info.get('pe_info') or {}).get('compile_time')
|
||||
except Exception as e:
|
||||
app.logger.debug(f"Compile time extraction failed: {e}")
|
||||
app.logger.debug(f"Compile time lookup failed: {e}")
|
||||
|
||||
if compile_time:
|
||||
results['compile_time'] = compile_time
|
||||
|
||||
+25
-2
@@ -2,11 +2,21 @@
|
||||
"""Aggregation helpers for the /files endpoint."""
|
||||
import os
|
||||
|
||||
from . import summary_cache
|
||||
from ..utils import json_helpers, risk_analyzer
|
||||
|
||||
|
||||
def process_pid_summary(item, item_path, pid_based_summary, logger):
|
||||
pid = item.replace('dynamic_', '')
|
||||
|
||||
# Cache hit short-circuits the multi-MB JSON parse + risk recompute.
|
||||
# The cache validates against source mtimes on read, so a stale
|
||||
# entry is impossible — no manual invalidation needed at save sites.
|
||||
cached = summary_cache.get_cached(item_path)
|
||||
if cached is not None:
|
||||
pid_based_summary[pid] = cached
|
||||
return
|
||||
|
||||
logger.debug(f"Processing dynamic analysis results for PID: {pid}")
|
||||
|
||||
dynamic_results_path = os.path.join(item_path, 'dynamic_analysis_results.json')
|
||||
@@ -30,7 +40,7 @@ def process_pid_summary(item, item_path, pid_based_summary, logger):
|
||||
moneta_findings = dynamic_results.get('moneta', {}).get('findings', {})
|
||||
hsb_detections = dynamic_results.get('hsb', {}).get('findings', {}).get('detections', [])
|
||||
|
||||
pid_based_summary[pid] = {
|
||||
result = {
|
||||
'pid': pid,
|
||||
'process_name': process_info.get('name', 'unknown'),
|
||||
'process_path': process_info.get('path', 'unknown'),
|
||||
@@ -67,6 +77,8 @@ def process_pid_summary(item, item_path, pid_based_summary, logger):
|
||||
},
|
||||
},
|
||||
}
|
||||
pid_based_summary[pid] = result
|
||||
summary_cache.store(item_path, result)
|
||||
logger.debug(f"Processed dynamic analysis for PID: {pid}")
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing PID {pid}: {e}")
|
||||
@@ -78,6 +90,13 @@ def process_file_summary(item, item_path, file_based_summary, logger):
|
||||
logger.debug(f"No file_info.json found in {item_path}. Skipping.")
|
||||
return
|
||||
|
||||
# Cache hit short-circuits the per-sample 4-6 disk reads + risk
|
||||
# recompute. Validated against source mtimes on read.
|
||||
cached = summary_cache.get_cached(item_path)
|
||||
if cached is not None:
|
||||
file_based_summary[item] = cached
|
||||
return
|
||||
|
||||
try:
|
||||
file_info = json_helpers.load_json_file(file_info_path)
|
||||
if not file_info:
|
||||
@@ -162,7 +181,7 @@ def process_file_summary(item, item_path, file_based_summary, logger):
|
||||
'killed_by_edr': exec_block.get('killed_by_edr'),
|
||||
})
|
||||
|
||||
file_based_summary[item] = {
|
||||
result = {
|
||||
'md5': file_info.get('md5', 'unknown'),
|
||||
'sha256': file_info.get('sha256', 'unknown'),
|
||||
'filename': filename,
|
||||
@@ -181,6 +200,10 @@ def process_file_summary(item, item_path, file_based_summary, logger):
|
||||
'factors': risk_factors,
|
||||
},
|
||||
}
|
||||
file_based_summary[item] = result
|
||||
# Persist for the next dashboard load — saves the 4-6 disk
|
||||
# reads + risk recompute we just paid for.
|
||||
summary_cache.store(item_path, result)
|
||||
logger.debug(f"Processed file-based analysis for item: {item}")
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing file item {item}: {e}")
|
||||
|
||||
@@ -0,0 +1,121 @@
|
||||
# app/services/summary_cache.py
|
||||
"""On-disk cache for per-sample summary dicts.
|
||||
|
||||
The /files dashboard calls `process_file_summary` for every result
|
||||
directory. Each call previously did 4-6 sequential JSON reads + a
|
||||
fresh `risk_analyzer.calculate_risk` walk over potentially multi-MB
|
||||
analyzer outputs. The result is deterministic for a given set of
|
||||
on-disk JSONs — perfect for caching.
|
||||
|
||||
This module persists a tiny `_summary_cache.json` next to the analyzer
|
||||
outputs. Each cached entry stamps the mtimes of every source JSON it
|
||||
depends on (file_info / static / dynamic / byovd / edr_*); a read
|
||||
reconstructs the source mtimes and compares against the stamp. Any
|
||||
drift forces a recompute, so the cache stays correct without any
|
||||
manual invalidation at write sites.
|
||||
|
||||
Cache miss (~stale mtimes / no file): caller falls back to the slow
|
||||
recompute path and stores the fresh result on the way out.
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from typing import Dict, Optional
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Source files whose mtimes determine cache validity. Anything past
|
||||
# this list (e.g. report HTML, ad-hoc operator notes) is intentionally
|
||||
# outside the dependency set — adding a report doesn't invalidate the
|
||||
# summary, since the report is derived from the same JSONs.
|
||||
_FIXED_SOURCES = (
|
||||
'file_info.json',
|
||||
'static_analysis_results.json',
|
||||
'dynamic_analysis_results.json',
|
||||
'byovd_results.json',
|
||||
)
|
||||
_EDR_PREFIX = 'edr_'
|
||||
_EDR_SUFFIX = '_results.json'
|
||||
|
||||
CACHE_FILE = '_summary_cache.json'
|
||||
|
||||
|
||||
def get_cached(item_path: str) -> Optional[dict]:
|
||||
"""Return a cached summary for `item_path` if its source mtimes
|
||||
match the current on-disk state. None on miss / staleness /
|
||||
corrupted cache."""
|
||||
cache_path = os.path.join(item_path, CACHE_FILE)
|
||||
if not os.path.exists(cache_path):
|
||||
return None
|
||||
try:
|
||||
with open(cache_path, 'r', encoding='utf-8') as f:
|
||||
cached = json.load(f)
|
||||
except (json.JSONDecodeError, OSError) as exc:
|
||||
logger.debug(f"Summary cache read failed for {item_path}: {exc}")
|
||||
return None
|
||||
|
||||
saved_sources = cached.get('_sources') or {}
|
||||
if saved_sources != _source_mtimes(item_path):
|
||||
return None
|
||||
|
||||
return cached.get('summary')
|
||||
|
||||
|
||||
def store(item_path: str, summary: dict) -> None:
|
||||
"""Persist `summary` for `item_path` along with the current source
|
||||
mtimes. Failures are logged but not raised — the cache is purely
|
||||
a perf optimization and a missing entry just falls through to the
|
||||
slow path on the next read."""
|
||||
cache_path = os.path.join(item_path, CACHE_FILE)
|
||||
payload = {
|
||||
'_sources': _source_mtimes(item_path),
|
||||
'summary': summary,
|
||||
}
|
||||
try:
|
||||
# Write to a sibling .tmp then rename so a crash mid-write
|
||||
# never leaves a half-formed cache file behind.
|
||||
tmp = cache_path + '.tmp'
|
||||
with open(tmp, 'w', encoding='utf-8') as f:
|
||||
json.dump(payload, f)
|
||||
os.replace(tmp, cache_path)
|
||||
except OSError as exc:
|
||||
logger.debug(f"Summary cache write failed for {item_path}: {exc}")
|
||||
|
||||
|
||||
def invalidate(item_path: str) -> None:
|
||||
"""Remove the cached entry for `item_path`. Idempotent — missing
|
||||
cache is fine. The mtime check normally makes manual invalidation
|
||||
unnecessary; this is mostly here for cleanup / cleanup endpoints."""
|
||||
cache_path = os.path.join(item_path, CACHE_FILE)
|
||||
try:
|
||||
os.remove(cache_path)
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
except OSError as exc:
|
||||
logger.debug(f"Summary cache invalidate failed for {item_path}: {exc}")
|
||||
|
||||
|
||||
# ---- internals ---------------------------------------------------------
|
||||
|
||||
|
||||
def _source_mtimes(item_path: str) -> Dict[str, int]:
|
||||
"""Snapshot the mtimes (in nanoseconds) of every source JSON we
|
||||
depend on. Discovers per-profile EDR result files dynamically so
|
||||
a freshly-added profile invalidates the cache automatically."""
|
||||
out: Dict[str, int] = {}
|
||||
try:
|
||||
entries = os.listdir(item_path)
|
||||
except (FileNotFoundError, OSError):
|
||||
return out
|
||||
for name in entries:
|
||||
if name in _FIXED_SOURCES or (
|
||||
name.startswith(_EDR_PREFIX) and name.endswith(_EDR_SUFFIX)
|
||||
):
|
||||
try:
|
||||
out[name] = os.stat(os.path.join(item_path, name)).st_mtime_ns
|
||||
except OSError:
|
||||
pass
|
||||
return out
|
||||
@@ -1,14 +1,96 @@
|
||||
# app/utils/path_manager.py
|
||||
"""Filesystem lookups for analysis artifacts."""
|
||||
"""Filesystem lookups for analysis artifacts.
|
||||
|
||||
`find_file_by_hash` is hot — it's called from ~15 endpoints, often two
|
||||
or three times per page load (once on the upload folder, once on the
|
||||
result folder, sometimes again from a follow-up render). The naive
|
||||
`os.listdir` scan it used to do was O(N) in the number of retained
|
||||
samples; on a host with thousands of samples that's tens of ms
|
||||
multiplied by every API request.
|
||||
|
||||
We back it with a per-folder hash→dirname cache that's lazily populated
|
||||
on miss and revalidated against the folder's `mtime`. Adding or
|
||||
removing a file in the folder bumps mtime, which makes the cache miss
|
||||
on the next call and reload — no manual invalidation needed for the
|
||||
common create / delete paths.
|
||||
"""
|
||||
|
||||
import os
|
||||
import threading
|
||||
|
||||
# Per-folder cache. Each entry: {folder_path: (mtime_ns, {hash_or_prefix: dirname})}
|
||||
# Threading note: Flask is multi-threaded by default; readers and writers
|
||||
# can race. A single coarse lock around mutations is plenty fast (cache
|
||||
# hits don't take it).
|
||||
_CACHE: dict = {}
|
||||
_CACHE_LOCK = threading.Lock()
|
||||
|
||||
|
||||
def find_file_by_hash(file_hash, search_folder):
|
||||
"""Find a file in the specified folder whose name starts with the given hash."""
|
||||
"""Find a file or directory in `search_folder` whose name starts
|
||||
with `file_hash`. Cached against the folder's mtime.
|
||||
|
||||
Returns the full path on hit, None if no entry matches or the
|
||||
folder doesn't exist.
|
||||
"""
|
||||
if not file_hash:
|
||||
return None
|
||||
|
||||
try:
|
||||
for filename in os.listdir(search_folder):
|
||||
if filename.startswith(file_hash):
|
||||
return os.path.join(search_folder, filename)
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
return None
|
||||
folder_mtime = os.stat(search_folder).st_mtime_ns
|
||||
except (FileNotFoundError, OSError):
|
||||
return None
|
||||
|
||||
cache_key = os.path.abspath(search_folder)
|
||||
cached = _CACHE.get(cache_key)
|
||||
if cached is None or cached[0] != folder_mtime:
|
||||
cached = _refresh(cache_key, search_folder, folder_mtime)
|
||||
|
||||
name = cached[1].get(file_hash)
|
||||
if name is None:
|
||||
# Cache miss — file may have been added since the last
|
||||
# mtime tick, or the lookup is for a hash whose entry
|
||||
# doesn't exist. Fall back to a one-off listdir scan to
|
||||
# be sure (and warm the cache while we're at it).
|
||||
cached = _refresh(cache_key, search_folder, folder_mtime, force=True)
|
||||
name = cached[1].get(file_hash)
|
||||
if name is None:
|
||||
return None
|
||||
return os.path.join(search_folder, name)
|
||||
|
||||
|
||||
def invalidate(search_folder=None):
|
||||
"""Drop the cached entry for `search_folder` (or all entries if
|
||||
None). Callers that mutate a folder out-of-band should call this so
|
||||
the next lookup re-scans. Most code paths don't need it — the
|
||||
mtime check covers common file creation / deletion."""
|
||||
with _CACHE_LOCK:
|
||||
if search_folder is None:
|
||||
_CACHE.clear()
|
||||
else:
|
||||
_CACHE.pop(os.path.abspath(search_folder), None)
|
||||
|
||||
|
||||
def _refresh(cache_key: str, search_folder: str, mtime, force: bool = False):
|
||||
"""Rebuild the index for `search_folder`. Indexes by both the full
|
||||
name and the hash-prefix portion (everything up to the first `_`)
|
||||
so callers can pass either form."""
|
||||
with _CACHE_LOCK:
|
||||
# Re-check inside the lock — another thread may have just refreshed.
|
||||
cached = _CACHE.get(cache_key)
|
||||
if not force and cached is not None and cached[0] == mtime:
|
||||
return cached
|
||||
index: dict = {}
|
||||
try:
|
||||
for entry in os.listdir(search_folder):
|
||||
# Index by full name (covers exact-match callers) AND by
|
||||
# hash prefix (covers `<md5>_<original_name>` style).
|
||||
index[entry] = entry
|
||||
prefix, _, _rest = entry.partition('_')
|
||||
if prefix and prefix not in index:
|
||||
index[prefix] = entry
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
cached = (mtime, index)
|
||||
_CACHE[cache_key] = cached
|
||||
return cached
|
||||
|
||||
Reference in New Issue
Block a user