Files
2026-05-08 17:45:23 -05:00

14 KiB

GreySec MAL — Master Kanban

Product: GreySec Malware Analysis Lab Type: Internal Build Project Status: BUILDING Updated: 2026-05-07 Parent debrief: ~/greysec/ops/debriefs/malware-lab-2026-05-07.md


Background

GreySec MAL is a self-hosted malware analysis sandbox for red team operators. It takes a binary payload, detonates it in an isolated Windows 11 VM instrumented with EDR (Fibratus + Whiskers + RedEdr), captures behavioral events via RabbitMQ, and produces a client-facing analysis report with a Detection Score (0-100) and MITRE ATT&CK kill chain map.

Architecture:

Payload Upload → LitterBox (:1337) → SMB Share Mount → Windows VM (:1337)
                                                            ↓
                                              Fibratus (kernel events)
                                              Whiskers (REST API :8080)
                                              RedEdr (EDR reporting)
                                                            ↓
                                              RabbitMQ (event queue)
                                                            ↓
                                              variant_event_consumer (Python)
                                                            ↓
                                              Supabase (structured data)
                                                            ↓
                                              Detection Score + MITRE ATT&CK Report

Current status: ARCHITECTURE VERIFIED. 4 critical bugs block end-to-end operation. Fix order is strict.


Pipeline Definition

What the product IS: Drop a binary. Get a Detection Score + MITRE ATT&CK kill chain. Client data never leaves your infrastructure.

What the client receives:

  • Detection Score (0-100) — how likely this payload is to be flagged by EDR
  • MITRE ATT&CK kill chain map — which tactics and techniques the payload uses
  • Behavioral analysis summary — what the payload actually did (file ops, network ops, process ops)
  • Raw event log (optional) — full Fibratus event stream for manual review

Target buyer:

  • Red team operators testing C2 payloads before deployment
  • MSSPs running adversary simulation for clients
  • Security teams with HIPAA/BAA obligations that prevent cloud malware analysis
  • Law firms and financial institutions with strict client confidentiality requirements

SLA (target):

  • Analysis turnaround: < 5 minutes for typical payloads (< 10MB)
  • Report available: via web dashboard or API
  • Uptime: 99% (target, TBD with Adam)

Current State

What Works

  • v1 Python payload: ran for 16 seconds, generated real EDR events, Fibratus saw them, Whiskers returned them via /api/alerts/fibratus/since — core event path verified
  • RabbitMQ → variant_event_consumer → Supabase: working
  • Docker-compose stack: LitterBox, RabbitMQ, Fibratus bridge, consumer all start cleanly
  • Pre-flight check script exists at ~/bin/greysec/pre-flight-vm-check.sh (not yet run in a session)

What Is Broken

# Bug Severity Fix Time Cascade
1 VM share mount \\172.28.0.1\share unreachable from Windows VM — payloads may not reach analysis dir CRITICAL 30 min Blocks all testing
2 RedEdr returns zero events despite Fibratus seeing real syscalls — event data doesn't reach final report CRITICAL 30-60 min Blocks EDR validation
3 Whiskers has no Windows service wrapper — dies when parent process exits, requires manual PAExec restart CRITICAL 1 hour Blocks reliability
4 manager.py lines 418-419 hardcodes init_wait_time = 5 regardless of config — payloads killed at 5s DEGRADED 30 min Blocks extended runs

Fix order: 1 → 2 → 3 → 4. Issue 4 is blocked by Issue 1 (can't test 4 until share mount works).


BOARD

BACKLOG

  • Build Detection Score algorithm (0-100 from Fibratus event frequency + severity + MITRE technique count)
  • Build web dashboard for results (currently Supabase only — no client-facing UI)
  • Build client upload portal (currently manual curl to localhost:1337)
  • Build MITRE ATT&CK kill chain mapper (Fibratus events → ATT&CK tactic/technique IDs)
  • Write greysec-malware-pipeline skill (standalone — not yet created)
  • Add payload hardening guidance output (what to change in the binary to lower Detection Score)
  • Set up TLS for LitterBox API (currently plain HTTP — fine for internal, not for client-facing portal)
  • Build multi-user access control (when portal is client-facing, need auth)
  • Benchmark performance: typical payload analysis time, max payload size, concurrent analysis capacity

IN PROGRESS

(empty — no work currently active)

VALIDATING

(empty)

DONE

  • Architecture design (RabbitMQ + Fibratus + Whiskers + Supabase)
  • Docker-compose stack (LitterBox + RabbitMQ + bridges)
  • v1 Python payload proves end-to-end event path
  • Pre-flight VM check script written (~/bin/greysec/pre-flight-vm-check.sh)
  • Supabase schema for analysis results

BLOCKED

  • ISSUE 1: VM share mount — Cannot test payloads until SMB share is reachable from inside VM
  • ISSUE 2: RedEdr zero events — Cannot validate EDR reporting until share mount works

Technical Fix Tasks

Task 1: Fix VM Share Mount (CRITICAL — do first)

What: \\172.28.0.1\share (SMB) not reachable from inside Windows VM at 172.28.0.10

Root cause: Docker bridge network (172.28.0.0/24) may not be attached to VM network interface. SMB port 445 may be blocked by Windows Firewall.

Fix approach A: Verify Docker bridge attachment and open Windows Firewall for SMB. Fix approach B (preferred): Replace SMB mount with HTTP upload endpoint inside VM — more reliable across Docker bridge, no firewall holes.

Files to touch:

  • ~/greysec/tools/LitterBox/docker-compose.yml (change mount mechanism)
  • May need new endpoint in ~/greysec/tools/LitterBox/app/analyzers/payload_receiver.py

Who: qwen2.5-coder:14b Time: ~30 minutes Verification: From inside VM: curl -F "file=@test.exe" http://172.28.0.1:PORT/upload returns 200

Acceptance criteria:

  • VM can reach LitterBox upload endpoint
  • Payload file appears in VM analysis directory
  • LitterBox begins processing within 10 seconds of upload

Task 2: Fix RedEdr Zero Events (CRITICAL — do second)

What: Fibratus sees real syscalls. Whiskers /api/alerts/fibratus/since returns events. But RedEdr report shows nothing.

Root cause: Trace path: Fibratus writes to Windows Application Event Log → Whiskers reads via wevtutil → publishes over HTTP → consumer receives. Something breaks between Whiskers and final report.

Fix approach:

  1. Check Fibratus filter rules — are they capturing the right event types?
  2. Check Whiskers polling interval — is it fast enough?
  3. Check variant_event_consumer.py — is it parsing Whiskers output correctly?
  4. Run a known-syscall payload and trace events at each hop

Files to touch:

  • ~/bin/greysec/fibratus_rabbitmq_bridge.py
  • ~/bin/greysec/variant_event_consumer.py
  • Fibratus config ~/greysec/tools/fibratus/config.yaml

Who: qwen2.5-coder:14b Time: ~30-60 minutes (diagnosis + fix) Verification: Run ransomware_sim_v1.py payload → confirm events in RedEdr report, not just Whiskers endpoint

Acceptance criteria:

  • Payload makes real OpenProcess/CreateFile syscalls
  • Fibratus events appear in Whiskers /api/alerts/fibratus/since output
  • Events are parsed and stored in Supabase
  • RedEdr-format report shows the events with correct timestamps

Task 3: Install Whiskers as Windows Service (CRITICAL — do third)

What: Whiskers dies when PAExec parent exits. No persistence across VM restart or process crash.

Fix: Install Whiskers as a Windows service using nssm (Non-Sucking Service Manager) or instsrv.

Files to touch:

  • VM-side setup: install nssm, run nssm install Whiskers "C:\path\to\whiskers.exe" "--port 8080"

Who: qwen2.5-coder:14b Time: ~1 hour Verification: Reboot VM → wait 5 minutes → confirm Whiskers still reachable at http://172.28.0.10:8080/api/alerts/fibratus/since

Acceptance criteria:

  • Whiskers survives VM reboot without manual intervention
  • Whiskers survives its own parent process exiting
  • Health check curl http://172.28.0.10:8080/health returns 200

Task 4: Fix manager.py Timeout Handler (DEGRADED — do fourth)

What: ~/greysec/tools/LitterBox/app/analyzers/manager.py lines 418-419 hardcode init_wait_time = 5 in the "terminated after" error handler, overriding config.yaml.

Fix: Change init_wait_time = 5 to init_wait_time = config.get('wait_time', 15) or similar.

Files to touch:

  • ~/greysec/tools/LitterBox/app/analyzers/manager.py (lines ~418-419)

Who: qwen2.5-coder:14b Time: ~30 minutes Verification: Set wait_time: 30 in config.yaml → run a 20-second payload → confirm it runs for 20+ seconds, not 5

Acceptance criteria:

  • Config value respected, not hardcoded fallback
  • C payloads (v2, v3) that need > 5 seconds run to completion

Product Build Tasks

Task 5: Detection Score Algorithm

What: The primary client deliverable. A score from 0-100 that rates how likely this payload is to be detected by EDR.

Approach: Combine:

  • Event count: how many syscalls per minute
  • Event severity: which syscalls (OpenProcess = medium, VirtualAlloc + WriteProcess = high)
  • MITRE technique count: how many distinct ATT&CK techniques used
  • Network indicators: outbound connections = higher score
  • Process injection indicators: highest score

Output: JSON field in Supabase + dashboard display Formula (target): score = min(100, (event_count * 0.1) + (technique_count * 15) + (severity_multiplier * 20) + (network_indicator * 25))

Who: qwen2.5-coder:14b or glm-5.1:cloud for algorithm design Time: ~2 hours Verification: Run 3 known-clean files (calc.exe, notepad.exe) → score < 20. Run ransomware_sim payload → score > 60.


Task 6: Web Dashboard

What: Client-facing results dashboard. Currently Supabase only — no UI.

Stack: TBD (recommend: Simple Python Flask or FastAPI + HTMX for simplicity, or integrate into existing GreySec dashboard)

Pages:

  • Upload page: drag-and-drop binary, job ID returned
  • Results page: Detection Score, MITRE kill chain visualization, behavioral summary
  • History: past analyses for the client's org

Who: qwen2.5-coder:14b (or Adam if design decision needed) Time: ~4 hours Dependencies: Task 1, 2, 5 complete first


Task 7: Client Upload Portal

What: Authenticated API endpoint for clients to submit binaries. Currently manual curl to localhost.

Features:

  • API key auth per client org
  • File type validation (.exe, .dll, .bin, .ps1, .py)
  • Max file size: 50MB
  • Sandbox: each org gets isolated analysis environment (future scope — V1 is shared infra)

Files to touch:

  • ~/greysec/tools/LitterBox/app/analyzers/payload_receiver.py (new endpoints)
  • ~/greysec/tools/LitterBox/Config/config.yaml (API key config)

Who: qwen2.5-coder:14b Time: ~2 hours Dependencies: Task 1 (share mount fix) must be complete


Task 8: MITRE ATT&CK Kill Chain Mapper

What: Map Fibratus syscall events to MITRE ATT&CK tactic and technique IDs automatically.

Approach: Build a mapping table:

  • NtOpenProcess → T1086 (PowerShell), T1055 (Process Injection)
  • NtCreateFile on sensitive paths → T1005 (Data from System Files)
  • VirtualAllocEx + WriteProcessMemory → T1055 (Process Injection)
  • CreateRemoteThread → T1055 (Process Injection)
  • RegSetValue → T1112 (Modify Registry)
  • URLDownloadToFile → T1105 (Ingress Tool Transfer)

Output: Kill chain visualization (text or SVG) showing sequence of ATT&CK techniques used Files to touch: ~/bin/greysec/variant_event_consumer.py (add mapping logic)

Who: qwen2.5-coder:14b Time: ~2 hours (building the mapping table is the work) Dependencies: Task 2 (RedEdr events must flow)


Definition of Done

GreySec MAL is operational when:

  1. All 4 critical bugs are fixed and verified
  2. A known-malicious payload (ransomware_sim_v1.py) produces a Detection Score > 60
  3. MITRE ATT&CK kill chain shows at least 3 techniques for that payload
  4. A known-clean payload (notepad.exe) produces a Detection Score < 20
  5. Analysis turnaround is < 5 minutes for a 1MB binary
  6. Client upload portal accepts a binary via API and returns a job ID
  7. Results are accessible via web dashboard within 5 minutes of upload
  8. Skill file greysec-malware-pipeline exists and documents the full operational procedure
  9. Time tracking is hooked into the pipeline (AI minutes logged to TIME-LOG)
  10. gbrain logging is hooked into the pipeline (findings logged post-analysis)

DEBT (Action Items from This Kanban)

Action Item Priority Status Notes
Fix VM share mount (Task 1) CRITICAL open Do first — blocks all testing
Fix RedEdr zero events (Task 2) CRITICAL open Do second — blocks reporting
Install Whiskers as Windows service (Task 3) CRITICAL open Do third — blocks reliability
Fix manager.py timeout (Task 4) DEGRADED open Do fourth
Build Detection Score algorithm (Task 5) HIGH open Primary deliverable metric
Build web dashboard (Task 6) HIGH open Client-facing UI
Build client upload portal (Task 7) HIGH open API for clients
Build MITRE ATT&CK mapper (Task 8) HIGH open Kill chain output
Write greysec-malware-pipeline skill MEDIUM open Docs
Add TIME-LOG hook MEDIUM open Cost tracking
Add gbrain logging hook MEDIUM open Knowledge capture