Files

T

ghstshdw 11b6032fcc Initial commit: malware-analysis-pipeline

2026-05-08 17:45:23 -05:00

14 KiB

Raw Permalink Blame History

GreySec MAL — Master Kanban

Product: GreySec Malware Analysis Lab Type: Internal Build Project Status: BUILDING Updated: 2026-05-07 Parent debrief: ~/greysec/ops/debriefs/malware-lab-2026-05-07.md

Background

GreySec MAL is a self-hosted malware analysis sandbox for red team operators. It takes a binary payload, detonates it in an isolated Windows 11 VM instrumented with EDR (Fibratus + Whiskers + RedEdr), captures behavioral events via RabbitMQ, and produces a client-facing analysis report with a Detection Score (0-100) and MITRE ATT&CK kill chain map.

Architecture:

Payload Upload → LitterBox (:1337) → SMB Share Mount → Windows VM (:1337)
                                                            ↓
                                              Fibratus (kernel events)
                                              Whiskers (REST API :8080)
                                              RedEdr (EDR reporting)
                                                            ↓
                                              RabbitMQ (event queue)
                                                            ↓
                                              variant_event_consumer (Python)
                                                            ↓
                                              Supabase (structured data)
                                                            ↓
                                              Detection Score + MITRE ATT&CK Report

Current status: ARCHITECTURE VERIFIED. 4 critical bugs block end-to-end operation. Fix order is strict.

Pipeline Definition

What the product IS: Drop a binary. Get a Detection Score + MITRE ATT&CK kill chain. Client data never leaves your infrastructure.

What the client receives:

Detection Score (0-100) — how likely this payload is to be flagged by EDR
MITRE ATT&CK kill chain map — which tactics and techniques the payload uses
Behavioral analysis summary — what the payload actually did (file ops, network ops, process ops)
Raw event log (optional) — full Fibratus event stream for manual review

Target buyer:

Red team operators testing C2 payloads before deployment
MSSPs running adversary simulation for clients
Security teams with HIPAA/BAA obligations that prevent cloud malware analysis
Law firms and financial institutions with strict client confidentiality requirements

SLA (target):

Analysis turnaround: < 5 minutes for typical payloads (< 10MB)
Report available: via web dashboard or API
Uptime: 99% (target, TBD with Adam)

Current State

What Works

v1 Python payload: ran for 16 seconds, generated real EDR events, Fibratus saw them, Whiskers returned them via /api/alerts/fibratus/since — core event path verified
RabbitMQ → variant_event_consumer → Supabase: working
Docker-compose stack: LitterBox, RabbitMQ, Fibratus bridge, consumer all start cleanly
Pre-flight check script exists at ~/bin/greysec/pre-flight-vm-check.sh (not yet run in a session)

What Is Broken

#	Bug	Severity	Fix Time	Cascade
1	VM share mount `\\172.28.0.1\share` unreachable from Windows VM — payloads may not reach analysis dir	CRITICAL	30 min	Blocks all testing
2	RedEdr returns zero events despite Fibratus seeing real syscalls — event data doesn't reach final report	CRITICAL	30-60 min	Blocks EDR validation
3	Whiskers has no Windows service wrapper — dies when parent process exits, requires manual PAExec restart	CRITICAL	1 hour	Blocks reliability
4	manager.py lines 418-419 hardcodes `init_wait_time = 5` regardless of config — payloads killed at 5s	DEGRADED	30 min	Blocks extended runs

Fix order: 1 → 2 → 3 → 4. Issue 4 is blocked by Issue 1 (can't test 4 until share mount works).

BOARD

BACKLOG

Build Detection Score algorithm (0-100 from Fibratus event frequency + severity + MITRE technique count)
Build web dashboard for results (currently Supabase only — no client-facing UI)
Build client upload portal (currently manual curl to localhost:1337)
Build MITRE ATT&CK kill chain mapper (Fibratus events → ATT&CK tactic/technique IDs)
Write greysec-malware-pipeline skill (standalone — not yet created)
Add payload hardening guidance output (what to change in the binary to lower Detection Score)
Set up TLS for LitterBox API (currently plain HTTP — fine for internal, not for client-facing portal)
Build multi-user access control (when portal is client-facing, need auth)
Benchmark performance: typical payload analysis time, max payload size, concurrent analysis capacity

IN PROGRESS

(empty — no work currently active)

VALIDATING

(empty)

DONE

Architecture design (RabbitMQ + Fibratus + Whiskers + Supabase)
Docker-compose stack (LitterBox + RabbitMQ + bridges)
v1 Python payload proves end-to-end event path
Pre-flight VM check script written (~/bin/greysec/pre-flight-vm-check.sh)
Supabase schema for analysis results

BLOCKED

ISSUE 1: VM share mount — Cannot test payloads until SMB share is reachable from inside VM
ISSUE 2: RedEdr zero events — Cannot validate EDR reporting until share mount works

Technical Fix Tasks

What: \\172.28.0.1\share (SMB) not reachable from inside Windows VM at 172.28.0.10

Root cause: Docker bridge network (172.28.0.0/24) may not be attached to VM network interface. SMB port 445 may be blocked by Windows Firewall.

Fix approach A: Verify Docker bridge attachment and open Windows Firewall for SMB. Fix approach B (preferred): Replace SMB mount with HTTP upload endpoint inside VM — more reliable across Docker bridge, no firewall holes.

Files to touch:

~/greysec/tools/LitterBox/docker-compose.yml (change mount mechanism)
May need new endpoint in ~/greysec/tools/LitterBox/app/analyzers/payload_receiver.py

Who: qwen2.5-coder:14b Time: ~30 minutes Verification: From inside VM: curl -F "file=@test.exe" http://172.28.0.1:PORT/upload returns 200

Acceptance criteria:

VM can reach LitterBox upload endpoint
Payload file appears in VM analysis directory
LitterBox begins processing within 10 seconds of upload

Task 2: Fix RedEdr Zero Events (CRITICAL — do second)

What: Fibratus sees real syscalls. Whiskers /api/alerts/fibratus/since returns events. But RedEdr report shows nothing.

Root cause: Trace path: Fibratus writes to Windows Application Event Log → Whiskers reads via wevtutil → publishes over HTTP → consumer receives. Something breaks between Whiskers and final report.

Fix approach:

Check Fibratus filter rules — are they capturing the right event types?
Check Whiskers polling interval — is it fast enough?
Check variant_event_consumer.py — is it parsing Whiskers output correctly?
Run a known-syscall payload and trace events at each hop

Files to touch:

~/bin/greysec/fibratus_rabbitmq_bridge.py
~/bin/greysec/variant_event_consumer.py
Fibratus config ~/greysec/tools/fibratus/config.yaml

Who: qwen2.5-coder:14b Time: ~30-60 minutes (diagnosis + fix) Verification: Run ransomware_sim_v1.py payload → confirm events in RedEdr report, not just Whiskers endpoint

Acceptance criteria:

Payload makes real OpenProcess/CreateFile syscalls
Fibratus events appear in Whiskers /api/alerts/fibratus/since output
Events are parsed and stored in Supabase
RedEdr-format report shows the events with correct timestamps

Task 3: Install Whiskers as Windows Service (CRITICAL — do third)

What: Whiskers dies when PAExec parent exits. No persistence across VM restart or process crash.

Fix: Install Whiskers as a Windows service using nssm (Non-Sucking Service Manager) or instsrv.

Files to touch:

VM-side setup: install nssm, run nssm install Whiskers "C:\path\to\whiskers.exe" "--port 8080"

Who: qwen2.5-coder:14b Time: ~1 hour Verification: Reboot VM → wait 5 minutes → confirm Whiskers still reachable at http://172.28.0.10:8080/api/alerts/fibratus/since

Acceptance criteria:

Whiskers survives VM reboot without manual intervention
Whiskers survives its own parent process exiting
Health check curl http://172.28.0.10:8080/health returns 200

Task 4: Fix manager.py Timeout Handler (DEGRADED — do fourth)

What: ~/greysec/tools/LitterBox/app/analyzers/manager.py lines 418-419 hardcode init_wait_time = 5 in the "terminated after" error handler, overriding config.yaml.

Fix: Change init_wait_time = 5 to init_wait_time = config.get('wait_time', 15) or similar.

Files to touch:

~/greysec/tools/LitterBox/app/analyzers/manager.py (lines ~418-419)

Who: qwen2.5-coder:14b Time: ~30 minutes Verification: Set wait_time: 30 in config.yaml → run a 20-second payload → confirm it runs for 20+ seconds, not 5

Acceptance criteria:

Config value respected, not hardcoded fallback
C payloads (v2, v3) that need > 5 seconds run to completion

Product Build Tasks

Task 5: Detection Score Algorithm

What: The primary client deliverable. A score from 0-100 that rates how likely this payload is to be detected by EDR.

Approach: Combine:

Event count: how many syscalls per minute
Event severity: which syscalls (OpenProcess = medium, VirtualAlloc + WriteProcess = high)
MITRE technique count: how many distinct ATT&CK techniques used
Network indicators: outbound connections = higher score
Process injection indicators: highest score

Output: JSON field in Supabase + dashboard display Formula (target): score = min(100, (event_count * 0.1) + (technique_count * 15) + (severity_multiplier * 20) + (network_indicator * 25))

Who: qwen2.5-coder:14b or glm-5.1:cloud for algorithm design Time: ~2 hours Verification: Run 3 known-clean files (calc.exe, notepad.exe) → score < 20. Run ransomware_sim payload → score > 60.

Task 6: Web Dashboard

What: Client-facing results dashboard. Currently Supabase only — no UI.

Stack: TBD (recommend: Simple Python Flask or FastAPI + HTMX for simplicity, or integrate into existing GreySec dashboard)

Pages:

Upload page: drag-and-drop binary, job ID returned
Results page: Detection Score, MITRE kill chain visualization, behavioral summary
History: past analyses for the client's org

Who: qwen2.5-coder:14b (or Adam if design decision needed) Time: ~4 hours Dependencies: Task 1, 2, 5 complete first

Task 7: Client Upload Portal

What: Authenticated API endpoint for clients to submit binaries. Currently manual curl to localhost.

Features:

API key auth per client org
File type validation (.exe, .dll, .bin, .ps1, .py)
Max file size: 50MB
Sandbox: each org gets isolated analysis environment (future scope — V1 is shared infra)

Files to touch:

~/greysec/tools/LitterBox/app/analyzers/payload_receiver.py (new endpoints)
~/greysec/tools/LitterBox/Config/config.yaml (API key config)

Who: qwen2.5-coder:14b Time: ~2 hours Dependencies: Task 1 (share mount fix) must be complete

Task 8: MITRE ATT&CK Kill Chain Mapper

What: Map Fibratus syscall events to MITRE ATT&CK tactic and technique IDs automatically.

Approach: Build a mapping table:

NtOpenProcess → T1086 (PowerShell), T1055 (Process Injection)
NtCreateFile on sensitive paths → T1005 (Data from System Files)
VirtualAllocEx + WriteProcessMemory → T1055 (Process Injection)
CreateRemoteThread → T1055 (Process Injection)
RegSetValue → T1112 (Modify Registry)
URLDownloadToFile → T1105 (Ingress Tool Transfer)

Output: Kill chain visualization (text or SVG) showing sequence of ATT&CK techniques used Files to touch: ~/bin/greysec/variant_event_consumer.py (add mapping logic)

Who: qwen2.5-coder:14b Time: ~2 hours (building the mapping table is the work) Dependencies: Task 2 (RedEdr events must flow)

Definition of Done

GreySec MAL is operational when:

All 4 critical bugs are fixed and verified
A known-malicious payload (ransomware_sim_v1.py) produces a Detection Score > 60
MITRE ATT&CK kill chain shows at least 3 techniques for that payload
A known-clean payload (notepad.exe) produces a Detection Score < 20
Analysis turnaround is < 5 minutes for a 1MB binary
Client upload portal accepts a binary via API and returns a job ID
Results are accessible via web dashboard within 5 minutes of upload
Skill file greysec-malware-pipeline exists and documents the full operational procedure
Time tracking is hooked into the pipeline (AI minutes logged to TIME-LOG)
gbrain logging is hooked into the pipeline (findings logged post-analysis)

DEBT (Action Items from This Kanban)

Action Item	Priority	Status	Notes
Fix VM share mount (Task 1)	CRITICAL	open	Do first — blocks all testing
Fix RedEdr zero events (Task 2)	CRITICAL	open	Do second — blocks reporting
Install Whiskers as Windows service (Task 3)	CRITICAL	open	Do third — blocks reliability
Fix manager.py timeout (Task 4)	DEGRADED	open	Do fourth
Build Detection Score algorithm (Task 5)	HIGH	open	Primary deliverable metric
Build web dashboard (Task 6)	HIGH	open	Client-facing UI
Build client upload portal (Task 7)	HIGH	open	API for clients
Build MITRE ATT&CK mapper (Task 8)	HIGH	open	Kill chain output
Write greysec-malware-pipeline skill	MEDIUM	open	Docs
Add TIME-LOG hook	MEDIUM	open	Cost tracking
Add gbrain logging hook	MEDIUM	open	Knowledge capture

14 KiB Raw Permalink Blame History

GreySec MAL — Master Kanban

Background

Pipeline Definition

Current State

What Works

What Is Broken

BOARD

BACKLOG

IN PROGRESS

VALIDATING

DONE

BLOCKED

Technical Fix Tasks

Task 1: Fix VM Share Mount (CRITICAL — do first)

Task 2: Fix RedEdr Zero Events (CRITICAL — do second)

Task 3: Install Whiskers as Windows Service (CRITICAL — do third)

Task 4: Fix manager.py Timeout Handler (DEGRADED — do fourth)

Product Build Tasks

Task 5: Detection Score Algorithm

Task 6: Web Dashboard

Task 7: Client Upload Portal

Task 8: MITRE ATT&CK Kill Chain Mapper

Definition of Done

DEBT (Action Items from This Kanban)

14 KiB

Raw Permalink Blame History