321 lines
14 KiB
Markdown
321 lines
14 KiB
Markdown
|
|
# GreySec MAL — Master Kanban
|
||
|
|
**Product:** GreySec Malware Analysis Lab
|
||
|
|
**Type:** Internal Build Project
|
||
|
|
**Status:** BUILDING
|
||
|
|
**Updated:** 2026-05-07
|
||
|
|
**Parent debrief:** `~/greysec/ops/debriefs/malware-lab-2026-05-07.md`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Background
|
||
|
|
|
||
|
|
GreySec MAL is a self-hosted malware analysis sandbox for red team operators. It takes a binary payload, detonates it in an isolated Windows 11 VM instrumented with EDR (Fibratus + Whiskers + RedEdr), captures behavioral events via RabbitMQ, and produces a client-facing analysis report with a Detection Score (0-100) and MITRE ATT&CK kill chain map.
|
||
|
|
|
||
|
|
**Architecture:**
|
||
|
|
```
|
||
|
|
Payload Upload → LitterBox (:1337) → SMB Share Mount → Windows VM (:1337)
|
||
|
|
↓
|
||
|
|
Fibratus (kernel events)
|
||
|
|
Whiskers (REST API :8080)
|
||
|
|
RedEdr (EDR reporting)
|
||
|
|
↓
|
||
|
|
RabbitMQ (event queue)
|
||
|
|
↓
|
||
|
|
variant_event_consumer (Python)
|
||
|
|
↓
|
||
|
|
Supabase (structured data)
|
||
|
|
↓
|
||
|
|
Detection Score + MITRE ATT&CK Report
|
||
|
|
```
|
||
|
|
|
||
|
|
**Current status:** ARCHITECTURE VERIFIED. 4 critical bugs block end-to-end operation. Fix order is strict.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Pipeline Definition
|
||
|
|
|
||
|
|
**What the product IS:**
|
||
|
|
Drop a binary. Get a Detection Score + MITRE ATT&CK kill chain. Client data never leaves your infrastructure.
|
||
|
|
|
||
|
|
**What the client receives:**
|
||
|
|
- Detection Score (0-100) — how likely this payload is to be flagged by EDR
|
||
|
|
- MITRE ATT&CK kill chain map — which tactics and techniques the payload uses
|
||
|
|
- Behavioral analysis summary — what the payload actually did (file ops, network ops, process ops)
|
||
|
|
- Raw event log (optional) — full Fibratus event stream for manual review
|
||
|
|
|
||
|
|
**Target buyer:**
|
||
|
|
- Red team operators testing C2 payloads before deployment
|
||
|
|
- MSSPs running adversary simulation for clients
|
||
|
|
- Security teams with HIPAA/BAA obligations that prevent cloud malware analysis
|
||
|
|
- Law firms and financial institutions with strict client confidentiality requirements
|
||
|
|
|
||
|
|
**SLA (target):**
|
||
|
|
- Analysis turnaround: < 5 minutes for typical payloads (< 10MB)
|
||
|
|
- Report available: via web dashboard or API
|
||
|
|
- Uptime: 99% (target, TBD with Adam)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Current State
|
||
|
|
|
||
|
|
### What Works
|
||
|
|
- v1 Python payload: ran for 16 seconds, generated real EDR events, Fibratus saw them, Whiskers returned them via `/api/alerts/fibratus/since` — core event path verified
|
||
|
|
- RabbitMQ → variant_event_consumer → Supabase: working
|
||
|
|
- Docker-compose stack: LitterBox, RabbitMQ, Fibratus bridge, consumer all start cleanly
|
||
|
|
- Pre-flight check script exists at `~/bin/greysec/pre-flight-vm-check.sh` (not yet run in a session)
|
||
|
|
|
||
|
|
### What Is Broken
|
||
|
|
|
||
|
|
| # | Bug | Severity | Fix Time | Cascade |
|
||
|
|
|---|-----|----------|----------|---------|
|
||
|
|
| 1 | VM share mount `\\172.28.0.1\share` unreachable from Windows VM — payloads may not reach analysis dir | CRITICAL | 30 min | Blocks all testing |
|
||
|
|
| 2 | RedEdr returns zero events despite Fibratus seeing real syscalls — event data doesn't reach final report | CRITICAL | 30-60 min | Blocks EDR validation |
|
||
|
|
| 3 | Whiskers has no Windows service wrapper — dies when parent process exits, requires manual PAExec restart | CRITICAL | 1 hour | Blocks reliability |
|
||
|
|
| 4 | manager.py lines 418-419 hardcodes `init_wait_time = 5` regardless of config — payloads killed at 5s | DEGRADED | 30 min | Blocks extended runs |
|
||
|
|
|
||
|
|
**Fix order:** 1 → 2 → 3 → 4. Issue 4 is blocked by Issue 1 (can't test 4 until share mount works).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## BOARD
|
||
|
|
|
||
|
|
### BACKLOG
|
||
|
|
|
||
|
|
- [ ] Build Detection Score algorithm (0-100 from Fibratus event frequency + severity + MITRE technique count)
|
||
|
|
- [ ] Build web dashboard for results (currently Supabase only — no client-facing UI)
|
||
|
|
- [ ] Build client upload portal (currently manual `curl` to localhost:1337)
|
||
|
|
- [ ] Build MITRE ATT&CK kill chain mapper (Fibratus events → ATT&CK tactic/technique IDs)
|
||
|
|
- [ ] Write `greysec-malware-pipeline` skill (standalone — not yet created)
|
||
|
|
- [ ] Add payload hardening guidance output (what to change in the binary to lower Detection Score)
|
||
|
|
- [ ] Set up TLS for LitterBox API (currently plain HTTP — fine for internal, not for client-facing portal)
|
||
|
|
- [ ] Build multi-user access control (when portal is client-facing, need auth)
|
||
|
|
- [ ] Benchmark performance: typical payload analysis time, max payload size, concurrent analysis capacity
|
||
|
|
|
||
|
|
### IN PROGRESS
|
||
|
|
|
||
|
|
_(empty — no work currently active)_
|
||
|
|
|
||
|
|
### VALIDATING
|
||
|
|
|
||
|
|
_(empty)_
|
||
|
|
|
||
|
|
### DONE
|
||
|
|
|
||
|
|
- [x] Architecture design (RabbitMQ + Fibratus + Whiskers + Supabase)
|
||
|
|
- [x] Docker-compose stack (LitterBox + RabbitMQ + bridges)
|
||
|
|
- [x] v1 Python payload proves end-to-end event path
|
||
|
|
- [x] Pre-flight VM check script written (`~/bin/greysec/pre-flight-vm-check.sh`)
|
||
|
|
- [x] Supabase schema for analysis results
|
||
|
|
|
||
|
|
### BLOCKED
|
||
|
|
|
||
|
|
- [ ] **ISSUE 1: VM share mount** — Cannot test payloads until SMB share is reachable from inside VM
|
||
|
|
- [ ] **ISSUE 2: RedEdr zero events** — Cannot validate EDR reporting until share mount works
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Technical Fix Tasks
|
||
|
|
|
||
|
|
### Task 1: Fix VM Share Mount (CRITICAL — do first)
|
||
|
|
|
||
|
|
**What:** `\\172.28.0.1\share` (SMB) not reachable from inside Windows VM at 172.28.0.10
|
||
|
|
|
||
|
|
**Root cause:** Docker bridge network (172.28.0.0/24) may not be attached to VM network interface. SMB port 445 may be blocked by Windows Firewall.
|
||
|
|
|
||
|
|
**Fix approach A:** Verify Docker bridge attachment and open Windows Firewall for SMB.
|
||
|
|
**Fix approach B (preferred):** Replace SMB mount with HTTP upload endpoint inside VM — more reliable across Docker bridge, no firewall holes.
|
||
|
|
|
||
|
|
**Files to touch:**
|
||
|
|
- `~/greysec/tools/LitterBox/docker-compose.yml` (change mount mechanism)
|
||
|
|
- May need new endpoint in `~/greysec/tools/LitterBox/app/analyzers/payload_receiver.py`
|
||
|
|
|
||
|
|
**Who:** qwen2.5-coder:14b
|
||
|
|
**Time:** ~30 minutes
|
||
|
|
**Verification:** From inside VM: `curl -F "file=@test.exe" http://172.28.0.1:PORT/upload` returns 200
|
||
|
|
|
||
|
|
**Acceptance criteria:**
|
||
|
|
- VM can reach LitterBox upload endpoint
|
||
|
|
- Payload file appears in VM analysis directory
|
||
|
|
- LitterBox begins processing within 10 seconds of upload
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Task 2: Fix RedEdr Zero Events (CRITICAL — do second)
|
||
|
|
|
||
|
|
**What:** Fibratus sees real syscalls. Whiskers `/api/alerts/fibratus/since` returns events. But RedEdr report shows nothing.
|
||
|
|
|
||
|
|
**Root cause:** Trace path: Fibratus writes to Windows Application Event Log → Whiskers reads via `wevtutil` → publishes over HTTP → consumer receives. Something breaks between Whiskers and final report.
|
||
|
|
|
||
|
|
**Fix approach:**
|
||
|
|
1. Check Fibratus filter rules — are they capturing the right event types?
|
||
|
|
2. Check Whiskers polling interval — is it fast enough?
|
||
|
|
3. Check `variant_event_consumer.py` — is it parsing Whiskers output correctly?
|
||
|
|
4. Run a known-syscall payload and trace events at each hop
|
||
|
|
|
||
|
|
**Files to touch:**
|
||
|
|
- `~/bin/greysec/fibratus_rabbitmq_bridge.py`
|
||
|
|
- `~/bin/greysec/variant_event_consumer.py`
|
||
|
|
- Fibratus config `~/greysec/tools/fibratus/config.yaml`
|
||
|
|
|
||
|
|
**Who:** qwen2.5-coder:14b
|
||
|
|
**Time:** ~30-60 minutes (diagnosis + fix)
|
||
|
|
**Verification:** Run ransomware_sim_v1.py payload → confirm events in RedEdr report, not just Whiskers endpoint
|
||
|
|
|
||
|
|
**Acceptance criteria:**
|
||
|
|
- Payload makes real OpenProcess/CreateFile syscalls
|
||
|
|
- Fibratus events appear in Whiskers `/api/alerts/fibratus/since` output
|
||
|
|
- Events are parsed and stored in Supabase
|
||
|
|
- RedEdr-format report shows the events with correct timestamps
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Task 3: Install Whiskers as Windows Service (CRITICAL — do third)
|
||
|
|
|
||
|
|
**What:** Whiskers dies when PAExec parent exits. No persistence across VM restart or process crash.
|
||
|
|
|
||
|
|
**Fix:** Install Whiskers as a Windows service using `nssm` (Non-Sucking Service Manager) or `instsrv`.
|
||
|
|
|
||
|
|
**Files to touch:**
|
||
|
|
- VM-side setup: install nssm, run `nssm install Whiskers "C:\path\to\whiskers.exe" "--port 8080"`
|
||
|
|
|
||
|
|
**Who:** qwen2.5-coder:14b
|
||
|
|
**Time:** ~1 hour
|
||
|
|
**Verification:** Reboot VM → wait 5 minutes → confirm Whiskers still reachable at `http://172.28.0.10:8080/api/alerts/fibratus/since`
|
||
|
|
|
||
|
|
**Acceptance criteria:**
|
||
|
|
- Whiskers survives VM reboot without manual intervention
|
||
|
|
- Whiskers survives its own parent process exiting
|
||
|
|
- Health check `curl http://172.28.0.10:8080/health` returns 200
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Task 4: Fix manager.py Timeout Handler (DEGRADED — do fourth)
|
||
|
|
|
||
|
|
**What:** `~/greysec/tools/LitterBox/app/analyzers/manager.py` lines 418-419 hardcode `init_wait_time = 5` in the `"terminated after"` error handler, overriding `config.yaml`.
|
||
|
|
|
||
|
|
**Fix:** Change `init_wait_time = 5` to `init_wait_time = config.get('wait_time', 15)` or similar.
|
||
|
|
|
||
|
|
**Files to touch:**
|
||
|
|
- `~/greysec/tools/LitterBox/app/analyzers/manager.py` (lines ~418-419)
|
||
|
|
|
||
|
|
**Who:** qwen2.5-coder:14b
|
||
|
|
**Time:** ~30 minutes
|
||
|
|
**Verification:** Set `wait_time: 30` in config.yaml → run a 20-second payload → confirm it runs for 20+ seconds, not 5
|
||
|
|
|
||
|
|
**Acceptance criteria:**
|
||
|
|
- Config value respected, not hardcoded fallback
|
||
|
|
- C payloads (v2, v3) that need > 5 seconds run to completion
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Product Build Tasks
|
||
|
|
|
||
|
|
### Task 5: Detection Score Algorithm
|
||
|
|
|
||
|
|
**What:** The primary client deliverable. A score from 0-100 that rates how likely this payload is to be detected by EDR.
|
||
|
|
|
||
|
|
**Approach:** Combine:
|
||
|
|
- Event count: how many syscalls per minute
|
||
|
|
- Event severity: which syscalls (OpenProcess = medium, VirtualAlloc + WriteProcess = high)
|
||
|
|
- MITRE technique count: how many distinct ATT&CK techniques used
|
||
|
|
- Network indicators: outbound connections = higher score
|
||
|
|
- Process injection indicators: highest score
|
||
|
|
|
||
|
|
**Output:** JSON field in Supabase + dashboard display
|
||
|
|
**Formula (target):** `score = min(100, (event_count * 0.1) + (technique_count * 15) + (severity_multiplier * 20) + (network_indicator * 25))`
|
||
|
|
|
||
|
|
**Who:** qwen2.5-coder:14b or glm-5.1:cloud for algorithm design
|
||
|
|
**Time:** ~2 hours
|
||
|
|
**Verification:** Run 3 known-clean files (calc.exe, notepad.exe) → score < 20. Run ransomware_sim payload → score > 60.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Task 6: Web Dashboard
|
||
|
|
|
||
|
|
**What:** Client-facing results dashboard. Currently Supabase only — no UI.
|
||
|
|
|
||
|
|
**Stack:** TBD (recommend: Simple Python Flask or FastAPI + HTMX for simplicity, or integrate into existing GreySec dashboard)
|
||
|
|
|
||
|
|
**Pages:**
|
||
|
|
- Upload page: drag-and-drop binary, job ID returned
|
||
|
|
- Results page: Detection Score, MITRE kill chain visualization, behavioral summary
|
||
|
|
- History: past analyses for the client's org
|
||
|
|
|
||
|
|
**Who:** qwen2.5-coder:14b (or Adam if design decision needed)
|
||
|
|
**Time:** ~4 hours
|
||
|
|
**Dependencies:** Task 1, 2, 5 complete first
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Task 7: Client Upload Portal
|
||
|
|
|
||
|
|
**What:** Authenticated API endpoint for clients to submit binaries. Currently manual `curl` to localhost.
|
||
|
|
|
||
|
|
**Features:**
|
||
|
|
- API key auth per client org
|
||
|
|
- File type validation (.exe, .dll, .bin, .ps1, .py)
|
||
|
|
- Max file size: 50MB
|
||
|
|
- Sandbox: each org gets isolated analysis environment (future scope — V1 is shared infra)
|
||
|
|
|
||
|
|
**Files to touch:**
|
||
|
|
- `~/greysec/tools/LitterBox/app/analyzers/payload_receiver.py` (new endpoints)
|
||
|
|
- `~/greysec/tools/LitterBox/Config/config.yaml` (API key config)
|
||
|
|
|
||
|
|
**Who:** qwen2.5-coder:14b
|
||
|
|
**Time:** ~2 hours
|
||
|
|
**Dependencies:** Task 1 (share mount fix) must be complete
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Task 8: MITRE ATT&CK Kill Chain Mapper
|
||
|
|
|
||
|
|
**What:** Map Fibratus syscall events to MITRE ATT&CK tactic and technique IDs automatically.
|
||
|
|
|
||
|
|
**Approach:** Build a mapping table:
|
||
|
|
- `NtOpenProcess` → T1086 (PowerShell), T1055 (Process Injection)
|
||
|
|
- `NtCreateFile` on sensitive paths → T1005 (Data from System Files)
|
||
|
|
- `VirtualAllocEx` + `WriteProcessMemory` → T1055 (Process Injection)
|
||
|
|
- `CreateRemoteThread` → T1055 (Process Injection)
|
||
|
|
- ` RegSetValue` → T1112 (Modify Registry)
|
||
|
|
- `URLDownloadToFile` → T1105 (Ingress Tool Transfer)
|
||
|
|
|
||
|
|
**Output:** Kill chain visualization (text or SVG) showing sequence of ATT&CK techniques used
|
||
|
|
**Files to touch:** `~/bin/greysec/variant_event_consumer.py` (add mapping logic)
|
||
|
|
|
||
|
|
**Who:** qwen2.5-coder:14b
|
||
|
|
**Time:** ~2 hours (building the mapping table is the work)
|
||
|
|
**Dependencies:** Task 2 (RedEdr events must flow)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Definition of Done
|
||
|
|
|
||
|
|
GreySec MAL is operational when:
|
||
|
|
1. All 4 critical bugs are fixed and verified
|
||
|
|
2. A known-malicious payload (ransomware_sim_v1.py) produces a Detection Score > 60
|
||
|
|
3. MITRE ATT&CK kill chain shows at least 3 techniques for that payload
|
||
|
|
4. A known-clean payload (notepad.exe) produces a Detection Score < 20
|
||
|
|
5. Analysis turnaround is < 5 minutes for a 1MB binary
|
||
|
|
6. Client upload portal accepts a binary via API and returns a job ID
|
||
|
|
7. Results are accessible via web dashboard within 5 minutes of upload
|
||
|
|
8. Skill file `greysec-malware-pipeline` exists and documents the full operational procedure
|
||
|
|
9. Time tracking is hooked into the pipeline (AI minutes logged to TIME-LOG)
|
||
|
|
10. gbrain logging is hooked into the pipeline (findings logged post-analysis)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## DEBT (Action Items from This Kanban)
|
||
|
|
|
||
|
|
| Action Item | Priority | Status | Notes |
|
||
|
|
|------------|----------|--------|-------|
|
||
|
|
| Fix VM share mount (Task 1) | CRITICAL | open | Do first — blocks all testing |
|
||
|
|
| Fix RedEdr zero events (Task 2) | CRITICAL | open | Do second — blocks reporting |
|
||
|
|
| Install Whiskers as Windows service (Task 3) | CRITICAL | open | Do third — blocks reliability |
|
||
|
|
| Fix manager.py timeout (Task 4) | DEGRADED | open | Do fourth |
|
||
|
|
| Build Detection Score algorithm (Task 5) | HIGH | open | Primary deliverable metric |
|
||
|
|
| Build web dashboard (Task 6) | HIGH | open | Client-facing UI |
|
||
|
|
| Build client upload portal (Task 7) | HIGH | open | API for clients |
|
||
|
|
| Build MITRE ATT&CK mapper (Task 8) | HIGH | open | Kill chain output |
|
||
|
|
| Write greysec-malware-pipeline skill | MEDIUM | open | Docs |
|
||
|
|
| Add TIME-LOG hook | MEDIUM | open | Cost tracking |
|
||
|
|
| Add gbrain logging hook | MEDIUM | open | Knowledge capture |
|