Files
malware-analysis-pipeline/TECHNICAL-RUNBOOK.md
T
2026-05-08 17:45:23 -05:00

11 KiB

GreySec MAL — Technical Runbook

Product: GreySec Malware Analysis Lab Version: 1.0 Updated: 2026-05-07 Parent: ~/greysec/tools/malware-analysis-pipeline/kanban.md


Architecture

                                    Docker Host (Linux)
                                    172.28.0.1
                                         │
                    ┌───────────────────[Docker Bridge: 172.28.0.0/24]────────────────────┐
                    │                                                                  │
            LitterBox API                                               Windows 11 VM
            :1337                                                      172.28.0.10
            (upload portal)                                                   │
            (orchestration)                                            Fibratus (kernel)
            (result storage)                                                 │
                                                                     Whiskers (:8080)
            RabbitMQ                                                      │
            :5672                                                        │
                 │                                                  RedEdr reporting
            variant_event_consumer                                            │
                 │                                                          │
            Supabase                                                       │
            (results DB)                                                    │
                 │                                                    [SHARE MOUNT]
            Web Dashboard                                              C:\analysis\

Key address: LitterBox API = http://172.28.0.1:1337 Key address: Whiskers (inside VM) = http://172.28.0.10:8080


Component Inventory

Component Location Port Purpose Credentials
LitterBox API ~/greysec/tools/LitterBox/ 1337 Upload portal + orchestration None (local)
RabbitMQ Docker container 5672 Event queue guest/guest (local)
variant_event_consumer ~/bin/greysec/variant_event_consumer.py Parse events → Supabase Via env
fibratus_rabbitmq_bridge ~/bin/greysec/fibratus_rabbitmq_bridge.py Bridge Fibratus to RabbitMQ Via env
Whiskers Inside Windows VM 8080 EDR REST API None
Fibratus Inside Windows VM Kernel event capture
RedEdr Inside Windows VM EDR reporting (RedEdr.exe)
Supabase Cloud (or local) 3000 Results database greysec-dev-key-2026
pre-flight-vm-check.sh ~/bin/greysec/pre-flight-vm-check.sh VM health check script

Prerequisites

Before running the pipeline:

  1. Docker daemon running on Linux host
  2. Windows 11 VM running at 172.28.0.10
  3. Kali container reachable from host
  4. Supabase accessible at localhost:3000 (or cloud)
  5. MacBook Ollama reachable at 100.127.137.64 (for AI augmentations)

Pre-Flight Checklist

Run before every session:

# 1. Check Docker containers
docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "litterbox|rabbitmq|fibratus"

# 2. Check VM is running
ping -c 2 172.28.0.10

# 3. Check Whiskers is up
curl -s http://172.28.0.10:8080/health
# Expected: {"status":"ok"} or similar

# 4. Check RabbitMQ is up
curl -s -u guest:guest http://localhost:15672/api/overview | jq '.queue_messages'
# Expected: {"count": N, "message": "ok"}

# 5. Check Supabase reachable
curl -s http://localhost:3000/health | jq '.status'
# Expected: "ready"

# 6. Check share mount from VM side (AFTER FIX — currently broken)
# From inside VM:
# curl -F "file=@test.exe" http://172.28.0.1:1337/upload

If any check fails, resolve before uploading payloads.


Startup Sequence

Run in order. Wait for each to be healthy before moving to the next.

# 1. Start Docker stack
cd ~/greysec/tools/LitterBox
docker-compose up -d
# Wait 30 seconds

# 2. Verify containers are up
docker ps | grep -E "litterbox|rabbitmq"

# 3. Start variant_event_consumer
cd ~/greysec/tools/LitterBox
python3 ~/bin/greysec/variant_event_consumer.py &
# Or use supervisor/systemd if running as service

# 4. Verify VM is running
ping -c 1 172.28.0.10

# 5. Start Whiskers (manual PAExec — until Task 3 is done)
# From inside VM or via PAExec:
# PAExec \\172.28.0.10 -u administrator -p [password] "C:\path\to\whiskers.exe"
# Until Task 3 is done, this is manual and needs to be redone after VM reboot

# 6. Verify Whiskers is responding
curl -s http://172.28.0.10:8080/health

# 7. Verify Fibratus is running inside VM
# On VM: sc query fibratus
# Should show RUNNING

# 8. Verify RabbitMQ connection from consumer
curl -s -u guest:guest http://localhost:15672/api/overview | jq '.message_stats'

Shutdown Sequence

# 1. Stop uploading new payloads (drain queue)
# Check RabbitMQ for pending messages
curl -s -u guest:guest http://localhost:15672/api/queues | jq '.[] | select(.messages > 0)'

# 2. Stop variant_event_consumer
pkill -f variant_event_consumer

# 3. Stop Whiskers (if Task 3 not done — manual)
# On VM: taskkill /IM whiskers.exe /F

# 4. Stop Docker containers
cd ~/greysec/tools/LitterBox
docker-compose down

# 5. Shutdown VM
# virsh shutdown greysec-win11
# or from inside VM: shutdown /s /t 0

Payload Upload Procedure

Via CLI (current method)

# Upload a payload
curl -X POST http://172.28.0.1:1337/upload \
  -F "file=@ransomware_sim_v1.py" \
  -F "timeout=30" \
  -F "metadata={\"name\":\"ransomware_sim_v1\",\"category\":\"test\",\"submitted_by\":\"operator\"}"

# Check job status
curl http://172.28.0.1:1337/jobs/[JOB_ID]

# Get results
curl http://172.28.0.1:1337/results/[JOB_ID]

Via Client Portal (future — Task 7)

# Authenticated upload (future)
curl -X POST https://[client-portal-host]/upload \
  -H "Authorization: Bearer [API_KEY]" \
  -F "file=@payload.exe" \
  -F "timeout=60"
# Returns job_id for polling

Reading the Results

Detection Score (0-100)

The primary deliverable metric.

Score Interpretation Action
0-20 Clean — no suspicious syscalls Deployable in most environments
21-40 Low — minor suspicious activity Review behavioral summary before deployment
41-60 Medium — multiple suspicious syscalls Modify payload or test in isolated environment
61-80 High — significant EDR coverage Likely to be blocked by most EDR products
81-100 Critical — extensive offensive tooling Not recommended for production use

MITRE ATT&CK Kill Chain

The sequence of ATT&CK tactics and techniques the payload used.

Format:

[1] T1086 — PowerShell: one-liner downloader
[2] T1055 — Process Injection: VirtualAllocEx + WriteProcessMemory
[3] T1055 — Process Injection: CreateRemoteThread
[4] T1105 — Ingress Tool Transfer: URLDownloadToFile

What to look for:

  • Technique count > 3: sophisticated payload
  • T1055 (Process Injection): likely evasion attempt
  • T1105 (Ingress Tool Transfer): network Indicators
  • T1486 (Data Encrypted for Impact): ransomware behavior

Behavioral Summary

Text summary of what the payload did:

  • File operations (created/modified/deleted)
  • Network operations (outbound connections, DNS queries)
  • Process operations (spawned children, injected into processes)
  • Registry operations (modified keys)

Troubleshooting Guide

Problem: Payload never starts processing

Symptoms: Upload returns 200 OK but no job in queue.

Diagnosis:

  1. Check share mount is reachable from VM (see Issue 1)
  2. Check curl -v http://172.28.0.1:1337/jobs — does job appear?
  3. Check LitterBox logs: docker logs litterbox-api

Fix order: Verify share mount → verify upload endpoint → check LitterBox logs


Problem: Payload killed at exactly 5 seconds

Symptoms: All payloads die at 5 seconds, regardless of timeout setting.

Diagnosis: This is Issue 4. Check manager.py lines 418-419.

grep -n "init_wait_time" ~/greysec/tools/LitterBox/app/analyzers/manager.py
# Should show hardcoded value = 5

Fix: Change to respect config.yaml value.


Problem: Whiskers endpoint returns 502 or timeout

Symptoms: curl http://172.28.0.10:8080/api/alerts/fibratus/since fails.

Diagnosis: Whiskers process died (Issue 3 — no keepalive).

Fix (immediate): PAExec back into VM and restart Whiskers. Fix (permanent): Task 3 — install as Windows service.


Problem: RedEdr report is empty despite real syscalls

Symptoms: Whiskers returns events but RedEdr shows nothing.

Diagnosis: This is Issue 2. Fibratus sees events but they don't reach the final report.

Fix: Trace the event path:

  1. Inside VM: run fibratus dump — are events being captured by Fibratus?
  2. curl http://172.28.0.10:8080/api/alerts/fibratus/since — does Whiskers see them?
  3. Check variant_event_consumer logs — is it receiving from RabbitMQ?
  4. Check Supabase malware_analyses table — are events stored?

Find the break point and fix at that layer.


Problem: RabbitMQ queue not draining

Symptoms: curl -u guest:guest http://localhost:15672/api/queues shows messages accumulating.

Diagnosis: variant_event_consumer is not running or is crashing on messages.

Fix:

# Restart consumer with verbose logging
python3 -v ~/bin/greysec/variant_event_consumer.py

# Check consumer is running
ps aux | grep variant_event_consumer

Problem: VM unreachable at 172.28.0.10

Symptoms: ping 172.28.0.10 fails.

Diagnosis: VM is down or Docker bridge network changed.

Fix:

# Check VM status
virsh list --all

# Restart VM
virsh start greysec-win11

# Verify Docker bridge
docker network inspect bridge | jq '.[0].IPAM.Config[0].Subnet'

Escalation Path

If you encounter any of these, ping @Adam immediately:

  1. VM will not start or boots to BSOD
  2. Docker stack fails to start after host reboot
  3. Supabase is unreachable and not recoverable within 5 minutes
  4. MacBook Ollama needs to be re-authenticated (token expired)
  5. Any of the 4 critical bugs cannot be resolved within 2 hours of focused work

Before escalating:

  • Document what you tried
  • Note exact error messages
  • Note which component is failing (ping the exact hop)

Format for escalation:

[@Adam] [COMPONENT] is broken: [ONE-LINE DESCRIPTION]
What I tried: [SHORT LIST]
Error: [EXACT ERROR]
Last working: [WHEN — if known]

Appendix: Test Payloads

Name Path Purpose Expected Behavior
ransomware_sim_v1.py ~/greysec/engagements/litterbox-fibratus-deploy/payloads/ Detection Score test 60-80 score, multiple ATT&CK techniques
ransomware_sim_v2.c ~/greysec/engagements/litterbox-fibratus-deploy/payloads/ Extended run test Run > 5 seconds, capture output
ransomware_sim_v3.c ~/greysec/engagements/litterbox-fibratus-deploy/payloads/ Extended run test Same as v2
calc.exe Windows system binary Clean baseline test Score < 20
notepad.exe Windows system binary Clean baseline test Score < 20