# GreySec MAL — Technical Runbook
**Product:** GreySec Malware Analysis Lab
**Version:** 1.0
**Updated:** 2026-05-07
**Parent:** `~/greysec/tools/malware-analysis-pipeline/kanban.md`

---

## Architecture

```
                                    Docker Host (Linux)
                                    172.28.0.1
                                         │
                    ┌───────────────────[Docker Bridge: 172.28.0.0/24]────────────────────┐
                    │                                                                  │
            LitterBox API                                               Windows 11 VM
            :1337                                                      172.28.0.10
            (upload portal)                                                   │
            (orchestration)                                            Fibratus (kernel)
            (result storage)                                                 │
                                                                     Whiskers (:8080)
            RabbitMQ                                                      │
            :5672                                                        │
                 │                                                  RedEdr reporting
            variant_event_consumer                                            │
                 │                                                          │
            Supabase                                                       │
            (results DB)                                                    │
                 │                                                    [SHARE MOUNT]
            Web Dashboard                                              C:\analysis\
```

**Key address:** LitterBox API = `http://172.28.0.1:1337`
**Key address:** Whiskers (inside VM) = `http://172.28.0.10:8080`

---

## Component Inventory

| Component | Location | Port | Purpose | Credentials |
|-----------|----------|------|---------|-------------|
| LitterBox API | `~/greysec/tools/LitterBox/` | 1337 | Upload portal + orchestration | None (local) |
| RabbitMQ | Docker container | 5672 | Event queue | guest/guest (local) |
| variant_event_consumer | `~/bin/greysec/variant_event_consumer.py` | — | Parse events → Supabase | Via env |
| fibratus_rabbitmq_bridge | `~/bin/greysec/fibratus_rabbitmq_bridge.py` | — | Bridge Fibratus to RabbitMQ | Via env |
| Whiskers | Inside Windows VM | 8080 | EDR REST API | None |
| Fibratus | Inside Windows VM | — | Kernel event capture | — |
| RedEdr | Inside Windows VM | — | EDR reporting (RedEdr.exe) | — |
| Supabase | Cloud (or local) | 3000 | Results database | greysec-dev-key-2026 |
| pre-flight-vm-check.sh | `~/bin/greysec/pre-flight-vm-check.sh` | — | VM health check script | — |

---

## Prerequisites

Before running the pipeline:

1. Docker daemon running on Linux host
2. Windows 11 VM running at 172.28.0.10
3. Kali container reachable from host
4. Supabase accessible at localhost:3000 (or cloud)
5. MacBook Ollama reachable at 100.127.137.64 (for AI augmentations)

---

## Pre-Flight Checklist

Run before every session:

```bash
# 1. Check Docker containers
docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "litterbox|rabbitmq|fibratus"

# 2. Check VM is running
ping -c 2 172.28.0.10

# 3. Check Whiskers is up
curl -s http://172.28.0.10:8080/health
# Expected: {"status":"ok"} or similar

# 4. Check RabbitMQ is up
curl -s -u guest:guest http://localhost:15672/api/overview | jq '.queue_messages'
# Expected: {"count": N, "message": "ok"}

# 5. Check Supabase reachable
curl -s http://localhost:3000/health | jq '.status'
# Expected: "ready"

# 6. Check share mount from VM side (AFTER FIX — currently broken)
# From inside VM:
# curl -F "file=@test.exe" http://172.28.0.1:1337/upload
```

If any check fails, resolve before uploading payloads.

---

## Startup Sequence

Run in order. Wait for each to be healthy before moving to the next.

```bash
# 1. Start Docker stack
cd ~/greysec/tools/LitterBox
docker-compose up -d
# Wait 30 seconds

# 2. Verify containers are up
docker ps | grep -E "litterbox|rabbitmq"

# 3. Start variant_event_consumer
cd ~/greysec/tools/LitterBox
python3 ~/bin/greysec/variant_event_consumer.py &
# Or use supervisor/systemd if running as service

# 4. Verify VM is running
ping -c 1 172.28.0.10

# 5. Start Whiskers (manual PAExec — until Task 3 is done)
# From inside VM or via PAExec:
# PAExec \\172.28.0.10 -u administrator -p [password] "C:\path\to\whiskers.exe"
# Until Task 3 is done, this is manual and needs to be redone after VM reboot

# 6. Verify Whiskers is responding
curl -s http://172.28.0.10:8080/health

# 7. Verify Fibratus is running inside VM
# On VM: sc query fibratus
# Should show RUNNING

# 8. Verify RabbitMQ connection from consumer
curl -s -u guest:guest http://localhost:15672/api/overview | jq '.message_stats'
```

---

## Shutdown Sequence

```bash
# 1. Stop uploading new payloads (drain queue)
# Check RabbitMQ for pending messages
curl -s -u guest:guest http://localhost:15672/api/queues | jq '.[] | select(.messages > 0)'

# 2. Stop variant_event_consumer
pkill -f variant_event_consumer

# 3. Stop Whiskers (if Task 3 not done — manual)
# On VM: taskkill /IM whiskers.exe /F

# 4. Stop Docker containers
cd ~/greysec/tools/LitterBox
docker-compose down

# 5. Shutdown VM
# virsh shutdown greysec-win11
# or from inside VM: shutdown /s /t 0
```

---

## Payload Upload Procedure

### Via CLI (current method)

```bash
# Upload a payload
curl -X POST http://172.28.0.1:1337/upload \
  -F "file=@ransomware_sim_v1.py" \
  -F "timeout=30" \
  -F "metadata={\"name\":\"ransomware_sim_v1\",\"category\":\"test\",\"submitted_by\":\"operator\"}"

# Check job status
curl http://172.28.0.1:1337/jobs/[JOB_ID]

# Get results
curl http://172.28.0.1:1337/results/[JOB_ID]
```

### Via Client Portal (future — Task 7)

```bash
# Authenticated upload (future)
curl -X POST https://[client-portal-host]/upload \
  -H "Authorization: Bearer [API_KEY]" \
  -F "file=@payload.exe" \
  -F "timeout=60"
# Returns job_id for polling
```

---

## Reading the Results

### Detection Score (0-100)

The primary deliverable metric.

| Score | Interpretation | Action |
|-------|---------------|--------|
| 0-20 | Clean — no suspicious syscalls | Deployable in most environments |
| 21-40 | Low — minor suspicious activity | Review behavioral summary before deployment |
| 41-60 | Medium — multiple suspicious syscalls | Modify payload or test in isolated environment |
| 61-80 | High — significant EDR coverage | Likely to be blocked by most EDR products |
| 81-100 | Critical — extensive offensive tooling | Not recommended for production use |

### MITRE ATT&CK Kill Chain

The sequence of ATT&CK tactics and techniques the payload used.

**Format:**
```
[1] T1086 — PowerShell: one-liner downloader
[2] T1055 — Process Injection: VirtualAllocEx + WriteProcessMemory
[3] T1055 — Process Injection: CreateRemoteThread
[4] T1105 — Ingress Tool Transfer: URLDownloadToFile
```

**What to look for:**
- Technique count > 3: sophisticated payload
- T1055 (Process Injection): likely evasion attempt
- T1105 (Ingress Tool Transfer): network Indicators
- T1486 (Data Encrypted for Impact): ransomware behavior

### Behavioral Summary

Text summary of what the payload did:
- File operations (created/modified/deleted)
- Network operations (outbound connections, DNS queries)
- Process operations (spawned children, injected into processes)
- Registry operations (modified keys)

---

## Troubleshooting Guide

### Problem: Payload never starts processing

**Symptoms:** Upload returns 200 OK but no job in queue.

**Diagnosis:**
1. Check share mount is reachable from VM (see Issue 1)
2. Check `curl -v http://172.28.0.1:1337/jobs` — does job appear?
3. Check LitterBox logs: `docker logs litterbox-api`

**Fix order:** Verify share mount → verify upload endpoint → check LitterBox logs

---

### Problem: Payload killed at exactly 5 seconds

**Symptoms:** All payloads die at 5 seconds, regardless of timeout setting.

**Diagnosis:** This is Issue 4. Check `manager.py` lines 418-419.

```bash
grep -n "init_wait_time" ~/greysec/tools/LitterBox/app/analyzers/manager.py
# Should show hardcoded value = 5
```

**Fix:** Change to respect config.yaml value.

---

### Problem: Whiskers endpoint returns 502 or timeout

**Symptoms:** `curl http://172.28.0.10:8080/api/alerts/fibratus/since` fails.

**Diagnosis:** Whiskers process died (Issue 3 — no keepalive).

**Fix (immediate):** PAExec back into VM and restart Whiskers.
**Fix (permanent):** Task 3 — install as Windows service.

---

### Problem: RedEdr report is empty despite real syscalls

**Symptoms:** Whiskers returns events but RedEdr shows nothing.

**Diagnosis:** This is Issue 2. Fibratus sees events but they don't reach the final report.

**Fix:** Trace the event path:
1. Inside VM: run `fibratus dump` — are events being captured by Fibratus?
2. `curl http://172.28.0.10:8080/api/alerts/fibratus/since` — does Whiskers see them?
3. Check `variant_event_consumer` logs — is it receiving from RabbitMQ?
4. Check Supabase `malware_analyses` table — are events stored?

Find the break point and fix at that layer.

---

### Problem: RabbitMQ queue not draining

**Symptoms:** `curl -u guest:guest http://localhost:15672/api/queues` shows messages accumulating.

**Diagnosis:** `variant_event_consumer` is not running or is crashing on messages.

**Fix:**
```bash
# Restart consumer with verbose logging
python3 -v ~/bin/greysec/variant_event_consumer.py

# Check consumer is running
ps aux | grep variant_event_consumer
```

---

### Problem: VM unreachable at 172.28.0.10

**Symptoms:** `ping 172.28.0.10` fails.

**Diagnosis:** VM is down or Docker bridge network changed.

**Fix:**
```bash
# Check VM status
virsh list --all

# Restart VM
virsh start greysec-win11

# Verify Docker bridge
docker network inspect bridge | jq '.[0].IPAM.Config[0].Subnet'
```

---

## Escalation Path

**If you encounter any of these, ping @Adam immediately:**

1. VM will not start or boots to BSOD
2. Docker stack fails to start after host reboot
3. Supabase is unreachable and not recoverable within 5 minutes
4. MacBook Ollama needs to be re-authenticated (token expired)
5. Any of the 4 critical bugs cannot be resolved within 2 hours of focused work

**Before escalating:**
- Document what you tried
- Note exact error messages
- Note which component is failing (ping the exact hop)

**Format for escalation:**
```
[@Adam] [COMPONENT] is broken: [ONE-LINE DESCRIPTION]
What I tried: [SHORT LIST]
Error: [EXACT ERROR]
Last working: [WHEN — if known]
```

---

## Appendix: Test Payloads

| Name | Path | Purpose | Expected Behavior |
|------|------|---------|-------------------|
| ransomware_sim_v1.py | `~/greysec/engagements/litterbox-fibratus-deploy/payloads/` | Detection Score test | 60-80 score, multiple ATT&CK techniques |
| ransomware_sim_v2.c | `~/greysec/engagements/litterbox-fibratus-deploy/payloads/` | Extended run test | Run > 5 seconds, capture output |
| ransomware_sim_v3.c | `~/greysec/engagements/litterbox-fibratus-deploy/payloads/` | Extended run test | Same as v2 |
| calc.exe | Windows system binary | Clean baseline test | Score < 20 |
| notepad.exe | Windows system binary | Clean baseline test | Score < 20 |