Files
2026-05-08 17:46:06 -05:00

12 KiB

GreySec RED — Master Kanban

Product: GreySec Exploit Development Pipeline Type: Internal Build Project Status: BUILDING Updated: 2026-05-07 Parent debrief: ~/greysec/ops/debriefs/exploit-lab-2026-05-07.md


Background

GreySec RED is an AI-augmented reverse engineering and exploit development lab. It takes a binary target, runs it through a two-agent pipeline (RE Agent + Exploit Writer), and produces a complete vulnerability brief plus a working exploit.

Architecture:

Binary Target → RE Agent (Kali + qwen2.5-coder:abliterator)
                   ↓
              analysis.md + struct.json
                   ↓
              Exploit Writer (Kali + qwen2.5-coder:abliterator)
                   ↓
              exploit.py + shellcode.bin + test-results.md

Current status: PARTIALLY OPERATIONAL. Works on easy binaries (stack0, stack1, format0). Fails silently on harder ones (heap0). Validation layer missing.


Pipeline Definition

What the product IS: Drop a binary. Get a vulnerability brief, a working exploit, and shellcode. No manual RE required.

What the client receives:

  • analysis.md — full vulnerability analysis: vulnerable function, offset calculation, attack vector, constraints
  • struct.json — structured vulnerability data: offset, return address, bad chars, suggested shellcode type
  • exploit.py — working pwntools exploit targeting the binary directly
  • shellcode.bin — position-independent shellcode for the target architecture
  • test-results.md — proof the exploit was run against the real binary and worked

Target buyer:

  • Security teams building internal red team toolchains
  • Exploit developers who need fast turnaround on binary targets
  • CTF players and competitive hacking teams needing rapid challenge solutions
  • Security researchers analyzing third-party binaries for CVEs

Current State

What Works

  • RE Agent on stack0, stack1, format0: analysis.md + struct.json correct, exploit.py written and correct
  • Agent scripts functional: re-agent.sh and exploit-writer.sh exist and execute
  • Directory structure clean: reports/, exploits/, agents/ properly organized
  • Model access confirmed: Kali container can reach MacBook Ollama at 100.127.137.64 (when SSH works)
  • Exploit approach correct: pwntools process/stdin for stack0, argv for stack1

What Is Broken or Missing

# Issue Severity Fix Time Status
1 re-agent.sh has no validation gate — if struct.json not produced, exploit-writer.sh runs blind CRITICAL 30 min Script updated — needs testing
2 exploit-writer.sh has no test loop — exploit.py never run against real binary CRITICAL 30 min Script updated — needs testing
3 No gbrain logging hooks — pipeline findings not captured to institutional knowledge HIGH 15 min Script updated — needs testing
4 No TIME-LOG hook — ~2.5 hours of AI time completely untracked HIGH 15 min Script updated — needs testing
5 heap0 RE Agent failed silently — no struct.json, exploit written from guesswork HIGH 20 min Needs re-run with fixed re-agent.sh
6 No shellcode.bin produced for any binary — referenced in scripts, never built MEDIUM 30 min Needs msfvenom or pwntools asm step
7 MacBook SSH blocked — password rejected twice — abliterator model unreachable CRITICAL TBD Adam needs to fix SSH or Tailscale
8 No test-results.md for any binary — "verified" in kanban was false HIGH Fixes #1 and #2 address this
9 Skill file greysec-exploit-lab does not exist MEDIUM 1 hour Needs writing
10 exploit.py for vuln_test uses nonstandard "launcher" approach MEDIUM 30 min Needs rewrite as direct pwntools exploit

BOARD

BACKLOG

  • Write greysec-exploit-lab skill (standalone operational procedure)
  • Build shellcode.bin generation step (msfvenom or pwntools asm) for each binary
  • Rewrite vuln_test exploit.py as direct pwntools targeting, not C tempfile launcher
  • Add gbrain knowledge logging for each completed target (pipeline should auto-log findings)
  • Benchmark: how fast is the pipeline on intermediate binaries (heap2/3, format1-4, net0-4)?
  • Windows DLL analysis path — can the pipeline handle PE/DLL analysis?
  • ARM/IoT binary path — what changes needed for non-x86 targets?
  • Multi-arch support: x86, x64, ARM, MIPS — shellcode generation per arch

IN PROGRESS

  • Validate updated re-agent.sh and exploit-writer.sh on heap0
  • Unblock MacBook SSH (Adam's decision needed)

VALIDATING

(empty — waiting for heap0 re-run and MacBook SSH fix)

DONE

  • Agent pipeline architecture (RE Agent + Exploit Writer, two-stage)
  • Agent scripts written (re-agent.sh, exploit-writer.sh)
  • Directory structure created (reports/, exploits/, agents/)
  • Protostar binaries validated: stack0, stack1, format0 (3 of 5 complete)
  • Agent scripts updated with validation gates, gbrain hooks, TIME-LOG hooks

BLOCKED

  • MacBook SSH — abliterator model unreachable, all RE runs on cloud fallback
  • heap0 re-run — blocked by MacBook SSH (RE Agent works better with abliterator model)

Technical Fix Tasks

Task 1: Validate re-agent.sh + exploit-writer.sh on heap0 (HIGH PRIORITY)

What: The updated scripts have validation gates and test loops. Run them against heap0 to confirm they work.

Test procedure:

cd ~/greysec/engagements/exploit-lab
./agents/re-agent.sh heap0 /opt/protostar/bin/heap0
# Expected: analysis.md + struct.json produced
# Expected: FAIL and exit 1 if struct.json missing

./agents/exploit-writer.sh heap0
# Expected: exploit.py written
# Expected: exploit.py run against real binary
# Expected: test-results.md with PASS or FAIL

Acceptance criteria:

  • re-agent.sh exits 1 if struct.json not produced
  • exploit-writer.sh refuses to run if struct.json missing
  • exploit.py is run and result captured in test-results.md
  • Both scripts log to gbrain and TIME-LOG

Who: qwen2.5-coder:14b (can be self-verified) Time: 20-30 minutes


Task 2: Fix MacBook SSH Access (CRITICAL — Adam's Decision Needed)

What: Password SSH to adamsloggett@100.127.137.64 rejected twice. The abliterator model only lives on MacBook.

Option A — Fix the password: The current Mac password is V4sTGRZqm#dW5@aW (rotated 2026-05-05). Try from the Linux host directly to confirm whether this is a Tailscale SSH issue or a password issue:

ssh adamsloggett@100.127.137.64
# from the Linux host — not through Tailscale if possible

Option B — Use Tailscale SSH instead: Tailscale SSH (ssh -h tailscale adamsloggett@100.127.137.64) bypasses password auth using Tailscale's SSH certificate authority. This requires:

# On the Mac: enable Tailscale SSH
tailscale set --ssh

# On the Linux host: use Tailscale hostname
ssh adamsloggett@macbook-pro-2
# Or: ssh adamsloggett@100.127.137.64 -o "ProxyCommand tailscale ssh --bg %h"

Option C — Copy the abliterator model to Linux: If MacBook is unreachable, pull huihui_ai/qwen2.5-coder-abliterate:latest from MacBook's Ollama and host it on Linux. This requires enough disk space (~10GB).

Who: Adam needs to pick an option and act. Hermes will execute once the path is confirmed. Time: Option A or B: ~15 minutes. Option C: ~30 minutes.


Task 3: Re-Run heap0 with Fixed Scripts

What: heap0's RE Agent failed silently in the original run. With the validation gate in place, re-run it.

Procedure:

cd ~/greysec/engagements/exploit-lab
./agents/re-agent.sh heap0 /opt/protostar/bin/heap0
# Should produce struct.json or FAIL

./agents/exploit-writer.sh heap0
# Should produce tested exploit.py

Acceptance criteria:

  • struct.json produced for heap0
  • analysis.md accurate (verify offset and WINNER address)
  • exploit.py written and tested — PASS in test-results.md

Who: qwen2.5-coder:14b Time: 20-30 minutes


Task 4: Shellcode Generation Step

What: No shellcode.bin has ever been produced. The agent scripts reference it but the step doesn't exist.

Approach: Use msfvenom or pwntools' asm() to generate shellcode based on the binary architecture.

For Protostar binaries (x86, static):

# Example for stack0 (calls execve("/bin/sh"))
msfvenom -p linux/x86/exec CMD=/bin/sh -f raw -a x86 --platform linux

Or via pwntools in exploit.py:

from pwn import *
shellcode = asm(shellcraft.i386.linux.sh())

Files to touch: exploit-writer.sh — add shellcode generation as a post-exploit step.

Who: qwen2.5-coder:14b Time: 30 minutes to add the step and test on stack0


Task 5: Rewrite vuln_test exploit.py (MEDIUM)

What: The current vuln_test exploit.py uses a nonstandard "launcher" approach (compiles a C helper program inside a Python tempfile). Rewrite it as a direct pwntools process targeting the actual /tmp/vuln_test binary.

Why: The agent contract specifies pwntools process/exploit targeting the binary directly. The launcher approach is a workaround that suggests the model didn't fully understand how to exploit the binary via pwntools standard interface.

Who: qwen2.5-coder:14b Time: 30 minutes


Capability Expansion Tasks

Tier 1: Beginner (WHAT EXISTS)

Binaries: stack0, stack1, format0 (Protostar) Skills needed: Basic buffer overflow, format string exploitation Time per binary: ~20-30 minutes with pipeline Status: OPERATIONAL — 3 of 3 complete

Tier 2: Intermediate (IN BACKLOG)

Binaries: heap2, heap3, format1-4, net0-4 (Protostar) Skills needed: Heap grooming, UAF, fastbin dup, format string chaining Time per binary: ~40-60 minutes with pipeline What needs building: None — same pipeline, harder targets

Tier 3: Advanced (IN BACKLOG)

Binaries: Fusion (Web, HTTP, SQL, etc. — advanced Protostar) Skills needed: ROP chains, ASLR/DEP bypass, heaptechniques What needs building: ROP gadget finder integration, libc database lookup Time per binary: ~60-90 minutes with pipeline

Tier 4: Elite (FUTURE)

Targets: Real-world binaries, DLL analysis, kernel modules Skills needed: Full RE, CVE research, kernel exploitation What needs building: Windows VM path, DLL analysis pipeline, kernel debug setup


Product Tiers (Internal Planning)

Tier Target Output Complexity Time
Beginner stack/heap/format (Protostar) analysis + exploit Easy 20-30 min
Intermediate Protostar advanced, VWA analysis + exploit + ROP Medium 40-60 min
Advanced real-world binaries analysis + struct.json + suggested exploit path Hard 60-120 min
Elite 0-day research analysis only (no exploit — model limitations) Expert TBD

Definition of Done

GreySec RED is operational when:

  1. re-agent.sh validates struct.json and exits non-zero if missing
  2. exploit-writer.sh tests exploit.py against real binary and reports PASS/FAIL
  3. gbrain logging is wired and firing after each completed target
  4. TIME-LOG is updated after each pipeline run
  5. heap0 re-run produces correct struct.json (validated against known values: offset 80, WINNER 0x08048464)
  6. shellcode.bin is generated for at least stack0
  7. All 5 Protostar binaries (stack0, stack1, format0, heap0, vuln_test) have PASS in test-results.md
  8. MacBook SSH is unblocked and abliterator model is reachable
  9. Skill file greysec-exploit-lab exists and documents operational procedure
  10. At least one intermediate binary (heap2 or format1) has been processed end-to-end and PASS

DEBT (Action Items from This Kanban)

Action Item Priority Status Notes
Validate updated scripts on heap0 CRITICAL open Confirm validation gates work
Unblock MacBook SSH CRITICAL blocked Adam's decision needed
Re-run heap0 with fixed scripts HIGH open After Task 1
Build shellcode.bin generation step HIGH open msfvenom or pwntools asm
Rewrite vuln_test exploit.py MEDIUM open Direct pwntools approach
Test intermediate binaries (heap2, format1) MEDIUM open Pipeline validation
Write greysec-exploit-lab skill MEDIUM open Operational docs
Add ROP gadget finder for advanced tier LOW backlog Future
Validate pipeline against Windows DLL LOW backlog Future