Files

T

ghstshdw 0bd211c81d Initial commit: exploit-pipeline

2026-05-08 17:46:06 -05:00

12 KiB

Raw Blame History

GreySec RED — Master Kanban

Product: GreySec Exploit Development Pipeline Type: Internal Build Project Status: BUILDING Updated: 2026-05-07 Parent debrief: ~/greysec/ops/debriefs/exploit-lab-2026-05-07.md

Background

GreySec RED is an AI-augmented reverse engineering and exploit development lab. It takes a binary target, runs it through a two-agent pipeline (RE Agent + Exploit Writer), and produces a complete vulnerability brief plus a working exploit.

Architecture:

Binary Target → RE Agent (Kali + qwen2.5-coder:abliterator)
                   ↓
              analysis.md + struct.json
                   ↓
              Exploit Writer (Kali + qwen2.5-coder:abliterator)
                   ↓
              exploit.py + shellcode.bin + test-results.md

Current status: PARTIALLY OPERATIONAL. Works on easy binaries (stack0, stack1, format0). Fails silently on harder ones (heap0). Validation layer missing.

Pipeline Definition

What the product IS: Drop a binary. Get a vulnerability brief, a working exploit, and shellcode. No manual RE required.

What the client receives:

analysis.md — full vulnerability analysis: vulnerable function, offset calculation, attack vector, constraints
struct.json — structured vulnerability data: offset, return address, bad chars, suggested shellcode type
exploit.py — working pwntools exploit targeting the binary directly
shellcode.bin — position-independent shellcode for the target architecture
test-results.md — proof the exploit was run against the real binary and worked

Target buyer:

Security teams building internal red team toolchains
Exploit developers who need fast turnaround on binary targets
CTF players and competitive hacking teams needing rapid challenge solutions
Security researchers analyzing third-party binaries for CVEs

Current State

What Works

RE Agent on stack0, stack1, format0: analysis.md + struct.json correct, exploit.py written and correct
Agent scripts functional: re-agent.sh and exploit-writer.sh exist and execute
Directory structure clean: reports/, exploits/, agents/ properly organized
Model access confirmed: Kali container can reach MacBook Ollama at 100.127.137.64 (when SSH works)
Exploit approach correct: pwntools process/stdin for stack0, argv for stack1

What Is Broken or Missing

#	Issue	Severity	Fix Time	Status
1	`re-agent.sh` has no validation gate — if struct.json not produced, exploit-writer.sh runs blind	CRITICAL	30 min	Script updated — needs testing
2	`exploit-writer.sh` has no test loop — exploit.py never run against real binary	CRITICAL	30 min	Script updated — needs testing
3	No gbrain logging hooks — pipeline findings not captured to institutional knowledge	HIGH	15 min	Script updated — needs testing
4	No TIME-LOG hook — ~2.5 hours of AI time completely untracked	HIGH	15 min	Script updated — needs testing
5	heap0 RE Agent failed silently — no struct.json, exploit written from guesswork	HIGH	20 min	Needs re-run with fixed re-agent.sh
6	No shellcode.bin produced for any binary — referenced in scripts, never built	MEDIUM	30 min	Needs msfvenom or pwntools asm step
7	MacBook SSH blocked — password rejected twice — abliterator model unreachable	CRITICAL	TBD	Adam needs to fix SSH or Tailscale
8	No test-results.md for any binary — "verified" in kanban was false	HIGH	—	Fixes #1 and #2 address this
9	Skill file `greysec-exploit-lab` does not exist	MEDIUM	1 hour	Needs writing
10	`exploit.py` for vuln_test uses nonstandard "launcher" approach	MEDIUM	30 min	Needs rewrite as direct pwntools exploit

BOARD

BACKLOG

Write greysec-exploit-lab skill (standalone operational procedure)
Build shellcode.bin generation step (msfvenom or pwntools asm) for each binary
Rewrite vuln_test exploit.py as direct pwntools targeting, not C tempfile launcher
Add gbrain knowledge logging for each completed target (pipeline should auto-log findings)
Benchmark: how fast is the pipeline on intermediate binaries (heap2/3, format1-4, net0-4)?
Windows DLL analysis path — can the pipeline handle PE/DLL analysis?
ARM/IoT binary path — what changes needed for non-x86 targets?
Multi-arch support: x86, x64, ARM, MIPS — shellcode generation per arch

IN PROGRESS

Validate updated re-agent.sh and exploit-writer.sh on heap0
Unblock MacBook SSH (Adam's decision needed)

VALIDATING

(empty — waiting for heap0 re-run and MacBook SSH fix)

DONE

Agent pipeline architecture (RE Agent + Exploit Writer, two-stage)
Agent scripts written (re-agent.sh, exploit-writer.sh)
Directory structure created (reports/, exploits/, agents/)
Protostar binaries validated: stack0, stack1, format0 (3 of 5 complete)
Agent scripts updated with validation gates, gbrain hooks, TIME-LOG hooks

BLOCKED

MacBook SSH — abliterator model unreachable, all RE runs on cloud fallback
heap0 re-run — blocked by MacBook SSH (RE Agent works better with abliterator model)

Technical Fix Tasks

Task 1: Validate re-agent.sh + exploit-writer.sh on heap0 (HIGH PRIORITY)

What: The updated scripts have validation gates and test loops. Run them against heap0 to confirm they work.

Test procedure:

cd ~/greysec/engagements/exploit-lab
./agents/re-agent.sh heap0 /opt/protostar/bin/heap0
# Expected: analysis.md + struct.json produced
# Expected: FAIL and exit 1 if struct.json missing

./agents/exploit-writer.sh heap0
# Expected: exploit.py written
# Expected: exploit.py run against real binary
# Expected: test-results.md with PASS or FAIL

Acceptance criteria:

re-agent.sh exits 1 if struct.json not produced
exploit-writer.sh refuses to run if struct.json missing
exploit.py is run and result captured in test-results.md
Both scripts log to gbrain and TIME-LOG

Who: qwen2.5-coder:14b (can be self-verified) Time: 20-30 minutes

Task 2: Fix MacBook SSH Access (CRITICAL — Adam's Decision Needed)

What: Password SSH to adamsloggett@100.127.137.64 rejected twice. The abliterator model only lives on MacBook.

Option A — Fix the password: The current Mac password is V4sTGRZqm#dW5@aW (rotated 2026-05-05). Try from the Linux host directly to confirm whether this is a Tailscale SSH issue or a password issue:

ssh adamsloggett@100.127.137.64
# from the Linux host — not through Tailscale if possible

Option B — Use Tailscale SSH instead: Tailscale SSH (ssh -h tailscale adamsloggett@100.127.137.64) bypasses password auth using Tailscale's SSH certificate authority. This requires:

# On the Mac: enable Tailscale SSH
tailscale set --ssh

# On the Linux host: use Tailscale hostname
ssh adamsloggett@macbook-pro-2
# Or: ssh adamsloggett@100.127.137.64 -o "ProxyCommand tailscale ssh --bg %h"

Option C — Copy the abliterator model to Linux: If MacBook is unreachable, pull huihui_ai/qwen2.5-coder-abliterate:latest from MacBook's Ollama and host it on Linux. This requires enough disk space (~10GB).

Who: Adam needs to pick an option and act. Hermes will execute once the path is confirmed. Time: Option A or B: ~15 minutes. Option C: ~30 minutes.

Task 3: Re-Run heap0 with Fixed Scripts

What: heap0's RE Agent failed silently in the original run. With the validation gate in place, re-run it.

Procedure:

cd ~/greysec/engagements/exploit-lab
./agents/re-agent.sh heap0 /opt/protostar/bin/heap0
# Should produce struct.json or FAIL

./agents/exploit-writer.sh heap0
# Should produce tested exploit.py

Acceptance criteria:

struct.json produced for heap0
analysis.md accurate (verify offset and WINNER address)
exploit.py written and tested — PASS in test-results.md

Who: qwen2.5-coder:14b Time: 20-30 minutes

Task 4: Shellcode Generation Step

What: No shellcode.bin has ever been produced. The agent scripts reference it but the step doesn't exist.

Approach: Use msfvenom or pwntools' asm() to generate shellcode based on the binary architecture.

For Protostar binaries (x86, static):

# Example for stack0 (calls execve("/bin/sh"))
msfvenom -p linux/x86/exec CMD=/bin/sh -f raw -a x86 --platform linux

Or via pwntools in exploit.py:

from pwn import *
shellcode = asm(shellcraft.i386.linux.sh())

Files to touch: exploit-writer.sh — add shellcode generation as a post-exploit step.

Who: qwen2.5-coder:14b Time: 30 minutes to add the step and test on stack0

Task 5: Rewrite vuln_test exploit.py (MEDIUM)

What: The current vuln_test exploit.py uses a nonstandard "launcher" approach (compiles a C helper program inside a Python tempfile). Rewrite it as a direct pwntools process targeting the actual /tmp/vuln_test binary.

Why: The agent contract specifies pwntools process/exploit targeting the binary directly. The launcher approach is a workaround that suggests the model didn't fully understand how to exploit the binary via pwntools standard interface.

Who: qwen2.5-coder:14b Time: 30 minutes

Capability Expansion Tasks

Tier 1: Beginner (WHAT EXISTS)

Binaries: stack0, stack1, format0 (Protostar) Skills needed: Basic buffer overflow, format string exploitation Time per binary: ~20-30 minutes with pipeline Status: OPERATIONAL — 3 of 3 complete

Tier 2: Intermediate (IN BACKLOG)

Binaries: heap2, heap3, format1-4, net0-4 (Protostar) Skills needed: Heap grooming, UAF, fastbin dup, format string chaining Time per binary: ~40-60 minutes with pipeline What needs building: None — same pipeline, harder targets

Tier 3: Advanced (IN BACKLOG)

Binaries: Fusion (Web, HTTP, SQL, etc. — advanced Protostar) Skills needed: ROP chains, ASLR/DEP bypass, heaptechniques What needs building: ROP gadget finder integration, libc database lookup Time per binary: ~60-90 minutes with pipeline

Tier 4: Elite (FUTURE)

Targets: Real-world binaries, DLL analysis, kernel modules Skills needed: Full RE, CVE research, kernel exploitation What needs building: Windows VM path, DLL analysis pipeline, kernel debug setup

Product Tiers (Internal Planning)

Tier	Target	Output	Complexity	Time
Beginner	stack/heap/format (Protostar)	analysis + exploit	Easy	20-30 min
Intermediate	Protostar advanced, VWA	analysis + exploit + ROP	Medium	40-60 min
Advanced	real-world binaries	analysis + struct.json + suggested exploit path	Hard	60-120 min
Elite	0-day research	analysis only (no exploit — model limitations)	Expert	TBD

Definition of Done

GreySec RED is operational when:

re-agent.sh validates struct.json and exits non-zero if missing
exploit-writer.sh tests exploit.py against real binary and reports PASS/FAIL
gbrain logging is wired and firing after each completed target
TIME-LOG is updated after each pipeline run
heap0 re-run produces correct struct.json (validated against known values: offset 80, WINNER 0x08048464)
shellcode.bin is generated for at least stack0
All 5 Protostar binaries (stack0, stack1, format0, heap0, vuln_test) have PASS in test-results.md
MacBook SSH is unblocked and abliterator model is reachable
Skill file greysec-exploit-lab exists and documents operational procedure
At least one intermediate binary (heap2 or format1) has been processed end-to-end and PASS

DEBT (Action Items from This Kanban)

Action Item	Priority	Status	Notes
Validate updated scripts on heap0	CRITICAL	open	Confirm validation gates work
Unblock MacBook SSH	CRITICAL	blocked	Adam's decision needed
Re-run heap0 with fixed scripts	HIGH	open	After Task 1
Build shellcode.bin generation step	HIGH	open	msfvenom or pwntools asm
Rewrite vuln_test exploit.py	MEDIUM	open	Direct pwntools approach
Test intermediate binaries (heap2, format1)	MEDIUM	open	Pipeline validation
Write greysec-exploit-lab skill	MEDIUM	open	Operational docs
Add ROP gadget finder for advanced tier	LOW	backlog	Future
Validate pipeline against Windows DLL	LOW	backlog	Future

12 KiB Raw Blame History