Files
phi-scanner/README.md
T
2026-05-08 17:44:26 -05:00

2.2 KiB

GreySec PHI Scanner

Detects Protected Health Information (PHI) across files, databases, and Windows hosts to support HIPAA Security Risk Assessments.

What It Scans

Target How PHI Types
File systems Regex + entropy scan SSN, MRN, phone, email, DOB, license, account, URL
MSSQL / PostgreSQL Direct SQL query + Presidio NLP All PHI 18 identifiers
Windows hosts (remote) SMB upload + WinRM exec + SMB download File-based PHI patterns

Quick Start

# Scan a directory
python3 -m greysec_phi_scanner scan /path/to/patient_data

# Scan a database
phi-scan scan --config configs/hq.yaml

# Generate HTML report
phi-scan report --results results.json -o report.html --client "Acme Hospital"

Architecture

phi-scanner/
├── src/greysec_phi_scanner/
│   ├── scanner.py         # Core regex file scanner
│   ├── config.py          # Pydantic config models
│   ├── cli.py             # Typer CLI (scan/report/discover/inventory)
│   ├── db/
│   │   └── scanner.py     # MSSQL + PostgreSQL scanning
│   ├── windows/
│   │   ├── winrm_scan.py  # Remote Windows scan (SMB + WinRM)
│   │   └── host_detector.py # LDAP host discovery
│   ├── inventory/
│   │   └── db.py          # SQLite inventory
│   └── reporting/
│       └── html_report.py # GreySec-branded HTML reports
├── test_data/             # Synthetic PHI for testing
└── docs/deployment.md     # Multi-location deployment guide

Report Output

  • Cover page with client name, date, classification
  • Executive summary with KPI cards (HIGH/MED/LOW)
  • Scope table with files scanned per source
  • Findings by source with severity badges
  • Risk & Impact narrative (no remediation — GreySec business rule)
  • Appendix with full raw JSON data

Multi-Location Deployment

Each engagement location gets its own config.yaml with:

  • Target-specific paths/credentials
  • Environment variable ${VAR} for secrets
  • SQLite inventory at ~/.greysec/phi_inventory.db
  • Reports per location under ~/engagements/<client>/reports/