master
GreySec PHI Scanner
Detects Protected Health Information (PHI) across files, databases, and Windows hosts to support HIPAA Security Risk Assessments.
What It Scans
| Target | How | PHI Types |
|---|---|---|
| File systems | Regex + entropy scan | SSN, MRN, phone, email, DOB, license, account, URL |
| MSSQL / PostgreSQL | Direct SQL query + Presidio NLP | All PHI 18 identifiers |
| Windows hosts (remote) | SMB upload + WinRM exec + SMB download | File-based PHI patterns |
Quick Start
# Scan a directory
python3 -m greysec_phi_scanner scan /path/to/patient_data
# Scan a database
phi-scan scan --config configs/hq.yaml
# Generate HTML report
phi-scan report --results results.json -o report.html --client "Acme Hospital"
Architecture
phi-scanner/
├── src/greysec_phi_scanner/
│ ├── scanner.py # Core regex file scanner
│ ├── config.py # Pydantic config models
│ ├── cli.py # Typer CLI (scan/report/discover/inventory)
│ ├── db/
│ │ └── scanner.py # MSSQL + PostgreSQL scanning
│ ├── windows/
│ │ ├── winrm_scan.py # Remote Windows scan (SMB + WinRM)
│ │ └── host_detector.py # LDAP host discovery
│ ├── inventory/
│ │ └── db.py # SQLite inventory
│ └── reporting/
│ └── html_report.py # GreySec-branded HTML reports
├── test_data/ # Synthetic PHI for testing
└── docs/deployment.md # Multi-location deployment guide
Report Output
- Cover page with client name, date, classification
- Executive summary with KPI cards (HIGH/MED/LOW)
- Scope table with files scanned per source
- Findings by source with severity badges
- Risk & Impact narrative (no remediation — GreySec business rule)
- Appendix with full raw JSON data
Multi-Location Deployment
Each engagement location gets its own config.yaml with:
- Target-specific paths/credentials
- Environment variable
${VAR}for secrets - SQLite inventory at
~/.greysec/phi_inventory.db - Reports per location under
~/engagements/<client>/reports/
Description
GreySec PHI Scanner - HIPAA compliance scanning tool for PHI detection across databases and file systems
Languages
Python
97.4%
Cython
1.4%
C
0.8%
HTML
0.2%
C++
0.1%