Documentation
Complete guide to protecting your healthcare application from PII leaks
Quick Start (60 Seconds)
The fastest way to protect your healthcare application from leaking PII:
Get SDK Access
Install the SafeKey Lab Python SDK:
pip install safekeylab
Initialize Client
Set up your API key (get one from your dashboard):
from safekeylab import SafeKeyLabClient
# Use production API endpoint
client = SafeKeyLabClient(
api_key="sk-...",
base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)
Protect Your Data
Make your first API call:
response = client.protect_text(
"Patient John Doe, MRN 123456, DOB 01/15/1980"
)
print(response['redacted_text'])
# Output: "Patient [REDACTED], MRN [REDACTED], DOB [REDACTED]"
Installation
SafeKey Lab supports multiple programming languages and frameworks:
Python
pip install safekeylab
Node.js
npm install @safekeylab/sdk
Java
<dependency>
<groupId>com.safekeylab</groupId>
<artifactId>safekeylab-sdk</artifactId>
<version>1.0.0</version>
</dependency>
Authentication
All API requests require authentication using an API key. You can obtain your API key from the SafeKey Lab dashboard.
Using Environment Variables
export SAFEKEYLAB_API_KEY="sk-your-api-key"
Passing Directly to Client
from safekeylab import SafeKeyLabClient
client = SafeKeyLabClient(
api_key="sk-your-api-key",
base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)
Request Headers
When making direct HTTP requests, include your API key in the X-API-Key header:
X-API-Key: sk-your-api-key
Your First Request
After authentication, you're ready to make your first API request. Here's a complete example:
from safekeylab import SafeKeyLabClient
# Initialize the client
client = SafeKeyLabClient(
api_key="sk-your-api-key",
base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)
# Your first PII protection request
text_with_pii = """
Patient: John Doe
DOB: 01/15/1980
MRN: 123456789
Diagnosis: Type 2 Diabetes
Provider: Dr. Smith at Mayo Clinic
"""
# Protect the text
response = client.protect_text(text_with_pii)
print(response['redacted_text'])
# Output shows all PII replaced with [REDACTED] tags
# Get detailed detection results
if 'entities' in response:
for entity in response['entities']:
print(f"Found {entity['type']} at position {entity['start']}-{entity['end']}")
Response Structure
{
"redacted_text": "Patient: [REDACTED]\nDOB: [REDACTED]...",
"entities": [
{"type": "PERSON", "text": "John Doe", "start": 9, "end": 17},
{"type": "DATE", "text": "01/15/1980", "start": 23, "end": 33},
{"type": "MEDICAL_RECORD_NUMBER", "text": "123456789", "start": 39, "end": 48}
],
"statistics": {
"total_entities": 5,
"processing_time": 0.042
}
}
How It Works
SafeKey Lab uses a multi-layered approach to detect and protect PII in healthcare data:
1. Pattern Recognition
Our system uses advanced pattern matching to identify structured data like SSNs, MRNs, phone numbers, and dates. These patterns are continuously updated based on real-world healthcare data formats.
2. Named Entity Recognition (NER)
Machine learning models trained on millions of medical records identify entities like patient names, provider names, and facility names even when they don't follow standard patterns.
3. Context Analysis
The system understands medical context to differentiate between:
- Patient names vs. provider names
- Medical terms vs. personal identifiers
- Generic drug names vs. patient information
4. Redaction & Replacement
Once PII is identified, it's replaced with appropriate tags while maintaining document structure and readability for research purposes.
PII Detection Types
SafeKey Lab detects and redacts 18+ types of PII commonly found in healthcare data:
| Category | Types Detected | Example |
|---|---|---|
| Patient Identifiers | Name, MRN, SSN | John Doe, 123-45-6789 |
| Demographics | DOB, Age, Address | 01/15/1980, 123 Main St |
| Contact Info | Phone, Email, Fax | (555) 123-4567 |
| Medical Info | Provider, Facility, Device ID | Dr. Smith, Mayo Clinic |
| Financial | Insurance ID, Account | BCBS123456 |
Privacy Methods
SafeKey Lab offers multiple privacy protection methods to suit different use cases:
Redaction
Complete removal of PII, replaced with generic tags:
Input: "John Doe, born 01/15/1980" Output: "[REDACTED], born [REDACTED]"
Tokenization
Replace PII with reversible tokens for data that needs to be re-identified:
Input: "Patient John Doe, MRN 123456"
Output: "Patient TOKEN_PERSON_001, MRN TOKEN_MRN_001"
# Tokens can be reversed with proper authorization
client.detokenize("TOKEN_PERSON_001") # Returns "John Doe"
Synthetic Data Generation
Replace real PII with realistic but fake data for testing:
Input: "John Doe, SSN 123-45-6789" Output: "Sarah Johnson, SSN 987-65-4321" # Synthetic replacements
Differential Privacy
Add calibrated noise to aggregate data while preserving privacy:
# Protect aggregate statistics
response = client.protect_aggregate(
data=patient_demographics,
epsilon=1.0 # Privacy budget
)
Compliance
SafeKey Lab helps you meet and exceed healthcare regulatory requirements:
HIPAA Compliance
- Safe Harbor: Removes all 18 HIPAA identifiers
- Expert Determination: Statistical analysis to ensure re-identification risk < 0.01%
- Minimum Necessary: Only process and expose required data
- Audit Logs: Complete trail of all PHI access and modifications
GDPR Compliance
- Right to Erasure: Complete PII removal capabilities
- Data Minimization: Process only necessary data
- Privacy by Design: Built-in protection at every layer
- Data Portability: Export protected data in standard formats
State Regulations
- CCPA (California): Consumer privacy rights support
- BIPA (Illinois): Biometric data protection
- SHIELD Act (New York): Data breach notification
Certifications
- SOC 2 Type II certified
- ISO 27001:2013 compliant
- HITRUST CSF certified
- FedRAMP authorized (in process)
MIMIC Dataset Support
SafeKey Lab is specifically optimized for MIMIC-III and MIMIC-IV datasets with 99% accuracy:
from safekeylab import SafeKeyLabClient
client = SafeKeyLabClient(
api_key="sk-...",
base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)
# Process MIMIC discharge summary
with open('DISCHARGE_SUMMARY.txt', 'r') as f:
mimic_text = f.read()
response = client.protect_text(mimic_text)
# Save de-identified version
with open('DISCHARGE_SUMMARY_DEIDENTIFIED.txt', 'w') as f:
f.write(response['redacted_text'])
MIMIC-Specific Features
- Pre-trained on MIMIC discharge summaries and clinical notes
- Handles MIMIC-specific formatting and abbreviations
- Maintains clinical context while removing PII
- Compatible with PhysioNet data use agreements
EHR Systems Integration
SafeKey Lab seamlessly integrates with major Electronic Health Record systems to provide real-time PII protection:
Supported EHR Systems
- Epic: MyChart, Hyperspace, Caboodle
- Cerner: PowerChart, Millennium
- Athenahealth: athenaPractice, athenaOne
- Allscripts: Sunrise, TouchWorks
- NextGen: NextGen Office, NextGen Enterprise
- eClinicalWorks: Version 11+
Integration Methods
FHIR API Integration
Use FHIR R4 standards for modern EHR integration:
from safekeylab import SafeKeyLabClient
import fhirclient.models.patient as p
client = SafeKeyLabClient(
api_key="sk-...",
base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)
# Process FHIR patient resource
patient_data = p.Patient.read('patient-id', smart.server)
protected_data = client.protect_text(str(patient_data.as_json()))
# Protected data maintains FHIR structure
print(protected_data['redacted_text'])
HL7 Message Processing
Direct integration with HL7 v2.x messages:
# Process HL7 ADT message
hl7_message = """
MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|20240101000000||ADT^A01|1817457|P|2.5|
PID||0493575^^^2^ID 1|123456789|SMITH^JOHN^M||19800101|M||C|123 MAIN ST^^ANYTOWN^OH^12345|
"""
response = client.protect_text(
hl7_message,
format="hl7"
)
# Returns HL7 with PII redacted
print(response['redacted_text'])
Direct Database Integration
Connect directly to your EHR database with real-time protection:
# Configure database middleware
from safekeylab.middleware import EHRMiddleware
middleware = EHRMiddleware(
api_key="sk-...",
ehr_type="epic", # or 'cerner', 'athena', etc.
auto_protect=True
)
# All queries automatically protected
results = middleware.query(
"SELECT * FROM patient_records WHERE admission_date > '2024-01-01'"
)
# Results have PII automatically redacted
Compliance & Security
- HIPAA Compliant: Maintains audit logs for all PHI access
- BAA Available: Business Associate Agreement for covered entities
- Encryption: TLS 1.3 for data in transit, AES-256 for data at rest
- Access Controls: Role-based access with MFA support
- Audit Trail: Complete audit logs for compliance reporting
Implementation Timeline
Initial Setup (Day 1)
Configure API credentials and test connectivity
Integration Testing (Days 2-3)
Test with sample data and validate PII detection
Production Rollout (Day 4-5)
Deploy to production with monitoring
Multimodal Support
Process multiple file types with a single API:
Supported Formats
- Documents: PDF, DOCX, TXT, RTF
- Images: PNG, JPG, TIFF (for scanned documents)
- Medical: DICOM, HL7, FHIR
- Audio: MP3, WAV (transcription + redaction)
# Process multimodal content
response = client.protect_multimodal({
"type": "pdf",
"content": pdf_content_base64
})
# Process multiple items
response = client.batch_protect([
"Patient record 1: John Doe, DOB 01/15/1980",
"Patient record 2: Jane Smith, SSN 123-45-6789"
])
API Reference
Base URLs
# Production Endpoints (Both are fully operational) Azure: https://safekeylab-api-1054985024815.us-central1.run.app/v1 GCP: https://safekeylab-api-1054985024815.us-central1.run.app/v1
Endpoints
POST /protect
Redact PII from text content
curl -X POST https://safekeylab-api-1054985024815.us-central1.run.app/v1/protect \
-H "X-API-Key: sk-..." \
-H "Content-Type: application/json" \
-d '{
"text": "Patient John Doe, DOB 01/15/1980"
}'
POST /batch/protect
Batch process multiple texts
curl -X POST https://safekeylab-api-1054985024815.us-central1.run.app/v1/batch/protect \
-H "X-API-Key: sk-..." \
-H "Content-Type: application/json" \
-d '{
"texts": [
"Patient record 1",
"Patient record 2"
]
}'
GET /health
Check API health status
curl https://safekeylab-api-1054985024815.us-central1.run.app/v1/health
SDKs & Libraries
Native SDKs for all major programming languages and frameworks:
Official SDKs
Python SDK
# Installation
pip install safekeylab
# Usage
from safekeylab import SafeKeyLabClient
client = SafeKeyLabClient(
api_key="sk-your-api-key",
base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)
Node.js/TypeScript SDK
// Installation
npm install @safekeylab/sdk
// Usage
import { SafeKeyLabClient } from '@safekeylab/sdk';
const client = new SafeKeyLabClient({
apiKey: 'sk-your-api-key',
baseUrl: 'https://safekeylab-api-1054985024815.us-central1.run.app'
});
Go SDK
// Installation
go get github.com/safekeylab/safekeylab-go
// Usage
import "github.com/safekeylab/safekeylab-go"
client := safekeylab.NewClient(
"sk-your-api-key",
safekeylab.WithBaseURL("https://safekeylab-api-1054985024815.us-central1.run.app"),
)
Java SDK
// Maven
<dependency>
<groupId>com.safekeylab</groupId>
<artifactId>safekeylab-java</artifactId>
<version>1.0.0</version>
</dependency>
// Usage
import com.safekeylab.SafeKeyLabClient;
SafeKeyLabClient client = new SafeKeyLabClient.Builder()
.apiKey("sk-your-api-key")
.baseUrl("https://safekeylab-api-1054985024815.us-central1.run.app")
.build();
Framework Integrations
- Django: safekeylab-django middleware
- Flask: Flask-SafeKeyLab extension
- Express: express-safekeylab middleware
- Spring Boot: spring-boot-starter-safekeylab
- Rails: safekeylab-rails gem
Batch Processing
Process large volumes of data efficiently with batch operations:
Batch Text Protection
from safekeylab import SafeKeyLabClient
client = SafeKeyLabClient(
api_key="sk-your-api-key",
base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)
# Process multiple texts in one request
texts = [
"Patient John Doe, MRN 123456",
"Jane Smith, SSN 987-65-4321",
"Dr. Johnson at Mayo Clinic"
]
response = client.batch_protect(texts)
for i, result in enumerate(response['results']):
print(f"Text {i+1}: {result['redacted_text']}")
Batch File Processing
# Process multiple files
import os
files_directory = "/path/to/medical/records"
results = []
for filename in os.listdir(files_directory):
with open(os.path.join(files_directory, filename), 'r') as f:
result = client.protect_text(f.read())
results.append({
'filename': filename,
'redacted': result['redacted_text']
})
Async Batch Processing
For large datasets, use async processing with callbacks:
# Submit batch job
job = client.create_batch_job(
files=["file1.txt", "file2.txt", "file3.txt"],
callback_url="https://your-app.com/webhook"
)
print(f"Job ID: {job['id']}")
print(f"Status: {job['status']}")
# Check job status
status = client.get_job_status(job['id'])
print(f"Progress: {status['processed']}/{status['total']}")
Performance Guidelines
- Batch size: Up to 1000 items per request
- File size: Up to 50MB per file
- Throughput: 10,000+ records/minute
- Parallel processing: Up to 100 concurrent requests
Webhooks
Receive real-time notifications for async operations and events:
Setting Up Webhooks
# Configure webhook endpoint
client.configure_webhook({
'url': 'https://your-app.com/webhooks/safekeylab',
'events': ['job.completed', 'job.failed', 'compliance.alert'],
'secret': 'your-webhook-secret'
})
Webhook Events
| Event | Description | Payload |
|---|---|---|
| job.completed | Batch job finished successfully | Job ID, results URL, statistics |
| job.failed | Batch job encountered error | Job ID, error message, partial results |
| compliance.alert | Compliance issue detected | Alert type, affected data, recommendations |
| quota.warning | API quota threshold reached | Current usage, limit, reset time |
Webhook Security
Verify webhook signatures to ensure requests are from SafeKey Lab:
import hmac
import hashlib
def verify_webhook(request):
signature = request.headers.get('X-SafeKeyLab-Signature')
body = request.body
expected = hmac.new(
webhook_secret.encode(),
body.encode(),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(signature, expected)
Retry Policy
- Initial retry: 5 seconds
- Max retries: 5 attempts
- Backoff: Exponential (5s, 10s, 20s, 40s, 80s)
- Timeout: 30 seconds per request
Monitoring & Analytics
Track your PII protection metrics in real-time through the SafeKey Lab dashboard:
- API call volume and latency
- PII detection accuracy metrics
- Data processing volumes
- Compliance audit logs
- Cost optimization insights