Documentation

Complete guide to protecting your healthcare application from PII leaks

⚡ Quick Integration:
Most developers integrate SafeKey Lab in under 60 seconds. Just one API call between your app and your data processing pipeline.

Quick Start (60 Seconds)

The fastest way to protect your healthcare application from leaking PII:

1

Get SDK Access

Install the SafeKey Lab Python SDK:

pip install safekeylab
2

Initialize Client

Set up your API key (get one from your dashboard):

from safekeylab import SafeKeyLabClient

# Use production API endpoint
client = SafeKeyLabClient(
    api_key="sk-...",
    base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)
3

Protect Your Data

Make your first API call:

response = client.protect_text(
    "Patient John Doe, MRN 123456, DOB 01/15/1980"
)

print(response['redacted_text'])
# Output: "Patient [REDACTED], MRN [REDACTED], DOB [REDACTED]"

Installation

SafeKey Lab supports multiple programming languages and frameworks:

Python

pip install safekeylab

Node.js

npm install @safekeylab/sdk

Java

<dependency>
    <groupId>com.safekeylab</groupId>
    <artifactId>safekeylab-sdk</artifactId>
    <version>1.0.0</version>
</dependency>

Authentication

All API requests require authentication using an API key. You can obtain your API key from the SafeKey Lab dashboard.

Using Environment Variables

export SAFEKEYLAB_API_KEY="sk-your-api-key"

Passing Directly to Client

from safekeylab import SafeKeyLabClient

client = SafeKeyLabClient(
    api_key="sk-your-api-key",
    base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)

Request Headers

When making direct HTTP requests, include your API key in the X-API-Key header:

X-API-Key: sk-your-api-key

Your First Request

After authentication, you're ready to make your first API request. Here's a complete example:

from safekeylab import SafeKeyLabClient

# Initialize the client
client = SafeKeyLabClient(
    api_key="sk-your-api-key",
    base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)

# Your first PII protection request
text_with_pii = """
Patient: John Doe
DOB: 01/15/1980
MRN: 123456789
Diagnosis: Type 2 Diabetes
Provider: Dr. Smith at Mayo Clinic
"""

# Protect the text
response = client.protect_text(text_with_pii)

print(response['redacted_text'])
# Output shows all PII replaced with [REDACTED] tags

# Get detailed detection results
if 'entities' in response:
    for entity in response['entities']:
        print(f"Found {entity['type']} at position {entity['start']}-{entity['end']}")

Response Structure

{
    "redacted_text": "Patient: [REDACTED]\nDOB: [REDACTED]...",
    "entities": [
        {"type": "PERSON", "text": "John Doe", "start": 9, "end": 17},
        {"type": "DATE", "text": "01/15/1980", "start": 23, "end": 33},
        {"type": "MEDICAL_RECORD_NUMBER", "text": "123456789", "start": 39, "end": 48}
    ],
    "statistics": {
        "total_entities": 5,
        "processing_time": 0.042
    }
}

How It Works

SafeKey Lab uses a multi-layered approach to detect and protect PII in healthcare data:

1. Pattern Recognition

Our system uses advanced pattern matching to identify structured data like SSNs, MRNs, phone numbers, and dates. These patterns are continuously updated based on real-world healthcare data formats.

2. Named Entity Recognition (NER)

Machine learning models trained on millions of medical records identify entities like patient names, provider names, and facility names even when they don't follow standard patterns.

3. Context Analysis

The system understands medical context to differentiate between:

  • Patient names vs. provider names
  • Medical terms vs. personal identifiers
  • Generic drug names vs. patient information

4. Redaction & Replacement

Once PII is identified, it's replaced with appropriate tags while maintaining document structure and readability for research purposes.

🔒 Zero-Trust Architecture
Data is processed in memory only, never stored. Each request is isolated in its own secure container with automatic cleanup after processing.

PII Detection Types

SafeKey Lab detects and redacts 18+ types of PII commonly found in healthcare data:

Category Types Detected Example
Patient Identifiers Name, MRN, SSN John Doe, 123-45-6789
Demographics DOB, Age, Address 01/15/1980, 123 Main St
Contact Info Phone, Email, Fax (555) 123-4567
Medical Info Provider, Facility, Device ID Dr. Smith, Mayo Clinic
Financial Insurance ID, Account BCBS123456

Privacy Methods

SafeKey Lab offers multiple privacy protection methods to suit different use cases:

Redaction

Complete removal of PII, replaced with generic tags:

Input: "John Doe, born 01/15/1980"
Output: "[REDACTED], born [REDACTED]"

Tokenization

Replace PII with reversible tokens for data that needs to be re-identified:

Input: "Patient John Doe, MRN 123456"
Output: "Patient TOKEN_PERSON_001, MRN TOKEN_MRN_001"

# Tokens can be reversed with proper authorization
client.detokenize("TOKEN_PERSON_001") # Returns "John Doe"

Synthetic Data Generation

Replace real PII with realistic but fake data for testing:

Input: "John Doe, SSN 123-45-6789"
Output: "Sarah Johnson, SSN 987-65-4321"  # Synthetic replacements

Differential Privacy

Add calibrated noise to aggregate data while preserving privacy:

# Protect aggregate statistics
response = client.protect_aggregate(
    data=patient_demographics,
    epsilon=1.0  # Privacy budget
)

Compliance

SafeKey Lab helps you meet and exceed healthcare regulatory requirements:

HIPAA Compliance

  • Safe Harbor: Removes all 18 HIPAA identifiers
  • Expert Determination: Statistical analysis to ensure re-identification risk < 0.01%
  • Minimum Necessary: Only process and expose required data
  • Audit Logs: Complete trail of all PHI access and modifications

GDPR Compliance

  • Right to Erasure: Complete PII removal capabilities
  • Data Minimization: Process only necessary data
  • Privacy by Design: Built-in protection at every layer
  • Data Portability: Export protected data in standard formats

State Regulations

  • CCPA (California): Consumer privacy rights support
  • BIPA (Illinois): Biometric data protection
  • SHIELD Act (New York): Data breach notification

Certifications

  • SOC 2 Type II certified
  • ISO 27001:2013 compliant
  • HITRUST CSF certified
  • FedRAMP authorized (in process)
📋 Business Associate Agreement (BAA)
We provide BAAs for all covered entities. Contact [email protected] to request your BAA.

MIMIC Dataset Support

SafeKey Lab is specifically optimized for MIMIC-III and MIMIC-IV datasets with 99% accuracy:

from safekeylab import SafeKeyLabClient

client = SafeKeyLabClient(
    api_key="sk-...",
    base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)

# Process MIMIC discharge summary
with open('DISCHARGE_SUMMARY.txt', 'r') as f:
    mimic_text = f.read()

response = client.protect_text(mimic_text)

# Save de-identified version
with open('DISCHARGE_SUMMARY_DEIDENTIFIED.txt', 'w') as f:
    f.write(response['redacted_text'])

MIMIC-Specific Features

  • Pre-trained on MIMIC discharge summaries and clinical notes
  • Handles MIMIC-specific formatting and abbreviations
  • Maintains clinical context while removing PII
  • Compatible with PhysioNet data use agreements

EHR Systems Integration

SafeKey Lab seamlessly integrates with major Electronic Health Record systems to provide real-time PII protection:

Supported EHR Systems

  • Epic: MyChart, Hyperspace, Caboodle
  • Cerner: PowerChart, Millennium
  • Athenahealth: athenaPractice, athenaOne
  • Allscripts: Sunrise, TouchWorks
  • NextGen: NextGen Office, NextGen Enterprise
  • eClinicalWorks: Version 11+

Integration Methods

FHIR API Integration

Use FHIR R4 standards for modern EHR integration:

from safekeylab import SafeKeyLabClient
import fhirclient.models.patient as p

client = SafeKeyLabClient(
    api_key="sk-...",
    base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)

# Process FHIR patient resource
patient_data = p.Patient.read('patient-id', smart.server)
protected_data = client.protect_text(str(patient_data.as_json()))

# Protected data maintains FHIR structure
print(protected_data['redacted_text'])

HL7 Message Processing

Direct integration with HL7 v2.x messages:

# Process HL7 ADT message
hl7_message = """
MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|20240101000000||ADT^A01|1817457|P|2.5|
PID||0493575^^^2^ID 1|123456789|SMITH^JOHN^M||19800101|M||C|123 MAIN ST^^ANYTOWN^OH^12345|
"""

response = client.protect_text(
    hl7_message,
    format="hl7"
)

# Returns HL7 with PII redacted
print(response['redacted_text'])

Direct Database Integration

Connect directly to your EHR database with real-time protection:

# Configure database middleware
from safekeylab.middleware import EHRMiddleware

middleware = EHRMiddleware(
    api_key="sk-...",
    ehr_type="epic",  # or 'cerner', 'athena', etc.
    auto_protect=True
)

# All queries automatically protected
results = middleware.query(
    "SELECT * FROM patient_records WHERE admission_date > '2024-01-01'"
)
# Results have PII automatically redacted

Compliance & Security

  • HIPAA Compliant: Maintains audit logs for all PHI access
  • BAA Available: Business Associate Agreement for covered entities
  • Encryption: TLS 1.3 for data in transit, AES-256 for data at rest
  • Access Controls: Role-based access with MFA support
  • Audit Trail: Complete audit logs for compliance reporting

Implementation Timeline

1

Initial Setup (Day 1)

Configure API credentials and test connectivity

2

Integration Testing (Days 2-3)

Test with sample data and validate PII detection

3

Production Rollout (Day 4-5)

Deploy to production with monitoring

🏥 Healthcare-Specific Features
SafeKey Lab understands medical terminology, drug names, procedure codes, and maintains clinical context while removing PII. This ensures your medical records remain useful for research and analysis while protecting patient privacy.

Multimodal Support

Process multiple file types with a single API:

Supported Formats

  • Documents: PDF, DOCX, TXT, RTF
  • Images: PNG, JPG, TIFF (for scanned documents)
  • Medical: DICOM, HL7, FHIR
  • Audio: MP3, WAV (transcription + redaction)
# Process multimodal content
response = client.protect_multimodal({
    "type": "pdf",
    "content": pdf_content_base64
})

# Process multiple items
response = client.batch_protect([
    "Patient record 1: John Doe, DOB 01/15/1980",
    "Patient record 2: Jane Smith, SSN 123-45-6789"
])

API Reference

Base URLs

# Production Endpoints (Both are fully operational)
Azure: https://safekeylab-api-1054985024815.us-central1.run.app/v1
GCP: https://safekeylab-api-1054985024815.us-central1.run.app/v1

Endpoints

POST /protect

Redact PII from text content

curl -X POST https://safekeylab-api-1054985024815.us-central1.run.app/v1/protect \
  -H "X-API-Key: sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Patient John Doe, DOB 01/15/1980"
  }'

POST /batch/protect

Batch process multiple texts

curl -X POST https://safekeylab-api-1054985024815.us-central1.run.app/v1/batch/protect \
  -H "X-API-Key: sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "texts": [
      "Patient record 1",
      "Patient record 2"
    ]
  }'

GET /health

Check API health status

curl https://safekeylab-api-1054985024815.us-central1.run.app/v1/health

SDKs & Libraries

Native SDKs for all major programming languages and frameworks:

Official SDKs

Python SDK

# Installation
pip install safekeylab

# Usage
from safekeylab import SafeKeyLabClient

client = SafeKeyLabClient(
    api_key="sk-your-api-key",
    base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)

Node.js/TypeScript SDK

// Installation
npm install @safekeylab/sdk

// Usage
import { SafeKeyLabClient } from '@safekeylab/sdk';

const client = new SafeKeyLabClient({
    apiKey: 'sk-your-api-key',
    baseUrl: 'https://safekeylab-api-1054985024815.us-central1.run.app'
});

Go SDK

// Installation
go get github.com/safekeylab/safekeylab-go

// Usage
import "github.com/safekeylab/safekeylab-go"

client := safekeylab.NewClient(
    "sk-your-api-key",
    safekeylab.WithBaseURL("https://safekeylab-api-1054985024815.us-central1.run.app"),
)

Java SDK

// Maven
<dependency>
    <groupId>com.safekeylab</groupId>
    <artifactId>safekeylab-java</artifactId>
    <version>1.0.0</version>
</dependency>

// Usage
import com.safekeylab.SafeKeyLabClient;

SafeKeyLabClient client = new SafeKeyLabClient.Builder()
    .apiKey("sk-your-api-key")
    .baseUrl("https://safekeylab-api-1054985024815.us-central1.run.app")
    .build();

Framework Integrations

  • Django: safekeylab-django middleware
  • Flask: Flask-SafeKeyLab extension
  • Express: express-safekeylab middleware
  • Spring Boot: spring-boot-starter-safekeylab
  • Rails: safekeylab-rails gem

Batch Processing

Process large volumes of data efficiently with batch operations:

Batch Text Protection

from safekeylab import SafeKeyLabClient

client = SafeKeyLabClient(
    api_key="sk-your-api-key",
    base_url="https://safekeylab-api-1054985024815.us-central1.run.app"
)

# Process multiple texts in one request
texts = [
    "Patient John Doe, MRN 123456",
    "Jane Smith, SSN 987-65-4321",
    "Dr. Johnson at Mayo Clinic"
]

response = client.batch_protect(texts)

for i, result in enumerate(response['results']):
    print(f"Text {i+1}: {result['redacted_text']}")

Batch File Processing

# Process multiple files
import os

files_directory = "/path/to/medical/records"
results = []

for filename in os.listdir(files_directory):
    with open(os.path.join(files_directory, filename), 'r') as f:
        result = client.protect_text(f.read())
        results.append({
            'filename': filename,
            'redacted': result['redacted_text']
        })

Async Batch Processing

For large datasets, use async processing with callbacks:

# Submit batch job
job = client.create_batch_job(
    files=["file1.txt", "file2.txt", "file3.txt"],
    callback_url="https://your-app.com/webhook"
)

print(f"Job ID: {job['id']}")
print(f"Status: {job['status']}")

# Check job status
status = client.get_job_status(job['id'])
print(f"Progress: {status['processed']}/{status['total']}")

Performance Guidelines

  • Batch size: Up to 1000 items per request
  • File size: Up to 50MB per file
  • Throughput: 10,000+ records/minute
  • Parallel processing: Up to 100 concurrent requests

Webhooks

Receive real-time notifications for async operations and events:

Setting Up Webhooks

# Configure webhook endpoint
client.configure_webhook({
    'url': 'https://your-app.com/webhooks/safekeylab',
    'events': ['job.completed', 'job.failed', 'compliance.alert'],
    'secret': 'your-webhook-secret'
})

Webhook Events

Event Description Payload
job.completed Batch job finished successfully Job ID, results URL, statistics
job.failed Batch job encountered error Job ID, error message, partial results
compliance.alert Compliance issue detected Alert type, affected data, recommendations
quota.warning API quota threshold reached Current usage, limit, reset time

Webhook Security

Verify webhook signatures to ensure requests are from SafeKey Lab:

import hmac
import hashlib

def verify_webhook(request):
    signature = request.headers.get('X-SafeKeyLab-Signature')
    body = request.body

    expected = hmac.new(
        webhook_secret.encode(),
        body.encode(),
        hashlib.sha256
    ).hexdigest()

    return hmac.compare_digest(signature, expected)

Retry Policy

  • Initial retry: 5 seconds
  • Max retries: 5 attempts
  • Backoff: Exponential (5s, 10s, 20s, 40s, 80s)
  • Timeout: 30 seconds per request

Monitoring & Analytics

Track your PII protection metrics in real-time through the SafeKey Lab dashboard:

  • API call volume and latency
  • PII detection accuracy metrics
  • Data processing volumes
  • Compliance audit logs
  • Cost optimization insights
📊 Enterprise Dashboard
Access detailed analytics, audit logs, and compliance reports through your SafeKey Lab dashboard at https://www.safekeylab.com