Disaster Recovery & Attack Prevention
This page covers what to do when things go wrong — and how the MPC architecture and layered encryption policies make most attacks structurally infeasible.
Emergency Private Key Recovery
Recovery of the raw private key is a last-resort operation used only for migration to a new system, legal holds, or catastrophic infrastructure loss. It requires quorum approval and is irreversible if deleteAfterRecover: true is set.
import { WorkspaceClient, ComponentModule } from 'caller-sdk';
const workspace = new WorkspaceClient({ apiKey: process.env.WR_API_KEY! });
async function emergencyRecoverPrivateKey(
keyId: string,
quorumApprovers: string[], // logged for audit — minimum 2
): Promise<string> {
if (quorumApprovers.length < 2) {
throw new Error('Recovery requires at least 2 quorum approvers');
}
// Log BEFORE proceeding — the record must exist even if recovery fails
await audit.log({
event: 'PRIVATE_KEY_RECOVERY_INITIATED',
keyId,
approvers: quorumApprovers,
timestamp: new Date().toISOString(),
});
// Combines threshold shares from 2+ nodes to reconstruct the raw private key.
// This is the ONLY White Rabbit operation that produces plaintext key material.
const { privateKey } = await workspace
.call(ComponentModule.RECOVER_PRIVATE_KEY, {
keyId,
// deleteAfterRecover: true — irreversible; only set once you are certain
// the node shares are no longer needed.
})
.promise();
await audit.log({
event: 'PRIVATE_KEY_RECOVERY_COMPLETED',
keyId,
approvers: quorumApprovers,
timestamp: new Date().toISOString(),
});
// Transfer to HSM immediately — never log, never write to disk in plaintext.
return privateKey;
}
RECOVER_PRIVATE_KEY exposes raw key materialThe privateKey in the output is a plaintext hex-encoded private key. Handle it accordingly:
- Never log it — exclude from structured logging pipelines
- Never write it to disk in plaintext — transfer directly into an HSM or re-derive wallets from it and destroy the value immediately
- Treat the run stage log as sensitive — the output appears in White Rabbit run logs; restrict access via RBAC
- Set
deleteAfterRecover: trueonly once you are certain the node shares are no longer needed
All of the following must be satisfied before executing:
- Written approval from 2-of-3 Custodians (signed email or DocuSign)
- Legal sign-off if driven by a regulatory request
- A designated receiving HSM ready to import the key within 60 seconds
- The full event recorded in an immutable audit log (SIEM / CloudTrail)
Abort if any condition is unmet.
Disaster Recovery Playbook
A disaster is any event that prevents the system from signing transactions: node loss, shard loss, compromise, or infrastructure failure.
Scenario 1 — Single MPC Node Goes Offline
With a 2-of-3 threshold, one node offline does not break signing. The two remaining nodes still produce valid signatures. This is a degraded state — not an emergency — but you must restore before another failure occurs.
Node A ✓ Node B ✗ (offline) Node C ✓
│ │
└──────────── 2-of-3 signing ───┘ ← still operational
Response:
- Confirm nodes A and C are reachable and signing normally.
- Provision a replacement node.
- Any custodian imports their shard to the new node using
restoreShardToNode(Export, Rotation & Restore). - Verify the new
keyIdmatchesrootPublicKeyand signing works end-to-end. - Clear the old
keyIdrecord from your node registry.
Losing one node is not an emergency — it is why you chose 2-of-3. The ceremony is not repeated; no key material is lost. You simply restore one shard to a new node.
Scenario 2 — Loss of One Custodian Shard
A custodian loses access to their shard (forgotten passphrase, hardware failure, departed employee). You still have 2 shards from other custodians.
Response:
- Do not use
RECOVER_PRIVATE_KEY— you still have quorum. - Generate a new age identity for the replacement custodian (locally, per Custodian Setup).
- Use a remaining custodian's shard to
localRewrapShardfor the new custodian. - Update your custodian registry.
RECOVER_PRIVATE_KEY exposes the raw key. Use it only when you cannot sign at all. Losing one shard is a rotation event, not a disaster.
Scenario 3 — Loss of Two Custodian Shards (Catastrophic)
Two custodians are simultaneously unavailable. You are below threshold and cannot sign.
Response:
- Convene all available custodians and legal.
- SSS guardians for any remaining custodian reconstruct that custodian's age key (
reconstructAgeKeyfrom Custodian Setup). - Restore the surviving shard to a temporary node (
restoreShardToNode). - You now have one node live — still below the 2-of-3 threshold. You need a second shard.
- If a second SSS reconstruction can be done for another custodian: restore a second shard. You are now at threshold and can sign.
- Re-export shards to newly onboarded custodians and rotate.
The SSS guardian network is specifically designed for this scenario — when a custodian themselves is unavailable. M-of-N guardians can reconstruct the custodian's age key, which unlocks the shard, which allows the node to be restored. Without SSS, loss of two custodians means permanent loss.
Scenario 4 — Suspected Age Key or Shard Compromise
A custodian's device is breached, or you suspect their passphrase was exposed.
Response — assess blast radius first:
- If only the shard file was copied: still protected by age encryption + passphrase. Rotate as a precaution.
- If both the shard file and the passphrase were obtained: treat as full shard compromise. Move assets immediately.
Rotation (local, no API private key exposure):
// Run on a trusted machine, not the potentially compromised device
await localRewrapShard(
'./custodian-a-shard.json',
'./custodian-a-identity.json',
'<exposed-passphrase>',
{ name: 'custodian-a-new', agePublicKey: '<new-A-public-key>' },
);
// Old wrappedKeyShare is now cryptographically worthless to the attacker —
// it was encrypted to the old age identity, which is now invalidated.
If an attacker has wrappedKeyShare + the old age private key, they can decrypt the shard. After local re-wrapping to a new identity, the old blob is obsolete — decrypting it yields nothing because it no longer maps to an active wrapping. Speed matters: the window is between the attacker decrypting the shard and you completing rotation.
Scenario 5 — Full Infrastructure Loss
Both the MPC node network and all but one custodian shard are lost simultaneously. You cannot sign. This is an extreme scenario.
Response:
- Convene quorum + legal immediately.
- SSS guardians reconstruct the surviving custodian's age key.
- Restore the surviving shard to a temporary node.
- If a second shard can be reconstructed from SSS: import it. You reach threshold.
- Call
RECOVER_PRIVATE_KEYwithdeleteAfterRecover: falseinitially. - Import the raw key into an HSM within 60 seconds.
- Move all assets to a newly generated MPC key.
- Decommission the recovered key — never use a recovered raw key for ongoing operations.
A raw private key in an HSM is significantly less secure than a 2-of-3 MPC key. Use recovery to move assets to safety, then generate a fresh key immediately.
Attack Prevention Policy
Understanding how attacks happen — and why the MPC architecture defeats them — helps teams build with the right threat model.
Attack Surface Map
Your application ──► WR_API_KEY ──► White Rabbit API ──► Signing request
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ (empty) │ │ (empty) │ │ (empty) │
└─────────┘ └─────────┘ └─────────┘
│ JIT import only during signing sessions
└───────── threshold signature ──────────┘
Threat 1 — API Key Theft
An attacker steals your WR_API_KEY and can call signing operations on your behalf.
Mitigations:
- Rotate the API key immediately on suspected exposure (Dashboard → API Keys → Revoke)
- Use separate keys per environment — a leaked dev key cannot touch prod
- Load keys from a secret manager (AWS Secrets Manager, Doppler, HashiCorp Vault), never
.envfiles - Monitor for anomalous signing volume or unknown destination addresses
// Load from secret manager — never from process.env on production
const apiKey = await secretsManager.getSecretValue('prod/whiterabbit/api-key');
const workspace = new WorkspaceClient({ apiKey });
Threat 2 — Single MPC Node Compromise
An attacker gains access to a node's stored key share.
Why it fails: With nodes empty at rest (JIT model), there is no key material on the node to steal. If a node is compromised during a signing session, the attacker has one share — useless alone. Threshold signing requires coordinated computation across ≥ 2 nodes simultaneously.
Mitigations:
- Nodes empty at rest (
deleteAfterExport: true) — compromise yields nothing - Use all 3 official nodes — reduces single-node value to zero
- Monitor node health; anomalous latency or unexpected disconnects should alert
Threat 3 — Custodian Shard Theft
An attacker copies a custodian's shard file.
The layers an attacker must break:
wrappedKeyShare (stolen file)
│
▼ age-encrypted (inner layer)
requires custodian's age private key
│
▼ envelopeEncrypt (outer layer)
requires custodian's passphrase
│
▼ PBKDF2 600K iterations + AES-256-GCM
brute force is computationally infeasible
Even fully breaking this only yields one shard — useless below threshold. The attacker needs two custodians' shards simultaneously.
Mitigations:
- Store shard files and identity files in separate physical locations
- Require a hardware token (YubiKey) as a second factor for any shard access
- SSS means that even the age private key is not held by a single person
Threat 4 — Insider Threat
A malicious employee with API key access tries to sign unauthorized transactions.
Mitigations:
- Transaction allow-lists — only pre-approved addresses can be recipients
- Value-based approval gates — transfers above a threshold require a second approval
- Time-locks on large transfers — submit intent, wait 24h, then execute (window to detect and cancel)
- Key separation — trading key, treasury key, and hot wallet are separate MPC keys
- Log signing requests with the authenticated user's identity, not just the API key
async function signWithApproval(params: SignParams) {
if (params.valueUsd > 10_000) {
const approved = await approvalService.requestApproval({
requester: params.requesterId,
action: 'SIGN_TRANSACTION',
value: params.valueUsd,
destination: params.to,
});
if (!approved) throw new Error('Approval denied or timed out');
}
return workspace.call(ComponentModule.SIGN_WITH_KEY_SHARE, params.signParams).promise();
}
Threat 5 — Transaction Replay / Front-Running
A signed transaction is replayed on another chain, or an attacker observes a pending transaction and front-runs it.
Mitigations:
- Always include
chainIdin transactions (EIP-155) — prevents cross-chain replay - Manage
noncecarefully in multi-pod deployments — duplicate nonces cause one transaction to drop - Use a private mempool (Flashbots
eth_sendBundle) for MEV-sensitive operations - EIP-712 typed-data (Permit2, Safe signatures) embeds
chainId+ contract address — cross-chain replay is structurally impossible
Threat 6 — Supply Chain Attack
A malicious dependency update injects code that exfiltrates key material or signs unauthorized transactions.
Mitigations:
- Pin exact dependency versions (
package-lock.json/yarn.lock) and audit all upgrades - Run
npm auditin CI; fail on critical vulnerabilities - Use a private npm registry with curated, approved packages (Artifactory, Verdaccio)
- Sign and verify CI build artifacts (GitHub Actions OIDC + SLSA provenance)
- Never run
RECOVER_PRIVATE_KEYin the same process as untrusted third-party code
Institutional Policy Summary
| Concern | Recommendation |
|---|---|
| Age key generation | age-keygen locally on each custodian's device — never via the API |
| Age key protection | AES-256-GCM / PBKDF2 600K passphrase before writing to disk |
| Age key backup | SSS 3-of-5; each share encrypted with a separate guardian passphrase |
| MPC node policy | Nodes empty at rest; import only to sign, delete immediately after |
| Re-wrapping | Always local using age-encryption npm package — never REWRAPPING_KEY_SHARE |
| keyId storage | Primary DB + read replica + encrypted off-site file + printed copy |
| Shard storage | 3 custodians, geographically separated, 3-2-1 rule |
| Passphrase strength | ≥ 24-character diceware or hardware token (YubiKey FIDO2) |
| Rotation cadence | Every 90 days or on any custodian/guardian change |
| Recovery authorization | 2-of-3 custodian written approval + legal sign-off |
| Recovery destination | Certified HSM (AWS CloudHSM, Thales Luna) within 60 seconds |
| Audit logging | Immutable log (CloudTrail, SIEM) for all key operations |
| Access control | Key users cannot export; custodians cannot call signing endpoints |
| Disaster recovery drill | Full restore-from-backup test every quarter |
Pre-Launch Policy Checklist
Generation & Backup
- Key generated:
threshold: 2, all 3 official servers -
keyIdin primary DB, read replica, and printed off-site backup -
rootPublicKeyverified on-chain against a derivation path - 3 custodian age identities generated locally — not via API
- Each age key split with SSS (3-of-5 recommended); shares distributed
- 3 shards exported with
deleteAfterExport: true— nodes verified empty - Restore drill completed: imported shard to test node, signing verified
Access Control
- Production API key is fresh (not reused from development)
- API keys loaded from a secret manager, not from
.envfiles - Signing service account cannot export or recover keys
- Custodians cannot call signing endpoints directly
- Transaction allow-list configured for all signing operations
Monitoring & Response
- Audit log for all key operations (generation, export, import, sign, recover)
- Alerts set: anomalous signing volume, new destination addresses, off-hours signing
- Incident runbook documented and rehearsed for all 5 disaster scenarios
- Emergency contact list current: custodians, SSS guardians, legal, HSM provider
Ongoing Operations
- Rotation scheduled every 90 days in the team calendar
- Quarterly recovery drill scheduled
- Custodian offboarding procedure documented and tested
- Dependency audit running in CI (
npm audit)