Quick take: splitting keys is not enough. A production system must still behave correctly when some nodes lie, fail, or disappear.
We already covered key-splitting in What Is Threshold Cryptography. The next step is resilience under failure.
The Byzantine Generals Problem
The Byzantine Generals Problem models distributed coordination under untrusted conditions.
Several generals must choose one action together: attack or retreat. If they do not agree, they fail.
The hard part: some generals may be traitors and send conflicting instructions.
In distributed systems language, generals are nodes. Byzantine behavior can appear in two ways:
- A node is compromised and sends malicious data.
- A node is faulty and produces invalid output due to runtime or hardware issues.
A system is Byzantine-resilient when it can still reach a safe, correct outcome with honest participants.
How TKeeper Models Faults
TKeeper classifies problematic participants into two classes:
- Imposters: nodes that send invalid payloads or malformed Zero-Knowledge proofs.
- Dead: nodes that stop responding (timeouts, disconnects, crashes).
This is tracked continuously during protocol rounds.
Where Faults Are Visible
TKeeper surfaces these signals in multiple places:
- Audit logs: explicit fields for
impostersand records fordeadnodes. - Error responses: failed operations include participant-level fault details.
- Successful responses: even when quorum is reached, detected bad actors are still reported.
Example response containing both an imposter and an unavailable node:
{
"errorType": "SOME_ERROR",
"imposters": [
"keeper-1"
],
"dead": [
"keeper-3"
]
}
Protocol Flow
Most operations run in rounds under a Coordinator (the node that accepted the client request):
- The Coordinator advances round transitions.
- Participants validate each other’s messages.
- When
imposterordeadis detected, participant sets and audit records are updated.
System behavior then depends on operation type.
1. Signing (GG20, FROST)
Signing is handled conservatively.
- If any
imposterordeadnode is detected, the protocol is restarted from round zero. - The new attempt excludes problematic participants.
- Continuing mid-flight after malicious behavior is unsafe for these protocols.
2. Faulty Coordinator
The Coordinator itself can be Byzantine.
- TKeeper records it as
imposterordead. - Restart is not possible inside the same execution context.
- The operation fails closed.
Important property: a bad Coordinator can impact availability, but cannot extract private key material.
3. Decryption
Decryption remains threshold-based and more flexible operationally.
- Result data can still be returned with detected fault metadata.
- If at least honest participants remain available, operation can complete.
- If honest participation drops below threshold, operation aborts safely.
4. Verification and Encryption
These operations depend on public-key reconstruction.
Same rule: if at least honest participants are available, the operation can continue safely.
Practical Outcome
By design, TKeeper tolerates up to unavailable or untrusted participants while preserving safety constraints of each protocol.
The system favors fail-safe behavior and explicit fault visibility over silent degradation.
Read next: Key Refresh: Why Long-Haul Attacks Fail