Most messaging protocols assume that sender and receiver are online at the same time, or that a server in between is always reachable. For many real-world environments, neither assumption holds. Field offices lose satellite uplinks for hours. Mesh networks partition when nodes move out of range. Censored networks have routes that appear and disappear based on what the authorities blocked this morning.
The common response is to build around a central server. Put everything in the cloud. But that trades one problem for another. A central server is a single point of failure, a single point of surveillance, and a single point of coercion. If the server is down, nobody talks. If the server is compromised, everybody is compromised. If a court order lands on the server operator, every message is subject to it.
The alternative is a federated store-and-forward architecture -- a network where messages move hop by hop through independently operated relay nodes, stored at each hop until a path to the next one opens up. No single node sees the whole network. No single operator controls the system. Messages get through eventually, as long as any sequence of hops connects sender to receiver over time.
This article describes the design of one such protocol: Estafette. Named after the relay race -- estafette in Dutch and French -- where each runner carries the baton forward one leg.
The Problem in Detail
Start with what breaks when you cannot assume continuous connectivity.
A standard TCP connection requires both endpoints to be reachable at the moment the connection is initiated. TLS requires a handshake that completes in real time. HTTP requires a request-response cycle. WebSocket requires a persistent connection. All of these assume that the network path between two machines exists right now and stays stable long enough to complete the exchange.
In an intermittently connected environment, that assumption fails in at least three ways:
Temporal disconnection. The recipient is offline when the sender wants to transmit. This is the simplest case -- email handles it fine. But if the intermediary (the mail server) is also intermittently connected, email breaks too.
Path unavailability. Both sender and receiver are online, but no end-to-end route exists between them. A mesh network may have partitioned. A firewall rule may have changed. An undersea cable may have been cut. The nodes are up, but the path is not.
Asymmetric reachability. Node A can reach Node B, but Node B cannot reach Node A. This is common in NAT-heavy environments, satellite links with asymmetric bandwidth, and networks where outbound connections succeed but inbound connections are blocked.
Store-and-forward addresses all three cases with a single mechanism: accept the message, store it locally, and try to deliver it later. If delivery fails, try again. If a neighbor has a better path, hand the message off. Eventually, the message arrives. The cost is latency. The gain is that the system works when nothing else does.
Store-and-Forward as a Pattern
Store-and-forward is not a new idea. It is one of the oldest patterns in digital communication, and it predates digital communication entirely. The postal system is store-and-forward. A letter is accepted at a post office, sorted, transported to another post office, sorted again, and delivered when the carrier can reach the recipient's mailbox. The sender does not need to know the route. The recipient does not need to be home when the letter is posted.
Email is store-and-forward. An MTA accepts a message, looks up the recipient's MX record, connects to the next server, and hands it off. If the next server is down, the MTA queues the message and retries. SMTP has built-in retry logic with configurable intervals, typically trying for up to five days before giving up.
Usenet was store-and-forward. Each NNTP server received articles from its peers, stored them locally, and forwarded them to other peers that had not yet seen them. There was no central server. Articles propagated through the network like gossip, eventually reaching every server that subscribed to the relevant newsgroup. Propagation could take hours or days depending on the topology and peering frequency.
Delay-Tolerant Networking -- the DTN architecture developed for interplanetary communication -- formalized store-and-forward into a protocol stack. The Bundle Protocol (RFC 5050, later RFC 9171) defines a message format and forwarding rules for networks where end-to-end connectivity may never exist simultaneously. Bundles are stored at each node and forwarded opportunistically. Contact windows -- the periods when two nodes can communicate -- are scheduled or discovered, and bundles queue until a window opens.
What these systems share is a trade-off: they sacrifice latency and ordering guarantees in exchange for delivery across unreliable paths. A store-and-forward system does not promise when your message will arrive. It promises that the system will keep trying until it does, or until a defined expiration time passes.
Why Federation Matters
A centralized store-and-forward system is just email with a single provider. Your message goes from your client to one server, and from that server to the recipient. If the server is compromised, every message is exposed. If the operator is compelled, every user is affected. If the server goes down, nobody communicates.
Federation distributes that risk. In a federated system, relay nodes are operated by different parties under different jurisdictions with different threat models. No single operator sees all the traffic. No single compromise exposes the whole network. No single court order covers every relay.
Federation also distributes the cost. Running a relay node requires storage, bandwidth, and maintenance. In a federated system, that cost is shared among operators. Each operator is responsible for their own node. The protocol defines how nodes cooperate. The economics work because each operator gets something out of participating -- typically, access to the network for their own users.
The cost of federation is complexity. Centralized systems have one authority that makes decisions about identity, routing, abuse, and capacity. Federated systems need protocols that handle all of those problems across trust boundaries. The protocol must work even when some operators are hostile, some are incompetent, and some disappeared six months ago and left their node running.
The Estafette Protocol
Estafette is designed around five principles:
Messages are self-describing. Every message carries enough metadata to be routed, verified, and delivered without consulting any external authority. A relay node that has never seen a message before can determine where it needs to go, whether it is valid, and whether it has expired -- all from the message itself.
Relays are untrusted with content. Relay nodes forward encrypted payloads. They cannot read message content. They see routing metadata -- enough to decide where to forward -- but the payload is opaque. End-to-end encryption is not optional. It is structural.
Delivery is best-effort with bounded lifetime. Messages have a TTL. If delivery has not completed by expiration, the message is dropped. The protocol does not guarantee delivery. It guarantees that the network will try for the duration of the TTL, and then stop. Senders who need confirmation use application-level acknowledgments.
Forwarding is one-hop-at-a-time. A relay does not compute a full path to the destination. It looks at the message, determines the best next hop from its own perspective, and forwards. Each relay makes an independent forwarding decision. This is how IP routing works, and it is how physical mail works. It means the protocol does not require global knowledge of the network topology.
Signatures accumulate. Each relay that handles a message appends its own signature to a chain. The recipient can inspect this chain to see the full path the message traveled, and verify that each hop was handled by a node with a valid identity. This is not for privacy -- it is for accountability. If a relay is misbehaving, the signature chain provides evidence.
Message Format
An Estafette message has three layers: the envelope, the signature chain, and the payload.
The envelope is plaintext metadata that relays read to make forwarding decisions. It contains the message ID (a random 128-bit identifier), the destination address (a public key hash or federated address), the origin address, a creation timestamp, a TTL in seconds, a hop count, and a maximum hop limit. The envelope is small by design -- a few hundred bytes at most. Relays parse it quickly to make forwarding decisions without touching the payload.
The signature chain is an ordered list of relay signatures. Each relay appends a record containing its node ID, a timestamp, and a signature over the envelope plus all previous chain entries. This creates a hash chain -- modifying any earlier entry invalidates all subsequent signatures. The chain grows by roughly 100 bytes per hop.
The payload is an encrypted blob. Estafette does not specify the encryption scheme for the payload -- that is an application-layer decision. The protocol treats the payload as opaque bytes with a length prefix. In practice, most implementations use an authenticated encryption scheme like XChaCha20-Poly1305 or age encryption, with the recipient's public key.
Routing Strategy
Estafette uses a hybrid routing approach that combines three mechanisms: explicit relay lists, gossip-based peer discovery, and TTL-based expiration as a safety net.
Explicit Relay Lists
The simplest routing mode. The sender includes an ordered list of relay node addresses in the envelope. Each relay forwards to the next address on the list. If the next relay is unreachable, the current relay stores the message and retries on a backoff schedule. This mode works when the sender knows the network topology in advance -- for example, a field office that always routes through a regional hub before reaching headquarters.
Explicit relay lists are predictable and auditable. The sender knows exactly which nodes will handle the message. The downside is that they are brittle. If any relay on the list is permanently offline, the message stalls. In practice, explicit lists are used for well-known routes and combined with a fallback to gossip-based routing.
Gossip-Based Discovery
Relay nodes periodically exchange peer lists with their neighbors. Each node maintains a table of known peers, their addresses, the last time they were reachable, and what destination prefixes they claim to serve. When a relay receives a message without an explicit route, it consults this table to find the best next hop toward the destination.
"Best" is determined by a simple cost function: prefer peers that claim to serve the destination prefix, prefer peers that were recently reachable, prefer peers with fewer hops to the destination (if known). Ties are broken randomly to distribute load.
Gossip discovery is eventually consistent. A new node joining the network will be unknown to distant peers until the gossip propagates. A node that went offline will remain in peer tables until its last-seen timestamp ages out. This is acceptable because Estafette is already designed for latency tolerance. Routing decisions do not need to be optimal. They need to be good enough to make forward progress.
TTL-Based Expiration
Every message has a TTL. When the TTL expires, the message is dropped at whatever relay currently holds it. This prevents messages from circulating indefinitely in the case of routing loops, permanently unreachable destinations, or network partitions that never heal.
The TTL is set by the sender based on the urgency of the message. A time-sensitive alert might have a TTL of one hour. A routine report might have a TTL of seven days. A message with no time constraint might use the protocol maximum of 30 days.
TTL expiration is the mechanism that bounds resource consumption. Without it, a relay's storage would grow without limit as messages accumulated for unreachable destinations. With TTL, the maximum storage per relay is bounded by the ingest rate multiplied by the maximum TTL.
Cryptographic Properties
The security model of Estafette has to account for the fact that messages pass through nodes operated by parties the sender does not fully trust. A relay might be honest-but-curious (following the protocol but trying to learn message content). A relay might be malicious (modifying messages, dropping them selectively, or injecting forged ones). The cryptographic design addresses each threat.
End-to-End Encryption
The payload is encrypted by the sender using the recipient's public key. No relay in the path has the decryption key. This is structurally enforced -- the protocol parser treats the payload as a length-prefixed byte array and provides no API for relays to interpret its contents.
The encryption is performed before the message enters the relay network. The sender encrypts to the recipient's public key, wraps the ciphertext in the Estafette envelope, signs the envelope, and submits it to the first relay. At no point does the cleartext exist on any system other than the sender's and the recipient's.
For group messages, the sender encrypts the payload once with a symmetric key and then encrypts that symmetric key separately for each recipient's public key. The per-recipient key wraps are included in the payload. This avoids encrypting the full message body multiple times while still ensuring that only intended recipients can decrypt.
Signature Verification
The origin signature (hop[0] in the chain) is the sender's Ed25519 signature over the envelope. This lets the recipient verify that the message was composed by the claimed sender and that the envelope has not been modified in transit.
Each relay signature covers the envelope plus all previous chain entries. This means that if any relay modifies the envelope after signing, the next relay's signature will not verify against the original chain. The recipient checks the full chain: verify hop[0] against the sender's key, verify hop[1] against relay-alpha's key and the data that includes hop[0], and so on. If any link in the chain is invalid, the message is rejected.
This is not the same as authenticating the relays as trusted. It is verifying that the relays that claim to have handled the message actually did, in the order they claim. A recipient cannot verify that a relay is honest. They can verify that a relay signed what it forwarded.
Replay Protection
The 128-bit random message ID, combined with the creation timestamp and TTL, provides replay protection. Each relay maintains a seen-messages table of recently processed message IDs. If a message arrives with an ID already in the table, it is dropped as a duplicate. The table is pruned based on TTL -- entries older than the maximum TTL can be safely removed because any legitimate message with that ID would have expired.
This mechanism means that an attacker who captures a message and re-injects it into the network will have the replay dropped at the first relay that already processed the original. It does not prevent the attacker from delaying delivery of the original -- that is an availability attack, and store-and-forward systems are inherently more vulnerable to availability attacks than real-time systems.
Node Discovery and Trust
The question of how nodes find each other and how trust is established is where federated protocols get complicated. Estafette separates discovery from trust and defines different levels of trust for different operations.
Discovery
Nodes discover each other through three mechanisms:
Static configuration. An operator manually adds peer addresses to the node's configuration. This is the bootstrapping mechanism. When you set up a new Estafette node, you configure at least one peer -- typically the operator's own hub node or a well-known community relay.
Gossip exchange. Once connected to at least one peer, a node receives peer announcements through the gossip protocol. These announcements include the peer's address, its public key, and the destination prefixes it serves. Over time, the node builds a view of the reachable network.
DNS-based bootstrap. For public Estafette networks, a set of well-known DNS records (SRV records under _estafette._tcp) can provide initial peer addresses. This is a convenience mechanism for joining an existing network without manual configuration.
Trust Levels
Estafette defines three trust levels, each granting different capabilities:
Forwarding trust. The baseline. You trust this peer to forward messages on your behalf and to forward messages toward you. This does not imply you trust the peer to keep your messages confidential (they are encrypted anyway) or to handle them with any particular priority. It means you believe the peer will make a good-faith effort to deliver.
Storage trust. You trust this peer to store messages for you when you are offline. This is a stronger commitment because the peer is dedicating resources (disk space, bandwidth for retries) to your availability. Peers can advertise storage limits in their gossip announcements.
Identity trust. You trust that this peer's public key actually belongs to the entity it claims to represent. This is the hardest problem. Estafette does not include a built-in PKI. Operators are expected to verify peer identities out-of-band -- exchanging key fingerprints over a trusted channel, using TOFU (trust on first use) with monitoring for key changes, or integrating with an external certificate authority.
Failure Modes
Every distributed system fails. The measure of a protocol is not whether it prevents failure but whether it degrades predictably when failure occurs. Estafette has several well-defined failure modes.
Split-Brain
If the network partitions into two disconnected subsets, messages from nodes in one partition cannot reach nodes in the other. This is not a bug -- it is the fundamental constraint the protocol is designed around. Messages queue at the relay closest to the partition boundary and wait for reconnection. If the partition persists longer than the message TTL, the messages expire.
The risk is that a long partition followed by reconnection causes a burst of queued messages to flood the network. Estafette handles this with rate limiting on the forwarding path. When a relay reconnects to a peer, it drains its queue at a controlled rate rather than dumping everything at once. The drain rate is configurable per-node and is advertised in the gossip protocol so that peers know not to send faster than the remote can absorb.
Message Duplication
Because relays make independent forwarding decisions, and because a relay may forward a message to multiple peers if it is unsure of the best path, duplicates can occur. The seen-messages table at each relay handles most deduplication. The recipient also maintains a seen-messages table as the final deduplication layer.
In practice, duplication rates in a well-connected network are low -- typically under 2%. In a heavily partitioned network with frequent reconnections, rates can spike as queued messages take multiple paths simultaneously. The protocol treats duplication as an acceptable cost of reliability. Better to deliver a message twice than not at all.
Ordering
Estafette provides no ordering guarantees. Messages may arrive out of order, especially if they take different paths through the network. If an application needs ordering, it must implement it at the application layer -- typically using sequence numbers in the encrypted payload.
This is a deliberate design choice. Providing causal ordering in a store-and-forward system requires either global state (which contradicts federation) or vector clocks (which add complexity and metadata overhead that grows with the number of participants). For the primary use cases -- asynchronous messaging, file delivery, status reports -- application-level sequence numbers are simpler and sufficient.
Storage Exhaustion
A relay node has finite disk space. If messages arrive faster than they are forwarded, the queue grows. Estafette handles this with admission control. Each relay advertises a maximum queue size and a per-sender rate limit. When a relay's queue is above 80% capacity, it begins rejecting messages from senders that have already consumed more than their proportional share. At 95% capacity, it rejects all new messages and focuses on draining the existing queue.
Rejected messages are not lost. The sending node (whether an origin or another relay) keeps the message in its own queue and retries later. This pushes backpressure upstream, which is the correct behavior -- the source of the congestion should slow down, not the relay that is full.
Comparison with Existing Protocols
Estafette occupies a specific niche. Understanding where it sits relative to existing systems clarifies when it is the right choice and when something else is better.
SMTP
Email is the original federated store-and-forward system. SMTP has been doing this since 1982. It is battle-tested, universally deployed, and well-understood. But SMTP was designed for a network where servers are always online. MX records point to hosts that are expected to be reachable. Retry windows assume temporary outages, not intermittent connectivity. SMTP's five-day default retry period is a bug-fix for temporary server failures, not a feature for disconnected operation.
SMTP also has no end-to-end encryption at the protocol level. S/MIME and PGP bolt encryption on top, but it is optional and rarely used. The envelope (From, To, Subject, timestamps) is always visible to every relay. Estafette encrypts the payload structurally and minimizes the envelope metadata visible to relays.
Matrix
Matrix is a federated messaging protocol with strong eventual consistency. It uses a DAG (directed acyclic graph) of events to synchronize room state across servers. This gives it stronger ordering guarantees than Estafette and supports rich features like room state, typing notifications, and presence.
The trade-off is that Matrix servers need to be online and reachable most of the time. The federation protocol requires server-to-server HTTP connections to synchronize state. A Matrix server that goes offline for extended periods will miss events and need to backfill, which is bandwidth-intensive and can cause inconsistencies. Matrix is designed for "mostly connected" environments. Estafette is designed for "mostly disconnected" environments.
Briar
Briar is a peer-to-peer messenger designed for censorship resistance. It can communicate over Tor, local Wi-Fi, and Bluetooth. It has no servers -- messages are stored on the sender's device until the recipient comes online.
Briar solves a similar problem to Estafette but makes different trade-offs. Without relay nodes, Briar requires that sender and recipient eventually share a direct connection (even if that connection is over Tor). Estafette's relay architecture means that sender and recipient never need to be simultaneously reachable -- the relay chain handles temporal disconnection. Briar has better privacy properties (no relays means no relay metadata). Estafette has better delivery properties for long disconnections.
DTN Bundle Protocol
The Bundle Protocol (RFC 9171) is the closest relative to Estafette in formal protocol design. It was built for interplanetary networking, where light-speed delays mean that end-to-end connections are physically impossible. The Bundle Protocol defines custody transfer (a relay formally accepts responsibility for a bundle), fragmentation (splitting large bundles for constrained links), and administrative records (delivery confirmations, custody signals).
Estafette borrows ideas from the Bundle Protocol but simplifies them for terrestrial use. Estafette does not implement custody transfer because the accountability properties of the signature chain serve a similar purpose with less protocol complexity. Estafette does not implement fragmentation because terrestrial links, even constrained ones, can typically handle messages up to the payload size limit. The Bundle Protocol is the right choice for space networking and academic DTN research. Estafette is the right choice for field-deployable systems where implementation simplicity matters.
Practical Implementation Considerations
Storage on Relay Nodes
A relay node needs enough storage to hold queued messages for the maximum TTL duration at the expected ingest rate. For a relay handling 1,000 messages per day with an average size of 50 KB and a maximum TTL of 7 days, the storage requirement is roughly 350 MB. That is trivial. For a hub relay handling 100,000 messages per day with larger payloads and longer TTLs, storage requirements can reach tens of gigabytes. Still manageable on modern hardware, but it needs monitoring.
The storage backend is intentionally unspecified. Reference implementations use SQLite for the message queue (envelope and metadata in rows, payload as a blob or on the filesystem). High-throughput nodes might use LMDB or RocksDB. The protocol does not care. What matters is that the storage is durable -- a node crash should not lose queued messages -- and that lookups by message ID and by destination prefix are fast.
Bandwidth Constraints
In satellite, HF radio, or metered mobile environments, bandwidth is expensive. Estafette includes optional message compression (zstd, negotiated during the peer handshake) and supports priority levels in the envelope. High-priority messages are forwarded before low-priority messages when bandwidth is constrained. Priority is advisory -- a relay can ignore it -- but well-behaved relays honor it.
For extremely constrained links, Estafette supports a "header-only" probe. Before sending a full message, a relay can send just the envelope to the next hop. The next hop checks its seen-messages table and responds with either "send it" or "already have it." This avoids wasting bandwidth on duplicates over expensive links.
Abuse Prevention
An open relay network is a spam vector. Estafette's abuse prevention works at two levels. At the protocol level, messages must be signed by a valid identity. Anonymous messages are not supported. At the operator level, each relay maintains an allowlist or denylist of sender identities and peer nodes. Operators can block specific senders, specific peers, or entire destination prefixes.
Rate limiting per sender identity is the primary spam defense. A relay tracks how many messages each sender has submitted in a rolling window and rejects submissions that exceed the configured limit. Because sender identities are cryptographic (tied to a key pair), creating new identities has a cost -- the new identity has no reputation, and relays with strict policies may not forward its messages until it has been vouched for by a trusted peer.
Use Cases
Mesh Networks
Community mesh networks -- like those built with LoRa, Meshtastic, or dedicated RF links -- have intermittent connectivity by nature. Nodes come and go. Links depend on line of sight, weather, and interference. Estafette provides a messaging layer that works with the mesh rather than against it. Each mesh node runs an Estafette relay, stores messages when downstream is unreachable, and forwards when the link comes back.
Intermittent Connectivity
Ships at sea, remote research stations, disaster areas with damaged infrastructure, rural areas with unreliable internet -- anywhere that connectivity is present sometimes but not always. Estafette turns "sometimes connected" into "eventually delivered."
Censorship-Resistant Communication
In environments where the network is actively monitored and filtered, Estafette's relay architecture provides path diversity. A message does not need to take the same route every time. If one path is blocked, the message can route through a different sequence of relays. The encrypted payload means that even if a relay is compromised or operated by the adversary, message content is not exposed.
Estafette does not solve the traffic analysis problem. An adversary who controls enough relays can observe message flow patterns even without reading content. Resistance to traffic analysis requires additional measures -- padding, dummy traffic, timing obfuscation -- that are outside the scope of the base protocol but can be implemented as extensions.
Embassy and Field Office Communication
Organizations with distributed offices in areas with unreliable or untrusted infrastructure need a way to exchange messages that does not depend on the local internet being up or trustworthy. Estafette with explicit relay lists and strong identity verification provides a chain of custody for every message, with encryption that does not depend on the local PKI or certificate authorities that the host country controls.
What Estafette Does Not Do
It is worth being explicit about the limitations.
Estafette does not provide real-time messaging. If both parties are online and connected, use a direct protocol -- QUIC, WebSocket, or even a phone call. Estafette adds latency that is unnecessary when a direct path exists.
Estafette does not provide guaranteed delivery. A message can expire before delivery if the TTL is too short or the partition lasts too long. The protocol is honest about this. Best-effort with bounded lifetime is the correct contract for a store-and-forward system.
Estafette does not hide metadata from relays. The envelope -- destination, origin, timestamps, hop count -- is visible to every relay that handles the message. If metadata privacy is a requirement, you need an onion-routing layer on top of Estafette, which is possible but adds latency and complexity.
Estafette does not scale to millions of users on a single relay. It is designed for networks of hundreds to thousands of nodes with proportional traffic. For internet-scale messaging, centralized architectures with redundant infrastructure are more practical. Estafette is for environments where that infrastructure does not exist, cannot be trusted, or might disappear.
The name fits. In a relay race, no single runner covers the full distance. Each one carries the baton forward as far as they can, then hands it off. The baton reaches the finish line not because any one runner was fast enough, but because the handoffs worked. Estafette is the same. The message gets through not because any one node had a complete path, but because each node carried it forward one hop.