Clustering
GatewayD supports multi-node clustering using the Raft consensus protocol. When multiple GatewayD instances form a cluster, Raft ensures that load balancer state is replicated across all nodes so that every node makes consistent routing decisions.
What gets replicated
Raft replicates load balancer state across the cluster. This includes:
| Replicated state | Description |
|---|---|
| Consistent hash mappings | Maps a hash (derived from client IP and configuration group) to a proxy, ensuring the same client always reaches the same backend |
| Round robin index | Tracks which proxy comes next per server group so all nodes agree on the rotation |
| Weighted round robin weights | Tracks current and effective weights per proxy per group for correct weighted distribution |
| Peer membership | Tracks known peers (ID, Raft address, gRPC address) for cluster coordination |
Every time a load balancer decision is made, the state change goes through Raft consensus before returning a result. On follower nodes, writes are forwarded to the leader via gRPC.
Configuration parameters
The Raft configuration is a top-level section in gatewayd.yaml:
| Parameter | Type | Default | Description |
|---|---|---|---|
| address | string | 127.0.0.1:2222 | TCP address for Raft consensus protocol communication between nodes |
| nodeId | string | node1 | Unique identifier for this node in the cluster. Falls back to hostname if empty |
| isBootstrap | boolean | True | Whether this node bootstraps a new cluster. Exactly one node must have this set to True |
| isSecure | boolean | False | Enables TLS for the internal gRPC communication between Raft nodes |
| certFile | string | "" | Path to TLS certificate file (PEM). Required when isSecure is True |
| keyFile | string | "" | Path to TLS private key file (PEM). Required when isSecure is True |
| grpcAddress | string | 127.0.0.1:50051 | Address for the Raft-internal gRPC server used for inter-node RPCs |
| directory | string | raft | Directory where Raft data is persisted. Actual path is <directory>/<nodeId>/ |
| peers | array | [] | List of known peers to connect to when joining an existing cluster |
Each entry in peers has three fields:
| Field | Type | Description |
|---|---|---|
| id | string | The peer’s node ID |
| address | string | The peer’s Raft protocol address |
| grpcAddress | string | The peer’s gRPC address for inter-node RPC |
Setting up a cluster
Single node (default)
The default gatewayd.yaml runs a single bootstrap node. No special configuration is needed:
raft:
address: 127.0.0.1:2222
nodeId: node1
isBootstrap: True
grpcAddress: 127.0.0.1:50051
peers: []
Multi-node cluster
To run a three-node cluster, configure each node as follows.
Node 1 (bootstrap node):
raft:
address: 192.168.1.10:2222
nodeId: node1
isBootstrap: True
grpcAddress: 192.168.1.10:50051
peers:
- id: node2
address: 192.168.1.11:2222
grpcAddress: 192.168.1.11:50051
- id: node3
address: 192.168.1.12:2222
grpcAddress: 192.168.1.12:50051
Node 2 (joining node):
raft:
address: 192.168.1.11:2222
nodeId: node2
isBootstrap: False
grpcAddress: 192.168.1.11:50051
peers:
- id: node1
address: 192.168.1.10:2222
grpcAddress: 192.168.1.10:50051
Node 3 (joining node):
raft:
address: 192.168.1.12:2222
nodeId: node3
isBootstrap: False
grpcAddress: 192.168.1.12:50051
peers:
- id: node1
address: 192.168.1.10:2222
grpcAddress: 192.168.1.10:50051
Exactly one node must have
isBootstrap: True. Non-bootstrap nodes join the cluster by sendingAddPeergRPC calls to their configured peers, retrying every 5 seconds with a total timeout of 5 minutes.
Using Docker Compose
A ready-made docker-compose-raft.yaml is available in the GatewayD repository that sets up a three-node Raft cluster with PostgreSQL backends and observability tooling. See the deployment page for details.
Peer discovery and management
Peers must be explicitly configured in gatewayd.yaml or added at runtime via the API. There is no automatic DNS-based or multicast discovery.
A background peer synchronizer runs every 30 seconds, reconciling the Raft membership with known peers. If a peer exists in the Raft configuration but is missing from the local state, other peers are queried via GetPeerInfo RPC.
Adding and removing peers at runtime
Peers can be managed via the GatewayD REST API:
- Add peer:
POST /v1/raft/add-peerwithpeer_id,address, andgrpc_address - Remove peer:
POST /v1/raft/remove-peerwithpeer_id - List peers: Returns the peer list with status (Leader, Follower, NonVoter, Unknown)
Storage
Raft data is persisted using BoltDB under <directory>/<nodeId>/:
raft-log.db– Raft log storeraft-stable.db– Stable store (term, vote, etc.)- File-based snapshots (up to 3 retained)
Ensure the configured directory is writable by the GatewayD process.
Metrics
The following Prometheus metrics are exposed for Raft:
| Metric | Type | Description |
|---|---|---|
gatewayd_raft_health_status | Gauge | 1 if healthy, 0 if unhealthy |
gatewayd_raft_leader_status | Gauge | 1 if leader, 0 if follower |
gatewayd_raft_last_contact_seconds | Gauge | Milliseconds since last contact with leader |
gatewayd_raft_peer_additions_total | Counter | Total peer additions |
gatewayd_raft_peer_removals_total | Counter | Total peer removals |
All metrics include a node_id label.
Environment variables
Raft configuration can be overridden with environment variables using the GATEWAYD_ prefix:
| Variable | Maps to |
|---|---|
GATEWAYD_RAFT_ADDRESS | raft.address |
GATEWAYD_RAFT_NODEID | raft.nodeId |
GATEWAYD_RAFT_ISBOOTSTRAP | raft.isBootstrap |
GATEWAYD_RAFT_GRPCADDRESS | raft.grpcAddress |
GATEWAYD_RAFT_PEERS | raft.peers (JSON array) |
Limitations
- Always enabled: The Raft subsystem starts with every
gatewayd run. A single-node cluster is the default when no peers are configured. - No automatic discovery: Peers must be explicitly configured or added via the API.
- Consensus latency: Every load balancer decision goes through a Raft consensus round-trip. In multi-node clusters, follower nodes forward writes to the leader, which adds latency.
- Hardcoded timeouts: Internal timeouts (leader election: 3s, apply: 2s, transport: 10s, cluster join: 5m) are not configurable (yet).
- Hardcoded timeouts: Internal timeouts (leader election: 3s, apply: 2s, transport: 10s, cluster join: 5m) are not configurable (yet).
- Graceful leave: Nodes support graceful cluster departure via
LeaveCluster(), which removes the node from the Raft configuration before shutting down.