Clustering

GatewayD supports multi-node clustering using the Raft consensus protocol. When multiple GatewayD instances form a cluster, Raft ensures that load balancer state is replicated across all nodes so that every node makes consistent routing decisions.

What gets replicated

Raft replicates load balancer state across the cluster. This includes:

Replicated state	Description
Consistent hash mappings	Maps a hash (derived from client IP and configuration group) to a proxy, ensuring the same client always reaches the same backend
Round robin index	Tracks which proxy comes next per server group so all nodes agree on the rotation
Weighted round robin weights	Tracks current and effective weights per proxy per group for correct weighted distribution
Peer membership	Tracks known peers (ID, Raft address, gRPC address) for cluster coordination

Every time a load balancer decision is made, the state change goes through Raft consensus before returning a result. On follower nodes, writes are forwarded to the leader via gRPC.

Configuration parameters

The Raft configuration is a top-level section in gatewayd.yaml:

Parameter	Type	Default	Description
address	string	`127.0.0.1:2222`	TCP address for Raft consensus protocol communication between nodes
nodeId	string	`node1`	Unique identifier for this node in the cluster. Falls back to hostname if empty
isBootstrap	boolean	`True`	Whether this node bootstraps a new cluster. Exactly one node must have this set to `True`
isSecure	boolean	`False`	Enables TLS for the internal gRPC communication between Raft nodes
certFile	string	`""`	Path to TLS certificate file (PEM). Required when `isSecure` is `True`
keyFile	string	`""`	Path to TLS private key file (PEM). Required when `isSecure` is `True`
grpcAddress	string	`127.0.0.1:50051`	Address for the Raft-internal gRPC server used for inter-node RPCs
directory	string	`raft`	Directory where Raft data is persisted. Actual path is `<directory>/<nodeId>/`
peers	array	`[]`	List of known peers to connect to when joining an existing cluster

Each entry in peers has three fields:

Field	Type	Description
id	string	The peer’s node ID
address	string	The peer’s Raft protocol address
grpcAddress	string	The peer’s gRPC address for inter-node RPC

Setting up a cluster

Single node (default)

The default gatewayd.yaml runs a single bootstrap node. No special configuration is needed:

raft:
  address: 127.0.0.1:2222
  nodeId: node1
  isBootstrap: True
  grpcAddress: 127.0.0.1:50051
  peers: []

Multi-node cluster

To run a three-node cluster, configure each node as follows.

Node 1 (bootstrap node):

raft:
  address: 192.168.1.10:2222
  nodeId: node1
  isBootstrap: True
  grpcAddress: 192.168.1.10:50051
  peers:
    - id: node2
      address: 192.168.1.11:2222
      grpcAddress: 192.168.1.11:50051
    - id: node3
      address: 192.168.1.12:2222
      grpcAddress: 192.168.1.12:50051

Node 2 (joining node):

raft:
  address: 192.168.1.11:2222
  nodeId: node2
  isBootstrap: False
  grpcAddress: 192.168.1.11:50051
  peers:
    - id: node1
      address: 192.168.1.10:2222
      grpcAddress: 192.168.1.10:50051

Node 3 (joining node):

raft:
  address: 192.168.1.12:2222
  nodeId: node3
  isBootstrap: False
  grpcAddress: 192.168.1.12:50051
  peers:
    - id: node1
      address: 192.168.1.10:2222
      grpcAddress: 192.168.1.10:50051

Exactly one node must have isBootstrap: True. Non-bootstrap nodes join the cluster by sending AddPeer gRPC calls to their configured peers, retrying every 5 seconds with a total timeout of 5 minutes.

Using Docker Compose

A ready-made docker-compose-raft.yaml is available in the GatewayD repository that sets up a three-node Raft cluster with PostgreSQL backends and observability tooling. See the deployment page for details.

Peer discovery and management

Peers must be explicitly configured in gatewayd.yaml or added at runtime via the API. There is no automatic DNS-based or multicast discovery.

A background peer synchronizer runs every 30 seconds, reconciling the Raft membership with known peers. If a peer exists in the Raft configuration but is missing from the local state, other peers are queried via GetPeerInfo RPC.

Adding and removing peers at runtime

Peers can be managed via the GatewayD REST API:

Add peer: POST /v1/raft/add-peer with peer_id, address, and grpc_address
Remove peer: POST /v1/raft/remove-peer with peer_id
List peers: Returns the peer list with status (Leader, Follower, NonVoter, Unknown)

Storage

Raft data is persisted using BoltDB under <directory>/<nodeId>/:

raft-log.db – Raft log store
raft-stable.db – Stable store (term, vote, etc.)
File-based snapshots (up to 3 retained)

Ensure the configured directory is writable by the GatewayD process.

Metrics

The following Prometheus metrics are exposed for Raft:

Metric	Type	Description
`gatewayd_raft_health_status`	Gauge	1 if healthy, 0 if unhealthy
`gatewayd_raft_leader_status`	Gauge	1 if leader, 0 if follower
`gatewayd_raft_last_contact_seconds`	Gauge	Milliseconds since last contact with leader
`gatewayd_raft_peer_additions_total`	Counter	Total peer additions
`gatewayd_raft_peer_removals_total`	Counter	Total peer removals

All metrics include a node_id label.

Environment variables

Raft configuration can be overridden with environment variables using the GATEWAYD_ prefix:

Variable	Maps to
`GATEWAYD_RAFT_ADDRESS`	`raft.address`
`GATEWAYD_RAFT_NODEID`	`raft.nodeId`
`GATEWAYD_RAFT_ISBOOTSTRAP`	`raft.isBootstrap`
`GATEWAYD_RAFT_GRPCADDRESS`	`raft.grpcAddress`
`GATEWAYD_RAFT_PEERS`	`raft.peers` (JSON array)

Limitations

Always enabled: The Raft subsystem starts with every gatewayd run. A single-node cluster is the default when no peers are configured.
No automatic discovery: Peers must be explicitly configured or added via the API.
Consensus latency: Every load balancer decision goes through a Raft consensus round-trip. In multi-node clusters, follower nodes forward writes to the leader, which adds latency.
Hardcoded timeouts: Internal timeouts (leader election: 3s, apply: 2s, transport: 10s, cluster join: 5m) are not configurable (yet).
Hardcoded timeouts: Internal timeouts (leader election: 3s, apply: 2s, transport: 10s, cluster join: 5m) are not configurable (yet).
Graceful leave: Nodes support graceful cluster departure via LeaveCluster(), which removes the node from the Raft configuration before shutting down.