Troubleshooting
This page covers common issues you may encounter when running GatewayD and how to resolve them.
GatewayD won’t start
Port already in use
Symptom: GatewayD exits with an error like bind: address already in use.
Solution: Another process is using the port. Check which process is using it:
lsof -i :15432
Either stop the conflicting process or change the GatewayD listening port in the server configuration.
Configuration file not found
Symptom: Error mentioning gatewayd.yaml or gatewayd_plugins.yaml not found.
Solution: Generate default configuration files using the CLI:
gatewayd config init
Or specify the path explicitly:
gatewayd run --config /path/to/gatewayd.yaml --plugin-config /path/to/gatewayd_plugins.yaml
Invalid configuration
Symptom: GatewayD exits with a configuration validation error.
Solution: Lint your configuration files:
gatewayd config lint --config /path/to/gatewayd.yaml
gatewayd config lint --plugin-config /path/to/gatewayd_plugins.yaml
Connection issues
Cannot connect to the backend database
Symptom: Database clients can connect to GatewayD but queries fail or time out. Logs show connection errors to the backend.
Solution:
- Verify the backend database address in the clients configuration.
-
Ensure the backend database is reachable from the GatewayD host:
pg_isready -h <backend-host> -p <backend-port> - Check that the database credentials in the configuration are correct.
- If using TLS, verify certificate paths and permissions.
Connection pool exhaustion
Symptom: New client connections are refused or delayed. Logs mention pool exhaustion.
Solution:
- Increase the pool size in the pools configuration.
- Check if client connections are being properly closed by the application.
- Monitor connection metrics via Prometheus to understand usage patterns.
Stale connections
Symptom: Queries intermittently fail with connection reset or broken pipe errors.
Solution: Stale connections occur when the backend database closes idle connections. Enable connection health checking to periodically verify backend connections.
Plugin issues
Plugin fails to start
Symptom: GatewayD logs show errors about a plugin failing to start.
Solution:
- Verify the plugin binary path in
gatewayd_plugins.yaml. - Check that the plugin binary has execute permissions.
- Ensure the plugin’s required environment variables are set.
- Increase the
startTimeoutin the general plugin configuration if the plugin needs more time to initialize.
Plugin crashes and restarts
Symptom: Logs show a plugin being restarted repeatedly.
Solution:
- Check plugin-specific logs for error details.
- Verify that the plugin’s dependencies are available (e.g., Redis for the cache plugin).
- Set
reloadOnCrash: Falsein the general plugin configuration to prevent restart loops while debugging.
Checksum verification failure
Symptom: GatewayD refuses to load a plugin due to checksum mismatch.
Solution: Regenerate the checksum for the plugin binary and update it in gatewayd_plugins.yaml. See checksum verification for details.
Raft clustering issues
Node fails to join the cluster
Symptom: A non-bootstrap node logs errors about failing to join within the timeout (5 minutes).
Solution:
- Verify that the bootstrap node is running and reachable.
- Check that
raft.addressandraft.grpcAddressare accessible from the joining node. - Ensure
nodeIdvalues are unique across all nodes. - Check firewall rules allow traffic on the configured Raft and gRPC ports.
Split-brain or leader election issues
Symptom: Nodes report conflicting leaders or frequent leader elections.
Solution:
- Ensure network connectivity between all nodes is stable and low-latency.
- Use an odd number of nodes (3 or 5) for proper quorum.
- Monitor
gatewayd_raft_last_contact_secondsto identify network issues.
Performance
High latency
Solution:
-
Enable debug logging temporarily to identify bottlenecks:
GATEWAYD_LOGGERS_DEFAULT_LEVEL=debug gatewayd run - Check Prometheus metrics for connection pool utilization and proxy latency.
- In multi-node clusters, Raft consensus adds latency to every load balancer decision. Consider whether single-node deployment is sufficient for your use case.
High memory usage
Solution:
- Review pool sizes – each backend connection consumes memory.
- Check plugin memory usage, especially plugins that buffer data (e.g., the cache plugin’s Redis connection count).
- Monitor metrics to correlate memory usage with connection count.
Debugging
Enable debug logging
Set the log level to debug for verbose output:
loggers:
default:
level: debug
Or via an environment variable:
GATEWAYD_LOGGERS_DEFAULT_LEVEL=debug gatewayd run
Enable tracing
Send OpenTelemetry traces to a supported backend:
gatewayd run --tracing
Traces are sent via gRPC to the configured collector. See observability for details.
Check API health
The HTTP API provides a health check endpoint:
curl http://localhost:18080/healthz