Technical Deep Dive Into GitLab’s Database Load Balancer Architecture
Presented by:

Dylan Griffith
Dylan Griffith is a Principal Engineer at GitLab. In the 7 years working there he has worked across many areas of the project with a focus on build scalable architectures and optimising all aspects of our usage of Postgres.
Problem: Idle Replicas
Many open source tools exist today for creating high availability deployments of Postgres. Most of these tools involve having 1 primary database and a fleet of replicas which have some small amount of replication lag. Almost all of these solutions mean running multiple Postgres servers with the same amount of disk space, CPU and memory in order to ensure that any replica is ready for failover at any time. This increases the cost of your infrastructure but it can be mitigated so long as the replicas can be used to handle read-only queries. Doing so means reducing the amount of resources needed by your primary database and thus the total cost of your infrastructure. But all of these solutions suffer from the same technical challenge which is “how do I know if I can send this specific read-only query to a replica that is lagging slightly behind the primary?”. If a user just writes data to the database and is redirected to view a new record that they just created will we end up serving a 404 due to a stale replica?
Solution: GitLab's Open Source Database Load Balancer
As GitLab has a huge volume of read-only queries and very important application constraints around reading fresh data we had a clear opportunity to maximise our resource utilisation by doing everything we can to send read-only queries to replicas which are sufficiently up to date. We had to define very specifically what it meant to be sufficiently up to date to not surprise users or cause serious data corruption bugs. To this end we’ve made use of other open source components (such as PGBouncer and Patroni) as well as built a sophisticated open source application-side load balancer for Rails to efficiently utilize replicas.
This talk will explain in detail 4 different key features in our database load balancer.
- How to manage connection pools for a fleet of replicas and 1 primary database using service discovery, Consul and Patroni
- How to make route user’s queries to replica database (which have some replication lag) without users seeing anomalies or missing data while making live edits
- How to maximise replica database usage for background job processing including a sophisticated retry mechanism that delays non time-critical background jobs until a replica is sufficiently caught up to process that specific job
- How service discovery is able to efficiently connect every application server to each Postgres server only once even though each Postgres server is fronted by multiple PGBouncers
- Date:
- 2025 October 17 16:00 +11
- Duration:
- 40 min
- Room:
- Oxford 1 + 2
- Conference:
- PG Down Under 2025
- Language:
- Track:
- Development
- Difficulty:
- Hard