Scaling PostgreSQL to power 800 million ChatGPT users
Blog

Scaling PostgreSQL to power 800 million ChatGPT users

2026.01.24
·Service·by 이호민
#PostgreSQL#Scalability#Database#OpenAI#High Availability

Key Points

  • 1OpenAI has successfully scaled a single Azure PostgreSQL primary instance and nearly 50 read replicas to handle millions of queries per second for 800 million ChatGPT users, proving its viability for massive read-heavy workloads.
  • 2This was achieved through rigorous optimizations including offloading write-heavy workloads to sharded systems, aggressive query tuning, workload isolation, connection pooling with PgBouncer, and a cache locking mechanism to prevent overload.
  • 3Future plans involve migrating more challenging write-heavy workloads, implementing cascading replication to scale beyond 50 replicas, and exploring sharded PostgreSQL to ensure continued growth and stability.

OpenAI has successfully scaled PostgreSQL to handle read-heavy workloads for 800 million ChatGPT users, processing millions of queries per second (QPS) with a single primary Azure PostgreSQL flexible server instance and nearly 50 read replicas across multiple global regions. This achievement, detailed in their engineering blog post, challenges previous perceptions of PostgreSQL's scalability for massive read-heavy demands, demonstrating its capability to support high traffic while maintaining low latency and high availability.

The core methodology revolves around a multi-pronged approach addressing inherent PostgreSQL limitations and operational challenges. While PostgreSQL excels at read-heavy workloads, its MultiVersion Concurrency Control (MVCC) implementation presents challenges for write-heavy loads, leading to write/read amplification, table/index bloat, and autovacuum complexities. To mitigate these, OpenAI implements a strategy of offloading and rigorous optimization:

  1. Workload Migration and Write Pressure Reduction:
    • Strategic Sharding: Shardable, write-heavy workloads are progressively migrated to horizontally partitioned systems like Azure Cosmos DB. This ensures that new tables and high-volume writes bypass the monolithic PostgreSQL primary.
    • Application-Level Optimization: Application logic is aggressively optimized to minimize unnecessary writes (e.g., fixing redundant writes, implementing lazy writes to smooth traffic spikes) and rate-limiting during large data backfills.
    • Primary Load Minimization: The primary instance is dedicated primarily to writes and transactional reads, with most read traffic offloaded to replicas.
  1. Query Optimization:
    • Complex Query Refactoring: Expensive multi-table joins are actively avoided. If necessary, complex join logic is moved from the database to the application layer.
    • ORM Review: SQL generated by Object-Relational Mapping (ORM) frameworks is carefully reviewed and optimized.
    • Idle Query Management: idle_in_transaction_session_timeout is configured to prevent long-running idle queries from blocking critical database processes like autovacuum.
  1. High Availability and Fault Tolerance:
    • Primary Single Point of Failure (SPOF) Mitigation: The primary runs in High-Availability (HA) mode with a hot standby, enabling rapid failover and minimizing downtime for write operations. Offloading critical reads to replicas ensures read availability even if the primary fails.
    • Read Replica Redundancy: Multiple read replicas with sufficient capacity headroom are deployed in each region to absorb single replica failures without regional outages.
  1. Workload Isolation:
    • Tiered Request Routing: Requests are categorized into low-priority and high-priority tiers and routed to separate PostgreSQL instances. This isolates resource-intensive "noisy neighbor" workloads, preventing them from degrading the performance of critical services. This strategy extends across different products and services.
  1. Connection Management:
    • Connection Pooling: PgBouncer is deployed as a proxy layer, operating in statement or transaction pooling mode, to efficiently reuse database connections. This significantly reduces active client connections and connection setup latency (from 50ms to 5ms).
    • Co-location: PgBouncer, clients, and replicas are co-located within the same region to minimize network overhead.
    • Timeout Configuration: Careful configuration of idle timeouts prevents connection exhaustion.
  1. Caching Strategy:
    • Cache Locking/Leasing: To prevent cache-miss storms from overwhelming PostgreSQL, a cache locking/leasing mechanism is implemented. When multiple requests miss the same cache key, only one request fetches the data from PostgreSQL, acquiring a lock, while others wait for the cache to be populated. This dramatically reduces redundant database reads.
  1. Replication Scalability:
    • Cascading Replication (Future): To overcome the primary's eventual limit in streaming Write Ahead Log (WAL) data to an ever-increasing number of replicas, OpenAI is collaborating with Azure PostgreSQL on cascading replication, where intermediate replicas relay WAL to downstream replicas. This aims to scale beyond 100 replicas without overwhelming the primary, albeit with increased operational complexity around failover.
  1. Rate Limiting:
    • Multi-layer Protection: Rate limiting is applied across multiple layers—application, connection pooler, proxy, and query—to prevent sudden traffic spikes, expensive query surges, or retry storms from exhausting resources and causing cascading failures.
    • Targeted Load Shedding: The ORM layer supports rate limiting and can fully block specific query digests for rapid recovery from sudden surges of expensive queries.
  1. Schema Management:
    • Lightweight Changes: Only lightweight schema changes are permitted (e.g., adding/removing columns that don't trigger full table rewrites).
    • Timeouts and Concurrency: A strict 5-second timeout is enforced on schema changes, and concurrent index creation/dropping is allowed.
    • New Tables in Sharded Systems: New tables for new features are mandated to be in alternative sharded systems, not the existing PostgreSQL deployment.

These comprehensive engineering efforts enable OpenAI to achieve five-nines availability and low double-digit millisecond p99 client-side latency for their critical products, demonstrating that PostgreSQL, with careful design and optimization, can sustain extreme production workloads. Future plans include further migration of challenging write-heavy workloads, leveraging cascading replication, and exploring sharded PostgreSQL or alternative distributed systems as demands continue to grow.