
Scaling PostgreSQL on Azure with Citus
Introduction
As data grows, so does the complexity of managing and querying it efficiently. Many companies start with a single-node PostgreSQL setup, which is reliable, battle-tested, and feature-rich. But what happens when your data size outgrows the capacity of a single machine? What if you’re building a SaaS application that needs to support hundreds or thousands of tenants? Or you’re running analytical workloads over billions of rows?
This is where Citus, a PostgreSQL extension that brings distributed database capabilities, becomes invaluable. And when used on Azure—as Azure Cosmos DB for PostgreSQL—it offers powerful scaling options in a fully managed environment.
This blog post provides an in-depth look at Citus on Azure, exploring its architecture, key features, and ideal use cases. We’ll break down how it works, why it matters, and how you can start using it effectively.
What Is Citus?
Citus is an open-source extension that transforms PostgreSQL into a distributed database by enabling data sharding, parallel query execution, and replication. Instead of treating your database as a single monolith, Citus enables you to scale out by distributing data and queries across a cluster of machines.
On Azure, Microsoft provides a fully managed Citus service under the name Azure Cosmos DB for PostgreSQL. This service simplifies operations like scaling, monitoring, backup, high availability, and disaster recovery.
Key characteristics of Citus:
- Horizontal scaling: Split data across multiple nodes (shards).
- Parallel query execution: Leverage worker nodes for concurrent query processing.
- PostgreSQL compatibility: Use familiar SQL, drivers, and tools.
- Flexible architecture: Combine distributed and reference tables for efficiency.
- Managed experience: Benefit from cloud-native integrations on Azure.
Core Architecture of Citus
Citus introduces a coordinator/worker architecture:
- Coordinator Node: The entry point for applications. Handles query planning and result aggregation.
- Worker Nodes: Store actual data shards and perform distributed operations.
When you issue a SQL query, the coordinator breaks it down into tasks and distributes them across the workers for parallel execution. The results are then aggregated and returned.
This architecture allows for:
- Distributed joins: When data is colocated.
- Massive parallel processing: For large aggregations.
- Incremental scalability: Add nodes and rebalance data.
Citus Table Types
1. Distributed Tables
Distributed tables are partitioned across worker nodes using a distribution column. This is typically an identifier like tenant_id
, user_id
, or customer_id
.
SELECT create_distributed_table('orders', 'user_id');
- Citus creates shards based on the distribution key.
- Each shard lives on a worker node.
- Queries involving this key can be routed directly to the correct shard.
Benefits:
- Efficient parallelism.
- Minimal data movement for colocated joins.
- Horizontal scalability.
Considerations:
- The distribution column should be frequently used in queries.
- It should evenly distribute data.
2. Reference Tables
Reference tables are small and immutable tables that are replicated across all worker nodes.
SELECT create_reference_table('countries');
These are ideal for:
- Lookup tables (e.g., countries, currencies, settings).
- Joining with distributed tables without network overhead.
How it Works:
- Any update to a reference table is propagated to all workers.
- Joins with distributed tables are performed locally.
3. Local Tables
Local tables are not distributed and reside only on the coordinator. Use them for:
- Admin or metadata tables.
- Tables not involved in distributed queries.
Distributed SQL Query Execution
Let’s consider a query involving a distributed orders
table and a reference users
table:
SELECT o.id, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.created_at >= now() - interval '7 days';
If both tables are correctly defined (e.g., orders is distributed by user_id
and users is a reference or colocated table), Citus will:
- Identify the relevant shards based on
user_id
. - Route the query to workers.
- Perform local joins on each worker.
- Aggregate and return results via the coordinator.
Citus builds an execution plan depending on:
- Type of query (SELECT, INSERT, UPDATE, etc.)
- Table types involved.
- Whether the data is colocated.
You can inspect the execution plan using:
EXPLAIN ANALYZE SELECT ...
Real-Time Performance at Scale
Citus is designed for speed. Here’s how it achieves high performance:
- Parallel processing: Each shard can be queried independently.
- Colocated joins: When data is partitioned properly, joins are local.
- Pushdown of predicates: Filters and aggregations are executed on the workers.
- Write scaling: Insert and update operations can scale with shard count.
Example: A SaaS platform with 1M tenants can distribute tenants across 100 shards. This makes reporting and querying for a single tenant extremely fast and efficient.
Citus Use Cases
✅ Ideal Scenarios
Scenario | Why Citus Works |
---|---|
SaaS platforms | Tenants can be sharded by ID. |
Real-time analytics | Large aggregations run in parallel. |
Multi-region setups | Replication and failover via Azure. |
Event-driven systems | Time-series data can be distributed. |
JSON/Geo/PostGIS data | Still works thanks to PostgreSQL base. |
🚫 Less Ideal Scenarios
- Heavy use of cross-shard joins.
- OLTP workloads requiring strict ACID across multiple shards.
- Small datasets where overhead outweighs benefits.
Schema Design Tips
- Choose the right distribution key: Must be frequently queried.
- Keep reference tables small: To avoid replication overhead.
- Avoid cross-shard transactions: Use colocated tables when possible.
- Use foreign keys only within local context: Global foreign keys are not supported.
- Use
EXPLAIN
to inspect plans: Optimize performance with query insight.
Azure Integration
Azure provides a fully managed Citus experience with Azure Cosmos DB for PostgreSQL. You get:
- Managed backups and patching.
- Monitoring and insights via Azure Monitor.
- Scalable infrastructure with elastic compute and storage.
- Role-based access control (RBAC).
- Zone-redundant high availability.
- Point-in-time restore (PITR).
Getting Started
- Create a Cosmos DB for PostgreSQL cluster via Azure Portal.
- Configure coordinator and worker node counts.
- Connect using standard PostgreSQL tools (psql, DBeaver, etc.).
- Create distributed/reference tables.
- Start scaling your data.
Query Patterns & Optimization
Parallel Aggregates
SELECT user_id, COUNT(*) FROM events GROUP BY user_id;
Citus executes this in parallel across shards, then aggregates the results.
Time-based Filters
SELECT * FROM metrics WHERE timestamp > now() - interval '1 day';
If metrics is sharded by device_id
, queries can target relevant shards using filters.
Avoiding Cross-Shard Joins
Cross-shard joins require data movement. Prefer colocated joins:
SELECT o.id, p.name
FROM orders o
JOIN products p ON o.product_id = p.id
WHERE o.tenant_id = 123;
Both tables should be distributed by tenant_id
for optimal performance.
Security and Reliability on Azure
Azure provides enterprise-grade security for Cosmos DB for PostgreSQL:
- Encryption at rest and in transit.
- Private Link support.
- Firewall rules and VNETs.
- Audit logging.
- Compliance with standards like ISO, SOC, and HIPAA.
Monitoring and Maintenance
You can monitor your Citus cluster via:
- Azure Monitor: Metrics like CPU, connections, IOPS.
- pg_stat_activity: Inspect running queries.
- Citus internal views:
pg_dist_shard
,pg_dist_node
, etc. - Logs: Capture slow queries, errors, warnings.
Maintenance options:
- Automatic patching.
- Manual upgrades.
- Rolling restarts.
- Shard rebalancing.
Conclusion
Citus brings the power of distributed systems to the world of PostgreSQL. Whether you’re building a modern multi-tenant SaaS, processing billions of rows of data, or needing real-time analytics, Citus provides a compelling option. And when deployed via Azure Cosmos DB for PostgreSQL, it’s easier than ever to scale without sacrificing the PostgreSQL experience you know and love.
Before diving in, spend time on schema design, choosing the right distribution strategy, and understanding the trade-offs of distributed systems. But once you’re set up, the rewards in performance and scalability are substantial.
Further Resources
- Azure Cosmos DB for PostgreSQL Docs
- Citus Data GitHub
- Citus Docs
- Citus Blog
- Distributed PostgreSQL Use Cases