Replication
Copying the same data from one database to one or more replicas (add redundancy).
Types
Primary - Replica
One primary DB that handle all writes (INSERT, UPDATE, DELETE) and one or more replica DBs that handle the reads (SELECT). Replicas get copied data from primary DB.
The Primary DB sends Changelogs (WALs) to the replicas, replicas apply changes to stay up to date.
Why use it?
- if you have a lot of reads, replicas can help distribute the load instead of reading from single db instances.
- if the primary Db crashes, replicas can take over.
Pros:
- Scale Reads.
- High Availability (even when primary DB fails you can promotes one of replica to primary DB).
- Backups can be taken from Replica.
Cons:
- Can’t scale Writes (all writes still goes to primary DB).
- Data Lag (Eventually consistent) -> sometimes Data in replica is not up to date with primary.
- If Primary DB crashes failovers takes time.
Failovers:
- Detection: Monitoring detect a Primary DB failure.
- Promotion: One of the replica is promoted to Primary.
- Reconfiguration: Other replicas point the the new promoted DB.
Tools for Failovers Automation:
- MySQL: MHA and Orchestrator.
- PostgreSQL: Patroni and repmgr .
Active - Active (Primary-Primary)
Multiple Primary DBs each of them can accept both writes and reads, they sync with each other to stay up to date.
Why use it?
- Write intensive Application
- Geographically distributed Application
Pros:
- Scale Writes across locations/regions.
- No SPOF -> if one fails, others still works
- lower latency for global app
Cons:
- Conflicts -> what if 2 or more node update the same row (require conflict resolutions)
Conflict Resolutions: Because multiple nodes can accept write operations, conflicts can occur when the same row is modified concurrently on different primaries.
Conflict Resolution Strategies:
- Last Write Wins (LWW).
- Application Defined (Custom)
- Conflict-Free Replicated Data Types (CRDTs)
Strategy
Synchronous
Transactions on primary DB declared complete only when those changes have been replicated to all replicas, this ensures that all replicas have the latest data.
- Pros:
- Strong data consistency.
- No data loss in case of primary failure.
- Cons:
- Increased latency due to waiting for acks.
- Potential performance bottlenecks. Use when you can’t risk data loss (e.g., banking, inventory) and you’re okay with slightly slower write performance.
Asynchronous
As the name suggest, primary DB can commits transactions without waiting for replicas.
- Pros:
- Lower latency and higher throughput.
- Better performance for write-heavy applications.
- Cons:
- Risk of data loss if the primary fails before replication.
- Replicas may serve stale data. Use when you care more about speed than perfect sync.
Semi-synchronous:
Primary Db waits until changes have been replicated to at least one replica.
Summary
Type (Strategy) | Who handles writes? | Consistency | Pros | Cons | Best Use |
---|---|---|---|---|---|
Primary - Replica (Async) | Only one primary | Eventual | Simple, fast reads | Stale replicas, primary bottleneck | Read-heavy apps |
Primary - Replica (Sync) | Only one primary | Strong | Safe writes | Slower reads, blocking | Critical data |
Primary-Primary | Multiple Primary | Needs conflict resolution | Fast, no SPOF | Complex, conflict risk | Multi-region writes |
Sharding
Splitting database horizontally across servers, where each server holds a subset of data. manage multiple independent databases (shards). App or middleware decide where to read/write.
Why use it?
- When data is too big on 1 server.
- To scale out and reduce load.
- Optimize User geo-loc. Pros:
- Linearly scales with number of shards.
- Each shard is smaller -> better performance
- Independent failure/recovery of shards. Cons:
- Complex queries (e.g., joins) across shards are difficult (or in some cases impossible).
- Application or middleware must route queries.
- Data balancing and shard rebalancing is difficult.
- Transactions across shards are complex (need 2 phase commit)/
Choosing Shard key:
Ideal shard key properties:
Property | Why it Matters |
---|---|
High Cardinality | Avoid putting too much data in one shard |
Even Distribution | Balance load and storage across all shards |
Frequently Queried | So that queries know which shard to go to |
Stable | Shouldn’t change often (don’t use usernames/emails) |
Locality (if needed) | If you want region-aware routing (geo-sharding) |
Problems
- *Celebrity problem (Hotspot):
- Unbalance access to certain key (for example social apps where a single celebrity can have millions of reads per day)
- one way to solve this by allocate each shard to single celebrity (and further partition it if needed)
- Unbalance access to certain key (for example social apps where a single celebrity can have millions of reads per day)
- Resharding Data
- Needed when a single shard no longer hold more data, solution: Consistent Hashing
A celebrity user causes huge amount of writes/reads, all their data goes to 1 shard -> This shard becomes a hotspot.