Solving the Dual Write Problem: Transactional Outbox Pattern with Debezium and MongoDB

14 min read

We run a platform where multiple services write to MongoDB and publish events to Kafka. Around 20 million messages flow through it every day. We needed both writes, the database state and the Kafka event, to either both succeed or both fail. The naive approach is two writes in application code, but no combination of try/catch, transactions, or retries across two systems solves that reliably. The failure window always exists: one write succeeds, the other doesn’t, and the system is silently inconsistent. This is the dual write problem.

We solved it with the transactional outbox pattern. Instead of publishing to Kafka from application code, we write both the domain entity and the intended event into the database within a single atomic transaction. A separate process tails the database’s change stream, picks up new outbox entries, and publishes them to Kafka. This is Change Data Capture (CDC), and Debezium is the tool we use to implement it. The database transaction guarantees consistency; Debezium handles reliable delivery.

The trade-off: we’re adding an infrastructure component (Debezium/Kafka Connect), introducing eventual consistency (events are published asynchronously, not synchronously with the write), and accepting at-least-once delivery semantics that require idempotent consumers. For our case, where data consistency and auditability are non-negotiable, this trade was worth it.

What the Outbox Pattern Guarantees (and What It Doesn’t)

The contract we sign up for has two sides.

What we get:

Atomic persistence of state + event intent. The database transaction guarantees that a domain entity is never written without its corresponding outbox event (and vice versa). The system won’t silently drop events or create phantom state.
At-least-once delivery to Kafka. Debezium tracks its offset in Kafka. If the connector crashes and restarts, it replays from the last committed offset. Events may be delivered more than once, but they won’t be lost.
Ordering per aggregate. Because aggregateid is used as the Kafka partition key, all events for the same business entity arrive in order on the same partition.

What we don’t get:

Exactly-once end-to-end delivery. Debezium provides at-least-once semantics. Connector restarts, rebalances, and snapshot replays can all produce duplicate events. This is by design: at-least-once is the safe default in distributed systems.
Immediate consistency. Events are published asynchronously. There is a window (typically milliseconds to low seconds, longer under load) where the database has state that Kafka consumers haven’t seen yet.
Cross-aggregate ordering. Events for different aggregates may land on different partitions and be consumed in any order. Global ordering requires a single partition, which kills parallelism.

How the Outbox Pattern Works

The core idea: stop writing to two systems. Write to one.

Instead of publishing directly to Kafka, we write both the domain entity and the event we intend to publish into the database, within a single atomic transaction. It’s still two writes, but they share a transaction boundary, and the database guarantees both succeed or both roll back. A separate process, the message relay, reads new entries from the outbox table and publishes them to Kafka.

function placeOrder(order) {
  session = db.startTransaction();
  try {
    outboxEvent = buildOutboxEvent(order);
    db.orders.save(order, session);
    db.outbox.save(outboxEvent, session);
    session.commit();
  } catch (err) {
    session.rollback();
  }
}

The database transaction guarantees atomicity: either both the order and the outbox entry are written, or neither is. The Kafka publish is now decoupled. It happens asynchronously via the message relay, which can retry independently without risking inconsistency.

Outbox Table Design

The outbox collection is append-only. Once an entry is written, it is never mutated. This simplifies reasoning about consistency and makes the collection safe for CDC consumers.

A typical outbox document in MongoDB:

{
  "_id": ObjectId("..."),
  "aggregateid": "order-1",
  "aggregatetype": "orders",
  "type": "OrderPlaced",
  "payload": {
    "orderId": "order-1",
    "customerId": "customer-42",
    "items": [
      { "sku": "WIDGET-001", "qty": 2, "price": 19.99 },
      { "sku": "GADGET-007", "qty": 1, "price": 49.99 }
    ],
    "totalAmount": 89.97,
    "placedAt": "2025-03-02T10:30:00Z"
  }
}

Key fields (note: the lowercase field names aggregateid, aggregatetype follow Debezium’s Outbox Event Router conventions; these are the names the router expects by default):

aggregateid: The business identifier. This becomes the Kafka message key, ensuring all events for the same aggregate land on the same partition (preserving ordering within that partition; cross-aggregate ordering is not guaranteed).
aggregatetype: Determines the destination Kafka topic (e.g., "orders" → topic orders).
payload: The event body. We shape the event for consumers, not for internal storage.

Why Not Something Simpler?

The simpler approaches all break for the same reason: every one of them has a failure window where one write succeeds and the other doesn’t.

Approach 1: Save, Then Publish

function placeOrder(order) {
  db.orders.save(order);
  kafka.publish("orders", order);
}

If db.save succeeds but kafka.publish fails (broker down, network partition, timeout), the database has an order that no downstream consumer will ever see. The event is lost. We can’t roll back a committed write without another round-trip, and even that introduces its own failure window.

Approach 2: Flip the Order

function placeOrder(order) {
  kafka.publish("orders", order);
  db.orders.save(order);  // ← what if this fails?
}

Now if Kafka succeeds but the DB write fails, downstream services process an event for an order that was never persisted. The inconsistency is just mirrored. Flipping the order doesn’t change the fundamental problem.

Approach 3: Wrap in a Database Transaction

function placeOrder(order) {
  session = db.startTransaction();
  try {
    db.orders.save(order, session);
    kafka.publish("orders", order);   // ← success
    session.commit();                 // ← what if this fails?
  } catch (err) {
    session.rollback();
  }
}

Database transactions only guarantee atomicity for database operations. Kafka has its own transaction mechanism, but it doesn’t span beyond Kafka to external systems. If kafka.publish succeeds and then session.commit() fails, we can roll back the database, but the Kafka event is already in the topic. Consumers have it. We can’t un-publish it.

Note on Kafka Transactions: Kafka transactions guarantee atomicity across multiple Kafka topics (e.g., a message lands in topic A and topic B, or neither). They do not extend beyond Kafka to external systems like databases. There is no distributed transaction spanning Kafka and MongoDB.

Approach 4: Retries

function placeOrder(order) {
  db.orders.save(order);  // ✓ committed

  while (attempts < MAX_RETRIES) {
    try {
      kafka.publish("orders", order);
      return;
    } catch (err) {
      attempts++;
    }
  }
}

This handles transient Kafka failures, but only if the process stays alive. If the pod crashes after the DB write but before Kafka publish succeeds, the event exists only in memory. It’s gone. And if retries are exhausted because the Kafka cluster is genuinely unavailable, we’re back to an inconsistent state with no recovery path.

Every approach that involves writing to two systems in application code has a failure window where one write succeeds and the other doesn’t. The outbox pattern eliminates this by never writing to two systems. The database transaction is the single source of truth.

Implementing the Message Relay

The outbox pattern shifts the problem from atomic two-system writes to reliably reading new outbox entries and publishing them to Kafka. We considered two approaches.

Option 1: Polling

A polling service queries the outbox table on a fixed interval, fetches new records, publishes them to Kafka, and marks them as sent.

Pros:

Easy to implement. A query-on-interval loop is well-trodden ground.
We own the implementation end-to-end: query logic, retry strategy, error handling.
No dependency on database internals: we only need the query language.

Cons:

Latency: A 10-second polling interval means events are delivered up to 10 seconds late. For real-time processing, that’s a problem.
Constant database load: The poller hammers the database even when there are no new events. Under high throughput, when our MongoDB cluster is already at 80% CPU, these extra queries add meaningful load.
Scaling is hard: Spinning up multiple poller instances means they all run the same query. We need distributed locking to avoid duplicate publishes, which reintroduces coordination complexity.
State management paradox: The poller needs to track which records it has already published (typically via a status column). But updating that status after a Kafka publish is… another dual write. We’re back to the same problem we were trying to solve.

Option 2: Transaction Log Tailing

Instead of polling a table, we tail the database’s internal transaction log and capture change events directly. This is event-driven. We react to writes as they happen, with no polling interval and no additional queries.

Each database exposes its log differently:

Database	Log Mechanism
MongoDB	Oplog / Change Streams
PostgreSQL	WAL (Write-Ahead Log)
MySQL	Binlog

For MongoDB, Change Streams provide a high-level API over the oplog. We subscribe to changes on a specific collection (or database, or entire cluster), apply filters, and get events pushed to our cursor. Change Streams also provide a resume token so we can restart from exactly where we left off after a crash.

We picked transaction log tailing. It adds no load and doesn’t suffer from the state management paradox that plagues polling. The catch: building a reliable log-tailing service ourselves means handling resume points across multiple application instances, managing state, implementing retries, and dealing with oplog rotation. None of that is trivial to build, test, and maintain.

That’s where Debezium comes in.

Debezium: CDC for the Outbox Pattern

Debezium is an open-source distributed platform for Change Data Capture, built on top of Kafka Connect:

Kafka Connect is the framework. It handles task scheduling, offset management, fault tolerance, and data conversion. It doesn’t know about specific databases.
Debezium provides database-specific source connectors that plug into Kafka Connect. For MongoDB, the Debezium connector uses Change Streams (or direct oplog tailing) to capture changes.

Debezium also ships a purpose-built Outbox Event Router SMT (Single Message Transform) for MongoDB that maps outbox document fields to Kafka message properties (topic routing, partition key, payload extraction) without custom code.

How Debezium Manages State

Debezium stores its state in three internal Kafka topics:

Config topic: Connector configurations and task assignments across Kafka Connect worker nodes.
Offset topic: Tracks the last processed position (e.g., Change Stream resume token) per connector. On restart, the connector resumes from exactly where it left off.
Status topic: Health and lifecycle state of each connector (running, paused, failed).

Because state lives in Kafka (not in the connector process), Debezium is stateful in data, stateless in deployment. We can restart or reschedule connector tasks across nodes without losing our place.

Configuration Deep Dive

The Connector Configuration

Connectors are registered via the Kafka Connect REST API (POST /connectors). The fields that matter for the outbox pattern:

{
  "name": "mongo-outbox-connector",
  "config": {
    "connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
    "tasks.max": "1",
    "collection.include.list": "demo.outbox",
    "capture.mode": "change_streams_update_full",
    "snapshot.mode": "no_data",

    // ── Outbox Event Router ──
    "transforms": "outbox",
    "transforms.outbox.type":
      "io.debezium.connector.mongodb.transforms.outbox.MongoEventRouter",

    // Route to Kafka topic based on the "aggregatetype" field value
    "transforms.outbox.route.by.field": "aggregatetype",
    "transforms.outbox.route.topic.replacement": "${routedByValue}",

    // Map outbox document fields to Kafka message properties
    "transforms.outbox.collection.field.event.id": "_id",
    "transforms.outbox.collection.field.event.key": "aggregateid",
    "transforms.outbox.collection.field.event.payload": "payload",
    "transforms.outbox.collection.expand.json.payload": "true"
  }
}

The remaining config (connection string, topic prefix, converters) is standard Debezium/Kafka Connect boilerplate.

Without the Outbox Event Router, Debezium would emit raw CDC envelopes (with before/after state, source metadata, etc.) to a single topic named after the collection. The router transforms outbox documents into clean domain events and routes them to the correct topic based on aggregatetype.

Three configuration choices matter most:

snapshot.mode: Use "initial" to backfill all existing records when setting up Debezium on an existing collection. Use "no_data" to capture only changes going forward.
tasks.max: This should match the number of MongoDB shards, not replicas. Each task reads the change stream from one shard in parallel. For an unsharded deployment, keep this at 1.
aggregateid as the Kafka key: This ensures all events for the same business entity land on the same Kafka partition, preserving ordering per aggregate. Note that ordering is only guaranteed within a partition; events across different aggregates may be consumed in any order.

Registering the Connector

curl -s -X POST \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  http://localhost:8083/connectors \
  -d @connector.json | jq .

Once registered, insert a document into the outbox collection:

db.getSiblingDB("demo").outbox.insertOne({
  aggregateid:   "order-1",
  aggregatetype: "orders",
  type:          "OrderPlaced",
  payload: {
    orderId:    "order-1",
    customerId: "customer-42",
    items: [
      { sku: "WIDGET-001", qty: 2, price: 19.99 },
      { sku: "GADGET-007", qty: 1, price: 49.99 }
    ],
    totalAmount: 89.97,
    placedAt:    new Date().toISOString()
  }
});

The event appears on the orders Kafka topic (routed by aggregatetype) with order-1 as the partition key (from aggregateid).

Operational Considerations

Retention and TTL

The outbox table doesn’t need to hold data forever. In MongoDB, a TTL index on a timestamp field auto-expires entries. A 7-day retention window is long enough to debug and replay failures, short enough to keep collection size manageable.

Be aware that MongoDB’s TTL deletion is best-effort: a background task runs roughly every 60 seconds and removes up to ~50,000 expired records per pass. It doesn’t guarantee timely cleanup. At high throughput (millions of events per day), this means TTL deletion can fall significantly behind inserts, causing the collection to grow unbounded until the backlog clears.

Strategies to manage this include using time-bucketed collections (one collection per day or week) and dropping entire collections once they expire, which is far cheaper than row-by-row deletion.

TTL + CDC edge case: TTL deletions are real write operations. They appear in the Change Stream. If we tail the outbox collection with Debezium, consumers receive delete events for expired documents. Consumers must either filter on operation type (only process inserts) or handle deletes gracefully. The Outbox Event Router handles this by default, but it matters when building a custom CDC pipeline.

Oplog Rotation

MongoDB’s oplog is a capped collection with a fixed size. Under heavy write load, older oplog entries are overwritten. If Debezium’s connector falls behind (e.g., due to a prolonged outage or high MongoDB load), its resume token may reference an oplog entry that no longer exists. The connector will fail to resume and require re-snapshotting.

Mitigation strategies:

Size the oplog for write throughput and maximum expected downtime.
Monitor the lastEvent timestamp from Debezium’s JMX metrics to detect lag early.
Use snapshot.mode: initial as a recovery mechanism: re-register the connector to re-read the full collection state.

Event Delay and Backpressure

When MongoDB is under heavy load (high CPU, many concurrent operations), Change Stream delivery can lag. This is inherent to the shared oplog. Change Streams are not a dedicated real-time channel; they read from the same oplog that serves replication. Monitor end-to-end latency (outbox insert timestamp vs. Kafka message timestamp) as a key operational metric.

The same applies in reverse: if Kafka is slow to accept writes (broker overload, replication lag), Debezium’s internal buffers grow and connector throughput drops. The connector won’t lose events (offsets only advance after successful Kafka writes), but the growing lag means our system is running further behind real-time. We set up alerts on connector lag and buffer utilization.

Deployment

For production, Debezium Connect can be deployed via:

Docker image (quay.io/debezium/connect): ships with all database connector plugins pre-installed.
Strimzi Operator: for Kubernetes-native deployments with managed Kafka Connect clusters, rolling upgrades, and connector CRD management.

Key monitoring metrics (exposed via JMX on Kafka Connect):

lastEvent: timestamp of the last captured change event.
snapshotCompleted: whether the initial snapshot phase has finished.
Connector status: running, paused, or failed (available via the REST API at /connectors/{name}/status).

What We Still Need to Handle

The outbox pattern and Debezium solve the producer-side consistency problem. But they push certain responsibilities to the consumer side that we can’t ignore.

Idempotent Consumers

Since Debezium delivers at-least-once, consumers will see duplicate events. Common causes: connector restarts, task rebalances, re-snapshotting after oplog rotation, or Kafka consumer group rebalancing. Every consumer must be able to process the same event twice without side effects.

Two approaches:

Event ID deduplication: Use the outbox document’s _id (which maps to the Kafka message’s event ID via the Outbox Event Router) as a deduplication key. Track processed event IDs in the consumer’s own database and skip duplicates. This works cleanly but requires storage.
Idempotent operations by design: When the consumer’s action is naturally idempotent (e.g., setting a field to a value rather than incrementing it), duplicates are harmless without explicit tracking.

Dead-Letter Handling

When a consumer fails to process an event after retries (poison pill messages, schema mismatches, unexpected payloads), we need somewhere for those events to go. A dead-letter topic or collection prevents a single bad event from blocking all downstream processing. This isn’t specific to the outbox pattern, but it matters most in at-least-once systems where we can’t afford to lose the broader event stream while debugging one failure.

Wrapping Up

The dual write problem is not a corner case. It’s a structural issue in any system that writes to a database and publishes to a message broker. The transactional outbox pattern addresses it at the architectural level by collapsing two writes into one atomic database transaction, and decoupling the event publishing into a separate, retryable process. Debezium eliminates the need to build and maintain that relay process ourselves.

Thanks for reading. Have a great day!