[EDD] Event-Driven Design

Event-Driven Design (EDD) is an architecture where the system reflects and communicates important business changes through events. Services react to what happens, rather than through direct calls.

EDD is ideal for systems that are Distributed, Scalable, and Resilient.

💡

Assume failures, design for them, and use events as the language of the business. It is often used alongside DDD.

What is an event?

An event represents a domain fact that has already occurred and cannot be changed.

Examples: OrderCreated PaymentConfirmed UserRegistered

⚠️

An event is not an intention, it is a statement of the past.

Core components in EDD

Component	Description	Responsibilities	Examples
Broker	Ensures event delivery	Message persistence Retries Ordering within a partition	Kafka, RabbitMQ
Producer	Service that emits the event	Generate the event after a domain change Send the event to the broker	—
Consumer	Service that processes the event	Process the event correctly Be idempotent Handle retries Send ACK to the broker	—

Eventual Consistency

⚠️

In distributed systems like those using EDD, not all parts of the system are consistent at the same time. Each service processes events at its own pace. Over time, all converge to the same state.

Example:

States may differ temporarily, but the system eventually becomes consistent.

Failures in distributed systems

⚠️

In distributed systems, failures are not an exception, they are the norm.

Common principles:

At-least-once: the event may arrive more than once; the broker and consumer must handle this.
At-most-once: the event may be lost.
Must-once: the event arrives at least once; if it arrives multiple times, the consumer must be idempotent (final effect is the same whether it’s delivered once or several times).
Exactly-once: the event is received only once; this rarely happens and is expensive, so it cannot be assumed.

Core Principles

1. Events reflect domain state changes

They are only emitted when something relevant to the business changes.

2. Events are immutable

Once emitted, they do not change. If something changes, a new event is emitted.

3. Idempotency

A consumer must be able to process the same event multiple times without side effects.
This is critical because:

Distributed systems fail
Retries occur
Duplicates may exist

Processing N times must produce the same result as processing once.

Learn more at Idempotency in Distributed Systems.

4. Event ordering

In some flows, order matters.

Example: events per customer, partition key would be the customer ID (client_id).

This is achieved with partitions.

Within a partition, events are processed in FIFO order.

5. Durability and reliability

Events must not be lost.

Achieved through: persistent brokers, logs, and database persistence.

6. Security

🚫

Events do not expose sensitive data or internal schemas; they follow clear contracts (schema, version).

Event Authentication through producers and consumers. 👉 The authentication does not occur in Kafka or the event. 👉 It occurs when a microservice exposes or consumes a synchronous API.

Events are not authenticated; they are authorized/secured by infrastructure:

Infrastructure security:
- TLS
- SASL
- ACLs de Kafka: ACL (Access Control List) defines what identity can perform an action on what resource.
  - Example: an ACL per topic is a rule like: “Service X can PRODUCE in topic Y, but not CONSUME.”
Authorization by:
- Topic
- Consumer group
- Service

Kafka ensures:

Who can publish
Who can consume

7. Observability

👀

It should be possible to:

Event Trace: trace an event end-to-end
Debug failures
Correlate events

🚫

Events → what happened (business event)
Traces/logs → how it happened (flow and processing time)

Example of traces for a CustomerCreated event published by a microservice:

{
  "event": "CustomerCreated",
  "customerId": "123",
  "traceId": "trace-1001",  // the entire creation operation
  "spanId": "span-02",      // this specific step: publishing the event
  "createdAt": "2026-01-22T12:00:00Z"
}

traceId connects the event through all microservices.
spanId shows exactly which step produced this event.

If you only include traceId without spanId, you can still track the flow, but you won’t be able to measure latency or pinpoint which step generated each event.

Idempotency in depth

As mentioned before, in distributed systems failures are normal, so events may be duplicated due to:

🚫

Service crashes/restarts

🚫

Network failures

🚫

Broker re-sends or producer retries

⚠️

It is the consumer's responsibility to protect against these cases.

Common techniques

process_event.ts

ingested_events

Idempotence key

💡

Store the event_id and ignore it if already processed

Safe modification operations

💡

Use append/patch instead of insert

Event Sourcing

The state is not stored as a snapshot (final value), it is reconstructed from events. That is, it consists of storing all events that caused state changes to rebuild the current state.

💡

Characteristics:

Events are the source of truth
State can be recalculated
Full audit trail

Not always necessary, but fits very well with EDD.

How to prevent event loss

On the broker

Persistent broker (Kafka / RabbitMQ): events are persisted.
Recovery after crashes: events are resent from the broker's last known state.
Partitions for ordering: events in a partition are processed in FIFO order.
Dead Letter Queue (DLQ): failed events are sent to this queue for later detailed evaluation (even manually).

What is a DLQ?

A queue that allows: reprocessing failed events and analyzing errors.

In the system

Persistent logs: Cloudwatch / ELK / Loki, etc.
Monitoring: allows visualization and analysis of events in real-time and over time (Prometheus/Grafana).
Alerts: allow preventing erroneous behaviors.

On the consumer

Typical flow:

On the producer

The Outbox Pattern applies

Outbox Pattern

Prevents losing events when the DB and broker are not synchronized.

The Outbox Pattern is applied by the consumer service that receives an event, persists a domain change, and as a consequence must publish a new event.

In other words, the actor implementing the Outbox Pattern is the consumer that becomes a producer.

The problem it solves

A common event-driven flow is:

A microservice consumes an event
It persists a domain state change
It publishes a new event to the broker

If the service crashes between steps 2 and 3:

The domain state is already persisted
The outgoing event is never published
The system becomes inconsistent

Role of the Outbox-enabled service

The service:

Consumes upstream events
Persists domain state changes
Produces downstream events

The Outbox Pattern ensures these responsibilities are handled safely.

How it works

The service receives an event
A database transaction is started
The domain change is persisted
The outgoing event is written to an outbox table
The transaction is committed

A separate background worker:

Polls the outbox table
Publishes pending events to the broker
Marks events as SENT

Typical outbox states

PENDING: waiting to be published
SENT: successfully published
(optional) FAILED: failed after retries

What this pattern provides

Guarantees that if the domain state exists, the event will eventually exist
Enables safe retries
Prevents event loss
Decouples domain logic from broker availability

Relationship with idempotency

Because delivery is usually at-least-once:

Downstream consumers must be idempotent
The Outbox Pattern guarantees delivery, not uniqueness

Both patterns complement each other.

💡

In the Outbox Pattern, if the broker goes down, events remain in the outbox table with status PENDING and will be retried when the broker comes back; if the publisher crashes right after committing the transaction, the events are already in the outbox, so no events are lost and the worker can send them when the publisher restarts.

Domain Driven Design Errors in Distributed Systems