NestJSKafkaMicroservicesEvent SourcingCQRS

Building Scalable Event-Driven Architecture with NestJS and Kafka

Bao Trong

January 20, 2026

25 min read

Back to Blog Get in Touch

Back to Blog

NestJSKafkaMicroservicesEvent SourcingCQRS

Building Scalable Event-Driven Architecture with NestJS and Kafka

Bao Trong

January 20, 2026

25 min read

Introduction

Event-driven architecture (EDA) has become the backbone of modern distributed systems. Unlike traditional request-response patterns where Service A directly calls Service B and waits for a response, EDA enables services to communicate asynchronously through events. This fundamental shift in how services interact leads to better scalability, loose coupling, and resilience.

In this comprehensive deep dive, we'll build a production-ready event-driven system using NestJS and Apache Kafka, implementing patterns like Event Sourcing and CQRS (Command Query Responsibility Segregation). By the end of this article, you'll understand not just the "how" but also the "why" behind each architectural decision.

Why Event-Driven Architecture?

Before diving into implementation, let's understand the problems that event-driven architecture solves. Traditional monolithic and synchronous microservice architectures face several challenges at scale:

The Problems with Synchronous Communication

Tight Coupling: In a synchronous system, when Service A needs data from Service B, it makes a direct HTTP call. This creates a compile-time and runtime dependency. If Service B changes its API, Service A must be updated. If Service B is slow, Service A is slow. If Service B is down, Service A fails.

Cascading Failures: Imagine a checkout flow: Order Service → Payment Service → Inventory Service → Notification Service. If Notification Service is down, the entire checkout fails, even though the payment was successful. This is the infamous "distributed monolith" anti-pattern.

Scaling Limitations: In monolithic systems, you must scale the entire application even if only one component is under load. In synchronous microservices, scaling one service often requires scaling its dependencies too.

Temporal Coupling: Both services must be available at the same time. This seems obvious, but in distributed systems, "at the same time" is harder than it sounds due to network partitions, deployments, and failures.

How Event-Driven Architecture Solves These Problems

Event-driven architecture addresses these challenges by introducing a message broker (like Kafka) between services. Instead of Service A calling Service B directly:

Service A publishes an event to the broker: "OrderCreated"
The broker stores and distributes this event
Service B (and C, D, E...) subscribe to this event and react asynchronously

This simple change has profound implications:

Loose Coupling: Services don't know about each other; they only know about events
Resilience: If Service B is down, events queue up and are processed when it recovers
Scalability: Each service scales independently based on its own load
Temporal Decoupling: Services don't need to be available simultaneously

Event-Driven Architecture Flow

The diagram above shows how a single event from Service A can trigger reactions in multiple downstream services through Kafka, without Service A knowing or caring about who consumes its events.

Understanding Apache Kafka

Before we write code, let's understand Kafka's core concepts that make it ideal for event-driven systems:

Topics: Named channels where events are published. Think of them as categories: "order-events", "user-events", "payment-events".

Partitions: Topics are divided into partitions for parallelism. Events with the same key (e.g., order ID) always go to the same partition, ensuring ordering.

Consumer Groups: Multiple instances of a service form a consumer group. Kafka ensures each partition is consumed by only one instance in the group, enabling horizontal scaling.

Offsets: Each message has an offset (position) in its partition. Consumers track their offset, enabling replay and exactly-once processing.

Retention: Unlike traditional queues, Kafka retains messages for a configurable period (days/weeks). This enables event replay and rebuilding state.

Setting Up NestJS with Kafka

First, let's set up our NestJS project with Kafka integration:

npm install @nestjs/microservices kafkajs
bash

Kafka Module Configuration

// src/kafka/kafka.module.ts
import { Module } from '@nestjs/common';
import { ClientsModule, Transport } from '@nestjs/microservices';

@Module({
  imports: [
    ClientsModule.register([
      {
        name: 'KAFKA_SERVICE',
        transport: Transport.KAFKA,
        options: {
          client: {
            clientId: 'order-service',
            brokers: ['localhost:9092'],
            ssl: process.env.NODE_ENV === 'production',
            sasl: process.env.NODE_ENV === 'production' ? {
              mechanism: 'scram-sha-256',
              username: process.env.KAFKA_USERNAME,
              password: process.env.KAFKA_PASSWORD,
            } : undefined,
          },
          consumer: {
            groupId: 'order-consumer-group',
            sessionTimeout: 30000,
            heartbeatInterval: 3000,
          },
          producer: {
            allowAutoTopicCreation: false,
            transactionTimeout: 30000,
          },
        },
      },
    ]),
  ],
  exports: [ClientsModule],
})
export class KafkaModule {}
typescript

Implementing Event Sourcing

Event Sourcing is a powerful pattern that fundamentally changes how we think about data storage. Instead of storing the current state of an entity (like "Order status = SHIPPED"), we store every change that happened to it as an immutable sequence of events:

OrderCreated (items: [...], customer: "John")
OrderConfirmed (confirmedAt: "2024-01-20")
PaymentReceived (amount: 150.00)
OrderShipped (trackingNumber: "ABC123")

Why Event Sourcing?

Complete Audit Trail: You have a complete history of every change. This is crucial for financial systems, healthcare, and any domain requiring compliance.

Temporal Queries: You can answer questions like "What was the order status at 3 PM yesterday?" by replaying events up to that point.

Debugging: When something goes wrong, you can replay events to understand exactly what happened and in what order.

Event Replay: You can rebuild read models, fix bugs in projections, or create entirely new views by replaying the event history.

Domain Modeling: Events naturally align with business language. "OrderShipped" is more meaningful than "UPDATE orders SET status = 'shipped'".

The Trade-offs

Event Sourcing adds complexity. Consider it when you need:

Complete audit trails (financial, healthcare, legal)
Temporal queries or "time travel"
Complex domains where events capture intent
CQRS (separating read/write models)

Avoid it for simple CRUD applications where the additional complexity isn't justified.

Event Store Implementation

// src/event-store/event-store.service.ts
import { Injectable } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { StoredEvent } from './stored-event.entity';

export interface DomainEvent {
  eventType: string;
  aggregateId: string;
  aggregateType: string;
  payload: Record<string, any>;
  metadata: {
    correlationId: string;
    causationId: string;
    userId?: string;
    timestamp: Date;
    version: number;
  };
}

@Injectable()
export class EventStoreService {
  constructor(
    @InjectRepository(StoredEvent)
    private readonly eventRepository: Repository<StoredEvent>,
  ) {}

  async append(event: DomainEvent): Promise<StoredEvent> {
    // Optimistic locking - check version
    const lastEvent = await this.eventRepository.findOne({
      where: { aggregateId: event.aggregateId },
      order: { version: 'DESC' },
    });

    const expectedVersion = lastEvent ? lastEvent.version + 1 : 1;

    if (event.metadata.version !== expectedVersion) {
      throw new ConcurrencyException(
        `Expected version ${expectedVersion}, got ${event.metadata.version}`
      );
    }

    const storedEvent = this.eventRepository.create({
      eventType: event.eventType,
      aggregateId: event.aggregateId,
      aggregateType: event.aggregateType,
      payload: event.payload,
      metadata: event.metadata,
      version: expectedVersion,
      createdAt: new Date(),
    });

    return this.eventRepository.save(storedEvent);
  }

  async getEvents(
    aggregateId: string,
    fromVersion?: number
  ): Promise<StoredEvent[]> {
    const query = this.eventRepository
      .createQueryBuilder('event')
      .where('event.aggregateId = :aggregateId', { aggregateId })
      .orderBy('event.version', 'ASC');

    if (fromVersion) {
      query.andWhere('event.version > :fromVersion', { fromVersion });
    }

    return query.getMany();
  }

  async replayEvents<T>(
    aggregateId: string,
    reducer: (state: T, event: StoredEvent) => T,
    initialState: T,
  ): Promise<T> {
    const events = await this.getEvents(aggregateId);
    return events.reduce(reducer, initialState);
  }
}
typescript

Aggregate Root Pattern

// src/domain/aggregate-root.ts
export abstract class AggregateRoot {
  private _uncommittedEvents: DomainEvent[] = [];
  private _version: number = 0;

  get uncommittedEvents(): DomainEvent[] {
    return [...this._uncommittedEvents];
  }

  get version(): number {
    return this._version;
  }

  protected apply(event: DomainEvent): void {
    this.when(event);
    this._uncommittedEvents.push(event);
    this._version++;
  }

  protected abstract when(event: DomainEvent): void;

  public loadFromHistory(events: DomainEvent[]): void {
    events.forEach(event => {
      this.when(event);
      this._version++;
    });
  }

  public clearUncommittedEvents(): void {
    this._uncommittedEvents = [];
  }
}

// src/domain/order/order.aggregate.ts
export class OrderAggregate extends AggregateRoot {
  private _id: string;
  private _status: OrderStatus;
  private _items: OrderItem[] = [];
  private _totalAmount: number = 0;

  static create(command: CreateOrderCommand): OrderAggregate {
    const order = new OrderAggregate();
    order.apply({
      eventType: 'OrderCreated',
      aggregateId: command.orderId,
      aggregateType: 'Order',
      payload: {
        customerId: command.customerId,
        items: command.items,
      },
      metadata: {
        correlationId: command.correlationId,
        causationId: command.commandId,
        userId: command.userId,
        timestamp: new Date(),
        version: 1,
      },
    });
    return order;
  }

  confirm(): void {
    if (this._status !== OrderStatus.PENDING) {
      throw new InvalidOperationException('Order must be pending to confirm');
    }

    this.apply({
      eventType: 'OrderConfirmed',
      aggregateId: this._id,
      aggregateType: 'Order',
      payload: { confirmedAt: new Date() },
      metadata: {
        correlationId: generateId(),
        causationId: generateId(),
        timestamp: new Date(),
        version: this.version + 1,
      },
    });
  }

  protected when(event: DomainEvent): void {
    switch (event.eventType) {
      case 'OrderCreated':
        this._id = event.aggregateId;
        this._status = OrderStatus.PENDING;
        this._items = event.payload.items;
        this._totalAmount = this.calculateTotal(event.payload.items);
        break;
      case 'OrderConfirmed':
        this._status = OrderStatus.CONFIRMED;
        break;
      case 'OrderShipped':
        this._status = OrderStatus.SHIPPED;
        break;
    }
  }
}
typescript

CQRS Implementation

CQRS (Command Query Responsibility Segregation) is a pattern that separates read and write operations into different models. This might seem like unnecessary complexity at first, but it solves real problems in complex systems.

Why Separate Reads and Writes?

In traditional architectures, the same model serves both reads and writes. This creates several challenges:

Different Optimization Needs: Writes need to enforce business rules, validate data, and maintain consistency. Reads need to be fast, often joining data from multiple entities. Optimizing for one often hurts the other.

Scaling Asymmetry: Most systems have far more reads than writes (often 100:1 or more). With a single model, you can't scale them independently.

Complex Queries: Read models often need denormalized data (user name + order count + last purchase date). Normalizing for writes means expensive JOINs for reads.

Event Sourcing Compatibility: If you're using Event Sourcing, you don't have a traditional database to query. You need projections (read models) built from events.

How CQRS Works

Command Side (Write Model):

Receives commands (CreateOrder, ConfirmOrder)
Validates business rules
Persists events to the event store
Publishes events to Kafka

Query Side (Read Model):

Subscribes to events
Updates denormalized read models (projections)
Serves fast queries without complex JOINs

The key insight is that the read model is eventually consistent with the write model. After a command is processed, there's a brief delay before the read model reflects the change. For most applications, this delay (typically milliseconds to seconds) is acceptable.

Command Side

// src/order/commands/handlers/create-order.handler.ts
import { CommandHandler, ICommandHandler, EventBus } from '@nestjs/cqrs';
import { Inject } from '@nestjs/common';

@CommandHandler(CreateOrderCommand)
export class CreateOrderHandler implements ICommandHandler<CreateOrderCommand> {
  constructor(
    private readonly eventStore: EventStoreService,
    private readonly eventBus: EventBus,
    @Inject('KAFKA_SERVICE') private readonly kafkaClient: ClientKafka,
  ) {}

  async execute(command: CreateOrderCommand): Promise<string> {
    // Create aggregate
    const order = OrderAggregate.create(command);

    // Persist events
    for (const event of order.uncommittedEvents) {
      await this.eventStore.append(event);

      // Publish to Kafka for other services
      await this.kafkaClient.emit('order.events', {
        key: event.aggregateId,
        value: JSON.stringify(event),
        headers: {
          'event-type': event.eventType,
          'correlation-id': event.metadata.correlationId,
        },
      });
    }

    order.clearUncommittedEvents();
    return order.id;
  }
}
typescript

Query Side with Projections

// src/order/projections/order-list.projection.ts
@Injectable()
export class OrderListProjection {
  constructor(
    @InjectRepository(OrderReadModel)
    private readonly readRepository: Repository<OrderReadModel>,
  ) {}

  @EventsHandler(OrderCreatedEvent, OrderConfirmedEvent, OrderShippedEvent)
  async handle(event: DomainEvent): Promise<void> {
    switch (event.eventType) {
      case 'OrderCreated':
        await this.readRepository.save({
          id: event.aggregateId,
          customerId: event.payload.customerId,
          status: 'PENDING',
          totalAmount: this.calculateTotal(event.payload.items),
          itemCount: event.payload.items.length,
          createdAt: event.metadata.timestamp,
          updatedAt: event.metadata.timestamp,
        });
        break;

      case 'OrderConfirmed':
        await this.readRepository.update(
          { id: event.aggregateId },
          {
            status: 'CONFIRMED',
            updatedAt: event.metadata.timestamp
          }
        );
        break;
    }
  }
}
typescript

Handling Eventual Consistency

In distributed systems, eventual consistency is not just a limitation—it's a fundamental property we must embrace. The CAP theorem tells us we can't have perfect consistency and availability during network partitions. In practice, most systems choose availability and eventual consistency.

What is Eventual Consistency?

When you create an order, the write model is updated immediately. But the read model (the dashboard showing "10 orders today") might take a few hundred milliseconds to reflect the new order. During this window, different parts of your system might show different data.

This isn't a bug—it's a feature. It enables:

Higher availability (services don't wait for each other)
Better performance (no distributed locks)
Simpler scaling (independent services)

Strategies for Handling Eventual Consistency

1. Optimistic UI Updates: Update the UI immediately after a command succeeds, before the read model updates. The user sees instant feedback.

2. Polling/Refresh: For critical operations, poll the read model until it reflects the expected change.

3. Event Subscriptions: Use WebSockets to push updates to the UI when projections update.

4. Compensation: If something goes wrong, publish compensating events to undo the operation.

Saga Pattern for Distributed Transactions

The Saga pattern handles distributed transactions across services. Unlike traditional database transactions (BEGIN, COMMIT, ROLLBACK), sagas use a sequence of local transactions with compensating actions for rollback.

Consider an order fulfillment flow:

Reserve Inventory → If fails, stop
Process Payment → If fails, release inventory
Ship Order → If fails, refund payment and release inventory
Send Notification → If fails, log and retry (non-critical)

Each step is a local transaction. If step 3 fails, we don't "rollback" in the database sense—we execute compensating transactions (refund, release) to undo the previous steps.

// src/sagas/order-fulfillment.saga.ts
@Injectable()
export class OrderFulfillmentSaga {
  constructor(
    private readonly commandBus: CommandBus,
    @Inject('KAFKA_SERVICE') private readonly kafkaClient: ClientKafka,
  ) {}

  @Saga()
  orderCreated = (events$: Observable<any>): Observable<ICommand> => {
    return events$.pipe(
      ofType(OrderCreatedEvent),
      map(event => {
        // Start saga - reserve inventory
        return new ReserveInventoryCommand({
          orderId: event.aggregateId,
          items: event.payload.items,
          correlationId: event.metadata.correlationId,
        });
      }),
    );
  };

  @EventPattern('inventory.reserved')
  async onInventoryReserved(data: InventoryReservedEvent): Promise<void> {
    // Continue saga - process payment
    await this.commandBus.execute(
      new ProcessPaymentCommand({
        orderId: data.orderId,
        amount: data.totalAmount,
        correlationId: data.correlationId,
      })
    );
  }

  @EventPattern('payment.failed')
  async onPaymentFailed(data: PaymentFailedEvent): Promise<void> {
    // Compensating transaction - release inventory
    await this.commandBus.execute(
      new ReleaseInventoryCommand({
        orderId: data.orderId,
        reason: 'Payment failed',
        correlationId: data.correlationId,
      })
    );

    // Mark order as failed
    await this.commandBus.execute(
      new FailOrderCommand({
        orderId: data.orderId,
        reason: data.failureReason,
      })
    );
  }
}
typescript

Idempotency Handling

// src/common/decorators/idempotent.decorator.ts
@Injectable()
export class IdempotencyGuard implements CanActivate {
  constructor(
    @InjectRedis() private readonly redis: Redis,
  ) {}

  async canActivate(context: ExecutionContext): Promise<boolean> {
    const request = context.switchToHttp().getRequest();
    const idempotencyKey = request.headers['idempotency-key'];

    if (!idempotencyKey) {
      return true; // No idempotency required
    }

    const existing = await this.redis.get(`idempotency:${idempotencyKey}`);

    if (existing) {
      // Return cached response
      const response = context.switchToHttp().getResponse();
      response.status(200).json(JSON.parse(existing));
      return false;
    }

    return true;
  }
}

// Store response after successful operation
@Injectable()
export class IdempotencyInterceptor implements NestInterceptor {
  constructor(@InjectRedis() private readonly redis: Redis) {}

  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    const request = context.switchToHttp().getRequest();
    const idempotencyKey = request.headers['idempotency-key'];

    return next.handle().pipe(
      tap(async (response) => {
        if (idempotencyKey) {
          await this.redis.setex(
            `idempotency:${idempotencyKey}`,
            86400, // 24 hours
            JSON.stringify(response)
          );
        }
      }),
    );
  }
}
typescript

Production Considerations

Dead Letter Queue (DLQ)

// Handle failed messages
@Injectable()
export class DeadLetterQueueHandler {
  @EventPattern('order.events.dlq')
  async handleDeadLetter(
    @Payload() message: any,
    @Ctx() context: KafkaContext,
  ): Promise<void> {
    const originalTopic = context.getMessage().headers['original-topic'];
    const failureReason = context.getMessage().headers['failure-reason'];
    const retryCount = parseInt(
      context.getMessage().headers['retry-count'] || '0'
    );

    // Log for investigation
    this.logger.error('Dead letter received', {
      originalTopic,
      failureReason,
      retryCount,
      message,
    });

    // Store in database for manual review
    await this.deadLetterRepository.save({
      originalTopic,
      message: JSON.stringify(message),
      failureReason,
      retryCount,
      createdAt: new Date(),
    });

    // Alert if critical
    if (this.isCriticalEvent(message)) {
      await this.alertService.sendAlert({
        severity: 'high',
        message: `Critical event failed: ${message.eventType}`,
      });
    }
  }
}
typescript

Conclusion

Event-driven architecture with NestJS and Kafka provides a robust foundation for building scalable microservices. Let's summarize what we've learned and when to apply these patterns.

Key Takeaways

Event Sourcing gives you a complete audit trail and enables temporal queries. Use it when you need compliance, debugging capabilities, or when your domain is naturally event-oriented. Avoid it for simple CRUD applications.

CQRS allows independent scaling of read and write workloads. Use it when reads and writes have different performance requirements, or when you need complex read models. It pairs naturally with Event Sourcing.

Sagas handle distributed transactions with compensating actions. They're essential for any multi-service workflow where atomicity across services is required. Design your compensating actions carefully—they're as important as the happy path.

Idempotency ensures exactly-once processing semantics. Always implement idempotency for event handlers. Network failures will cause retries, and without idempotency, you'll process events multiple times.

When to Use Event-Driven Architecture

Good fit:

High-scale systems with many services
Systems requiring loose coupling and independent deployability
Domains with complex business workflows (e-commerce, finance, logistics)
Systems requiring audit trails and compliance

Not ideal for:

Simple CRUD applications
Systems requiring strong consistency (banking ledgers)
Small teams without operational capacity for distributed systems
Prototypes and MVPs (start simple, evolve later)

Production Considerations

Before going to production, ensure you have:

Monitoring: Track consumer lag, message throughput, and processing errors
Alerting: Alert on consumer group lag exceeding thresholds
Dead Letter Queues: Capture and investigate failed messages
Schema Registry: Version your event schemas to handle evolution
Exactly-Once Semantics: Use Kafka transactions for critical workflows

The initial complexity pays off with better scalability, maintainability, and resilience in production systems. Start with a single service and event store, prove the pattern works for your domain, then expand incrementally.

Back to Blog Get in Touch

Introduction

Why Event-Driven Architecture?

The Problems with Synchronous Communication

How Event-Driven Architecture Solves These Problems

Event-driven architecture addresses these challenges by introducing a message broker (like Kafka) between services. Instead of Service A calling Service B directly:

Service A publishes an event to the broker: "OrderCreated"
The broker stores and distributes this event
Service B (and C, D, E...) subscribe to this event and react asynchronously

This simple change has profound implications:

Loose Coupling: Services don't know about each other; they only know about events
Resilience: If Service B is down, events queue up and are processed when it recovers
Scalability: Each service scales independently based on its own load
Temporal Decoupling: Services don't need to be available simultaneously

Event-Driven Architecture Flow

The diagram above shows how a single event from Service A can trigger reactions in multiple downstream services through Kafka, without Service A knowing or caring about who consumes its events.

Understanding Apache Kafka

Before we write code, let's understand Kafka's core concepts that make it ideal for event-driven systems:

Topics: Named channels where events are published. Think of them as categories: "order-events", "user-events", "payment-events".

Partitions: Topics are divided into partitions for parallelism. Events with the same key (e.g., order ID) always go to the same partition, ensuring ordering.

Consumer Groups: Multiple instances of a service form a consumer group. Kafka ensures each partition is consumed by only one instance in the group, enabling horizontal scaling.

Offsets: Each message has an offset (position) in its partition. Consumers track their offset, enabling replay and exactly-once processing.

Retention: Unlike traditional queues, Kafka retains messages for a configurable period (days/weeks). This enables event replay and rebuilding state.

Setting Up NestJS with Kafka

First, let's set up our NestJS project with Kafka integration:

npm install @nestjs/microservices kafkajs
bash

Kafka Module Configuration

// src/kafka/kafka.module.ts
import { Module } from '@nestjs/common';
import { ClientsModule, Transport } from '@nestjs/microservices';

@Module({
  imports: [
    ClientsModule.register([
      {
        name: 'KAFKA_SERVICE',
        transport: Transport.KAFKA,
        options: {
          client: {
            clientId: 'order-service',
            brokers: ['localhost:9092'],
            ssl: process.env.NODE_ENV === 'production',
            sasl: process.env.NODE_ENV === 'production' ? {
              mechanism: 'scram-sha-256',
              username: process.env.KAFKA_USERNAME,
              password: process.env.KAFKA_PASSWORD,
            } : undefined,
          },
          consumer: {
            groupId: 'order-consumer-group',
            sessionTimeout: 30000,
            heartbeatInterval: 3000,
          },
          producer: {
            allowAutoTopicCreation: false,
            transactionTimeout: 30000,
          },
        },
      },
    ]),
  ],
  exports: [ClientsModule],
})
export class KafkaModule {}
typescript

Implementing Event Sourcing

OrderCreated (items: [...], customer: "John")
OrderConfirmed (confirmedAt: "2024-01-20")
PaymentReceived (amount: 150.00)
OrderShipped (trackingNumber: "ABC123")

Why Event Sourcing?

Complete Audit Trail: You have a complete history of every change. This is crucial for financial systems, healthcare, and any domain requiring compliance.

Temporal Queries: You can answer questions like "What was the order status at 3 PM yesterday?" by replaying events up to that point.

Debugging: When something goes wrong, you can replay events to understand exactly what happened and in what order.

Event Replay: You can rebuild read models, fix bugs in projections, or create entirely new views by replaying the event history.

Domain Modeling: Events naturally align with business language. "OrderShipped" is more meaningful than "UPDATE orders SET status = 'shipped'".

The Trade-offs

Event Sourcing adds complexity. Consider it when you need:

Complete audit trails (financial, healthcare, legal)
Temporal queries or "time travel"
Complex domains where events capture intent
CQRS (separating read/write models)

Avoid it for simple CRUD applications where the additional complexity isn't justified.

Event Store Implementation

// src/event-store/event-store.service.ts
import { Injectable } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { StoredEvent } from './stored-event.entity';

export interface DomainEvent {
  eventType: string;
  aggregateId: string;
  aggregateType: string;
  payload: Record<string, any>;
  metadata: {
    correlationId: string;
    causationId: string;
    userId?: string;
    timestamp: Date;
    version: number;
  };
}

@Injectable()
export class EventStoreService {
  constructor(
    @InjectRepository(StoredEvent)
    private readonly eventRepository: Repository<StoredEvent>,
  ) {}

  async append(event: DomainEvent): Promise<StoredEvent> {
    // Optimistic locking - check version
    const lastEvent = await this.eventRepository.findOne({
      where: { aggregateId: event.aggregateId },
      order: { version: 'DESC' },
    });

    const expectedVersion = lastEvent ? lastEvent.version + 1 : 1;

    if (event.metadata.version !== expectedVersion) {
      throw new ConcurrencyException(
        `Expected version ${expectedVersion}, got ${event.metadata.version}`
      );
    }

    const storedEvent = this.eventRepository.create({
      eventType: event.eventType,
      aggregateId: event.aggregateId,
      aggregateType: event.aggregateType,
      payload: event.payload,
      metadata: event.metadata,
      version: expectedVersion,
      createdAt: new Date(),
    });

    return this.eventRepository.save(storedEvent);
  }

  async getEvents(
    aggregateId: string,
    fromVersion?: number
  ): Promise<StoredEvent[]> {
    const query = this.eventRepository
      .createQueryBuilder('event')
      .where('event.aggregateId = :aggregateId', { aggregateId })
      .orderBy('event.version', 'ASC');

    if (fromVersion) {
      query.andWhere('event.version > :fromVersion', { fromVersion });
    }

    return query.getMany();
  }

  async replayEvents<T>(
    aggregateId: string,
    reducer: (state: T, event: StoredEvent) => T,
    initialState: T,
  ): Promise<T> {
    const events = await this.getEvents(aggregateId);
    return events.reduce(reducer, initialState);
  }
}
typescript

Aggregate Root Pattern

// src/domain/aggregate-root.ts
export abstract class AggregateRoot {
  private _uncommittedEvents: DomainEvent[] = [];
  private _version: number = 0;

  get uncommittedEvents(): DomainEvent[] {
    return [...this._uncommittedEvents];
  }

  get version(): number {
    return this._version;
  }

  protected apply(event: DomainEvent): void {
    this.when(event);
    this._uncommittedEvents.push(event);
    this._version++;
  }

  protected abstract when(event: DomainEvent): void;

  public loadFromHistory(events: DomainEvent[]): void {
    events.forEach(event => {
      this.when(event);
      this._version++;
    });
  }

  public clearUncommittedEvents(): void {
    this._uncommittedEvents = [];
  }
}

// src/domain/order/order.aggregate.ts
export class OrderAggregate extends AggregateRoot {
  private _id: string;
  private _status: OrderStatus;
  private _items: OrderItem[] = [];
  private _totalAmount: number = 0;

  static create(command: CreateOrderCommand): OrderAggregate {
    const order = new OrderAggregate();
    order.apply({
      eventType: 'OrderCreated',
      aggregateId: command.orderId,
      aggregateType: 'Order',
      payload: {
        customerId: command.customerId,
        items: command.items,
      },
      metadata: {
        correlationId: command.correlationId,
        causationId: command.commandId,
        userId: command.userId,
        timestamp: new Date(),
        version: 1,
      },
    });
    return order;
  }

  confirm(): void {
    if (this._status !== OrderStatus.PENDING) {
      throw new InvalidOperationException('Order must be pending to confirm');
    }

    this.apply({
      eventType: 'OrderConfirmed',
      aggregateId: this._id,
      aggregateType: 'Order',
      payload: { confirmedAt: new Date() },
      metadata: {
        correlationId: generateId(),
        causationId: generateId(),
        timestamp: new Date(),
        version: this.version + 1,
      },
    });
  }

  protected when(event: DomainEvent): void {
    switch (event.eventType) {
      case 'OrderCreated':
        this._id = event.aggregateId;
        this._status = OrderStatus.PENDING;
        this._items = event.payload.items;
        this._totalAmount = this.calculateTotal(event.payload.items);
        break;
      case 'OrderConfirmed':
        this._status = OrderStatus.CONFIRMED;
        break;
      case 'OrderShipped':
        this._status = OrderStatus.SHIPPED;
        break;
    }
  }
}
typescript

CQRS Implementation

Why Separate Reads and Writes?

In traditional architectures, the same model serves both reads and writes. This creates several challenges:

Scaling Asymmetry: Most systems have far more reads than writes (often 100:1 or more). With a single model, you can't scale them independently.

Complex Queries: Read models often need denormalized data (user name + order count + last purchase date). Normalizing for writes means expensive JOINs for reads.

Event Sourcing Compatibility: If you're using Event Sourcing, you don't have a traditional database to query. You need projections (read models) built from events.

How CQRS Works

Command Side (Write Model):

Receives commands (CreateOrder, ConfirmOrder)
Validates business rules
Persists events to the event store
Publishes events to Kafka

Query Side (Read Model):

Subscribes to events
Updates denormalized read models (projections)
Serves fast queries without complex JOINs

Command Side

// src/order/commands/handlers/create-order.handler.ts
import { CommandHandler, ICommandHandler, EventBus } from '@nestjs/cqrs';
import { Inject } from '@nestjs/common';

@CommandHandler(CreateOrderCommand)
export class CreateOrderHandler implements ICommandHandler<CreateOrderCommand> {
  constructor(
    private readonly eventStore: EventStoreService,
    private readonly eventBus: EventBus,
    @Inject('KAFKA_SERVICE') private readonly kafkaClient: ClientKafka,
  ) {}

  async execute(command: CreateOrderCommand): Promise<string> {
    // Create aggregate
    const order = OrderAggregate.create(command);

    // Persist events
    for (const event of order.uncommittedEvents) {
      await this.eventStore.append(event);

      // Publish to Kafka for other services
      await this.kafkaClient.emit('order.events', {
        key: event.aggregateId,
        value: JSON.stringify(event),
        headers: {
          'event-type': event.eventType,
          'correlation-id': event.metadata.correlationId,
        },
      });
    }

    order.clearUncommittedEvents();
    return order.id;
  }
}
typescript

Query Side with Projections

// src/order/projections/order-list.projection.ts
@Injectable()
export class OrderListProjection {
  constructor(
    @InjectRepository(OrderReadModel)
    private readonly readRepository: Repository<OrderReadModel>,
  ) {}

  @EventsHandler(OrderCreatedEvent, OrderConfirmedEvent, OrderShippedEvent)
  async handle(event: DomainEvent): Promise<void> {
    switch (event.eventType) {
      case 'OrderCreated':
        await this.readRepository.save({
          id: event.aggregateId,
          customerId: event.payload.customerId,
          status: 'PENDING',
          totalAmount: this.calculateTotal(event.payload.items),
          itemCount: event.payload.items.length,
          createdAt: event.metadata.timestamp,
          updatedAt: event.metadata.timestamp,
        });
        break;

      case 'OrderConfirmed':
        await this.readRepository.update(
          { id: event.aggregateId },
          {
            status: 'CONFIRMED',
            updatedAt: event.metadata.timestamp
          }
        );
        break;
    }
  }
}
typescript

Handling Eventual Consistency

What is Eventual Consistency?

This isn't a bug—it's a feature. It enables:

Higher availability (services don't wait for each other)
Better performance (no distributed locks)
Simpler scaling (independent services)

Strategies for Handling Eventual Consistency

1. Optimistic UI Updates: Update the UI immediately after a command succeeds, before the read model updates. The user sees instant feedback.

2. Polling/Refresh: For critical operations, poll the read model until it reflects the expected change.

3. Event Subscriptions: Use WebSockets to push updates to the UI when projections update.

4. Compensation: If something goes wrong, publish compensating events to undo the operation.

Saga Pattern for Distributed Transactions

Consider an order fulfillment flow:

Reserve Inventory → If fails, stop
Process Payment → If fails, release inventory
Ship Order → If fails, refund payment and release inventory
Send Notification → If fails, log and retry (non-critical)

Each step is a local transaction. If step 3 fails, we don't "rollback" in the database sense—we execute compensating transactions (refund, release) to undo the previous steps.

// src/sagas/order-fulfillment.saga.ts
@Injectable()
export class OrderFulfillmentSaga {
  constructor(
    private readonly commandBus: CommandBus,
    @Inject('KAFKA_SERVICE') private readonly kafkaClient: ClientKafka,
  ) {}

  @Saga()
  orderCreated = (events$: Observable<any>): Observable<ICommand> => {
    return events$.pipe(
      ofType(OrderCreatedEvent),
      map(event => {
        // Start saga - reserve inventory
        return new ReserveInventoryCommand({
          orderId: event.aggregateId,
          items: event.payload.items,
          correlationId: event.metadata.correlationId,
        });
      }),
    );
  };

  @EventPattern('inventory.reserved')
  async onInventoryReserved(data: InventoryReservedEvent): Promise<void> {
    // Continue saga - process payment
    await this.commandBus.execute(
      new ProcessPaymentCommand({
        orderId: data.orderId,
        amount: data.totalAmount,
        correlationId: data.correlationId,
      })
    );
  }

  @EventPattern('payment.failed')
  async onPaymentFailed(data: PaymentFailedEvent): Promise<void> {
    // Compensating transaction - release inventory
    await this.commandBus.execute(
      new ReleaseInventoryCommand({
        orderId: data.orderId,
        reason: 'Payment failed',
        correlationId: data.correlationId,
      })
    );

    // Mark order as failed
    await this.commandBus.execute(
      new FailOrderCommand({
        orderId: data.orderId,
        reason: data.failureReason,
      })
    );
  }
}
typescript

Idempotency Handling

// src/common/decorators/idempotent.decorator.ts
@Injectable()
export class IdempotencyGuard implements CanActivate {
  constructor(
    @InjectRedis() private readonly redis: Redis,
  ) {}

  async canActivate(context: ExecutionContext): Promise<boolean> {
    const request = context.switchToHttp().getRequest();
    const idempotencyKey = request.headers['idempotency-key'];

    if (!idempotencyKey) {
      return true; // No idempotency required
    }

    const existing = await this.redis.get(`idempotency:${idempotencyKey}`);

    if (existing) {
      // Return cached response
      const response = context.switchToHttp().getResponse();
      response.status(200).json(JSON.parse(existing));
      return false;
    }

    return true;
  }
}

// Store response after successful operation
@Injectable()
export class IdempotencyInterceptor implements NestInterceptor {
  constructor(@InjectRedis() private readonly redis: Redis) {}

  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    const request = context.switchToHttp().getRequest();
    const idempotencyKey = request.headers['idempotency-key'];

    return next.handle().pipe(
      tap(async (response) => {
        if (idempotencyKey) {
          await this.redis.setex(
            `idempotency:${idempotencyKey}`,
            86400, // 24 hours
            JSON.stringify(response)
          );
        }
      }),
    );
  }
}
typescript

Production Considerations

Dead Letter Queue (DLQ)

// Handle failed messages
@Injectable()
export class DeadLetterQueueHandler {
  @EventPattern('order.events.dlq')
  async handleDeadLetter(
    @Payload() message: any,
    @Ctx() context: KafkaContext,
  ): Promise<void> {
    const originalTopic = context.getMessage().headers['original-topic'];
    const failureReason = context.getMessage().headers['failure-reason'];
    const retryCount = parseInt(
      context.getMessage().headers['retry-count'] || '0'
    );

    // Log for investigation
    this.logger.error('Dead letter received', {
      originalTopic,
      failureReason,
      retryCount,
      message,
    });

    // Store in database for manual review
    await this.deadLetterRepository.save({
      originalTopic,
      message: JSON.stringify(message),
      failureReason,
      retryCount,
      createdAt: new Date(),
    });

    // Alert if critical
    if (this.isCriticalEvent(message)) {
      await this.alertService.sendAlert({
        severity: 'high',
        message: `Critical event failed: ${message.eventType}`,
      });
    }
  }
}
typescript

High-scale systems with many services
Systems requiring loose coupling and independent deployability
Domains with complex business workflows (e-commerce, finance, logistics)
Systems requiring audit trails and compliance

Not ideal for:

Simple CRUD applications
Systems requiring strong consistency (banking ledgers)
Small teams without operational capacity for distributed systems
Prototypes and MVPs (start simple, evolve later)

Production Considerations

Before going to production, ensure you have:

Monitoring: Track consumer lag, message throughput, and processing errors
Alerting: Alert on consumer group lag exceeding thresholds
Dead Letter Queues: Capture and investigate failed messages
Schema Registry: Version your event schemas to handle evolution
Exactly-Once Semantics: Use Kafka transactions for critical workflows