Overview

Led development of a high-throughput crypto trading platform (CoinSwitch Exchange – CSX), including an in-memory, event-sourced matching engine with strict ordering, exactly-once processing, and high availability.

Technologies Used

  • Distributed Systems
  • Events Sourcing
  • CQRS
  • Matching Engines
  • Go
  • Microservices
  • AWS
  • CloudWatch
  • Docker
  • Kubernetes

Context

CoinSwitch was building CSX, a crypto exchange platform with multiple order types, high throughput requirements, and strong correctness guarantees.

The system needed to ensure:

  • no order loss
  • strict sequential processing
  • financial correctness under concurrency
  • scalability across multiple trading pairs
  • operational clarity across many microservices

My Role

  • Technical lead and execution owner
  • Owned the matching engine and coordinated development across 7 microservices
  • Led and mentored a team of ~8 engineers
  • Coordinated with project managers and stakeholders on delivery and rollout
  • Responsible for:
    • architecture decisions
    • code reviews
    • deployment strategy
    • cross-team consistency and discipline

System Architecture Overview

Order Flow

  • API Gateway routed requests to the User Service
  • User validation and authentication
  • Order request passed to Order Service
  • balance checks
  • order validation
  • Validated orders were placed onto a currency-specific input queue
  • Each currency pair had its own dedicated matching engine
  • Matching results were published to an output queue
  • Downstream services consumed updates to reflect order state

An additional AML verification step existed in the flow before final acceptance.

Matching Engine Design

  • In-memory, single-threaded:
    • avoided locking and race conditions
    • guaranteed strict ordering
  • Exactly-once semantics (logical)
  • Event-sourced:
    • every state transition recorded
    • full replay ability of order books
  • Currency-partitioned:
    • each trading pair isolated
    • improved scalability and fault isolation

If a matching engine instance failed, another instance could:

  • replay events
  • rebuild state
  • resume processing without order loss

Order Handling Semantics

  • Market & Limit orders supported
  • Partial fills – remaining quantity stayed in the order book
  • Cancellations – removed from the matching engine if still open
  • State mutation was not allowed outside the matching engine
  • All external services treated matching results as immutable facts

CQRS & Microservices Discipline

  • CQRS applied to Orders:
    • write service: order creation & validation
    • read service: order status, history, views
  • Services communicated asynchronously
  • Duplicate processing prevented via order IDs
  • Clear ownership boundaries between services

Observability & Debugging

  • Built a shared logging module used across all services
  • Structured logs with:
    • requestId
    • correlationId
  • End-to-end tracing across microservices using these IDs
  • Debugging via CloudWatch log correlation
  • Designed for production readiness (work completed up to UAT)

Performance & Reliability

  • Benchmarked throughput at ~7,000 orders/sec (approx.)
  • Designed for:
    • high availability via leader selection
    • fast recovery using event replay
  • Focused on correctness first, performance second

Leadership & Execution

  • Enforced:
    • consistent logging and tracing
    • architectural boundaries
    • clear communication standards
  • Ruthlessly prioritized P0 / P1 work
  • Blocked scope creep and unfocused discussions
  • Mentored a team of fresh engineers (Go)
    • intensive PR reviews
    • teaching correctness, not just syntax

What I’d Improve If Rebuilding Today

  • Stronger idempotency guarantees across services
  • Formalized queue abstractions earlier
  • Better domain abstractions for products and pricing
  • Even stricter separation of concerns between orchestration and execution