Overview
Led development of a high-throughput crypto trading platform (CoinSwitch Exchange – CSX), including an in-memory, event-sourced matching engine with strict ordering, exactly-once processing, and high availability.
Technologies Used
- Distributed Systems
- Events Sourcing
- CQRS
- Matching Engines
- Go
- Microservices
- AWS
- CloudWatch
- Docker
- Kubernetes
Context
CoinSwitch was building CSX, a crypto exchange platform with multiple order types, high throughput requirements, and strong correctness guarantees.
The system needed to ensure:
- no order loss
- strict sequential processing
- financial correctness under concurrency
- scalability across multiple trading pairs
- operational clarity across many microservices
My Role
- Technical lead and execution owner
- Owned the matching engine and coordinated development across 7 microservices
- Led and mentored a team of ~8 engineers
- Coordinated with project managers and stakeholders on delivery and rollout
- Responsible for:
- architecture decisions
- code reviews
- deployment strategy
- cross-team consistency and discipline
System Architecture Overview
Order Flow
- API Gateway routed requests to the User Service
- User validation and authentication
- Order request passed to Order Service
- balance checks
- order validation
- Validated orders were placed onto a currency-specific input queue
- Each currency pair had its own dedicated matching engine
- Matching results were published to an output queue
- Downstream services consumed updates to reflect order state
An additional AML verification step existed in the flow before final acceptance.
Matching Engine Design
- In-memory, single-threaded:
- avoided locking and race conditions
- guaranteed strict ordering
- Exactly-once semantics (logical)
- Event-sourced:
- every state transition recorded
- full replay ability of order books
- Currency-partitioned:
- each trading pair isolated
- improved scalability and fault isolation
If a matching engine instance failed, another instance could:
- replay events
- rebuild state
- resume processing without order loss
Order Handling Semantics
- Market & Limit orders supported
- Partial fills – remaining quantity stayed in the order book
- Cancellations – removed from the matching engine if still open
- State mutation was not allowed outside the matching engine
- All external services treated matching results as immutable facts
CQRS & Microservices Discipline
- CQRS applied to Orders:
- write service: order creation & validation
- read service: order status, history, views
- Services communicated asynchronously
- Duplicate processing prevented via order IDs
- Clear ownership boundaries between services
Observability & Debugging
- Built a shared logging module used across all services
- Structured logs with:
- requestId
- correlationId
- End-to-end tracing across microservices using these IDs
- Debugging via CloudWatch log correlation
- Designed for production readiness (work completed up to UAT)
Performance & Reliability
- Benchmarked throughput at ~7,000 orders/sec (approx.)
- Designed for:
- high availability via leader selection
- fast recovery using event replay
- Focused on correctness first, performance second
Leadership & Execution
- Enforced:
- consistent logging and tracing
- architectural boundaries
- clear communication standards
- Ruthlessly prioritized P0 / P1 work
- Blocked scope creep and unfocused discussions
- Mentored a team of fresh engineers (Go)
- intensive PR reviews
- teaching correctness, not just syntax
What I’d Improve If Rebuilding Today
- Stronger idempotency guarantees across services
- Formalized queue abstractions earlier
- Better domain abstractions for products and pricing
- Even stricter separation of concerns between orchestration and execution