Designing resilient e-commerce API integrations
February 5, 2026 · 9 min read · By Jean-Philippe Cormier
If your checkout depends on a synchronous call to an ERP, your checkout is exactly as reliable as that ERP. On Black Friday, that's a problem. Here are the patterns I recommend on every integration project.
1. Make every write idempotent
Every external write should accept an idempotency key — the order ID, a UUID, anything stable. Retries become safe, duplicate orders disappear, and your support team gets their evenings back.
2. Put a queue between you and them
Direct synchronous calls couple your uptime to theirs. A queue (SQS, Pub/Sub, even Redis Streams) lets you accept the order now and deliver it to the ERP when it's healthy. Customers never see the failure.
3. Retry with exponential backoff and jitter
When everyone retries on the same schedule, you create a thundering herd that keeps the downstream system down. Add jitter, cap retries, and route persistent failures to a dead-letter queue with alerting.
4. Add circuit breakers around fragile partners
If a tax API is timing out for 90 seconds straight, stop calling it. Fall back to cached rates or a default, alert the team, and resume when health checks pass. A degraded checkout beats a broken one.
5. Instrument everything, then trust the dashboards
- Request rate, error rate, p95/p99 latency per integration
- Queue depth and oldest message age
- Dead-letter queue size with paging alerts
- Business KPIs (orders/min) overlaid on infra KPIs
When something breaks at 2 a.m. on Cyber Monday, you don't want to be reading code. You want to be reading a dashboard.
Integrations don't fail when you test them. They fail when traffic is 10x and your partner is having a bad day.