The Concurrency Paradox of Email
API gateways are designed for extreme concurrency. A well-architected gateway can easily ingest 100,000 HTTP POST requests per second.
SMTP destination servers (like Gmail or Microsoft 365) are not designed for extreme concurrency. If you attempt to open 100,000 simultaneous TCP connections to Gmail, their firewalls will immediately classify your traffic as a Distributed Denial of Service (DDoS) attack, greylist your IPs, and drop your emails.
The Architectural Challenge: How do you build an API that accepts 1 million emails per minute from clients with zero latency, while simultaneously strictly throttling the outbound delivery to destination servers to protect sender reputation?
The answer is a deeply decoupled, highly partitioned Queueing Architecture.
Phase 1: Stateless Ingestion
When a client makes a POST /send request, the API Gateway does absolutely no processing related to delivery.
- It validates the Bearer token.
- It validates the JSON schema.
- It writes the payload to an ingestion queue.
- It returns a
202 Acceptedto the client.
Phase 2: The Payload Pointer Pattern
If a client sends a marketing blast to 100,000 recipients, pushing 100,000 copies of a 50KB HTML template into a Kafka queue will result in 5GB of network saturation and severe memory pressure on the brokers.
Instead, we use the Payload Pointer Pattern.
The heavy HTML payload is written exactly once to a fast, temporary object storage layer (utilizing patterns similar to MyStorageAPI's Data Lake architecture).
The queue only receives lightweight metadata messages containing a pointer to the payload:
{
"jobid": "job998877",
"recipient": "user@gmail.com",
"payloadref": "s3://ephemeral-cache/templates/blast123.html"
}
Phase 3: The Priority and Throttling Queues
Not all emails have the same urgency. A password reset email (Transactional) must arrive in seconds. A weekly newsletter (Bulk) can arrive over the course of an hour.
The ingestion queue fans out the messages into strict priority tiers.
- Tier 1 (Critical): Password resets, 2FA codes. Processed instantly.
- Tier 2 (Transactional): Receipts, shipping notifications.
- Tier 3 (Bulk): Marketing blasts, newsletters.
If the system detects that it is processing 50,000 emails destined for @yahoo.com, the worker pool dynamically shapes the traffic. It might limit itself to 50 concurrent connections to Yahoo's MX servers, slowly trickling the emails out to avoid triggering Yahoo's rate-limit firewalls.
Phase 4: Exponential Backoff and Dead Letters
If an SMTP server responds with a temporary failure (e.g., 450 4.2.1 Mailbox busy), the delivery worker does not fail the job.
The message is routed to a Retry Queue governed by an exponential backoff algorithm. The system will attempt to deliver the email again in 1 minute, then 5 minutes, then 15 minutes, up to a maximum threshold of 72 hours.
If the message fails after the maximum threshold, or if the destination server returns a permanent hard bounce (550 User Unknown), the message is routed to a Dead Letter Queue. This triggers an asynchronous edge webhook (as detailed in our Real-Time Deliverability guide) back to the client, informing them of the failure.
By completely decoupling ingestion from delivery using layered, prioritized queues and payload pointers, infrastructure can absorb massive traffic spikes from clients while remaining perfectly compliant with the strict concurrency rules of the global email network.