Heavy templates run one at a time per server process to keep latency predictable. If you hit the heavy queue while another heavy job is in flight, the API returns 503 with a try again in 30 seconds hint. The /api/generate rate limit is 60 requests per user per minute.
What counts as a heavy job
- White Studio (ai-photoshoots).
- Ghost Mannequin.
- Batch flag on any template (e.g. batch ghost mannequin, batch image-creation).
- AI model with multiple variants.
- Image Creation with multiple variants.
What to do when you hit it
- Wait 30 seconds and retry. The previous heavy job almost always finishes in that window.
- Switch to a non-heavy template while waiting (e.g. Image Creation single-variant, Virtual Try-On).
- Reduce variant count on Image Creation / AI Model from 4 to 1 to drop the heavy classification.
- Use the offload worker — some heavy templates are routed to a separate Docker worker fleet (10 parallel containers on the apiway server). The offload list is configured via the
OFFLOAD_TEMPLATESenv var. When offloaded, the heavy queue does not gate the job.
Rate limit details
- /api/generate: 60 requests / user / minute.
- Instagram connect: 5 / minute / user (via
checkInstagramConnectRateLimit).
Hitting the rate limit returns 429. Wait one minute and the bucket refills.
About the offload worker
Apiway runs heavy generation through a Docker microservice (10 worker containers in parallel) on a dedicated server (64 GB RAM). Communication with the worker is via the generation_jobs Supabase table — not HTTP — so worker scale-out does not require code changes on the Next.js side. See our infrastructure note on uploads and training.