Cutting a person + garment cleanly out of a fashion image is not a solved problem in 2026. Generic segmentation models do well enough on photographs and badly on AI-generated images, whose edges and shadows behave subtly differently. Here is how Apiway segments subjects from LLM-generated backgrounds for the pure-white pipeline.
Why segmentation is harder on AI imagery
Real photographs have predictable edge characteristics: chromatic aberration at high-contrast boundaries, sensor noise at uniform regions, depth-of-field falloff. Generic segmentation models train on photographic data and learn those signals as part of the boundary cue.
AI-generated images have different signals. Edges are cleaner, noise is uniform, depth-of-field is approximated rather than physical. Generic segmentation models often miss boundaries on AI imagery in subtle ways — a jacket cuff getting clipped, a hair strand getting fragmented, a bag strap disappearing into the background.
The three-layer decomposition
Apiway's segmentation pass produces three masks per image, not the usual two:
- Foreground: the model and the garment. Hard mask with alpha edges.
- Cast shadow: the soft shadow under the feet and behind the model. Soft mask, low opacity.
- Background: everything else. Discarded in the recomposite stage.
The shadow as a separate layer is the single most important architectural decision. (More: why we re-composite onto pure white.) Without it, models look like they are floating in space.
Edge cases the model has been trained for
Generic segmentation fails on these; Apiway's fashion-specific training catches them.
- Fine hair strands against the background — alpha matting rather than hard polygon cut.
- Sheer or partially-transparent fabrics where the background should partially show through.
- Long jewelry chains that thin into the background pixel width.
- Bag straps and belts crossing the body silhouette.
- Glass and reflective accessories (sunglasses, watch crystals).
- Loose drapey fabric where the boundary is genuinely ambiguous.
Alpha matting at the boundary
Hard cutouts look composited. The pipeline runs a learned alpha-matting pass at the foreground-background boundary that lets pixels be partially transparent rather than fully in or out. This is what allows hair to feather into the white background instead of leaving a hard polygon edge.
The alpha pass is calibrated against the original LLM output: if the original had soft, hazy edges (e.g. a backlit shot), the matting preserves more of that softness. If the original had crisp edges (e.g. studio lighting), the matting stays tighter.
Why shadow preservation specifically matters
Shadows are the single most important grounding signal for the human visual system. A fashion image where the model has no contact shadow under the feet reads as composited instantly — faster than the conscious eye can articulate why. Even a small soft shadow at 20% opacity anchors the model to the floor and makes the output feel photographed rather than collaged.
For Amazon-policy compliance specifically, soft shadows are explicitly allowed (the rule is “pure white background”, not “no shadow under the product”). The shadow stays inside the policy envelope while removing the floating-model artifact.
Performance shape
Segmentation runs as part of the post-processing pipeline. Roughly 0.5–1.0 seconds per image on production GPUs. Cost is absorbed into the per-shot credit price; no separate line item.
Why we built this rather than calling a third-party API
Off-the-shelf segmentation APIs are tuned for photographs and consumer use cases. They often miss the fashion-specific edge cases (jewelry chains, sheer fabric, cast shadows). They also charge per call, which would compound the per-shot cost in a way that breaks the one-credit-equals-one-cent pricing model.
A custom model trained on fashion-specific data delivers better output and zero marginal cost per call. The engineering investment paid back within the first production quarter.
See the segmentation result yourself
Generate any White Studio image. The output is the post-segmentation result — the foreground subject and cast shadow on a pure white canvas. Free accounts ship with 100 one-time credits — enough to test on real garments and edge cases.
