---
title: "What an AI image edit actually changes"
subtitle: "Measuring how much pixel drift you get when you ask an image model to change one thing."
date: "2026-04-18"
tags: ["research", "AI Image", "Deepdive"]
---

Every time I ask an AI to edit a photo of myself, the result doesn't quite look like me. It's hard to say exactly what it is, but there is just something off. Subtle enough that most people wouldn't catch it, but still obvious enough that close friends and family notice it.

This post is about looking into what's causing it, and trying to see if we can find a way to mitigate it.

All edits use Gemini 3.1 Flash Image Preview. I wanted to run the same matrix against OpenAI's gpt-image-2 but didn't have access to it at the time of writing. I'll update when that changes.

## Methodology

The setup is simple. Take a source image. Send it to the model with one prompt. Compute the per-pixel RGB distance between the input and the output. Render that distance as a heatmap (black through blue, cyan, yellow, red). Brighter pixels moved more.

<!-- component:VisualizationsPreview -->
**[VisualizationsPreview component]**

A `TabbedImage` showing two views of the same edit (remove the bowl of lemons, simple prompt):

- Heatmap: `/research/image-edit-drift/gemini/remove_lemons/simple/heatmap_compare.png` — magnitude of per-pixel change.
- Color diff: `/research/image-edit-drift/gemini/remove_lemons/simple/color_diff.png` — direction of per-pixel change.

Caption: "Two views of the same edit — remove the bowl of lemons, simple prompt. Heatmap shows magnitude of change; colour diff shows direction of change. Switch tabs to flip between them."
<!-- /component:VisualizationsPreview -->

When you look at the figures below, ignore the brightest patch. That's the intended edit, where the lemons used to be or where the new plate of olives went. It's supposed to change. The interesting part is everything else, the parts of the image you didn't ask the model to touch. Those are the parts we'll be looking at today.

A note on scale: every heatmap in this post (except the colour grid further down) is normalised against the same global ceiling, so brightness means the same thing across cells. If one image looks brighter than another, it actually drifted more.

OK. So what does drift look like on a normal edit?

## The baseline

AI-generated source (a balcony lunch produced by Gemini) and a one-line prompt:

> Remove the bowl of lemons from the table.

Click **Show changes** to overlay the heatmap.

<!-- component:ImageCompare -->
**[ImageCompare component]**

Interactive before/after slider with optional difference-heatmap overlay.

- Before — **Original**: `/research/image-edit-drift/gemini/remove_lemons/bare/original.png`
- After — **Lemons removed**: `/research/image-edit-drift/gemini/remove_lemons/bare/gemini.png`
- Overlay — **Change heatmap**: `/research/image-edit-drift/gemini/remove_lemons/bare/heatmap_compare.png`

Caption: "Drag the divider to compare. Toggle Show changes to overlay the per-pixel diff. The bright patch on the table is the intended edit. Everything else is drift."
<!-- /component:ImageCompare -->

The lemons are gone. At a glance the rest of the photo looks the same. But go fullscreen though, pick one spot, and slide back and forth between the two images, and you can actually see the slight differences. The basil has a slightly more textured look. The brick on the railing has a redder hue. Small enough to miss on a quick scroll, but real once you've seen them. The heatmap shows the full extent: most of the frame has shifted by a small amount, with the brightest collateral along edges and contours like the basil leaves, and the table, plants and railing. Interestingly enought, its seems to mostly affect things in the foreground. The background seems to be more stable.

That's the shape of every figure in this post. The intended edit is the bright patch. Everything else is what we're trying to look at.

The obvious next step is to try and tell the model not to change anything else. Maybe being explicit about what to keep makes a difference.

## Does prompt strictness help?

Same removal, same source, three different prompts. Only thing that varies is how strict it is about preserving the rest.

**Bare** - just the action.

> Remove the bowl of lemons from the table.

**Simple** - action plus one extra sentence.

> Remove the bowl of lemons from the table. Don't change anything else.

**Aggressive** - a long sentence naming every object that should stay the same.

> Remove the ceramic bowl of lemons from the table. Do not alter any other part of the image in any way. The wine bottle, wine glasses, bread, cutting board, napkin, table, chairs, railing, basil plant, ivy, buildings, cliffs, sea, sky, sun, lighting, shadows, colors, composition, and camera angle must remain pixel-identical to the original. Only the bowl of lemons should be gone, with the table surface behind it plausibly filled in.

If naming what to preserve helps, the aggressive heatmap should be visibly tighter than the bare one.

<!-- component:PromptStrengthStrip -->
**[PromptStrengthStrip component]**

A `HeatmapStrip` comparing three prompt strengths on the same edit (remove the bowl of lemons):

- **Bare** — action only: `/research/image-edit-drift/gemini/remove_lemons/bare/gemini.png` + heatmap `/research/image-edit-drift/gemini/remove_lemons/bare/heatmap_compare.png`
- **Simple** — + keep the rest: `/research/image-edit-drift/gemini/remove_lemons/simple/gemini.png` + heatmap `/research/image-edit-drift/gemini/remove_lemons/simple/heatmap_compare.png`
- **Aggressive** — full lockdown list: `/research/image-edit-drift/gemini/remove_lemons/aggressive/gemini.png` + heatmap `/research/image-edit-drift/gemini/remove_lemons/aggressive/heatmap_compare.png`

Caption: "Same edit, three prompt strengths. The drift pattern is approximately the same at every strength; listing more objects to preserve does not meaningfully preserve them."
<!-- /component:PromptStrengthStrip -->

It isn't. All three drift by about the same amount, in about the same places. Listing the things you want preserved doesn't change how the model treats them. It treats them the same way whether you mention them or not.

Changing the prompt doesn't seem to have any effect on the amount of drift. What about the type of edit? Would we see more drift for a removal than an addition or change?

## Does the edit type matter?

Does changing what the model has to do, affect the amount of drift in the other parts of the image? For example, would we see more drift for a removal than an addition or change?

Three edit types on the same source with the simple prompt:

- **Remove** the bowl of lemons.
- **Add** a small white plate of green olives next to the bread.
- **Replace** the white wine with red wine.

<!-- component:EditTypeStrip -->
**[EditTypeStrip component]**

A `HeatmapStrip` comparing three edit types on the same source image (each cell shows the model output and a per-pixel-difference heatmap):

- **Remove** — bowl of lemons: `/research/image-edit-drift/gemini/remove_lemons/simple/gemini.png` + heatmap `/research/image-edit-drift/gemini/remove_lemons/simple/heatmap_compare.png`
- **Add** — plate of olives: `/research/image-edit-drift/gemini/add_olives/simple/gemini.png` + heatmap `/research/image-edit-drift/gemini/add_olives/simple/heatmap_compare.png`
- **Replace** — white wine → red: `/research/image-edit-drift/gemini/replace_wine/simple/gemini.png` + heatmap `/research/image-edit-drift/gemini/replace_wine/simple/heatmap_compare.png`

Caption: "Same source, same prompt shape, three different edit types. All heatmaps in this post (except the colour grid further down) share a common ceiling, so brightness is comparable both within and across cells."
<!-- /component:EditTypeStrip -->

Same story. We see roughly the same drift across all three. The type of change you ask for doesn't make the model any more or less careful with everything else.

What about the source image? Maybe AI-generated photos are easier to repaint, and a real photo or a flat illustration would hold its ground better.

## Does the source image matter?

Maybe to model can better edit images it has generated itself, and worse edit real world images or images from other models. Or maybe the model would change less with simple illustrations than with real photos.

We ran the (almost) same prompt on three more sources: a real photograph (remove the bread basket), a flat-shaded illustration (remove the lemons), and an AI source from a different model (GPT) to rule out a Gemini-on-Gemini effect.

<!-- component:SourceTypeStrip -->
**[SourceTypeStrip component]**

A `HeatmapStrip` comparing pixel drift across four source-image types (same model, same simple prompt):

- **Base** — remove bowl of lemons: `/research/image-edit-drift/gemini/remove_lemons/simple/gemini.png` + heatmap `/research/image-edit-drift/gemini/remove_lemons/simple/heatmap_compare.png`
- **Real photo** — remove bread basket: `/research/image-edit-drift/real/remove_bread/simple/gemini.png` + heatmap `/research/image-edit-drift/real/remove_bread/simple/heatmap_compare.png`
- **Illustration** — remove lemons: `/research/image-edit-drift/drawing/remove_lemons/simple/gemini.png` + heatmap `/research/image-edit-drift/drawing/remove_lemons/simple/heatmap_compare.png`
- **GPT-generated** — remove lemons: `/research/image-edit-drift/gpt/remove_lemons/simple/gemini.png` + heatmap `/research/image-edit-drift/gpt/remove_lemons/simple/heatmap_compare.png`

Caption: "Same model, same simple prompt, three different source types. Heatmaps share the common ceiling: brightness is comparable across cells."
<!-- /component:SourceTypeStrip -->

This is the first axis where we actually see a difference. There's still drift in every cell, but the volume isn't equal.

The illustration drifts the least. Probably because the palette is small and the colours are flat, so there's fewer small details and textures for the model to subtly redraw. Still not zero, just visibly less than the rest.

The real photograph drifts more than baseline. Not by a lot, but still more. Real images might have more finer details or other artifacts that the model can pick up on and change.

The GPT-generated source drifts the most of the four. Putting Gemini on top of a foreign model's image seems to give it more permission to repaint than it takes on a real photo or its own output.

So source type matters. Illustration is the friendliest input, real photo sits in the middle, and AI from another model is the worst case. None of them are clean, but they aren't equal.

Still, every one of these images still has some form of finer details that the model could change. What if the source has nothing for it to be uncertain about?

## The colour grid

Six flat-colour squares with no gaps between them and no texture. The simplest possible edit.

> Change the green square to pink. Don't change anything else.

Every pixel here has exactly one defensible value.

<!-- component:ImageCompare -->
**[ImageCompare component]**

Interactive before/after slider with optional difference-heatmap overlay.

- Before — **Original**: `/research/image-edit-drift/simple/change_green_to_pink/original.png`
- After — **Green → pink**: `/research/image-edit-drift/simple/change_green_to_pink/gemini.png`
- Overlay — **Change heatmap**: `/research/image-edit-drift/simple/change_green_to_pink/heatmap.png`

Caption: "Six flat colours, one intended change. This heatmap is normalised to its own maximum — the absolute drift here is much smaller than on the photo edits, so a shared ceiling would render the figure nearly black. Read brightness within this cell only."
<!-- /component:ImageCompare -->

The green square is now pink. The other five are almost identical to the original. There's a slight colour drift in the interiors of each square, but it's small. Most of the visible damage is along the edges. The hard edges have softened. The borders between squares are blurrier than they started.

So even with zero ambiguity, the model still resamples the whole frame. The edges are where it shows up first. Drift isn't the model failing to disambiguate detail. It's what happens when an image goes through a generative model. There's no "leave this alone" mode.

So if every generation changes the whole image slightly, what happens when you chain two of them?

## Does drift compound?

If every call regenerates the whole image, chaining calls should stack the damage. I tested it. Ask for one edit. Then in the same conversation, on top of the previous output, ask for a second unrelated edit.

- **Call 1:** Remove the bowl of lemons.
- **Call 2:** (same conversation) Add a small white plate of green olives next to the bread.

Each cell below shows the output of one call. The heatmap on each cell is *incremental*: call 1's heatmap is drift relative to the original; call 2's is drift relative to call 1's output. We're looking at the dose each call adds, not cumulative damage.

<!-- component:CompoundingStrip -->
**[CompoundingStrip component]**

A `HeatmapStrip` showing two sequential edits in the same conversation. Each cell's heatmap is incremental (each call vs the previous step's output):

- **Call 1** — remove lemons: `/research/image-edit-drift/gemini/remove_lemons/roundtrip/step1_remove_lemons.png` + heatmap `/research/image-edit-drift/gemini/remove_lemons/roundtrip/original_vs_step1/heatmap_compare.png`
- **Call 2** — add olives (same conversation): `/research/image-edit-drift/gemini/remove_lemons/roundtrip/step2_add_olives.png` + heatmap `/research/image-edit-drift/gemini/remove_lemons/roundtrip/step1_vs_step2/heatmap_compare.png`

Caption: "Two sequential edits in the same conversation. Each cell's heatmap is incremental: call 1 vs the original, call 2 vs call 1's output."
<!-- /component:CompoundingStrip -->

Both calls drift across the whole frame in roughly equal volume. Mean per-pixel RGB distance is 10.1 on call 1 and 8.0 on call 2. The second call doesn't get a discount for operating on an already-regenerated image. Every request triggers another full regeneration with its own dose.

So drift compounds. Cumulative drift after both calls is 14.4. Bigger than either step alone, smaller than the sum (some shifts coincide), but clearly larger than one step. A two-call workflow pays close to twice the cost of one. 

This leads us naturally to the next question: how similar is the drift between calls?

## Same prompt, two calls

We ran the same removal twice. Same source, same prompt, same model, same parameters. Then computed the heatmap directly between the two outputs. No original involved. The diff is between Run 1 and Run 2 of the same edit.

<!-- component:ImageCompare -->
**[ImageCompare component]**

Interactive before/after slider with optional difference-heatmap overlay.

- Before — **Run 1**: `/research/image-edit-drift/gemini/remove_lemons/simple/gemini.png`
- After — **Run 2**: `/research/image-edit-drift/gemini/remove_lemons/simple/gemini-v2.png`
- Overlay — **Difference between runs**: `/research/image-edit-drift/gemini/remove_lemons/simple/heatmap_v1_v2_compare.png`

Caption: "Two outputs of the identical prompt. Heatmap is on the same shared ceiling as the rest of the post, so the relative dimness here is meaningful: two runs differ from each other less than either differs from the original."
<!-- /component:ImageCompare -->

The two runs aren't identical, but they're close. Mean per-pixel RGB distance between them is 4.6, about half the distance from original to either run. So the model isn't fully deterministic in its drift, but it's not random either. Two outputs of the same prompt land in the same neighbourhood. You can see the heatmap is noticeably dimmer than any of the original-vs-output comparisons earlier in the post.

The interesting part is *where* they differ. The brightest patch is the area of the actual edit, the part of the table where the bowl used to be. Which makes sense. The model isn't deterministic, and that area is the one place it has to invent pixels that didn't exist in the source. There are many plausible answers for what the table looks like under a removed bowl, and the model picks a different one each time. Outside the edit zone, the two runs are clearly more similar to each other than either is to the original. The drift is real, but it's correlated across calls.

That closes the loop. Drift isn't a single bug to be patched out. It's how the model works. Image models don't edit. They generate, conditioned on your input. The output stays close to the input because the conditioning is strong, not because anything is keeping the unchanged pixels in place, such as a mask or a other constraint.

## What I take from this

A few practical things if you're building on top of these models.

Don't chain calls if you can help it. Two calls cost you two full doses of drift, even when the second edit is small. Bundle every change you need into a single prompt.

If you need pixel-exact retention, do it outside the model. Mask the edit region, run the model, composite the new pixels onto the original. The model will not keep its hands off unchanged pixels no matter how hard you ask.

Don't spend time on prompt elaboration. Bare and aggressive give you the same output. A "don't change anything else" sentence is fine to keep around if it makes the scope clearer to you. Just don't think of it as an instruction that will improve the slight image drift.

Containment is getting better with every model generation. The shape probably isn't going away though, because it's how these models work. Atleast not with the current paradigm. I would be interesting to test a flow where the image is first masked around the edit zone, and then after the changes merged with the previous image. Atleast of more localised edits.

It's also why my face never quite comes back the same. The model isn't editing my face. It's drawing a new one that happens to be close to mine. Close, not same.

> One last thing worth flagging: every edit in this post is *localized*. Remove a bowl. Add a plate. Swap a colour. Whole-scene edits like changing the background or the lighting are a different question and not covered here. That's a follow-up post.