At the end of June I wrote a piece on using Flux.1 Kontext (Dev) Multimodal Image Editing, since then we have seen the release of Qwen Image Edit and Google Nano Banana (officially Gemini 2.5 Flash Image). I thought it was time to revisit those Flux.1 Kontext images and do a comparison with Qwen Image Edit and Nano Banana.
Getting Started - Qwen Image Edit
Given Qwen Image Edit is based on the same core model as Qwen Image I am expecting similar levels of control and quality. The official documentation calls out:
Semantic and Appearance Editing: Qwen-Image-Edit supports both low-level visual appearance editing (such as adding, removing, or modifying elements, requiring all other regions of the image to remain completely unchanged) and high-level visual semantic editing (such as IP creation, object rotation, and style transfer, allowing overall pixel changes while maintaining semantic consistency).
Precise Text Editing: Qwen-Image-Edit supports bilingual (Chinese and English) text editing, allowing direct addition, deletion, and modification of text in images while preserving the original font, size, and style.
These are similar claims to Flux.1 Kontext, which has scored highly in these areas and on the whole I find very good for image manipulation, so we are starting with a high bar.
The full Qwen Image model is over 40GB in size for the BF16 version (that’s nearly double the size of Flux.1 Kontext) and Qwen Image Edit is the same size. There is also an FP8 scaled version which comes in at 20.4GB and both of these are available as official ComfyUI Versions on HuggingFace.
For those of us with less than 24GB of VRAM the good news is there are quantised versions available and I decided to use the Q8 version which reports to be close to the BF16 model in terms of quality but is just under 22GB in size. In general I have been moving to Q8 quantised versions for several models as for my set-up they seem to offer a good balance of quality, performance and memory load. All of the quantised versions for Qwen Image Edit are available on Hugging Face.
With my 16GB RTX4060 Ti OC card (with 64GB of system RAM) I am seeing step times of around 15s, which is nearly double what I get with Flux.1 Kontext. You can use the Lightx2v Qwen Image Lightning LoRA to reduce the number of steps to 4 or 8 from the typical 25+ steps without too much impact on quality. I have not used it here to avoid any influence on the output.
Getting Started - Nano Banana
If you read any of my other posts you will know that I primarily focus on local generation using open-source models, typically with ComfyUI. I have made an exception with Google Nano Banana simply because of the amount of coverage it is getting and the incredibly positive reviews.
You can use Nano Banana within ComfyUI using the native Google Gemini Image node which hooks into the Google Gemini API and allows you to generate if you have credits on your ComfyUI account. For my testing, however, I have used Google Studio as I already had credits and I was not enhancing the workflow in ComfyUI.
Whether using the Google API or Google Studio, Nano Banana has the typical restrictions of closed online models - you have little control over the output in terms of settings, some content is restricted and, in the Nano Banana case, resolution is limited to 1.04 mega pixels, which is lower than Qwen Image Edit and Flux.1 Kontext. There is also a small watermark added to the bottom right corner of each image.
On the flip-side, being an online cloud-hosted model the generation is fast, in my testing it was around 10-15 seconds per image.
For testing purposes I am using the same samples as I used with Flux.1 Kontext so I can do a fair comparison across the models.
Simple Changes
The first test is a simple hair colour and style change.




To get the Qwen Image Edit version I did have to do a couple of iterations. The first learning being that with Qwen Image Edit you have to be very explicit in terms of what to change and importantly what not to change. Flux.1 Kontext tends to only change what you tell it to, whereas Qwen Image Edit takes a much freer approach unless you explicitly say not to change something.
The Qwen result is not bad there a couple of points to pick up on, the hair style has changed to long and wavy but I explicitly called out ‘brown hair’ whereas the output still has blonde highlights and in general is not as dark. This might be fixable by saying 'without blonde highlights in the prompt’ given the original did have blonder highlights. The cloud formation behind the woman’s head has also changed slightly, along with some overall image scaling.
Nano Banana has remained very true to the original image, it hasn’t gone as dark on the hair but otherwise composition is fine. The only slight criticism is that the image is a little softer than Flux.1 Kontext with perhaps a slight loss of detail.
Qwen Image Edit is OK, especially if you do not compare side by side, but I think Flux.1 Kontext holds truer to the original image. Nano Banana is good but I don’t think it betters Kontext.
Complex Object Removal
I made this test particularly difficult in the Flux.1. Kontext post, given basic object object removal is now commonplace and I was impressed with what it managed to achieve so I have kept the same test.




Top marks to Qwen Image Edit in this test, it certainly matched if not bettered Flux.1 Kontext in terms of removal. My only criticism would be there has been a little loss of detail.
Nano Banana has also done a good job and maintained an image quality of a similar level to Kontext. I would say better than Kontext apart from the addition of several random objects along the front.
Style Transfer
For these tests I have just used a very simple style prompt, no LoRAs or reference images, the beauty of Qwen is you could develop the style prompt into quite a detailed reference.



For the watercolour I thought in the original post that Flux.1. Kontext did a poor job, recently I have been using Flux.1 Krea to create watercolour styles and I think it achieves better results. Qwen Image Edit does a reasonable job but needs the prompt enhancing to give the style more character and feeling. Nano Banana looked surprisingly weak to me, maybe a more detailed prompt was needed.



Qwen oil painting style is fairly good, it is not quite as vivid or strong as Kontext but the general style is there so I think could be improved with some additional prompting. Nano Banana was a disappointment, not really varying from the watercolour style.



The default anime style for Qwen Image and Qwen Image Edit is very classic anime, it can be persuaded to do a broader range but it needs a detailed prompt. All Nano Banana seemed to do was change the eyes!



This one was just to see what a generic realistic painting style would look like, I think Qwen Image Edit does a better job here as the Kontext image is more of a softer version of the oil painting style. Nano Banana just kept with the basic painting style again.




With the cyberpunk style Qwen Image Edit doesn’t go as full-on as Kontext with the colouring but the image quality is good, although the facial shape and features are changed more than with Kontext. Nano Banana loses the plot and seems to add a rain effect and a few squiggles on the face with some limited colour change.
Consistent Characters
Creating consistent characters without a LoRA is a bit of a holy grail in image generation. Flux.1 Kontext was a massive step forward so it is a big test for Qwen Image Edit and Nano Banana.




With both Qwen and Nano Banana the overall character and scene consistency is good, even if both did miss the fact the prompt called out ‘leaning against the wall’.
Qwen does have the ‘softness’ which seems to creep in when compared to the original so has lost some of its ‘grittiness’ and detail. Nano Banana looks better on this aspect.
All the models supposedly only touch the parts of the image they need to, which really helps with consistency, however, with the Qwen version you can see the way it has infilled the ground does not match. This is not unique to Qwen as I have seen similar issues with Flux.1 Kontext. Nano Banana has done a slightly better job of the infill.



On this second test the character consistency is good across all of the models. The street setting and ground have lost some of their detail in Qwen and even the composition has changed a bit. This was with a prompt which explicitly called out keeping the background identical.
Nano Banana has made a better effort, certainly on par with Flux.1 Kontext. As with any of these comparisons it is important not to assume one or two poor runs are the norm, but so far I think Flux.1 Kontext has a slight edge over Nano Banana, with Qwen a little behind.
Text Manipulation
Flux.1 Kontext improved the text handling of Flux considerably, but Qwen Image is even better at more complex text so text manipulation should be strong with Qwen Image Edit.




No surprise that Qwen Image Edit updated this simple text test with no issues on the text. What was surprising is that Qwen Image Edit did a poor job on maintaining the font style. The softening/loss of detail that Qwen can suffer with was also noticeably present in the background of this image.
Nano Banana on the other hand did an excellent job, possibly better than Flux.1 Kontext.
Combining Images
Trying to merge two images is not something I do that often and when I do I find the results generally disappoint so I am only going to cover very briefly here. I originally did a test example using the two images below with Flux.1 Kontext so I have used the same images for Qwen Image Edit and Google Nano Banana.


The resulting Flux.1. Kontext combo image is shown below, using the core of the first image and a transplant of the woman in the second image, albeit some styling and detail has been changed.
Using the same approach with Qwen Image Edit created the image below, the style of the woman has been considerably changed and more of the second image is incorporated into the combination.
I tried a couple of image blends with Nano Banana but the output was a mess each time. Perhaps with more time and very precise prompts something better could be achieved but Nano Banana seemed to struggle with dealing with the proportions of people from two different images and how to fill in missing detail.
Conclusions
Image editing using diffusion models has come a long way very quickly. To be able to describe complex changes to images in a text prompt for the models to be able to produce the output they can is incredible when you think about what is being achieved.
Flux.1 Kontext led that charge and I think still holds its ground. As much as I like the Qwen Image model, the Qwen Image Edit model doesn’t quite deliver in the same way. It can produce some good output and with its more advanced prompt handling you can detail very specific changes, however, there tends to be too much change in the styling and loss of detailing. Perhaps with more tweaking of settings and prompts this can be improved on.
Nano Banana has garnered lots of praise and I think some of that is justified. In certain cases it produces excellent results but in others, style transfer for example, the results are mixed. I’m not sure though in any of the cases that Nano Banana stood out above Flux.1 Kontext. When you add in the resolution limitation, lack of settings control, the watermarking, content restrictions and cost, then Nano Banana doesn’t look that enticing if you are an experienced ComfyUI user. If though you want a simple to use and fast solution then Nano Banana fills that role very well.
Is there a standout leader? If I had to pick one model then Flux.1 Kontext would be my choice, but I do not think it is as standout any more as the gap is closing and there are cases where Qwen Image Edit may be my preference. Nano Banana in terms of image manipulation probably matches Flux.1 Kontext in most cases but its other limitations stop it from being the leader.
Overall it just means my collection of models continues to grow - there is no one perfect model, yet.




