As a background hobby project I have been slowly digitising and restoring old family photos, primarily using a scanner and editing using Adobe Photoshop, which is laborious and with damaged photos did not give great results. I started to look at whether there were any ‘AI’ tools which could assist me and this led to my discovery of ComfyUI, described by the creators as ‘the most powerful open source node-based application for creating images, videos and audio with GenAI’.
A bold claim but having been using it for a few months I can see some justification as I have become hooked on what it is capable of and its ever expanding feature set.
So what exactly is ComfyUI? It’s an open-source workflow application for image creation (and to some degree video and audio) which can use a range of latent diffusion models coupled with many additional ‘custom nodes’ to perform a variety of operations. It’s a bit of a Swiss Army knife for image generation and manipulation.
It isn’t the only workflow application out there for image generation — AUTOMATIC1111 and Forge (a fork of Automatic1111) are also popular — but ComfyUI looks to be taking the lead with a lot of support from developers and wide diffusion model compatibility. It is not as simple to use as the others but it is highly configurable and powerful. Currently it runs using a browser interface but a standalone desktop app is now available in an early form.
If you are new to diffusion models and AI image generation rather than me trying to explain the basics have a read of Diffusion Models: A Practical Guide — it is from 2022 so although the basics remain true some aspects of the models have moved on considerably since then.
The big thing with ComfyUI is that, in contrast to most AI image generation tools (apart from Automatic1111 and Forge), you can run it locally on your own computer providing a more flexible and powerful approach than the cloud based options and in general you do not have to pay for images generated. I’ll talk about different models in a later post but Flux for example comes in several flavours not all of which are available to download, some require the use of an API to integrate with ComfyUI and therefore do incur generation costs.
The speed of development of latent diffusion models is incredible and a well produced image can be hard to tell from a photograph or human developed image but they are far from perfect. There are common areas which in particular stand out, text in images has been a challenge for many models, Flux has moved this forward but without explicit instructions (and sometimes even with) images are prone to random text and characters. Company logos and names are a mixed bag.
Specific locations such as ‘Buckingham Palace’ are also on the whole a non-starter, not that you could expect an AI model at this point to understand millions of locations all over the world. They excel with general locations — countryside, mountains, a city, etc. — not exact locations.
The human body can be a bit mixed, faces are reaching a quality and accuracy which can almost be indiscernible, hands most of the time are also very good, feet for some reason are more of a struggle. Additional or missing digits are not unusual, even extra limbs or weird body joints all crop up.
There is often a tendency for models to produce a form of default styles, poses and faces, working out how to prompt and guide the models in different ways to avoid this is part of the learning curve.
After a few months I’m not claiming to be an expert on ComfyUI or diffusion models, I’m just a tech-savvy user who has found a new hobby and learnt along the way. Across the internet there are many detailed guides and forums with highly technical information (some of which I will link to in posts) but I thought I would share my knowledge from the ‘average’ user perspective for the benefit of others and to hopefully gain feedback and suggestions.
Before we go too far though it is worth pointing out that ComfyUI is not like installing and using Adobe Photoshop, some technical knowledge and patience is required. You will most likely hit occasional problems that require some head scratching and a lot of Googling. This is a space which is evolving very quickly and comes with leading edge challenges.
I started with photo restoration, pleased with what I could achieve far faster than previously using manual techniques. I quickly expanded to basic image generation, and on to Loras, inpainting, outpainting, control nets, detailing, upscaling, a little bit of video and model refinement, along with learning about samplers, schedulers, guidance , denoise and a whole host of other factors.
More on all this later, first a post on the basics is required, what hardware do you need and how do you install ComfyUI.
Very interesting Chris… again, Ty for sharing your valuable research.