One of the practical limitations of generating images locally is resolution. Even on capable hardware, generating directly at 2048×2048 or higher consumes enormous amounts of VRAM and time. The smarter approach — and the one used by most experienced ComfyUI users — is to generate at a lower resolution and then upscale using a dedicated upscaling model. The results are typically sharper and more detailed than a direct high-resolution generation, and the workflow is more efficient. This guide covers everything you need to know about upscaling in ComfyUI, from installing the models to building a workflow that produces clean, detailed enlargements.
Why Upscaling Matters for Local Generation
AI upscalers are not the same as traditional bicubic or Lanczos scaling. Models like RealESRGAN and 4x-UltraSharp were trained to reconstruct realistic details — they add texture, sharpness, and fine structure rather than simply interpolating between existing pixels. The difference on a face, a piece of fabric, or a detailed background is immediately visible. A well-upscaled 512×512 image frequently looks better than a poorly generated 1024×1024 one.
There is also a practical VRAM argument: generating at 512×512 or 768×768 and upscaling to 2048×2048 is typically faster and uses less VRAM than generating at 2048×2048 directly, particularly when running models like SDXL or Flux on consumer GPUs.
Downloading Upscale Models
Several high-quality upscale models are freely available. The most widely used are:
- RealESRGAN x4plus: Good general-purpose upscaler. Handles a wide range of input types and is robust on photorealistic images.
- 4x-UltraSharp: A community favourite for anime and illustrated styles. Produces crisp, detailed results with strong edge definition.
- ESRGAN 4x: The original ESRGAN model. Slightly more aggressive on sharpening than RealESRGAN, which can be desirable or problematic depending on your input.
- 4x-Remacri: A balanced option that works well on both photorealistic and stylised content without over-sharpening.
These models are available on Hugging Face and on the upscalers section of Civitai. Files end in .pth rather than .safetensors for older ESRGAN-based models, though newer models use .safetensors.
Where to Put Upscale Model Files
Place your downloaded upscale model files in:
ComfyUI/models/upscale_models/Create the folder if it does not already exist. Restart ComfyUI after adding new files, or use the ComfyUI Manager refresh function to make them available in node dropdowns.
The ImageUpscaleWithModel Node
The core upscaling node in ComfyUI is ImageUpscaleWithModel. To add it, right-click the canvas, select Add Node, navigate to image > upscaling, and select ImageUpscaleWithModel.
You will also need a Load Upscale Model node (found in loaders) which provides the upscale model to the ImageUpscaleWithModel node. Connect them as follows:
- Add Load Upscale Model and select your upscaler file from the dropdown.
- Add ImageUpscaleWithModel and connect the Load Upscale Model output into its
upscale_modelinput. - Connect your generated image (from a VAEDecode or Save Image node chain) into the
imageinput. - Connect the output to a Save Image node.
The upscale factor is determined by the model itself — 4x models quadruple the resolution, 2x models double it. A 512×512 input through a 4x model will produce a 2048×2048 output.
Tile-Based Upscaling for Large Outputs
When upscaling to very large resolutions, the upscale model processes the entire image in one pass, which can exceed your GPU’s VRAM. Tile-based upscaling solves this by splitting the image into overlapping tiles, upscaling each individually, and then reassembling them seamlessly.
The Ultimate SD Upscale custom node (available via ComfyUI Manager) implements this approach. It is one of the most popular upscaling nodes precisely because it handles large outputs reliably on limited VRAM. Install it via Manager, then add the UltimateSDUpscale node to your workflow. This node requires a KSampler-style setup because it can optionally run an img2img pass over each tile to add fine detail during upscaling.
Combining Upscaling with img2img for Detail Enhancement
The most powerful upscaling workflow in ComfyUI combines a standard upscale model pass with a subsequent img2img (image-to-image) denoising pass. This is the equivalent of a HiRes Fix in AUTOMATIC1111 — it uses the diffusion model to refine and add detail to the upscaled image rather than simply enlarging it.
The workflow looks like this:
- Generate your image at base resolution (e.g. 512×512 or 768×768).
- Pass it through ImageUpscaleWithModel to get a 4x enlarged version.
- Feed the enlarged image into a VAEEncode node to convert it back to latent space.
- Pass that latent into a KSampler with a low denoise value — typically between 0.35 and 0.55. Too high and the model will significantly alter the composition; too low and it will add very little detail.
- Decode and save the output.
This two-stage approach produces significantly more detailed results than upscaling alone, at the cost of additional generation time.
Memory Considerations
A 4x upscale of a 768×768 image produces a 3072×3072 result. Loading this into VRAM for an img2img pass is demanding. If you encounter CUDA OOM errors during the img2img step, reduce the upscale factor first (try 2x instead of 4x), or use the tile-based approach via Ultimate SD Upscale to process the image in smaller chunks.
You can also use CPU offloading. Launch ComfyUI with --lowvram if memory is tight, or process the upscale and img2img steps in separate queue runs rather than a single chained workflow.
Recommended Workflow Structure
For reliable, high-quality results, a proven workflow structure is:
- Generate at 768×768 (SD 1.5) or 1024×1024 (SDXL) with your standard text-to-image workflow.
- Pass the output through ImageUpscaleWithModel using 4x-UltraSharp or RealESRGAN x4plus.
- If quality allows, stop here and save. If you want more detail, continue with the img2img step.
- Run a KSampler pass at denoise 0.4 with the same positive prompt, checkpoint, and any LoRAs from the original generation.
- Save the final output.
Choosing the Right Upscale Model for Your Content
Not all upscale models produce the same results on every type of content, and it is worth testing a few before settling on a default. RealESRGAN x4plus is trained on real-world photographs and tends to over-smooth illustrated or anime-style content. 4x-UltraSharp handles both styles better but can introduce a sharpening artefact on very smooth gradients like skies or skin. 4x-Remacri is a good middle ground for users who generate a mix of styles.
The fastest way to evaluate upscalers is to save a sample output image and run it through each model using a simple ImageUpscaleWithModel workflow without any img2img pass. Compare the results at 100% zoom on a detail-rich area — hair, fabric, and background textures are good test regions. Once you have identified your preferred upscaler for each style, you can integrate it into your standard generation workflows.
For a full index of every ComfyUI guide on Serverman, see the ComfyUI complete guide and hub.