Image generation (experimental) · Ollama Blog
Key Points
- 1Ollama now supports image generation on macOS, allowing users to create images from text prompts that save to their current directory, with inline previews available in compatible terminals.
- 2Two primary models are introduced: Z-Image Turbo, a 6B parameter model excelling in photorealism and bilingual text rendering, and FLUX.2 Klein, available in 4B and 9B sizes, specialized in legible text within images.
- 3Users can customize image generation through parameters such as output dimensions, iteration steps, random seeds for reproducibility, and negative prompts, with Windows, Linux, and additional features planned for future updates.
This document details the introduction of image generation capabilities within Ollama, initially for macOS, with subsequent support planned for Windows and Linux. The core functionality enables users to generate images directly from text prompts using various specialized models.
The primary method for image generation involves executing ollama run x/model-name "your prompt", with generated images saving to the current working directory. Inline previewing is supported in compatible terminals such as Ghostty and iTerm2.
Two distinct text-to-image models are presented:
- Z-Image Turbo: Developed by Alibaba’s Tongyi Lab, this is a 6 billion parameter model specializing in photorealistic image generation and proficient bilingual text rendering. It excels at creating realistic photographs, portraits, and scenes, alongside accurately embedding both English and Chinese text within images. The model is released under an Apache 2.0 license, allowing for open weights and commercial use.
- FLUX.2 Klein: From Black Forest Labs, this model emphasizes speed and adeptly handles readable text within images, making it suitable for applications like UI mockups and designs requiring typography. It is available in two parameter sizes: a 4 billion parameter version, licensed under Apache 2.0 for commercial use, and a 9 billion parameter version, distributed under the FLUX Non-Commercial License v2.1.
The image generation process can be extensively customized through several parameters:
- Image Location: Generated images are saved to the user's active directory, which can be modified by changing the terminal's working directory.
- Image Sizes: Users can control the output dimensions by setting
widthandheightparameters via/set widthand/set heightcommands. Smaller dimensions result in faster generation times and reduced memory consumption. - Number of Steps: This parameter, crucial in iterative generative models (often diffusion models), controls the number of denoising steps or iterations the model performs during generation. Fewer steps lead to faster generation but potentially less detail, while an excessive number of steps can introduce artifacts. Ollama automatically defaults to a recommended step count for each model to balance quality and speed.
- Random Seed: A numerical seed can be specified to ensure reproducibility of results. Using the same seed with an identical prompt will yield the same image, facilitating iterative refinement or sharing exact outputs. Different seeds will produce unique images even with the same prompt.
- Negative Prompts: This feature allows users to guide the model by specifying elements or characteristics they wish to exclude from the generated image, refining the output by steering the generative process away from undesirable outcomes.
Future developments include extending support to Windows and Linux, integrating additional image generation models, and introducing image editing functionalities.