Image editing in Gemini just got a major upgrade
Key Points
- 1Google DeepMind has launched the "Nano Banana" image editing model in the Gemini app, which prioritizes maintaining a consistent likeness of people and pets during various transformations.
- 2This upgrade enables users to perform advanced edits such as changing costumes and locations, blending multiple photos, executing multi-turn edits, and applying styles from one image to another.
- 3Now available in the Gemini app, all images created or edited with this new capability include both visible and invisible SynthID watermarks to indicate their AI-generated origin.
The Gemini app has released a significant upgrade to its native image editing capabilities, powered by a new model named Nano Banana from Google DeepMind, launched on August 26, 2025. This advancement primarily focuses on maintaining consistent character likeness during image manipulation, addressing the challenge of preserving identity in edited photographs of people and pets.
The core methodology driving this upgrade involves several advanced generative and image-to-image translation techniques. For likeness preservation, the system employs a robust identity-preserving mechanism. This implies the use of latent space representations where distinct features pertaining to an individual's identity (e.g., facial structure, unique markings of a pet) are disentangled from other image attributes such as pose, clothing, or background. When a user requests a modification, such as a costume change, a location alteration, or even a transformation across different time periods, the model reconstructs the image by preserving these extracted identity features while generating new content for the altered attributes. This ensures that the subject consistently resembles themselves, even under significant stylistic or contextual changes.
The update introduces several specific functionalities:
- Identity-Consistent Transformation: Users can provide an input image of a person or pet and specify scene or attire modifications (e.g., "put them in a 60's beehive haircut" or "place them in a professional outfit"). The underlying model performs conditional image generation, preserving the identity embedding of the subject while synthesizing the new visual elements.
- Multi-Image Blending (Compositional Generation): The system supports the synthesis of new scenes by combining elements from multiple distinct input images. This suggests advanced object detection, segmentation, and neural recomposition capabilities, where semantic components from different sources (e.g., a person from one photo, a pet from another, and a specified background) are seamlessly extracted, aligned, and rendered into a unified, coherent image.
- Multi-Turn Iterative Editing: The model facilitates an interactive, conversational editing workflow. Users can sequentially apply modifications to an image, with the model retaining context from previous operations. This implies a generative architecture capable of fine-grained, localized modifications without corrupting unaffected regions, potentially leveraging attention mechanisms or explicit regional controls within the latent space. For instance, a user can modify a room's wall color, then add furniture, and subsequently adjust other elements, with the system maintaining state and applying changes incrementally.
- Cross-Image Style/Texture Transfer: The system enables the application of specific visual attributes (e.g., color, texture, pattern) from one source image onto an object in a target image. This extends traditional neural style transfer by allowing for more granular, object-specific attribute mapping, for example, transferring the texture of flower petals onto rainboots. This suggests a disentanglement of content and specific textural/pattern attributes, allowing for their recombination.
All images generated or edited within the Gemini app incorporate both a visible watermark and an invisible SynthID digital watermark to clearly denote their AI-generated origin.