Ultimate prompting guide for Nano Banana | Google Cloud Blog
Key Points
- 1Nano Banana models, powered by Gemini 3, are advanced image generation and editing tools that leverage deep reasoning and real-time web search information to deliver precise, high-quality visual results.
- 2Nano Banana 2 introduces Pro-level features including 4K upscaling, native aspect ratio support, and state-of-the-art multilingual text rendering, with models supporting extensive input token context windows for complex prompts.
- 3Effective prompting involves detailed narrative descriptions, multimodal inputs, integration with real-time data, and "Creative Director" controls over lighting, camera, and material textures, alongside seamless integration with other generative AI models like Veo and Lyria.
The Nano Banana models, specifically Nano Banana 2 and Nano Banana Pro, are advanced image generation and editing models built upon the Gemini 3 family of models. They leverage deep reasoning capabilities and real-world knowledge, including real-time information from web search, to produce precise, high-quality visual results with reduced trial and error. Nano Banana 2 is powered by Gemini 3.1 Flash Image, while Nano Banana Pro uses Gemini 3 Pro Image.
Key features of Nano Banana 2 include:
- More accurate visuals: Achieved through integration with real-time web search information and images.
- Fast, Pro-level features: Encompasses capabilities such as superior text rendering, multilingual translations, and upscaling functionality to resolutions up to 4K.
- Precision control: Offers native support for various aspect ratios (e.g., 16:9, 9:16, 2:1) and enhances visual fidelity with vibrant lighting and richer textures.
Technical Specifications:
- Context Windows: Gemini 3.1 Flash Image (Nano Banana 2) supports a maximum of 131,072 input tokens, while Gemini 3 Pro Image (Nano Banana Pro) supports 65,536 input tokens. Both models support a maximum of 32,768 output tokens.
- Resolutions: Both models offer built-in generation capabilities for 1K, 2K, and 4K visuals. Gemini 3.1 Flash Image also includes support for 512px (0.5K).
- Aspect Ratios: Both models support 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9. Gemini 3.1 Flash Image Preview further adds 1:4, 4:1, 1:8, and 8:1 aspect ratios.
- Image Inputs: Users can incorporate up to 14 reference object images in a single prompt. Supported MIME types include
image/png,image/jpeg,image/webp,image/heic, andimage/heif. - Document Inputs: Text and PDF files are supported, with a maximum file size of 50 MB for API and Cloud Storage imports, or 7 MB for direct uploads via the Google Cloud console.
- Outputs: Both models generate text and images.
- Model Knowledge Base: The knowledge cutoff date for both models is January 2025.
- Live Data: Both models are powered by real-time information from web search.
- Trust & Safety: All generated images incorporate C2PA Content Credentials and a SynthID watermark for provenance and authenticity.
Best Practices for Effective Prompting:
Effective prompting emphasizes specificity, positive framing, and iterative refinement. Prompts should commence with a strong verb indicating the desired primary operation. Key guidelines include:
- Be specific: Provide concrete details regarding subject, lighting, and composition.
- Use positive framing: Describe what is desired, not what is undesired.
- Control the camera: Employ photographic and cinematic terminology.
- Iterate: Refine images through conversational follow-up prompts.
Prompting Frameworks:
The paper outlines five core prompting frameworks:
- Image Generation:
- Text-to-Image Generation without References: This method requires narrative descriptions for the scene.
- Formula:
- Example: "A striking fashion model wearing a tailored brown dress, sleek boots, and holding a structured handbag. Posing with a confident, statuesque stance, slightly turned. A seamless, deep cherry red studio backdrop. Medium-full shot, center-framed. Fashion magazine style editorial, shot on medium-format analog film, pronounced grain, high saturation, cinematic lighting effect."
- Multimodal Generation (with References): This involves combining multiple reference images to guide the output, suitable for character consistency or merging objects into new environments.
- Formula:
- Example: "Using the attached napkin sketch as the structure and the attached fabric sample as the texture, transform this into a high-fidelity 3D armchair render. Place it in a sun-drenched, minimalist living room."
- Text-to-Image Generation without References: This method requires narrative descriptions for the scene.
- Image Editing: This framework focuses on modifying an existing base image.
- Conversational Editing (without New References): Allows for semantic masking (inpainting) by defining a text-based mask to modify specific parts while retaining others.
- Prompting tip: Explicitly state what should remain unchanged.
- Example: "Remove the man from the photo."
- Composition and Style Transfer (with New References): Involves incorporating new images to alter an existing one. This can include adding elements from an object image to a base image or recreating the content of an image in a different artistic style.
- Conversational Editing (without New References): Allows for semantic masking (inpainting) by defining a text-based mask to modify specific parts while retaining others.
- Real-Time Information from Web Search: Gemini Image models can search the web to generate images based on current information.
- Prompting changes: Instead of fictional descriptions, the model is instructed to retrieve real-world data and visualize it.
- Formula:
- Example: "[Search for current weather and date in San Francisco] + [Analytically, use this data to modify the scene (e.g., if raining, make it look grey and rainy)] + [Visualize this in a miniature city-in-a-cup concept embedded within a realistic, modern smartphone UI. prompted on Tuesday 3rd March]"
- Text Rendering & Localization: Nano Banana 2 and Nano Banana Pro excel at rendering sharp, legible text and support multilingual text generation in over 10 languages.
- Rules for best typographic results:
- Use quotes: Enclose desired words (e.g., "Happy Birthday").
- Choose a font: Describe the style or name (e.g., "bold, white, sans-serif font" or "Century Gothic 12px font").
- Translate and localize: Specify a target language for text output.
- Text-first hack: For text generation, it's recommended to first converse with the model to generate text concepts, then request an image incorporating that text.
- Example: "A high-end, glossy commercial beauty shot of a sleek, minimalist nude-colored face moisturizer jar resting on a warm studio background. The lighting is soft and radiant. Next to the product, render three lines of text with the following exact styling: For the top line, the word 'GLOW' in a flowing, elegant Brush Script font. For the middle line, the text '10% OFF' in a heavy, blocky Impact font. For the bottom line, the text 'Your First Order' in a thin, minimalist Century Gothic font." Then translate the text into Korean and Arabic.
- Rules for best typographic results:
- Prompting like a Creative Director: This approach involves employing studio-quality controls to elevate image generation.
- Design your lighting: Specify illumination techniques (e.g., "three-point softbox setup," "Chiaroscuro lighting," "Golden hour backlighting").
- Choose your camera, lens, and focus: Dictate hardware and photographic terms (e.g., "GoPro," "Fujifilm camera," "low-angle shot with a shallow depth of field (f/1.8)," "wide-angle lens," "macro lens").
- Define the color grading and film stock: Specify desired emotional tones through stylistic choices (e.g., "1980s color film, slightly grainy," "Cinematic color grading with muted teal tones").
- Emphasize materiality and texture: Define physical attributes of objects (e.g., "navy blue tweed," "ornate elven plate armor, etched with silver leaf patterns," "minimalist ceramic coffee mug").
Integration with Other Creative Models:
Nano Banana models are designed to work seamlessly with other generative AI tools:
- Nano Banana + Gemini: Gemini 3 can assist in prompt creation and creative direction.
- Nano Banana + Veo: Nano Banana can generate keyframes to direct an animation, with Veo then generating the video between them.
- Nano Banana + Veo + Lyria: Allows for generating visuals, then adding a custom AI soundtrack using Lyria.