Gemini 3.1 Flash-Lite: Built for intelligence at scale
News

Gemini 3.1 Flash-Lite: Built for intelligence at scale

The Gemini Team
2026.03.03
·Web·by 이호민
#AI#Gemini#Google AI#LLM#Vertex AI

Key Points

  • 1Google has introduced Gemini 3.1 Flash-Lite, a new AI model now available in preview via the Gemini API and Vertex AI, designed for high-volume workloads at significantly lower costs.
  • 2This model boasts a 2.5X faster Time to First Answer Token and 45% increased output speed compared to 2.5 Flash, while achieving high quality with an Elo score of 1432 and strong performance on reasoning and multimodal benchmarks.
  • 3Gemini 3.1 Flash-Lite is suited for tasks from high-volume translation and content moderation to complex UI generation and simulations, offering developers adaptive intelligence and "thinking levels" control for efficient, real-time applications.

Gemini 3.1 Flash-Lite is presented as the fastest and most cost-efficient model within the Gemini 3 series, engineered for high-volume developer workloads at scale. It is available in preview to developers via the Gemini API in Google AI Studio and to enterprises through Vertex AI.

The core methodology driving Gemini 3.1 Flash-Lite centers on achieving exceptional speed and cost-efficiency without compromising quality, making it suitable for high-frequency, real-time applications. Its pricing is set at 0.25per1millioninputtokensand0.25 per 1 million input tokens and1.50 per 1 million output tokens. Performance benchmarks indicate a substantial improvement over its predecessor, Gemini 2.5 Flash, with a 2.5X faster Time to First Answer Token and a 45% increase in output speed, as measured by the Artificial Analysis benchmark. This reduced latency is critical for building responsive, real-time user experiences.

In terms of quality, Gemini 3.1 Flash-Lite attains an Elo score of 1432 on the Arena.ai Leaderboard. It demonstrates strong performance across reasoning and multimodal understanding benchmarks, achieving 86.9% on GPQA Diamond and 76.8% on MMMU Pro. Notably, it surpasses Gemini 2.5 Flash and other larger Gemini models from prior generations in these quality metrics.

A key technical feature, referred to as "thinking levels," is integrated into AI Studio and Vertex AI. This functionality provides developers with granular control over the model's computational effort for a given task, allowing for optimization between speed, cost, and complexity. This adaptive intelligence mechanism is particularly beneficial for managing high-frequency workloads where cost is a primary consideration, while also enabling the model to tackle more complex tasks requiring deeper reasoning.

Its capabilities span a range of applications, from high-volume, cost-sensitive tasks such as translation and content moderation, to more complex assignments. Examples of sophisticated use cases include:

  • User Interface and Dashboard Generation: Dynamically generating user interfaces and dashboards, such as filling e-commerce wireframes with hundreds of products or creating real-time weather dashboards using live forecasts and historical data.
  • Simulation: Creating simulations.
  • Multi-step Task Execution: Developing SaaS agents capable of executing versatile, multi-step tasks for business operations.
  • Content Analysis: Rapidly analyzing and sorting large volumes of content, including images.

Early access developers and companies such as Latitude, Cartwheel, and Whering are leveraging 3.1 Flash-Lite, reporting its efficiency and reasoning capabilities in handling complex inputs with precision comparable to larger-tier models, while maintaining adherence to instructions.