Google Develops Next-Gen AI Chip with 10x Performance Boost - Maeil Business Newspaper
Key Points
- 1Google unveiled Gemini 2.5 Flash, a cost-effective and efficient large language model designed for rapid processing in high-volume scenarios.
- 2Concurrently, Google introduced Ironwood, its 7th-generation AI chip (TPU) specifically optimized for LLM inference, boasting over 10 times the performance of its predecessor.
- 3Ironwood features 198GB HBM and aims to reduce reliance on Nvidia by providing a specialized inference solution, enabling services like Gemini 2.5 Flash to operate more competitively.
Google has unveiled new artificial intelligence (AI) models and AI semiconductor hardware, aimed at enhancing performance and cost-efficiency while reducing dependency on NVIDIA. During its 'Next 2025' annual event in Las Vegas, Google Cloud introduced the Gemini 2.5 Flash, a more accessible and cost-effective variant of its latest large language model (LLM), Gemini 2.5. The Gemini 2.5 Flash is designed to automatically adjust processing time based on the complexity of a query, enabling faster responses for simpler requests and thereby offering lower service costs. This makes it suitable for high-volume scenarios such as customer service, real-time information processing, and virtual assistants where efficiency is paramount.
Concurrently, Google revealed its 7th-generation AI accelerator, named Ironwood, a Tensor Processing Unit (TPU) specifically optimized for AI inference tasks. Ironwood's core methodology focuses on efficiently serving LLMs to a large user base, especially by supporting functionalities like Mixture of Experts (MoE) and advanced reasoning capabilities that are increasingly adopted by modern LLMs. Google states that Ironwood delivers over a 10x performance improvement compared to its predecessor, the TPUv5p, released in 2023. The chip is equipped with 198GB of High Bandwidth Memory (HBM), which allows for the processing of larger models and datasets. This HBM integration reduces the necessity for frequent data transfers, consequently enhancing overall performance. Samsung Electronics is noted as the supplier of the HBM to Google, facilitated through Broadcom, a co-developer of the TPU. The combination of Ironwood and Gemini 2.5 Flash is anticipated to provide competitive inference costs, positioning Google to reduce its reliance on NVIDIA, which currently dominates over 80% of the AI accelerator market and is also shifting its focus from AI training to inference.
Additionally, Google announced the 'Agent2Agent' protocol for inter-agent communication and confirmed support for the open-source MCP protocol. Thomas Kurian, CEO of Google Cloud, stated that over 4 million developers are currently utilizing Gemini.