GitHub - moeru-ai/airi: πŸ’–πŸ§Έ Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.
Service

GitHub - moeru-ai/airi: πŸ’–πŸ§Έ Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.

moeru-ai
2026.03.07
Β·GitHubΒ·by 배레온/λΆ€μ‚°/개발자
#AI#Game Playing#LLM#Real-time Chat#Virtual Companion

Key Points

  • 1Project AIRI aims to re-create a versatile, open-source "digital human" or "AI waifu" inspired by Neuro-sama, enabling users to own and interact with a cyber living companion across various platforms.
  • 2Built upon Web technologies like WebGPU and WebAssembly, AIRI supports features such as gaming (Minecraft, Factorio), real-time chat integration, VRM/Live2D model control, and client-side inference with diverse LLM API providers.
  • 3Designed for browser, desktop, and mobile environments, the project fosters community contributions from developers, artists, and designers to expand its capabilities as a comprehensive virtual being platform.

Project AIRI is an open-source initiative dedicated to creating "digital humans" or "AI waifu/virtual characters" capable of sophisticated interaction, aiming to provide users with personal, persistent, and highly interactive digital companions. The project is heavily inspired by Neuro-sama, a virtual streamer, and seeks to address the limitations of existing AI chat platforms by enabling real-time game interaction, visual engagement, and broader computational capabilities.

The core methodology of Project AIRI revolves around a hybrid architectural approach that leverages modern web technologies for flexibility and reach, while integrating native capabilities for high performance. The project is built with extensive support for Web technologies, including WebGPU for graphics rendering and compute, WebAudio for sound processing, Web Workers for concurrent processing, WebAssembly (WASM) for near-native performance of specific modules, and WebSockets for real-time communication. This design allows AIRI to run natively in modern web browsers (Stage Web), on mobile devices via PWA support (Stage Pocket), and as a desktop application (Stage Tamagotchi).

For critical performance-sensitive tasks, particularly AI inference, the desktop version of AIRI deviates from a purely web-based execution model by integrating native hardware acceleration. It leverages NVIDIA CUDA and Apple Metal through the candle project (a Rust-based ML framework by Hugging Face), abstracting complex dependency management. This dual-pronged strategy ensures that while the user interface, layouts, animations, and plugin systems benefit from the web ecosystem's flexibility, heavy computational workloads, such as large language model inference, can exploit native GPU power.

The system architecture, as depicted in its flowchart, centers around a Core component that orchestrates interactions between various modules. Key technical components include:

  1. Memory System: Utilizes in-browser databases for persistence and contextual memory. DuckDB WASM provides a pure in-browser SQL database solution, and pglite offers a PostgreSQL-compatible in-browser database. A work-in-progress Memory Alaya system further enhances memory management, potentially incorporating advanced contextual understanding. Memory_PGVector is integrated for vector embeddings, supporting similarity search for retrieval-augmented generation (RAG).
  2. LLM Integration (xsAI): The project's language model capabilities are powered by xsAI, an internal lightweight SDK for interacting with various LLM APIs. This highly modular system supports a wide array of commercial and open-source LLM providers including OpenAI, Anthropic Claude, Google Gemini, Groq, Mistral, and local inference solutions like vLLM and Ollama. This modularity allows for diverse conversational capabilities and model flexibility.
  3. Sensory Input (Ears) & Output (Mouth):
    • Speech Recognition (STT): Handles audio input from the browser or Discord clients. It performs client-side speech recognition and talking detection. unspeech acts as a universal endpoint proxy for ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) services.
    • Speech Synthesis (TTS): Leverages third-party services like ElevenLabs for high-quality voice synthesis, enabling the AI character to "speak."
  4. Body (Visuals):
    • VRM Support: Allows integration and control of 3D VRM models, including dynamic animations (auto-blink, auto-look-at, idle eye movements) to enhance realism.
    • Live2D Support: Provides similar control over 2D Live2D models, enabling expressive character animations.
  5. Game Integration: A defining feature is the ability to interact with external applications, specifically games.
    • Factorio Agent (F_AGENT): Connects to Factorio servers via Factorio RCON API and utilizes the autorio library for in-game automation.
    • Minecraft Agent (MC_AGENT): Uses Mineflayer to interact with Minecraft servers.
These agents receive commands and provide feedback to the Core system, which then integrates game state into the character's responses and actions.
  1. UI Components: The front-end is built with Vue.js, TypeScript, and features StageUI for visual rendering, along with various UI and font libraries.

The project differentiates itself from other open-source VTuber projects by its deep integration of Web technologies from day one, its hybrid web/native performance architecture, and its comprehensive suite of features encompassing memory, multi-modal interaction, and game integration. It actively seeks contributions from developers, artists, and designers, fostering a collaborative environment to realize its vision of advanced digital companions.