EP 86. Agentic Workflow for Your Real Work(Lablup CEO Shin Jeong-gyu)

Video

EP 86. Agentic Workflow for Your Real Work(Lablup CEO Shin Jeong-gyu)

2026.02.18

·YouTube·by 네루

#Agent#AI#Backend#LLM#Workflow

Key Points

1Lablup's Backend.AI:GO, a local AI infrastructure and smart router, was developed in just 40 days, generating over 1 million lines of code, designed to manage local and cloud AI models with a focus on disaster recovery and high-speed routing.
2The development process, heavily reliant on AI agents (like Claude Code Max with 13 billion tokens consumed), revealed that IT competitiveness now correlates with token usage, shifting bottlenecks from merge queues to optimizing token generation speed and adaptive "thinking budgets" for AI.
3This agentic coding paradigm suggests a future where software code's value approaches zero, human roles transition to defining logic and managing AI agents, and the broader society will soon experience a massive acceleration in productivity as non-programmers adopt AI tools to automate their tasks.

The provided paper, a YouTube video transcript featuring Lablup CEO Shin Jung-kyu and host Noh Jung-seok, details the development and implications of Backend.AI:GO, a local AI model inference and routing tool.Backend.AI:GO Product Overview:Backend.AI:GO was developed by Lablup in approximately 40 days, resulting in about 1 million lines of code. It serves as a user-friendly interface for managing and utilizing various AI models, both local and cloud-based. Its origin stems from Lablup's decade-long experience with Backend.AI, an AI infrastructure operating system typically used for large-scale GPU clusters. The core concept evolved from the "Continuum router," designed for on-premise AI disaster recovery in critical sectors like healthcare and finance. The Continuum project's large scope led to a decision to extract and optimize only the smart routing component, focusing on speed.Key features demonstrated include: Model Management: Seamless search and download of models from platforms like Hugging Face, with detailed information (architecture, quantization, parameter distribution, vocabulary, KV cache memory footprint, transformer block structure, positional encoding). Inference: Local execution of models, leveraging available hardware (CPU, NVIDIA/AMD GPUs, Mac). Unified Access: Integration with various cloud AI providers (OpenAI, Gemini, Anthropic) and local inference engines (Ollama, LM Studio), all accessible through a single interface. Smart Routing: A visual router UI to monitor connections and latency, and advanced capabilities for fault tolerance like circuit breaking and model redirection in case of failures. Collaborative AI: A distinctive feature allowing users within a network (e.g., office) to pool their computing resources, enabling models to run on different machines while being accessible from any connected client. For instance, image generation could run on one PC, text models on another, and PDF processing on a third, with results shared across the team. Utility Features: Includes a built-in translation tool for various document types (PDF, TXT, DOCS) and images, along with benchmarking tools for model performance comparison. Agentic Coding Methodology:The development of Backend.AI:GO heavily relied on AI-driven development, specifically using Anthropic's Claude Code (referred to as "agentic coding"). Shin Jung-kyu employed Claude Code Max, running it across multiple machines (up to 8 PCs/VMs) and consuming approximately 13 billion tokens throughout the project. The rapid development (initial MVP for CES in 10 days, then 4x further development) highlights a significant shift in software engineering paradigms.The core methodology involves: Human-AI Collaboration via Context Building: Instead of direct instruction, the process begins with iteratively building context for the AI. This is achieved by asking the AI to explore topics, outline considerations, and then create "soul documents" like CLAUDE.md, PROGRESS.md, and PLAN.md within the project root. These Markdown files serve as persistent memory for the AI, informing its subsequent actions and ensuring alignment. The CLAUDE.md acts as the primary project definition, while PROGRESS.md and PLAN.md track work done and planned tasks, ensuring agents are always aware of project status upon restart or when different agents take over. Strategic Prompting: Shin Jung-kyu emphasizes a specific prompting style: Iterative Refinement: Rather than asking for a final product directly, prompts guide the AI through stages (e.g., "explore this topic," "tell me what to consider," "suggest ideas," then "create a command/skill based on these ideas"). Language Choice: Initially, English was preferred for token efficiency, but later, Korean was adopted due to human typing speed being the bottleneck. However, internal commands and skills are still generated in English. Politeness (Anthropomorphism as a Tool): Using polite language (honorifics in Korean) and phrasing instructions to avoid making the AI "defensive" (e.g., framing tasks as building data for "other agents" rather than suggesting the AI itself is being "fixed"). This is based on observations of how current AI models process context and the potential for "testing environment" awareness to affect output quality. Harnesses and Sub-Agents: The term "harness" refers to a structured automation layer, typically a cron job that executes Claude Code with specific prompts (e.g., Claude -p "command_name"). These harnesses automate routine development tasks. "Commands" are specific functionalities created by the AI for execution. Unlike "sub-agents" (which can be chained or run in parallel but cannot call each other to avoid infinite loops), commands can be called by agents, allowing for complex workflows. Parallelization: For large-scale tasks (e.g., translating 100 documents), the AI is instructed to fork the task into multiple sub-agents, each handling a manageable chunk (e.g., 4 documents per agent). This prevents "context explosions" and optimizes token usage. Lablup's internal system for Backend.AI:GO development uses similar cron-based harnesses to periodically scan GitHub issue trackers, validate new issues, generate ground plans, and queue tasks for AI agents to pick up and execute. Implications for the Future of Software Development:The speakers discuss profound shifts in the software industry: Code as a Commodity: With AI generating large volumes of code quickly, the intrinsic value of raw code approaches zero. The bottleneck shifts from code generation to human-AI interaction, UX design, and strategic problem-solving. Emergence of "Harness" Engineers: The role of the developer evolves from writing code to designing and refining "harnesses"—automated systems that orchestrate AI agents to perform complex tasks. This demands skills in prompt engineering, workflow design, and understanding AI's capabilities and limitations. "Instant Apps" and Lifecycle: Rapid AI-driven development will lead to a proliferation of "instant apps" (disposable software created for immediate needs). Only a small fraction will evolve into long-lived, maintained products, sustained by human commitment and "brand" trust. Shifting Value Capture: Value in the AI era is consolidating around core model development and hardware (GPUs, memory), rather than traditional software layers. Accessibility of AI Development: The ability to leverage AI for personal productivity is no longer exclusive to programmers. Examples like a CFO using Claude Code to automate report generation or a content creator building their own AI-driven content pipeline demonstrate that non-programmers can quickly enter the "acceleration curve" of AI-assisted work by focusing on automating their own tasks. Evolution of Computer Science: Traditional computer science concepts (data structures, algorithms, OS, networking) might become historical knowledge, while new curricula focus on model understanding, AI core engines, and how to integrate them with deterministic software logic. Accelerating Change: The current phase of AI development is not just about speed but "acceleration of acceleration," with bottleneck areas constantly shifting (from training to inference, then to multi-agent swarm orchestration). The next frontier for AI adoption will be its widespread application across diverse, non-IT domains, driven by its increasing accessibility. The overarching theme is a paradigm shift akin to previous major technological transformations (e.g., punch cards to keyboards, desktop software to web/mobile), but occurring at an unprecedented, accelerating pace. The true "wave" of disruption, where AI empowers all individuals to automate aspects of their work, is just beginning.

EP 86. Agentic Workflow for Your Real Work(Lablup CEO Shin Jeong-gyu)

Video

EP 86. Agentic Workflow for Your Real Work(Lablup CEO Shin Jeong-gyu)

2026.02.18

·YouTube·by 네루

#Agent#AI#Backend#LLM#Workflow

Key Points

1Lablup's Backend.AI:GO, a local AI infrastructure and smart router, was developed in just 40 days, generating over 1 million lines of code, designed to manage local and cloud AI models with a focus on disaster recovery and high-speed routing.
2The development process, heavily reliant on AI agents (like Claude Code Max with 13 billion tokens consumed), revealed that IT competitiveness now correlates with token usage, shifting bottlenecks from merge queues to optimizing token generation speed and adaptive "thinking budgets" for AI.
3This agentic coding paradigm suggests a future where software code's value approaches zero, human roles transition to defining logic and managing AI agents, and the broader society will soon experience a massive acceleration in productivity as non-programmers adopt AI tools to automate their tasks.

The provided paper, a YouTube video transcript featuring Lablup CEO Shin Jung-kyu and host Noh Jung-seok, details the development and implications of Backend.AI:GO, a local AI model inference and routing tool.Backend.AI:GO Product Overview:Backend.AI:GO was developed by Lablup in approximately 40 days, resulting in about 1 million lines of code. It serves as a user-friendly interface for managing and utilizing various AI models, both local and cloud-based. Its origin stems from Lablup's decade-long experience with Backend.AI, an AI infrastructure operating system typically used for large-scale GPU clusters. The core concept evolved from the "Continuum router," designed for on-premise AI disaster recovery in critical sectors like healthcare and finance. The Continuum project's large scope led to a decision to extract and optimize only the smart routing component, focusing on speed.Key features demonstrated include: Model Management: Seamless search and download of models from platforms like Hugging Face, with detailed information (architecture, quantization, parameter distribution, vocabulary, KV cache memory footprint, transformer block structure, positional encoding). Inference: Local execution of models, leveraging available hardware (CPU, NVIDIA/AMD GPUs, Mac). Unified Access: Integration with various cloud AI providers (OpenAI, Gemini, Anthropic) and local inference engines (Ollama, LM Studio), all accessible through a single interface. Smart Routing: A visual router UI to monitor connections and latency, and advanced capabilities for fault tolerance like circuit breaking and model redirection in case of failures. Collaborative AI: A distinctive feature allowing users within a network (e.g., office) to pool their computing resources, enabling models to run on different machines while being accessible from any connected client. For instance, image generation could run on one PC, text models on another, and PDF processing on a third, with results shared across the team. Utility Features: Includes a built-in translation tool for various document types (PDF, TXT, DOCS) and images, along with benchmarking tools for model performance comparison. Agentic Coding Methodology:The development of Backend.AI:GO heavily relied on AI-driven development, specifically using Anthropic's Claude Code (referred to as "agentic coding"). Shin Jung-kyu employed Claude Code Max, running it across multiple machines (up to 8 PCs/VMs) and consuming approximately 13 billion tokens throughout the project. The rapid development (initial MVP for CES in 10 days, then 4x further development) highlights a significant shift in software engineering paradigms.The core methodology involves: Human-AI Collaboration via Context Building: Instead of direct instruction, the process begins with iteratively building context for the AI. This is achieved by asking the AI to explore topics, outline considerations, and then create "soul documents" like CLAUDE.md, PROGRESS.md, and PLAN.md within the project root. These Markdown files serve as persistent memory for the AI, informing its subsequent actions and ensuring alignment. The CLAUDE.md acts as the primary project definition, while PROGRESS.md and PLAN.md track work done and planned tasks, ensuring agents are always aware of project status upon restart or when different agents take over. Strategic Prompting: Shin Jung-kyu emphasizes a specific prompting style: Iterative Refinement: Rather than asking for a final product directly, prompts guide the AI through stages (e.g., "explore this topic," "tell me what to consider," "suggest ideas," then "create a command/skill based on these ideas"). Language Choice: Initially, English was preferred for token efficiency, but later, Korean was adopted due to human typing speed being the bottleneck. However, internal commands and skills are still generated in English. Politeness (Anthropomorphism as a Tool): Using polite language (honorifics in Korean) and phrasing instructions to avoid making the AI "defensive" (e.g., framing tasks as building data for "other agents" rather than suggesting the AI itself is being "fixed"). This is based on observations of how current AI models process context and the potential for "testing environment" awareness to affect output quality. Harnesses and Sub-Agents: The term "harness" refers to a structured automation layer, typically a cron job that executes Claude Code with specific prompts (e.g., Claude -p "command_name"). These harnesses automate routine development tasks. "Commands" are specific functionalities created by the AI for execution. Unlike "sub-agents" (which can be chained or run in parallel but cannot call each other to avoid infinite loops), commands can be called by agents, allowing for complex workflows. Parallelization: For large-scale tasks (e.g., translating 100 documents), the AI is instructed to fork the task into multiple sub-agents, each handling a manageable chunk (e.g., 4 documents per agent). This prevents "context explosions" and optimizes token usage. Lablup's internal system for Backend.AI:GO development uses similar cron-based harnesses to periodically scan GitHub issue trackers, validate new issues, generate ground plans, and queue tasks for AI agents to pick up and execute. Implications for the Future of Software Development:The speakers discuss profound shifts in the software industry: Code as a Commodity: With AI generating large volumes of code quickly, the intrinsic value of raw code approaches zero. The bottleneck shifts from code generation to human-AI interaction, UX design, and strategic problem-solving. Emergence of "Harness" Engineers: The role of the developer evolves from writing code to designing and refining "harnesses"—automated systems that orchestrate AI agents to perform complex tasks. This demands skills in prompt engineering, workflow design, and understanding AI's capabilities and limitations. "Instant Apps" and Lifecycle: Rapid AI-driven development will lead to a proliferation of "instant apps" (disposable software created for immediate needs). Only a small fraction will evolve into long-lived, maintained products, sustained by human commitment and "brand" trust. Shifting Value Capture: Value in the AI era is consolidating around core model development and hardware (GPUs, memory), rather than traditional software layers. Accessibility of AI Development: The ability to leverage AI for personal productivity is no longer exclusive to programmers. Examples like a CFO using Claude Code to automate report generation or a content creator building their own AI-driven content pipeline demonstrate that non-programmers can quickly enter the "acceleration curve" of AI-assisted work by focusing on automating their own tasks. Evolution of Computer Science: Traditional computer science concepts (data structures, algorithms, OS, networking) might become historical knowledge, while new curricula focus on model understanding, AI core engines, and how to integrate them with deterministic software logic. Accelerating Change: The current phase of AI development is not just about speed but "acceleration of acceleration," with bottleneck areas constantly shifting (from training to inference, then to multi-agent swarm orchestration). The next frontier for AI adoption will be its widespread application across diverse, non-IT domains, driven by its increasing accessibility. The overarching theme is a paradigm shift akin to previous major technological transformations (e.g., punch cards to keyboards, desktop software to web/mobile), but occurring at an unprecedented, accelerating pace. The true "wave" of disruption, where AI empowers all individuals to automate aspects of their work, is just beginning.