Blog

Tax Refund Automation: AI-driven UI Test Automation Diary

Viva Republica

2026.01.04

·Service·by 이호민

#AI#Automation#LLM#QA#React#Testing

Key Points

1Facing the immense complexity and maintenance burden of automating QA for a multi-service tax refund platform, a QA manager experimented with leveraging AI to build and manage the entire E2E test suite.
2The solution involved integrating multiple AI tools, notably Claude Sonnet 4.5 as a versatile "AI team member" handling roles like SDET, Documentation Specialist, and Git Master, to generate code, manage architecture, and streamline operations.
3This AI-driven approach enabled one human to achieve the output of a 4-5 person automation team, effectively shifting the human's focus from writing code to defining problems, setting direction, and ensuring quality.

This paper describes a 5-month experiment (July-November 2025) by Suho Jeong, QA Manager at Toss Income, on implementing AI-driven E2E test automation for a highly complex tax refund service. The service's intricate nature, involving multiple UX flows, authentication methods, over dozens of deduction categories, and integration with various external systems (e.g., HomeTax scraping, payment gateways), made manual QA and traditional automation approaches unsustainable. Challenges included: Flow Complexity: Average 15-20 steps, each depending on different internal systems, external APIs, and policy engines. A "long relay race" where any timing mismatch or external system instability could cause failure. React-based UIs presented challenges with animations and overlays causing "click misses" or "timeouts" for automation tools. Frequent and Significant Changes: UI and policies evolved rapidly. Seemingly minor UI changes (e.g., button text, added questions, scraping UI redesign) required extensive re-work for hardcoded tests, affecting selectors, wait strategies, and flow logic. Unstable Environment: A/B testing, dynamic scraping tabs leading to "Target closed" errors, and varying domains across environments made robust automation difficult. To overcome these, the author hypothesized: "What if QA doesn't write code directly?" and "What if AI acts as a true team member, handling 99% of the automation?" The experiment concluded that 3 AI "team members" alongside 1 human could achieve the output of a 4-5 person automation team.Core Methodology: AI-Driven Test AutomationThe central strategy was for AI to generate code while the human focused on problem definition, requirement elucidation, context provision, constraint specification, and reviewing/approving AI outputs. This shifted the human's role from coding to directing and quality assurance.AI Tools and Personas:The solution leveraged a combination of AI tools, each assigned specific roles: Claude Sonnet 4.5 (Claude Code): Served as the primary development engine. SDET (Software Development Engineer in Test) Agent: Responsible for test design and architectural decisions. Documentation Specialist: Managed document structuring, generated daily logs/retrospectives based on commits, and maintained internal guides. Git Master: Handled commit messages, PR descriptions, and change summaries, ensuring clear version control history. Cursor: Acted as an in-IDE pair programmer, providing real-time assistance with type errors and import issues. Codex: Utilized for powerful code analysis and comparison tasks, such as identifying differences between test cases. Detailed Implementation and Technical Innovations:The paper highlights several key technical contributions enabled by the AI team: Page Object Model (POM) Introduction: Early in the project (July), as test cases grew, the SDET Agent proactively suggested and implemented a POM structure. This centralized selectors and actions for UI pages, drastically reducing maintenance effort when UI elements changed. For example, a RefundApplicationPage class would encapsulate methods like clickApplyButton() and fillAccountNumber(), abstracting the underlying Playwright selectors. This significantly simplified UI changes, requiring updates only in one place within the POM. typescriptexport class RefundApplicationPage { constructor(private page: Page) {} async clickApplyButton() { await this.page.click(this.selectors.applyButton); } async fillAccountNumber(account: string) { await this.page.fill(this.selectors.accountInput, account); } private selectors = { applyButton: 'button:has-text("환급 신청")', accountInput: 'input[name="account"]', }; } Dynamic Consent Flow Automation: Facing diverse and frequently changing consent (terms and conditions) flows across different services and entry points, the AI developed a utility to automatically detect and handle these variations. This involved: URL-based service type detection. Distinguishing new/returning users based on the number of checkboxes. Automatic matching of required consent items per service. Fallback mechanisms when a "Agree All" button was absent. This allowed a single clickInitialConsent() function to manage consent changes across four distinct services without breaking tests. Robust React UI Interaction (Handling "Visible" vs. "Interactable"): A critical challenge was Playwright's inability to click elements that were visually rendered but not yet interactable due to React's asynchronous rendering and event binding (e.g., useEffect). The AI precisely diagnosed this as a gap between DOM rendering/CSS layout completion and event handler binding. It proposed and implemented a standardized "Interaction Readiness Strategy": Staged Waiting for Interaction Readiness: Instead of arbitrary sleep commands, the AI developed waitForReactInteractionReady to ensure the element is not only visible but also has its event handlers bound. typescriptexport async function waitForReactInteractionReady(page: Page, selector: string) { // 1) DOMContentLoaded: Ensure basic DOM parsing is complete. await page.waitForLoadState('domcontentloaded'); // 2) Element Visibility: Wait for the element to be visible in the DOM. await page.waitForSelector(selector, { state: 'visible', timeout: 5000 }); // 3) React Hydration + Event Binding: Wait for React's internal processes to complete, // specifically checking if the 'onclick' handler is attached. await page.waitForFunction( (sel) => { const el = document.querySelector(sel); return !!el && typeof (el as HTMLElement).onclick === 'function'; }, selector, { timeout: 8000 } ); } Safe Click Fallback Strategy: A robust safeClick function was designed with a prioritized fallback mechanism to ensure interaction even under unstable conditions, avoiding destructive "force clicks" as a first resort. typescriptexport async function safeClick(page: Page, selector: string) { try { // 1. Standard click (preferred and safest) await page.click(selector, { timeout: 3000 }); return; } catch (_) {} try { // 2. Native keyboard interaction (e.g., pressing Enter), highly stable for focusable elements await page.keyboard.press('Enter'); return; } catch (_) {} // 3. Last resort: JavaScript dispatch of click event await page.$eval(selector, (el: HTMLElement) => el.click()); } This strategy improved click success rates from ~70% to 100% for problematic tests, indicating a deep understanding by the AI of React's lifecycle and Playwright's interaction model. Human-AI-Test Loop via Internal Messenger: A continuous feedback loop was established where test results (success/failure, execution time, environment, user ID, related branch) and detailed failure diagnostics (error messages, step, EventID, logs, screenshots) were automatically posted to an internal messenger. This enabled immediate human-AI collaboration for debugging, with the human requesting further analysis ("Analyze this case again") and the AI providing revised code, leading to a "conversation-driven" development flow. Results and Conclusion:Within 5 months, the team (1 human + 3 AIs) completed 35 robust E2E test scenarios, including complex ones like user withdrawal, HomeTax registration, and various deduction calculations. The AI significantly reduced the time spent on coding (less than 10% for the human) and maintenance. The project progressed from initial pilot tests to a stable operational phase, able to manage high rates of change.The author's main insight is that AI does not replace QA but amplifies its capabilities. While AI handles the speed of code generation and technical problem-solving, the QA professional's critical role shifts to defining the "direction" for that speed – focusing on problem definition, quality validation, and strategic guidance. This suggests a future where the ability to collaborate effectively with AI, rather than just coding proficiency, becomes a more valuable skill for QA professionals.

Blog

Tax Refund Automation: AI-driven UI Test Automation Diary

Viva Republica

2026.01.04

·Service·by 이호민

#AI#Automation#LLM#QA#React#Testing

Key Points

1Facing the immense complexity and maintenance burden of automating QA for a multi-service tax refund platform, a QA manager experimented with leveraging AI to build and manage the entire E2E test suite.
2The solution involved integrating multiple AI tools, notably Claude Sonnet 4.5 as a versatile "AI team member" handling roles like SDET, Documentation Specialist, and Git Master, to generate code, manage architecture, and streamline operations.
3This AI-driven approach enabled one human to achieve the output of a 4-5 person automation team, effectively shifting the human's focus from writing code to defining problems, setting direction, and ensuring quality.

This paper describes a 5-month experiment (July-November 2025) by Suho Jeong, QA Manager at Toss Income, on implementing AI-driven E2E test automation for a highly complex tax refund service. The service's intricate nature, involving multiple UX flows, authentication methods, over dozens of deduction categories, and integration with various external systems (e.g., HomeTax scraping, payment gateways), made manual QA and traditional automation approaches unsustainable. Challenges included: Flow Complexity: Average 15-20 steps, each depending on different internal systems, external APIs, and policy engines. A "long relay race" where any timing mismatch or external system instability could cause failure. React-based UIs presented challenges with animations and overlays causing "click misses" or "timeouts" for automation tools. Frequent and Significant Changes: UI and policies evolved rapidly. Seemingly minor UI changes (e.g., button text, added questions, scraping UI redesign) required extensive re-work for hardcoded tests, affecting selectors, wait strategies, and flow logic. Unstable Environment: A/B testing, dynamic scraping tabs leading to "Target closed" errors, and varying domains across environments made robust automation difficult. To overcome these, the author hypothesized: "What if QA doesn't write code directly?" and "What if AI acts as a true team member, handling 99% of the automation?" The experiment concluded that 3 AI "team members" alongside 1 human could achieve the output of a 4-5 person automation team.Core Methodology: AI-Driven Test AutomationThe central strategy was for AI to generate code while the human focused on problem definition, requirement elucidation, context provision, constraint specification, and reviewing/approving AI outputs. This shifted the human's role from coding to directing and quality assurance.AI Tools and Personas:The solution leveraged a combination of AI tools, each assigned specific roles: Claude Sonnet 4.5 (Claude Code): Served as the primary development engine. SDET (Software Development Engineer in Test) Agent: Responsible for test design and architectural decisions. Documentation Specialist: Managed document structuring, generated daily logs/retrospectives based on commits, and maintained internal guides. Git Master: Handled commit messages, PR descriptions, and change summaries, ensuring clear version control history. Cursor: Acted as an in-IDE pair programmer, providing real-time assistance with type errors and import issues. Codex: Utilized for powerful code analysis and comparison tasks, such as identifying differences between test cases. Detailed Implementation and Technical Innovations:The paper highlights several key technical contributions enabled by the AI team: Page Object Model (POM) Introduction: Early in the project (July), as test cases grew, the SDET Agent proactively suggested and implemented a POM structure. This centralized selectors and actions for UI pages, drastically reducing maintenance effort when UI elements changed. For example, a RefundApplicationPage class would encapsulate methods like clickApplyButton() and fillAccountNumber(), abstracting the underlying Playwright selectors. This significantly simplified UI changes, requiring updates only in one place within the POM. typescriptexport class RefundApplicationPage { constructor(private page: Page) {} async clickApplyButton() { await this.page.click(this.selectors.applyButton); } async fillAccountNumber(account: string) { await this.page.fill(this.selectors.accountInput, account); } private selectors = { applyButton: 'button:has-text("환급 신청")', accountInput: 'input[name="account"]', }; } Dynamic Consent Flow Automation: Facing diverse and frequently changing consent (terms and conditions) flows across different services and entry points, the AI developed a utility to automatically detect and handle these variations. This involved: URL-based service type detection. Distinguishing new/returning users based on the number of checkboxes. Automatic matching of required consent items per service. Fallback mechanisms when a "Agree All" button was absent. This allowed a single clickInitialConsent() function to manage consent changes across four distinct services without breaking tests. Robust React UI Interaction (Handling "Visible" vs. "Interactable"): A critical challenge was Playwright's inability to click elements that were visually rendered but not yet interactable due to React's asynchronous rendering and event binding (e.g., useEffect). The AI precisely diagnosed this as a gap between DOM rendering/CSS layout completion and event handler binding. It proposed and implemented a standardized "Interaction Readiness Strategy": Staged Waiting for Interaction Readiness: Instead of arbitrary sleep commands, the AI developed waitForReactInteractionReady to ensure the element is not only visible but also has its event handlers bound. typescriptexport async function waitForReactInteractionReady(page: Page, selector: string) { // 1) DOMContentLoaded: Ensure basic DOM parsing is complete. await page.waitForLoadState('domcontentloaded'); // 2) Element Visibility: Wait for the element to be visible in the DOM. await page.waitForSelector(selector, { state: 'visible', timeout: 5000 }); // 3) React Hydration + Event Binding: Wait for React's internal processes to complete, // specifically checking if the 'onclick' handler is attached. await page.waitForFunction( (sel) => { const el = document.querySelector(sel); return !!el && typeof (el as HTMLElement).onclick === 'function'; }, selector, { timeout: 8000 } ); } Safe Click Fallback Strategy: A robust safeClick function was designed with a prioritized fallback mechanism to ensure interaction even under unstable conditions, avoiding destructive "force clicks" as a first resort. typescriptexport async function safeClick(page: Page, selector: string) { try { // 1. Standard click (preferred and safest) await page.click(selector, { timeout: 3000 }); return; } catch (_) {} try { // 2. Native keyboard interaction (e.g., pressing Enter), highly stable for focusable elements await page.keyboard.press('Enter'); return; } catch (_) {} // 3. Last resort: JavaScript dispatch of click event await page.$eval(selector, (el: HTMLElement) => el.click()); } This strategy improved click success rates from ~70% to 100% for problematic tests, indicating a deep understanding by the AI of React's lifecycle and Playwright's interaction model. Human-AI-Test Loop via Internal Messenger: A continuous feedback loop was established where test results (success/failure, execution time, environment, user ID, related branch) and detailed failure diagnostics (error messages, step, EventID, logs, screenshots) were automatically posted to an internal messenger. This enabled immediate human-AI collaboration for debugging, with the human requesting further analysis ("Analyze this case again") and the AI providing revised code, leading to a "conversation-driven" development flow. Results and Conclusion:Within 5 months, the team (1 human + 3 AIs) completed 35 robust E2E test scenarios, including complex ones like user withdrawal, HomeTax registration, and various deduction calculations. The AI significantly reduced the time spent on coding (less than 10% for the human) and maintenance. The project progressed from initial pilot tests to a stable operational phase, able to manage high rates of change.The author's main insight is that AI does not replace QA but amplifies its capabilities. While AI handles the speed of code generation and technical problem-solving, the QA professional's critical role shifts to defining the "direction" for that speed – focusing on problem definition, quality validation, and strategic guidance. This suggests a future where the ability to collaborate effectively with AI, rather than just coding proficiency, becomes a more valuable skill for QA professionals.

View original