
How AI Impacts Skill Formation
Key Points
- 1A randomized experiment found that AI assistance, when used to complete coding tasks with a new Python library, significantly impaired skill formation, including conceptual understanding, code reading, and debugging abilities, particularly for novice workers.
- 2While AI use did not yield significant average efficiency gains, it led to a 17% reduction in evaluation scores, with productivity gains observed only when participants fully delegated coding tasks, which compromised learning.
- 3The study identified six distinct AI interaction patterns, noting that three patterns involving cognitive engagement preserved learning outcomes, indicating that careful adoption is necessary to maintain skill development alongside AI-enhanced productivity.
This paper investigates how AI assistance impacts skill formation, particularly for novice workers in software engineering. Motivated by the increasing integration of AI tools in professional domains, the authors hypothesize that while AI may offer productivity gains, it could potentially hinder the development of core skills. The study focuses on the context of learning a new programming library on the job.
Core Research Questions:
- Does AI assistance improve task completion productivity when new skills are required?
- How does using AI assistance affect the development of these new skills?
Methodology:
The study employs a between-subjects randomized controlled experiment.
- Participants: 52 participants (26 per group) were recruited from a third-party crowd-worker platform. Selection criteria included self-reported Python experience (more than one year, weekly use), prior AI coding assistance use, and no prior experience with the Trio library.
- Task: Participants were tasked with learning and applying the Python Trio library, an asynchronous concurrency and I/O library. Two specific tasks were designed:
- Implementing a timer function concurrently with other operations (introduces nurseries, task starting, concurrent function execution).
- Developing a record retrieval function with error handling (introduces error handling and memory channels).
- Experimental Setup: The experiment was conducted on an online interview platform.
- Warm-up Phase: All participants first completed a standard Python coding task (adding a border to strings) to familiarize them with the interface and calibrate their general Python familiarity, without AI assistance.
- Trio Task Phase: Participants had a maximum of 35 minutes to complete the two Trio tasks.
- Treatment Group (AI Assistance): Had access to a chat-based AI assistant (GPT-4o base model) prompted to be an intelligent coding assistant. This assistant had access to the participant's current code and could generate full, correct solutions.
- Control Group (No AI): Completed the tasks independently.
- Evaluation Phase: After the Trio tasks, participants completed a skill assessment quiz and a demographic/experiential survey.
- Evaluation Metrics:
- Productivity: Task completion time.
- Skill Formation: Assessed via a 14-question, 27-point quiz covering 7 core Trio concepts. The quiz specifically measured:
- Debugging: Ability to identify and diagnose code errors.
- Code Reading: Ability to comprehend what code does.
- Conceptual Understanding: Ability to grasp core principles of tools/libraries.
- Code writing was intentionally excluded to avoid syntax-related confounds.
- Pilot Studies and Refinements: Four pilot studies were conducted, leading to key methodological adjustments:
- Non-Compliance: Initial pilots revealed significant participant non-compliance (using AI in control group or for evaluation). This led to switching platforms and implementing screen recording for verification in the main study.
- Local Item Dependence: The quiz design was refined to mitigate participants inferring answers by comparing questions, leading to splitting the quiz across multiple pages.
- Syntax Barriers: Pilot studies showed control group participants struggled with basic Python syntax (e.g.,
try/exceptblocks), unrelated to Trio. For the main study, syntax hints were added to ensure the focus remained on Trio-specific learning. - The number of main tasks was reduced from five to two to ensure all participants had a chance to engage with core concepts within the time limit and to align the evaluation specifically with these core tasks.
- Data Collection: Keystrokes, AI interaction transcripts (for the treatment group), and survey responses were collected.
Results of Main Study (Pilot Study D served as a strong indicator):
Pilot Study D, with 20 participants, showed significant differences:
- Productivity: The AI group completed tasks faster ().
- Skill Formation: The AI group performed significantly worse on the knowledge quiz (), indicating reduced learning retention.
The paper indicates that in the main study, using AI assistance to complete tasks resulted in a significant reduction in the evaluation quiz score by 17% (equivalent to two grade points, ). This decline primarily impacted conceptual understanding, code reading, and debugging abilities. Conversely, no statistically significant acceleration in task completion time was observed on average with AI assistance. The qualitative analysis attributes this lack of average productivity gain to the additional time some participants spent interacting with the AI, with some dedicating over 30% of their task time to composing queries. The study identified six distinct AI interaction patterns, finding that three patterns involving higher cognitive engagement (e.g., asking for explanations or conceptual questions) preserved learning outcomes even with AI assistance, suggesting that independent problem-solving and cognitive effort are crucial for skill acquisition.
Implications:
The findings suggest that AI-enhanced productivity is not a guaranteed shortcut to competence. While AI can improve immediate task performance, particularly for novices, its uncritical use can impair the formation of critical skills necessary for understanding, debugging, and supervising code. This highlights a potential trade-off between immediate efficiency and long-term skill development, emphasizing the need for careful adoption of AI assistance in workflows, especially in safety-critical domains, to preserve skill formation and ensure human competency in overseeing AI-generated outputs.