Paper

Measuring AI Ability to Complete Long Tasks

Thomas Kwa

2025.04.20

·Arxiv·by Anonymous

#AI#Task Completion#Benchmark#AI Capabilities#AI Safety

Key Points

1This paper proposes the "50%-task-completion time horizon" as a new metric, quantifying AI capability by the duration of tasks humans typically complete that AI models can finish with 50% success.
2Analyzing 170 diverse tasks, the study reveals that this AI time horizon has been doubling approximately every seven months since 2019, primarily due to enhanced reliability, reasoning, and tool-use capabilities.
3Extrapolating this exponential trend, the research suggests that AI systems could automate many software tasks currently taking human professionals a month to complete within the next five years.

R^2 \approx 0.83

Paper

Thomas Kwa

2025.04.20

·Arxiv·by Anonymous

#AI#Task Completion#Benchmark#AI Capabilities#AI Safety

1This paper proposes the "50%-task-completion time horizon" as a new metric, quantifying AI capability by the duration of tasks humans typically complete that AI models can finish with 50% success.
2Analyzing 170 diverse tasks, the study reveals that this AI time horizon has been doubling approximately every seven months since 2019, primarily due to enhanced reliability, reasoning, and tool-use capabilities.
3Extrapolating this exponential trend, the research suggests that AI systems could automate many software tasks currently taking human professionals a month to complete within the next five years.

R^2 \approx 0.83