AI Tracker - Monitor AI Developments

This directly validates the orchestrator-agent architecture Zac is building — if AI agents can tackle weeks-long coding tasks, the Claude-powered orchestrator could be trusted with substantially larger refactoring efforts across the 20+ app ecosystem, such as migrating multiple apps to Rails 8 patterns or implementing Solid Queue across all apps simultaneously. The MirrorCode benchmark results (reimplementing 16k-line codebases) suggest the rails-expert and test-engineer agents could handle full feature implementations in apps like territory_game or the prediction_sports tools with less human checkpointing. This also raises the bar for agent reliability infrastructure — the app_monitor and task_tracker apps become even more critical as longer autonomous runs increase the surface area for failures that need detection and recovery.

Evidence that AI can already do some weeks-long coding tasks

Interest Score Breakdown

Summary

How to Use in Your Ecosystem

Source