study: are your engineers on autopilot while using copilot?
Copilot? More Like Autopilot
What is the true cost of the promised productivity gains of AI tools like MSFT’s CoPilot? After all, producing more lines of code doesn't necessarily mean you're producing software more effectively.
What's the code churn look like? How much duplication is produced? How many more bugs are created downstream of our fixes (a common wack-a-mole scenario created when engineers don't fully understand the greater system they are working on?)
Research indicates a number of changes in expected outcomes in the era of AI copilots. Not all of them are great.
The Not So Great
Every year that code assistants have become more available, the percentage of code that changes within 2 weeks has gone up. Today, 70% of new code deployed to a project is refactored within 2 weeks, while the amount of legacy code refactoring has gone down nearly 50% over the same time.
Additionally, the instances of duplicated code (5 lines or more,) has gone from 0.5% to 6.7% of commits. And, to make things worse, there is a similar trend of duplicating code not just within the same codebase, but within the exact same file, being added all in the same commit.
A prior study showed the worst of these possible outcomes was a real concern. They “found that developers using Copilot introduced 41% more bugs into their code.” It showed that while they spent less time writing code, they spent more of their time reviewing and refactoring that code. When it was all said and done they had only saved 1.7 minutes over the sprint cycle time.
Why is this happening? Lack of context. Your copilot doesn't know your codebase, it doesn't know your desired outcomes, and is merely offering you what it has seen people do when writing similar code. It’s the copilot, you are the pilot. I’m certain you wouldn’t accept autocomplete for every word you write in a text to your boss or significant other, why would you do so for your code?
A Time for Velocity
Don't get me wrong, velocity is rad. There's a time to care primarily about velocity and to not worry about technical debt as much. That's why frameworks like Ruby on Rails were so popular in early web 2. We could launch a product faster than ever before, prove traction cheaper than ever, while punting scalability concerns for a later day when your startup has hopefully raised another round of funding.
But, for all those RoR apps that did gain traction and scale, there was inevitably a day of reckoning - and it was painful. However, from day one it was a cost we knew we would have to pay. I fear orgs today don’t realize they’ll have to pay this cost in the future.
The concerning bit for this velocity narrative, however, is that Google’s data (see graph above) suggests that if their engineers were to increase their AI usage by 25%, it would reduce their overall delivery throughput by an estimated 1.5%. I can't help but believe this is due to the aforementioned churn, duplication, and downstream bug creation... The numbers tell me that folks are on autopilot, and rather than using the tool selectively, they’re giving control over to these tools and paying the price in hours of refactoring and debugging their way to delivery.
The Path Forward
It doesn't have to be this way. I don’t believe this is an indictment on the tools themselves, but rather a warning. Like everything, AI assistants must be used properly. Software organizations need to evangelize the difference between effectiveness and productivity now more than ever. Don't measure software in terms of lines of code, but instead champion business value created.
Let your team use these powerful tools, but get them off autopilot while doing so... Get their hands on the wheel unless you want your tesla to end up in a ditch.