There’s a Problem WIth AI Programming Assistants: They’re Inserting Far More Errors Into Code
We Regret the Error
Proponents of generative AI have claimed that the technology can make human workers more productive, especially when it comes to writing computer code.
But does it really?
A recent report conducted by coding management software business Uplevel, first spotted by IT magazine CIO, indicates that engineers who use GitHub's popular AI programming assistant Copilot don't experience any significant gains in efficiency.
If anything, the study says usage of Copilot results in 41 percent more errors being inadvertently entered into code.
For the study, Uplevel tracked the performance of 800 developers for three months before they got access to Copilot. After they got Copilot, Uplevel tracked them once again for another three months.
To measure their performance, Uplevel examined the time it took for the developers to merge code into a repository, otherwise known as a pull request, and how many requests they put through.
Uplevel found that "Copilot neither helped nor hurt the developers in the sample and also did not increase coding speed."
"Our team’s hypothesis was that we thought that PR cycle time would decrease," Uplevel product manager and data analyst Matt Hoffman told CIO. "We thought that they would be able to write more code, and we actually thought that defect rate might go down because you’re using these gen AI tools to help you review your code before you even get it out there."
Spin Cycle
All this information is not so surprising when you realize that GitHub Copilot is centered around large language models (LLM), which are often prone to hallucinating false information and spitting out incorrect data.
Another recent study led by University of Texas at San Antonio researchers found that large language models can generate a significant number of "hallucination packages," or code that "recommends or contains a reference" to files or code that doesn't exist.
Tech leaders are starting to get worried that making use of AI-generated code may actually end up being more work.
"It becomes increasingly more challenging to understand and debug the AI-generated code, and troubleshooting becomes so resource-intensive that it is easier to rewrite the code from scratch than fix it," software development firm Gehtsoft CEO Ivan Gekht told CIO.
More on AI: Cops Say Hallucinating AIs Are Ready to Write Police Reports That Could Send People to Prison