How well is code generation assisted by artificial intelligence? – The world of computers

“In our early experiments, we did a lot of work in Python, JavaScript, and similar languages,” GitHub COO Kyle Daigle said in an earlier interview with The computer world. “GitHub is mostly a Ruby company, but we also write in Go, and C, and FirGit. And so we expanded our Copilot use cases and used it in different languages. But in general, Copilot can work in the vast majority of languages that are in the public domain.”

Relying only on user instructions based on natural language processing, genAI-assisted code generators can offer software code suggestions ranging from snippets to full functions. And updates can make the tools even better.

Amazon, for example, said updates to its CodeWhisperer tool increased code acceptance rates from about 20% on average to 35% across all languages and use cases.

“Now, with Amazon Q included in CodeWhisperer, developers can query their code and take advantage of Amazon Q’s capabilities to debug, optimize, and translate the code they’re working on,” said Doug Seven, general manager of Amazon CodeWhisperer and director of software development for Amazon Q. is in the blog.

Why is AI-assisted coding so powerful?

One of the more heralded aspects of AI-assisted coding is that users do not need to be knowledgeable in software development. Natural language processing allows even business users to simply write a query and get back the software needed for any number of projects.

For example, users can write a natural language comment that outlines a specific task in English, such as, “Load a file with server-side encryption.” Based on that information, CodeWhisperer recommends one or more code snippets directly in the development platform to accomplish the task, according to an Amazon spokesperson.

Many coding tools also come with improved code security scanning and code remediation suggestions. Some even come with “bias” filtering and reference tracking tools, which detect whether a code proposal might be similar to open source training data. The latter are important features in an AI-based coding assistant.

Amazon and other service providers they are also experimenting with tools to help non-developers build applications for business purposes. For example, Amazon is testing and prototyping a tool called PartyRock that allows non-developers to work with genAI and LLMs in a sandbox environment.

“You can experiment with making different apps,” Seven said in an interview with The computer world. “We’re going to see the rise of different tools for different people using generative A. I think we’re just scratching the surface of where we’re going to see genAI in different places. We’re going to start seeing more and more of these tools.”

Accuracy rates vary

The seven mentioned code acceptance rates for CodeWhisperer are around 30% to 40%, but that doesn’t mean the code he wrote was incorrect or had errors. Acceptance rate refers to whether the genAI tool correctly interpreted what the developer asked it to do.

Seven described something similar to a conversation between a programmer and an AI-code generator, where the programmer asks it to produce something and then modifies the request with accompanying requests. CodeWhisperer’s ability to produce usable error-free code is “pretty high,” though Seven said Amazon doesn’t disclose internal metrics.

Anecdotally, developers and IT leaders put the ability of popular AI-based code extension tools to correctly generate usable code anywhere between 50% and 80%.

“We had this as a hypothesis. Now we’re starting to see it in real studies,” said Derek Holt, CEO of digital transformation services provider Digital.ai.

According to a Cornell University study last year, there is a big difference between different genAI coding tools. The study found that ChatGPT, GitHub Copilot, and Amazon CodeWhisperer generated correct code 65.2%, 64.3%, and 38.1% of the time, respectively.

Although the study is a year old, accuracy rates for AI-assisted coding tools today are “more or less the same,” according to Burak Yetiştiren, lead author of the paper and a graduate student researcher at UCLA’s Henry Samueli School of Engineering and Applied Science.

A study by GitClear, a developer tool for GitHub and GitLab that provides code analysis and git statistics, examined more than 153 million lines of code from 2020 to 2023. Highlighting key changes in code churn, duplication and age, it explored the impact AI tools like GitHub Copilot on programming practice.

Among GitClear’s findings was that developers write code 55% faster when using Copilot. When GitClear looked at GitHub’s code quality and maintainability compared to what a human would write, it found that less experienced developers have a greater advantage in AI-assisted programming compared to veteran developers.

GitHub data suggests that younger developers use Copilot about 20% more than more experienced developers, the study found.

GitClear conducted a corresponding survey of 500 developers and asked, “What metrics should you be judged on when you’re actively using AI?” The top three issues they cited were code quality, time required to complete a task, and the number of production incidents.

“When developers are inundated with quick and easy suggestions that will work in the short term, it becomes a constant temptation to add more lines of code without really checking whether the existing system can be improved for reuse,” GitClear’s document states.

More code but more bugs?

Developers produce 45% more code with automation tools, according to Digital.ai’s Holt, but that’s not necessarily a good thing.

“However, the main challenge with AI-assisted programming is that it becomes so easy to generate a lot of code that shouldn’t have been written in the first place,” said Adam Tornhill, founder and CTO of CodeScene, on the X/Twitter.

Another wrinkle is that when code isn’t generated by humans, it’s more opaque. As a result, quality challenges arise, including questions about whether code can be effectively tested for errors and security holes.

In a survey last year of software engineers (96% of whom used AI-based coding tools) by developer security platform Snyk, more than half said unsafe AI code suggestions were common.

“That shouldn’t surprise us,” Holt said. “It’s still early days and we’re training these models on all the code in the specific repositories. All you will do is repeat the mistakes made by the programmers who wrote that original code.”

Since much of a developer’s time is spent fixing existing code — not writing new features — the ability to read code and find problems when it wasn’t written by humans becomes another problem, Holt said.

Even with these issues, developers wouldn’t adopt tools like Copilot if they didn’t believe it accelerated their ability to produce code. GitHub’s research on the first point found that “developers are 75% more fulfilled when they use Copilot.”

In a study of 450 Accenture developers who used Copilot for six months, 88% of suggested code was retained, build success rates increased by 45%, and every developer surveyed reported that Copilot was useful, according to Microsoft Silver.

Issues with draining, moving, and copying/pasting code

GitClear, however, he also found that with the increased use of AI-assisted programming, the amount of “Churn”, “Moved” and “Copy/Past” code increased significantly.

“Churn” is the percentage of code that is pushed to the repository and then reverted, removed, or updated within two weeks. It was relatively rare for developers to author their own code; only 3% to 4% of the code was dropped before 2023.

But overall code churn jumped 9% the first year Copilot was available in beta — the same year ChatGPT became available.

From 2022 to 2023, the rise of AI assistants was strongly correlated with “bug code” being pushed into the repository. Copilot prevalence — its use in code generation — was 0% in 2021, 5% to 10% in 2022, and 30% in 2023, GitClear found.

“If the current pattern continues in 2024, more than 7% of all code changes will be undone within two weeks, twice as many as in 2021,” GitClear’s report said.

There is perhaps no greater scourge to long-term code maintenance than copy/paste code. This is because code that is simply reused may also contain previous bugs, security holes, or other problems.

“I have no doubt that we will be able to detect problems and we will be able to train models on small amounts of code created only by our best developers,” Holt said. “But right now you’re getting a junior developer, and if you’re not paying attention to what that means for the broader software development lifecycle, you’re going to expose yourself to some risks.”

Amazon’s Seven claims to be one of CodeWhisperer’s strengths and other products is their ability to examine existing code for errors and then suggest changes. “So they’re actually going to give you the code for that change,” Seven said. “The advantage of using Amazon Q [CodeWhisperer] in this context, as a developer, you have a debugging companion.”

This “could be particularly useful in checking for inconsistencies in existing code that developers may not be familiar with. AQ is really good at that,” he said.

Another advantage of automated tools is that they can be used in a set-and-forget manner, where a developer or engineer simply explains a task and then the tools complete it on their own—whether they’re developing a new application or debugging an existing one. “In both cases, the accuracy of the code and the quality of the code are really quite high,” Seven said.

What is not in doubt is that software generation tools will continue to improve over time — although there will always be a need for a human in the loop.

“My gut feeling is that there will always be a role for developers, whether it’s reviewing or cataloging or a mix of both,” Holt said. “We’re not even talking about the fact that delivering code isn’t the goal. … The real goal is to deliver great features that users love.

“So in my opinion, I still have a long career ahead of me in software development.”

Source link