Join leaders in Boston on March 27 for an exclusive night of networking, insights and conversation. Request an invitation here.
Today, Cognition, a newly founded AI startup backed by the Peter Thiel Founders Fund and tech industry leaders including former Twitter CEO Elad Gil and Doordash co-founder Tony Xu, announced a fully autonomous AI software engineer named “Devin”.
While there are multiple coding assistants out there, including the famous Github Copilot, Devin stands out from the crowd with his ability to manage entire development projects end-to-end, from writing the code to fixing the bugs associated with it to final execution. This is the first offering of its kind that can even manage projects on Upwork, the startup showed.
The announcement of Devin marks a significant shift in the AI-assisted development space, giving engineers a full-fledged AI worker for their projects, rather than a co-pilot who might just write barebones code or suggest snippets.
As of now, however, Devin remains private, and the company is only opening access to a select few customers, including Bloomberg reporter Ashlee Vance, who wrote about her experience using it here.
VB event
AI Impact Tour – Boston
Request an invitation
What exactly can Devin do?
In a blog post today on the Cognition website, Scott Wu, founder and CEO of Cognition and an award-winning sports coder, explained that Devin can access common development tools, including its own shell, code editor, and browser, within a sandbox computing environment to plan and execute complex engineering tasks that require thousands of decisions.
A human user simply types a natural language query into Devin’s chatbot-style interface, and the AI software engineer takes it from there, developing a detailed, step-by-step plan to solve the problem. It then starts the project using its developer tools just as a human would, writing its own code, fixing problems, testing and reporting progress in real-time, allowing the user to monitor everything as it works.
If something doesn’t look right to a human observer, the user can also jump into the chat interface and command the AI to fix it. This, Cognition says, allows engineering teams to delegate some of their projects to AI and focus on more creative tasks that require human intelligence.
In this way, Devin offers a new paradigm that could be a glimpse of how all software development — and computing in general — might be done in the near future: by AI workers overseen by human supervisors/users.
Capable of handling a wide range of development tasks
According to the demo shared by Wu, Devin is capable of handling a variety of tasks in his current form. This includes common engineering projects like implementing and improving apps/websites from start to finish and finding and fixing bugs in codebases to more complex things like fine-tuning a large language model using a link to a research repository on GitHub or learning how to use unknown technologies.
In one case, he learned from a blog post how to run code to create images with hidden messages. Meanwhile, in another, he tackled an Upwork project to run a computer vision model by writing and debugging code for it.
In the SWE-bench test, which challenges AI assistants with GitHub problems from real open source projects, an AI software engineer was able to correctly solve 13.86% of cases end-to-end – without any human help. In comparison, Claude 2 could only solve 4.80% while SWE-Llama-13b and GPT-4 could solve 3.97% and 1.74% of the problems, respectively. All of these models even required help, where they were told which file needed to be repaired.
The underlying technology remains undescribed
AI in software development is not new. There have been tools in this space for a long time, from the popular GitHub Copilot and StarCoder to Replit, which has several small models of AI coding on Hugging Face, and Codeium, which recently received $65 million in Series B funding at a $500 million valuation.
However, most of these offerings are heavily focused on using artificial intelligence to help with coding. They can generate barebones code from text queries, digest it with the relevant IDE context, or retrieve snippets, speeding up the team’s workflow. With Devin, Cognition AI seems to go a step (or more steps) further, giving a full-fledged AI worker to manage entire projects.
While the tool has yet to be tested, its ability to handle multiple steps—while staying on track—to complete a software engineering project is its biggest unique selling point. Cognition didn’t share how exactly it accomplished this feat and whether it uses its own proprietary model or a third-party model, but notes that the work is the result of its “advances in long-term thinking and planning.”
Currently, the company is in the process of increasing capacity and offers early access to Devin only to selected users. He says that interested parties who want to increase their engineering work can apply via email to gain access. Wider access is expected to open at a later stage.
Cognition also notes on its website that the coding is “just the beginning,” which seems to indicate that it could use its breakthrough thinking to launch similar AI agents/workers for other disciplines as well. The company has received $21 million in funding so far.
VentureBeat’s mission will be a digital town square for technical decision makers to gain knowledge about transformative enterprise technology and transactions. Discover our Briefings.