Codestral is the first generative AI model specifically designed for coding, developed by Mistral, a $6 billion French AI business financed by Microsoft.
Codestral is intended to assist developers in writing and interacting with code, just as other code-generating models. According to a blog post by Mistral, it was trained on more than 80 programming languages, such as Python, Java, C++, and JavaScript. In addition to writing tests, finishing coding tasks, and “filling in” missing code, Codestral can also respond to English inquiries regarding a codebase.
Although Mistral calls the model “open,” that is debatable. The startup’s license forbids using Codestral or any of its outputs for any kind of business endeavor. A portion is set up for “development,” yet even it has limitations: “Any internal usage by employees in the context of the company’s business activities” is expressly forbidden in the license.
It’s possible that Codestral was taught in part on copyrighted material. Mistral did not confirm or deny this in the blog post, but given the evidence, it is not surprising that copyrighted data was present in the startup’s prior training datasets.
Besides, Codestral might not be worth the hassle. Considering that the model has 22 billion parameters, it needs a powerful PC to run. (In essence, parameters define an AI model’s proficiency with a task, such as producing and evaluating text.) Furthermore, even while it outperforms the competition by certain benchmarks—which are, as we all know, unreliable—it’s hardly a landslide.
Although Codestral offers very slight performance gains and is unfeasible for the majority of developers, it will undoubtedly intensify the discussion around the viability of using code-generating models as programming assistants.
Generative AI techniques are undoubtedly being adopted by developers for some coding tasks. In a June 2023 Stack Overflow survey, 44% of developers stated they currently use AI technologies in their development process, and 26% said they want to do so in the near future. But these are clearly flawed tools.
GitClear’s examination of over 150 million lines of code contributed to project repos over the course of several years revealed that generative AI development tools are causing an increase in the amount of incorrect code that is pushed into codebases. Security experts have already cautioned that these kinds of tools have the potential to magnify already-existing vulnerabilities and security issues in software projects; a Purdue University study found that more than half of the responses provided by OpenAI’s ChatGPT to queries about programming are incorrect.
That won’t deter businesses like Mistral and others from trying to make money off of their ideas and acquire market share. Mistral announced a hosted version of Codestral this morning on both its premium API and conversational AI platform, Le Chat. According to Mistral, efforts have also been made to integrate Codestral into LlamaIndex, LangChain, Continue.dev, and Tabnine, among other app frameworks and development environments.