Microsoft has announced that it is developing MAI-1, a new large language model (LLM) that may be large enough to compete with the largest constructed models on the market right now, such as GPT-4 and Google Gemini.
Microsoft AI CEO Mustafa Suleyman is in charge of the new 500 billion parameter model, known as MAI-1, according to a report published in The Information.
Suleyman was one of the founders of UK AI pioneer DeepMind and co-founder of AI business Inflection. He was only recently appointed by Microsoft to head its consumer AI development group.
This is a significant shift for Microsoft, which up until now has primarily depended on OpenAI-developed models to drive its competitive advantage in the generative AI space against key players like Google and AWS.
What is the MAI-1’s competitive landscape?
If 500 billion parameters are used in the construction of MAI-1, it will rank among the largest models that are currently available.
For example, Grok from Elon Musk’s xAI contains 314 billion parameters, whereas ChatGPT-4 from OpenAI is estimated to have about 1 trillion.
Other major competitors in AI, including Google and Anthropic, have not disclosed how many parameters are in their LLMs.
Since Microsoft has already invested a significant $10 billion in OpenAI, whose ChatGPT models have dominated the generative AI space up to this point, it’s unclear why the company would need to construct another LLM.
Why is MAI-1 being built by Microsoft?
In a post on LinkedIn, Microsoft CTO Kevin Scott attempted to play down the news while also seemingly reaffirming that the corporation did in fact use a model known as MAI.
“Just to summarize the obvious: we build big supercomputers to train AI models; our partner Open AI uses these supercomputers to train frontier-defining models; and then we both make these models available in products and services so that lots of people can benefit from them. We rather like this arrangement,” he said.
According to Scott, every OpenAI supercomputer is significantly larger than the one that came before it, and every frontier model that is trained is far more potent than the one that came before it.
“We will continue to be on this path–building increasingly powerful supercomputer for Open AI to train the models that will set pace for the whole field – well into the future. There’s no end in sight to the increasing impact that our work together will have,” he said.
According to Scott, AI models have been developed by Microsoft “for years and years and years,” and the company uses them in practically all of its services, products, and internal operations.
“The teams making and operating things on occasion need to do their own custom work, whether that’s training a model from scratch, or fine tuning a model that someone else has built. There will be more of this in the future too. Some of these models have names like Turing, and MAI. Some, like Phi for instance, we even open source,” he said.
Are LLMs the only options available?
Not every AI model needs to have an enormous number of parameters. Microsoft recently unveiled Phi-3, a small language model that it claims can outperform models the same size and the next size up in a range of benchmarks related to language, reasoning, coding, and math. For users wishing to develop generative AI applications, Phi-3 may be a more sensible option.
Constructing a large-parameter LLM is just one aspect of the story; AI firms are also vying for the greatest data sources to use in the training of their generative AI systems.
Only this past week, OpenAI and Stack Overflow announced a partnership to leverage the millions of questions and answers developers have posted on the knowledge site to improve ChatGPT’s responses.