The new gold rush is in LLMs, and the shovels are open-source code. Scrappy startups, with teams smaller than a college a cappella group, are cooking up language models that shame the giants of the industry.
Take Mistral: it is a company of 22 people, started six months ago. This LLM-maker is raising USD415m at USD2bn valuations. Mistral also announced its first LLM within weeks of being formed. It claims to outperform Llama 2 on most benchmarks.
X.AI was founded in July. In less than six months, and seemingly with a team of barely a couple of dozen and funding of less than USD40m (this is not counting USD1bn being raised now, of which USD135m are already coming in), the Elon Musk startup has already launched Grok, which is beating GPT3.5 on most benchmarks.
There are other examples: Anthropic has a team of less than 50 that built the magnificent Claude 2. Baidu's Ernie has been built, apparently, with a team of less than 100.
The barrier to entry in model-making ain't quite as low as shape sorting toys. But these stories suggest you no longer need an Ivy League PhD, Silicon Valley juice cleanse regimen, and keys to the Google data centers to cook up an impressive next-gen LLM. Companies are building LLMs within weeks and with tiny teams that beat LLMs that took their pioneers five+ years and hundreds of millions to build. This analyst has maintained for months that LLMs have ushered in a patentless world where anyone, anywhere, can build them at an extremely small budget.
The message is clear for a large company from India, Wall Street, Tokyo, or anywhere else: resist the temptation to just buy someone else's LLM off the shelf. Imagine the savings, the control, the bragging rights of building one's own LLM! And who knows, you might even unlock some AI superpowers along the way.
LLMs and foundation models are one of the greatest innovations of all time, but they are also one of the simplest to replicate, fostering extreme democratization. If it may appear that the skill set is only with a handful who worked in the initial foundation model building teams, this number is growing rapidly with an increasing number of teams quickly building their own products.
Once again, despite the seeming compute, expertise, data quality and other challenges, there are multiple proofs in front of us: models with billions of parameters that can beat GPT3.5 or Llama 2 are being built in weeks with teams of less than 25. So far, once you achieve this, there is likely a billion+ Dollar valuation on offer.
Maybe I should build an LLM and not launch a fund!