A Big Year For Small Models

AI will be smaller, faster, cheaper, and better in 2024

Generated with DallE

2024 will be a big year for small LLM models.

People are going to realize for >90% of use cases you don’t want a generic LLM that can answer any question posed. Generic LLMs overwhelm most users (they don’t know what to ask), have to be large (to cover a vast surface area), and are impossible to debug and secure (that vast surface area needs to be QA’ed).

What we want are LLMs focused on a specific set of tasks. These models are easier for users and product designers to understand. They’re dramatically smaller while maintaining excellent performance in their one niche. This smaller size allows them to run, self-contained at the edge or on a server, faster and cheaper than your giant generic models.

Finally: niche, purpose-built models are a great fit for proprietary datasets, allowing companies to create unique, defensible products rather than simply repackage an LLM APIs.

Perhaps the best example of this is NVIDIA’s ChipNeMo, an LLM trained to help design new semiconductors. NVIDIA took LLaMA2, trained it on 130,000 proprietary documents, and shipped a 13 billion parameter model that beats the 70 billion parameter LLaMA2 in the chip design niche. NVIDIA, naturally, is ahead of the curve. But we’re going to see this trickle down.

Pick a company and think about the unique data it has to train small models. Then think about where they can deploy those models when they’re cheap to run. (I personally want a cooking assistant trained on Condé Nast’s recipe archive that lives on my counter. What purpose-built LLM do you want?)

Pete Warden highlights the potential for small, niche LLMs to run self-contained on microcontrollers, disconnected from networks; a vision he and I share. But when the size and cost of running these models is low enough for microcontrollers, the benefits will accrue server-side as well. Encapsulating shrunken and specialized models into cloud services is going to unlock all sorts of new use cases and interfaces.

Just watch: small models are gonna be big in 2024.

A Big Year For Small Models

AI will be smaller, faster, cheaper, and better in 2024

Have thoughts? Send me a note