The Growing Role of Small, Nimble GenAI Models

SanchaNorris · ‎02-21-2024

The emergence of generative AI as a top headline was made possible by steady progress over many years. While many giant billion and trillion-plus parameter models were built, the opposite is true for smaller models with less than 20B parameters that could be just as accurate. I hosted a webinar, Small and Nimble — the Fast Path to GenAI, with Gadi Singer, VP of Intel Labs and Director Emergent AI Research Lab, and Moshe Berchansky, Senior AI Researcher at Emergent AI Research Labs, Intel, to unpack this trend. In this blog, I present the highlights of the webcast, and I encourage you to watch the full webcast to see the full depth of the discussion. Be sure to watch the very cool multi-modal generative AI demo, using a Llama 2 7B model, added the book Forest Gump as additional local content through Retrieval-augmented generation (RAG), and see the model generate text from images as well as text from the book.

The Developer’s Dilemma

When it comes to generative AI, developers have choices-- perhaps too many. A small number of giant models are good for general and multi-purpose, and a giant number of small models are for added efficiency, accuracy, security, and traceability. Building and architecting generative AI models need to consider the following factors:

Giant vs small, nimble models (smaller by 10x-100x)
Proprietary vs open source models
Retrieval-augmented vs retrieval-centric generation
General-purpose vs targeted, customized models
Cloud-based vs local (on-prem/edge/client) inference

Giant and brawny vs small and nimble

At the moment, “small and nimble” is roughly anything under 20 billion parameters. The size threshold is a moving target that may double in 2024, but it gives a snapshot comparison against 175 billion parameters for ChatGPT 3.5 or a trillion+ for others. Smaller models are more cost-effective to scale throughout the enterprise by being faster to run and easier to adapt continuously than giant ones.

Dolly, Stable Diffusion, StarCoder, DALL·E, and Phi are powerful examples of models at this scale. Microsoft Research’s Phi 2, at 2.7 billion parameters, recently showcased amazing progress of so-called “small language models” on benchmarks in terms of common sense, language understanding, and logical reasoning. Such results argue for significant roles of small models, including in mixed implementations alongside larger ones.

Proprietary vs open source

Gadi and Moshe point out how vital open source has been to the rise of small and nimble GenAI models. In February 2023, Meta released LLaMA, which included models of 7 and 13 billion parameters. It was very powerful, and it was introduced as open software. A succession of animal-named models followed rapidly, with Alpaca from Stanford built on LLaMA, then UC Berkeley’s Vicuna, followed by Falcon, Orca, and LLaMA 2, all within a few months.

The rapid, continuous, open evolution of GenAI is far beyond what any single organization could accomplish alone. While GPT continues to be more powerful at a wide variety of tasks, smaller models have caught up on some discrete benchmarks.

Retrieval-augmented vs retrieval-centric

Retrieval-centric models rely on data trained with the model. The earliest releases of GPT-3 relied entirely on data held within the GenAI model’s parametric memory. That approach cannot account for vital newer information, which can compromise business outcomes by relying on dated information. Retrieval-augmented generation (RAG) arose to mitigate this shortcoming. A retrieval front end draws on multiple vector stores that enable indexed, fresh data to be retrieved as additional context for the model. Thus, the input data is more verifiable and up-to-date, making results more reliable and solutions more valuable.

General-purpose vs targeted and customized

In discussions with enterprise customers about GenAI, we see an uptick in preference for targeted models customized for specific functionality versus blanket wishes for a general-purpose, all-in-one model. For example, a large healthcare provider asked, “Why would we want the same model to deal with our supply chain as with our patient files?” It’s a fair question, and today’s fine-tuning methods applied to smaller open source models are a powerful alternative.

Cloud-based vs local

No conversation about evolving AI models can be complete without factoring in data protection and privacy considerations. Any organization must make these careful considerations before shipping data off to where it has third-party exposure beyond ownership control. With smaller models, it is easier to keep data local, whether they run on PCs, private clouds, or otherwise.

Building on the small, nimble inflection point

Our researchers at Intel Labs are working on a constellation of augmentations for GenAI in the near term, including efficient LLMs and the technologies needed to support them.

Watch the webcast to learn more about this topic and see RAG in action.

Interested in learning more about AI models and generative AI? Sign up for a free trial on Intel Developer Cloud.

Elements within the blog cover photo are Intel Owned AI Generated Graphics
https://firefly.adobe.com/
February 20, 2024