The end of large language model scaling?

OpenAI's strategy pivot, big open source releases, medical AI news and more

Welcome to the third issue of The Inference Times. In this issue we have the end of language model scaling, some big open source releases, a partnership and paper on medical AI and more!

Like what you see? Please forward this to a friend!

Front page:

1) Did OpenAI just announce the end of large language model scaling?

Yes, as it relates to parameter count. But combined with other news, it portends an interesting strategic shift. First, Sam Altman: “I think we’re at the end of the era where it’s gonna be these giant models, and we’ll make them better in other ways (source)” While models may grow in the future, the primary axes of quality improvement will not be parameter count.

For folks who’ve been following the research, this is a bit of a ‘no, duh’ announcement. The chinchilla paper found that compute-optimal training exhibited a scaling law where a doubling of parameters required a doubling of training data to arrive at compute-optimal model quality. The corollary is more salient: a much smaller model can outperform a much larger one given sufficient data and training resources. Given Altman’s statement and what we know about model and data scaling, it’s clear the greatest incremental returns now are in data augmentation, labeling, loss functions and other efficiencies.

Paired with OpenAI’s previous comments (see our last issue) to the effect that there’s no GPT-5 currently in training, we also see another factor at play. Not only is model size not an important differentiator, but OpenAI’s strategy will probably pivot slightly to translational, commercialization and productization work. When GPT-4 already outperforms most doctors, lawyers and programmers on many questions, bigger, badder GPTs aren’t a gating factor. The priority now is new partnerships, integrations, multi-modal capabilities and other features separate from model scaling and training.

The foundation models are more than good enough — now it’s time to invest in capabilities above and beside the models.

2) Multi-modal, open-source models!

Speaking of multi-modal capabilities, it’s refreshing to see open-source models beating OpenAI to the punch with public releases: MiniGPT-4 and LLaVa.

Imagine submitting a photo of ramen to the model and asking it for a recipe, or asking the provenance of a famous painting:

It’s not clear why OpenAI is dragging their feet on the public release of multi-modal capabilities, but these releases are pushing the state-of-the-art forward.

Want to know more about their approaches?

Both of these models rely on similar quasi-open language model, either LLaMa, or Vicuna (a ChatGPT fine-tune of LLaMa) and similar image encoders — CLIP and BLIP2.

The difference, is in how these models connect the visual encoder with the language model. Both use a projection layer that translates image inputs into language embeddings. MiniGPT-4 uses a two-step process, first fine-tuning LLaMa on instruction following, then using a frozen version of this model and optimizing the projection layer. In contrast, LLaVa uses the already instruction-tuned Vicuna, then optimizes the visual model, language model and projection layer end-to-end.

If you noticed the ‘quasi-open’ caveat above in describing LLaMa and its ilk and wondered why… Here it is: LLaMa (and many similar models) are licensed via GPL which makes commercial use tricky. Worse, the weights were only released by facebook on case-by-case basis but leaked onto torrent sites. RedPajama is an initiative to replicate LLaMa training data, model and weights unencumbered by the LLaMa GPL license and sketchy provenance. Bodes well for commercialization.

4) Epic and Microsoft partner to improve clinical messaging and data analysis… sometime in the future

Following on Microsoft’s announcement of a partnership with Dragon for ambient clinical documentation (AI listens to a doctor visit and drafts the doctor’s note - very cool!), Epic and Microsoft announced two new (if slightly less cool) applications of GPT-4: drafting patient messages and powering natural language queries in Epic’s data exploration tool, SlicerDicer.

Speeding up messaging will be a major boon for physicians and nurses, while the natural language capabilities will be a boon — most health record data is unstructured text.

As it stands, most medical research does not happen within SlicerDicer, but with anonymized data exports in python. Rather, this will give administrators and clinicians better citizen data analyst capabilities in what is otherwise a pretty anaemic BI tool.

The big question with all these partnership announcements is when we actually see the capabilities filter down to clinicians: How long does productization take and what % of hospital groups actually opt to pay for the Epic builds and Dragon integrations with these features (most hospitals opt for relatively few of Epic’s capabilities).

Find the article here.

5) Stable Genius

Stability.ai, the company behind Stable Diffusion, just dropped a major new open-source LLM family: StableLM in several different sizes: 3B and 7B for now, 15B and 65B in the future. Some nice features: 4k context width, instruction-tuned, more tokens used in training than most non-LLama models. Some downsides: No benchmarks or model details (yet), fewer training tokens than LLaMa and no commercial license for the instruction-tuned versions (since they’re aping GPT-3 responses).

6) No free lunch, says Reddit CEO

Reddit’s founder and CEO is making noises about charging for access to their corpus. Anyone who’s familiar with appending ‘reddit’ to their google search to sift usable results from the SEO wastleand will appreciate why you might want to include the reddit corpus! Quora has already released their own LLM, Poe. I suspect we’ll see more balkanization of training data with sites like reddit and stack overflow pulling away from open access.

Other News around the web

  • AI won’t take your job
    Everyone assumes LLM will decrease employment. This article makes good cases for why this might not be the case.

  • Microsoft has been working in secret on an AI chip code-named Athena, since 2019
    Joins Amazon, Google, Tesla and others designing their own silicon. Supposedly includes a team of 300 at Microsoft and OpenAI and Microsoft are working with the prototype now. Article (paywall).

  • Medical AI journal published by Eric Topol and collaborators
    A big paper published in Nature that serves as an overview of sorts on medical AI for people who read Nature but not Twitter. Coins the term Generalist Medical AI (GMAI)… who knows if it will stick. You can find the paper here, but Eric Topol’s breakdown on his own site is honestly superior.

    Serves as a helpful recap of what we all know: Multi-modal AI that has access to EHR data, imaging and unstructured notes will transform every aspect of medicine.

  • CheggMate joins Kahnmigo
    Following in Khan Academy’s announcement of Khanmigo, Chegg announced CheggMate. Both will provide homework help and adaptive tutoring powered by GPT-4. Sadly, both are gated by a waitlist.

🔧 Tool Time:

  • First fully-featured browser plugin for ChatGPT

    Yet to be seen if OpenAI approves it, but the demo video is promising: Drafting and posting a tweet, researching and completing a dinner reservation, etc. Essentially this breaks out of the prescribed integrations to give ChatGPT it’s own browser session to work it’s magic.

  • Great overview of Autonomous Agents

    If you haven’t experimented with BabyAGI, AutoGPT and others, you owe it to yourself to do so — seeing an AI agent create and complete a todo list is awesome. This article is a great place to start.

  • AI Functions by Databricks
    Nice feature to drop AI calls right in SQL. Should make it dead-easy for data analysts to do things like sentiment analysis without needing to serialize data and process in python. Continues the theme of LLMs: quick, high quality results without a data science team.

  • Transformer math

    Eleuther.ai released a resource to understand transformer computation and memory usage that’s quite helpful.

🧪 Research:
A little thin today in light of all the research above the fold!