The Inference Times
Posts
The (GPT) doctor is here: Why you may soon have better tools than your physician

The (GPT) doctor is here: Why you may soon have better tools than your physician

Plus: Pharma bro releases sketchy medical AI, data vs. model (a correction), and much more!

April 24, 2023

Welcome to the fourth issue of The Inference Times. We explore the consumerization of large language models in healthcare, discuss recent medical AI news, share how the researchers from a study I shared last edition schooled me on the evolution of model vs data quality, and more.

Like what you see? Please forward this to anyone you know who might enjoy it too!

Front page:

1) The (GPT) doctor is here: Why you may soon have better tools than your physician

Despite announcements covered in previous issues by Dragon and Epic, we’re years (decades?) away from FDA clearance and widespread rollout of large language models accessible to physicians in electronic health record systems (EHRs).

But patients will be able to request their medical record and pass it into language models to discuss diagnoses, identify treatments and more.

That leads to a striking irony: We soon may have superior tools to explore our own health than physicians will for their patients.

Some potential benefits:

In fast-moving fields like oncology, outcomes can be significantly worse outside cutting-edge research hospitals. Rural patients will be able to submit their health records to language models to get second opinions informed by the latest research.
Looking for a clinical research trial? Pass your health record into a language model with access to every clinical trial. Based on your medical record, the model provides candidate studies you could consider.
Startups like Abridge provide patient navigation tools - record your doctor visits to your phone so you can review and search them later.
A conversation with your actual medical record goes three steps further: a Spanish- or Tamil-speaker would be able to have a conversation in their own language about their diagnosis and treatment plan.

Consumerization describes the shift of products and service design from large organizations to individual consumers - it’s exactly these individual consumers who may see the greatest near-term benefit from large language models.

We’ve seen a few early announcements and projects in this space:

Ask your medical record questions. ScienceIO founder, Will Manidis, provided an LLM his medical record and has a conversation about previous visits and diagnoses.
HealthGPT is a project out of Stanford’s Biodesign lab, which allows you to discuss your Apple Health data on device. Their examples are limited to exercise and heart rate, which is odd because Apple’s support for retrieving Health Records gives this exciting potential to answer questions related to a user’s medical record. Congrats to Vishnu Ravi, Varun Shenoy and Paul Schmiedmayer on the project!

2) Pharma Bro jumps on GPT hype train

While we’re in the medical AI vein, Martin Shkreli (yes, that guy) just released Dr. Gupta which, despite the ‘Dr’ in its name, assures us “Dr. Gupta IS NOT a real physician… Dr. Gupta IS NOT intended for medical or clinical uses.“

But the good doctor seems to disagree:

In my testing, the results didn’t differ markedly from ChatGPT. I wouldn’t be surprised if they’re leveraging dubious means like ripping off UpToDate content (basically a super-powered webMD for doctors) but didn’t see much evidence of this — probably an item for their roadmap!

3) Why models are secondary to data: A correction from my last issue

My last description of multi-modal text+image models LLaVa vs miniGPT-4 wasn’t quite right. After chatting with the researchers behind LLaVA, here’s how they contrasted their work with miniGPT-4, and my original comparison (paraphrased for readability):

Data vs. model advancement: The focus of their line of work is data-centric and differences in the studies is not model-centric (this was my primary miss in the last issue).
“As the differences in models diminish, data quality has a greater impact on results” (this is similar to what I shared in the lede for my last newsletter when I announced the ‘end’ of model scaling).

LLaVa released the multi-modal instruction following data needed to partially replicate Multimodal GPT-4 because the high-quality data is all you need (the architecture is actually secondary).

Demonstrable results: LLaVA’s paper provides rigorous quantitative results, including degree of of similarity with Visual Chat and GPT-4, SoTA accuracy on Science QA, and studies removing components of data iteration and model design to identify relative contributions (ablation studies).
Result quality: LLaVa reproduces some results for visual reasoning examples found in the GPT-4 paper and also has strong OCR capabilities. These features are impressive and unique, making LLaVa possibly the closest to Multimodal GPT-4 in quality.

LLaVa vs mini-GPT4

Thank you Li Chunyuan & Liu Haotian!

4) Brain drain: Google Brain and DeepMind merge, become Google DeepMind

Earlier reporting indicated DeepMind wanted more independence, and certainly not a merger. But hey, at least they kept their name!

Tough to say if this helps Google close the gap on commercialization. On one hand, one organization in the company devoted to ML research probably makes more sense than two; on the other, merging organizations and cultures is always fraught.

A bit more analysis with and without paywall. And the announcement.

With higher interest rates, more frugal tech companies and more competition from OpenAI, I suspect we’ll have fewer random acts of papers and relatively more translational and productization work from industrial AI research labs.

5) ToolFormer meets AutoGPT

The important Toolformer paper was a breakthrough in demonstrating large language models’ ability to learn to use an API from just a few examples. In a similar vein, ChatGPT’s plugin beta allows the service to use tools via API, but requires they be listed in a manifest file provided to ChatGPT.

Now the frontier is to let the model figure out how to use an API from the documentation itself, no manifest or helpful examples. Daniel Gross (sold his startup to Apple and led AI & search there for a few years) has done exactly this, giving the model some API docs, a terminal and instructions then lets it figure things out on its own. Check out the GitHub project for code and examples.

6) StackOverflow to charge for access

In the wake of the noise Reddit made about charging for access to their data, we now see StackOverflow charging for access to their corpus. StackOverflow is also making noise about OpenAI violating their terms of service.

Judging by how ChatGPT has transformed my coding and ended my use of StackOverflow, I’m guessing they’re looking at massive declines in traffic. The content balkanization is coming…

Around the web

Great thread on OpenAI researcher John Shulman’s presentation on how to ensure language models convey accurate information. Probably best dive into the challenges of training a language model and how problems of hallucination arise and how OpenAI tries to minimize it.
Greg Brockman, OpenAI cofounder, presents on ChatGPT plugins and potential in a TED interview.

Good summary of autonomous agents - what are they, why are they cool and how will they evolve.
Pong-playing brain-in-a-tube, DishBrain, raises $10m. Organoids are simplified, lab-cultivated organs used extensively in research and pharma; this one can play pong when hooked up to hard silicon. Anyway, they’re raising $10m. Paper. Funding.
ChatGPT already serving as therapist for many people!
And changing how folks do journalism.
NY Times on the AI-generated Drake hit we reported on earlier. The track was pulled for copyright infringement; the article speculates on the future for AI-generated deepfake art.

🔧 Tool Time:

Best language model comparison tool
Vercel Labs has taken the mantle from nat.dev… if you want to compare output of all the major models in a friendly, UI, this is the way to do it.
AutoGPT without having to install python
AutoGPT now hosted on HuggingFace if you want to give autonomous agents a try!
Companies add AI interfaces
LLMs are dead-easy to bolt on, so we’ll see everyone do so. Atlassian announced Intelligence and Webflow is adding their own AI interface as well.
Impressive voice generation repository: Bark Infinity
An open-source alternative to ChatGPT plugins: Openpm.ai
Cohere publishes Wikipedia embeddings
This is very cool for folks building question answering and research tools. It’s prohibitively expensive to turn all Wikipedia pages into vector embeddings, but Cohere went and did it for us. Announcement. Hugging Face hosting link.

🧪 Research:

Yesterday’s news in autonomous agents was one AI agent working to complete a goal. Now the frontier is several AI agents working together to complete a goal. Paper and github repo.

ChatGPT product recommendations Suggests that ChatGPT performs well in generating product recommendations without any fine-tuning. Ironically, these Alibaba researchers used an Amazon dataset.
ChatGPT as search ranker Finds that ChatGPT outperforms traditional, supervised models for information retrieval tasks.
NVIDIA published impressive text-to-video research Still not production-quality but noteworthy for the resource efficiency of their approach.
Nerfbusters 3D diffusion cleans up NeRFs.

AI Wisdom

“We’ll use our extra special proprietary ML model to build a moat.”
— Bojan Tunguz (@tunguz)
6:30 PM • Apr 20, 2023