EU still doesn't like what it sees

Plus: An exciting course, a flawed study and an alternate movie ending

Welcome to the sixth issue of The Inference Times! We explore an exciting (free) AI course, check in on the latest salvo in EU AI regulations, identify some gaps in a medical AI paper and much more.

Like what you see? Please forward this to anyone you know who might enjoy it!

📰 The Inference Times: Front Page

1) The one prompt-engineering guide you might actually want to use

Legendary AI pioneer, Andrew Ng just dropped a short course on prompt engineering and it’s self-recommending.

Prompt engineering may be a short-term phenomenon until model responses are excellent out-of-the-box, but for now it represents the missing link between getting expert-level responses from today’s large language models. Oh, and it’s free.

2) EU: Open up, OpenAI

As we predicted in our last issue, incognito mode in ChatGPT did not assuage European regulators. Draft legislation in Europe would require AI tools to disclose any copyrighted materials used in their training.

While the legislation wouldn’t require OpenAI to compensate rights-holders, it would make it easier for them to bring suit. One legislator makes this point explicitly: “This opens the door for right holders.”

Complying with data deletion requests will be challenging with the current incarnation of Large Language Models. These challenges will include private and copyright deletion requests.

You can read the news on WSJ or avoid the paywall here.

3) GPT-4: Better doctor, worse confabulator… or is it?

A recent study out of Stanford analyzes GPT-4 vs. GPT-3.5 on real-world medical questions.

The headline numbers look great! Compared to GPT-3.5, more physicians agreed with GPT-4 (12%→20%), fewer disagreed (30%→23%) and fewer were unable to assess the response (14%→5%).

The chart above shows physician agreement with AI responses to a variety of questions posed to GPT-4 and GPT-3.5.

Interestingly (and contrary to other research) the study found GPT-4 much more likely to hallucinate than GPT-3.5, going from a rate of 3% to 14%!

Unfortunately, the study has a couple major flaws.:

  1. The prompts used for GPT-3.5 and GPT-4 were entirely different, which makes the comparisons questionable.

  2. There was no application of prompt engineering or best practices to elicit better responses from either model.

In fact, the study didn’t even ask GPT-4 to act as a doctor, as they did for GPT-3.5, merely asking it act “as an assistant with medical expertise.”

4) 🧠 AI is all you need

It’s breathtaking to behold how quickly Microsoft is incorporating generative AI across their company. Satya Nadella wants to make sure Wall Street knows it: he said “AI” over 50 times in their last earnings release.

In many ways, generative AI is as disruptive as the shift to mobile, cloud and graphical user interfaces, but there’s one big way it isn’t: Some incumbents are incorporating it at breakneck speed.

In related news, despite sinking billions of dollars into the metaverse (and renaming the company to reflect the goal!), Meta is now also the company that will deliver AI agents to the world.

It’s safe to say the growth in mentions of AI in earnings reports will exceed growth in earnings from AI for the foreseeable future.

5) Vector database Pinecone announces $100M funding

Vector databases are red-hot, with Pinecone announcing a $100M funding round at a $750M valuation (post-money).

As a16z notes, vector databases are the storage layer for LLMs, but is there a reason to believe vector-first databases will disrupt search incumbents with vector search features? Or that any particular one of these databases will see outsize returns?

It’ll be interesting to see how this plays out - the search market has relatively few large exits, embedding search isn’t tremendously technically challenging and there are numerous players in the space.

6) 📸💬 DeepFloydAI: State-of-the-art text-to-image model

Stability.ai just keeps releasing. Their latest model is a step up from Stable Diffusion XD released last week in the following ways: better photorealism 📸, text in photos 💬 and alternate aspect ratios 🎞️.

This model departs from the latent diffusion approach used by Stable Diffusion by working directly in pixel space, rather than a latent representation. That usually requires more memory and compute, so expect this model to be more costly.

The release also differs by integrating several separate models in one for different tasks (upscaling, image generation, in-painting, etc).

🌎 Around the web

  • AutoRPG auto-generated game design

    Using an autonomous AI (BabyAGI) to generate an entire game level - video.

  • Good news! We won’t have to worry about AI killing us because we’ll just use AI to kill each other

    Palantir demoes AI to wage war more efficiently, but they’ll do it in ethical ways? News here.

🔧 Tool Time:

  • How to build AI models that don’t lie

    Great overview on the challenges of building AI models that don’t hallucinate from John Schulman of OpenAI. Highly recommended for the in-depth tips from a practitioner.

  • 🦙 Lamini: train your LLM as easily as tailoring a prompt

    The project aims to simplify fine-tuning and reinforcement learning to create your own large language models. The tool includes a few components: data-augmentation engine to turn several dozen examples into 50k instruction-following examples, abstracts GPU hosting and more.

  • What you wish Siri would do: Amazing demo of GPT-integrated on a phone
    The demo shows a few different options for incorporating large language models on the iPhone via shortcuts, Siri or OpenAI’s whisper transcription. The tool pulls your calendar and other relevant personal data into the prompt sent to OpenAI which which unlocks the ability to ask detailed questions about the weather, alternative activities and more.

  • Microsoft Designer generative design tool released
    Previously waitlisted, Microsoft Designer is now available publicly.

🧪 Research:

  • Medical AI answers to patient questions outperform doctors
    If there’s one thing doctor’s could use help with, it’s answering patient questions. Good news: A recent study found in a blinded evaluation that doctors evaluated AI answers to patients to be significantly better in quality and empathy.

  • REMEDIS: Robust and Efficient Medical Imaging with Self-supervision
    This research from Google involved remixing existing architectures to achieve impressive results in medical imaging tasks. The model uses a mix of supervised and self-supervised learning, significantly reducing the need for human-labeled data and saving thousands of clinician annotation hours.

    Importantly, it performs well on both in-distribution and out-of-distribution data which means it can handle unseen data and is less sensitive to common issues like resolution differences, noise patterns, or operator technique variations that foil some medical imaging AI (including several with FDA clearance!).

    The model also shows versatility in a wide range of medical imaging tasks, from dermatology and chest X-rays to diabetic eye conditions and cancer metastases detection.

    The study didn’t appear to pick weak baselines (a common trick in academic research), but used robust baselines of ImageNet-1K and JFT-300M pre-trained models fine-tuned for medical imaging.

    Takeaway: Overall, the study suggests foundational image models may outperform narrower, domain-specific models in medical AI — similar to how we witness GPT-4 general text models outperforming narrower, domain-specific models, even in fields like search ranking they were never explicitly trained for.

  • Dancing Queen: AI-generated dance choreography

    Fun research analyzing music and creating realistic dance moves. Research, video.

💡⚡️ AI Wisdom