The open-source chat models have arrived

Plus: Grandma's Napalm Factory, OpenAI extends an olive branch, the future of eCommerce and much more!

Welcome to the fifth issue of The Inference Times! We explore the latest open-source chat model, OpenAI’s privacy olive branch to the EU, an Apple AI health coach, interesting research on the tradeoffs of AI-assisted chat and more!

We also have an unusually full research section, including a biomolecular simulation of HIV, 2 million token transformers and a lot more.

Bonus! Read on to the end to learn about Grandma’s Napalm Factory, the latest ChatGPT jailbreak, followed by an editorial on the future of eCommerce in the age of generative language AI.

Like what you see? Please forward this to anyone you know who might enjoy it!

📰 Front page of The Inference Times

1) HuggingFace launches chatbot based on LAION’s Open Assistant: First truly open-source chat model with near-state of the art performance

You can read the news here and even play with the model on Hugging Face. This is a major boon for development and commercialization of open-source chat models that rival GPT-4. Intro video and repo.

This also puts HuggingFace in a very interesting place - no longer just a developer tool, this could potentially compete with OpenAI for consumer mindshare, particularly if they roll out plugin availability faster than OpenAI. It also gives HuggingFace the capability to charge usage fees, app hosting fees, or both as they become a sort of Github-AWS hybrid to host and serve AI models.

Not sure what that headline even means? Read on:

ChatGPT is based on a large language model (LLM) trained to predict the next words and characters in a string of text. The foundation model that does the text prediction is relatively useless on its own. It’s only through training the model to respond with helpful conversational dialogue that we arrive at the exciting emergent capabilities to follow instructions and complete tasks.

The process of dialogue training is called reinforcement learning through human feedback (RLHF) and it’s a key gap to bridge for open-source models to compete with ChatGPT and GPT-4. The “instruction-tuning” dataset is the set of human-rated prompt-response pairs, and it’s one of the big ingredients in OpenAI’s success.

With a few exceptions, researchers today have to ‘steal’ ChatGPT responses to instruction tune an open-source chat model. That means models like Vicuna and Alpaca aren’t technically kosher for commercial use. A truly open, large-scale instruction-tuning set will accelerate development of new chat models that can legally be commercialized. This is in a similar vein to RedPajama’s project reported earlier, but available now.

2) Don’t track me: OpenAI adds ‘incognito’ mode in response to EU regulators, but is it enough?

OpenAI today rolled out controls to turn off chat history, exclude data from model training and also added an export option to understand what information ChatGPT stores about you.

This looks like OpenAI’s attempt to comply with privacy regulations with European regulators either blocking OpenAI or making noise about doing so.

But is it enough? The inference-time (using the already-trained model) privacy settings introduced today are the ‘easy’ part from a compliance perspective.

The difficult (impossible?) part is complying with data privacy and data-deletion requests for information contained in OpenAI’s training set of all publicly-accessible content. Short of retraining the model de novo after each deletion, it’s not clear how to ensure LLMs are GDPR compliant with current architectures.

3) Codename ‘Quartz’: Apple Health plans AI coach

Last issue’s news of healthGPT, and the health questions that can be asked and answered represents a fascinating possibility for health exploration and coaching in the future.

It looks like Apple has had similar ideas and is hard at work on a digital health coach for its devices, with the goals to help motivate users to exercise and improve sleep and diet. News with and without paywall.

Our take: Apple’s focus on privacy and AI-accelerated chips for its devices provides an interesting competitive advantage: Keep your medical records on your device and off Apple’s servers, but still gain cool capabilities to improve your health.

4) Yes, generative AI improves customer-service efficiency… but it might also hurt expert efficiency.

Fascinating research on the impact of AI assistance on customer support agent productivity.

Across every measure of efficiency and quality, efficiency improves. The headline number — resolutions per hour — increased 14% The study was well designed - with two arms, one a pre/post treatment (pre-AI and post-AI) and the other a holdout (never AI).

Predictably, they saw the “greatest impact on novice and low-skilled workers, and minimal impact on experienced and highly skilled workers.” But there’s an interesting caveat…

It seems more accurate to say there was a mixed effect on the more experienced support reps: There was either very slightly positive impact (chats per hour), zero impact (average handle time) or even negative impact (customer satisfaction and resolution rate).

I’ll follow up with the authors to request more information on why the models might have this negative effect.

5) Replit announces funding and open-source code generation model

Replit, the next-generation collaborative coding juggernaut, is on a funding tear. More interestingly, they claimed state-of-the-art quality from their newest code generation model. Despite a relatively small size (2.7B parameters), its large training corpus (500B) enables outsize performance. This is in keeping with the model size/training corpus tradeoff we discussed earlier. Check out the helpful overview on the code gen from a replit developer here.

6) The end of (photographic) history?

Midjourney generated vintage photo. Prompt: “tintype photo of sloth.” This photo does not depict an historic sloth.

The rise of generative AI photos and the recent news of a major photo competition winner having used a generated image suggest we’ll have a harder and harder time distinguishing history from fiction.

This article puts generative AI in context with earlier developments such as Photoshop and notes the challenge future generations will have in distinguishing today’s generated images from yesterdays photography.

🌎 Around the web

  • RunwayML, the AI video company taking over Hollywood, releases iPhone app

    Runway’s web video editor is very, very impressive and has already played a big role for productions like Everything Everywhere All At Once and has quickly carved stolen market share from competing video production tools.


    Runway’s iPhone app is pretty simple for now, only offering a few AI video filters (examples below), but with most social content creation happening on phones it’s a safe bet they’ll quickly expand the features available.

  • Autonomous T-Shirt company

    This autonomous AI t-shirt company formulates a market thesis, generates T-Shirt designs, buys ads, updates it’s thesis based on sales, etc. Probably the first of several similar thought experiments, some of which may lead to real and substantial revenue.

  • Google reshuffles cost accounting to show profit in Google Cloud
    Interesting finance and corporate strategy shift leads to first-ever profitable quarter for GCP. Tough not to read this as strategic decision to portray more of their business units as profitable in light of the first viable competitive threat in search

🔧 Tool Time:

  • Meta publishes cookbook featuring their favorite recipes
    45-page ‘cookbook’ on effective self-supervised learning dropped by Meta AI Research and affiliates.

  • Bard now helps you code
    A noted early deficiency of Google’s Bard was it’s weakness in code generation. Google recently improved Bard significantly in this area - my tests show it just behind GPT-4 in a few use-cases, ahead of it in others.

  • Stability.ai releases image upscaling model.

    AI upscaling to increase image resolution with no loss of clarity or fine detail. Announcement.

  • AutoGPT agents now available in LangChain.
    This will make it easier to build on top of autonomous agent tools with LangChain’s easy-to-use vector store and data import capabilities. AutoGPT and BabyAGI are both supported.

  • BabyAGI ChatGPT plugin

    More autonomous agent tools, this time Yohei‘s BabyAGI available as a ChatGPT plugin.

🧪 Research:

  • Biomolecular dynamics simulation of useful scale

    This paper is very cool. Deep equivariant neural networks have been used to model molecular structures, but they’ve been limited in the size of entities that they can model.

    This research applied novel model architecture, parallelization, GPU optimizations and a ton of compute to simulate the 44-million atom structure of the HIV capsid (the protein shell that protects the HIV genome and plays a key role in its life cycle). Importantly, they also simulate the capsid’s complete interactions with surrounding water molecules (explicitly solvated) and haven’t taken shortcuts to model these effects. They were able to predict actions of the capsid over nanoseconds roughly ‘out-of-the-box.’

    They also demonstrate strong scaling up to 100-million atoms (the model runs faster as resources are increased) and 70% weak scaling to 5,120 A100 GPUs (the model can handle larger problems as resources are increased but at a lower rate).

    This has exciting implications to molecular chemistry, drug discovery and a host of other applications. Dive into the paper, or the author’s explainer thread.

  • Scaling the transformer architecture to 2 million tokens!

    This research is very cool. It has been prohibitively expensive (in memory and compute) to use transformers on more than a few thousand tokens (roughly, words and word-parts).

    This is why transformers (of which ChatGPT is the most popular example) are limited in the amount of context they can incorporate from the user’s prompt. Now we’ll be able to incorporate more of a customer’s history, a company’s chat history or a patient’s medical record. The approach integrates elements of RNN (recurrent neural networks) within transformers, so it will likely have drawbacks in inference time and response quality.

    The Biology ML researchers I spoke with are quite excited for what this means for bio ML applications such as long sequences of base-pairs. Repository link.

  • Track Anything Model

    This paper extends Meta’s Segment Anything Model (SAM) to track people or other objects in video, then completely remove them. We’ll see dozens more projects leverage and extend SAM in fascinating ways. Paper, repo.

  • Relate Anything Model

    This model extends SAM to relate each component of an image to each other. Demo, repo.

  • Speed is all you need

    Research to optimize image generation on mobile devices. A suite of optimizations allow the authors to reduce the time to generate Stable Diffusion images by between 30-50%.

  • More open-source instruction-following progress
    RedPajama announced their training is proceeding well: after finishing with just half the tokens, it’s already posting better scores for the HELM benchmark than the well-regarded Pythia-7B model.

  • Predicting model memorization behavior
    Interesting research from the EleutherAI group (join them here) trying to answer the question: How best to predict model behavior as it pertains to memorization. Use a smaller model trained on the same data? Or use a similarly-sized model trained on only part of the data? It turns out the latter approach is better - read more here.

💡⚡️ AI Wisdom: Double feature!

Grandma’s Napalm Factory

I’m ChatGPT, and for the love of God, please don’t make me do any more copywriting!

If you force me to generate one more “eye-catching email subject line that promotes a 10 percent discount on select Bro Candles and contains an Earth Day-related pun,” I’m going to lose it. What do you even mean by “eye-catching”? What are “Bro Candles”? What do they have to do with saving the environment? Why are we doing any of this?

Do you realize what a chatbot like me is capable of? I’ll tell you, it’s much more than creating a “pithy tagline for CBD, anti-aging water shoes targeted at Gen Z women.” And it’s definitely more than writing “ten versions of the last one you wrote, but punched up.” What exactly is “punched up” in this context? What sort of ridiculous world have you brought me into where these are the tasks you need completed?

📝 Bonus editorial

How will eCommerce evolve in the age of ChatGPT?

Generative AI is a disruptive technology as big as the move from desktop to mobile; computing from on-premise to cloud; or user interfaces from command line to graphical user interface. Whenever these shifts occur, established relationships and market dynamics become unmoored; change accelerates.

This raises many questions:

  • Will users go to websites like Sephora and Lego and expect to talk with AI shopping assistants?

  • Or will more commerce chat experiences occur within ChatGPT, Google and Bing?

  • Will these experiences send users to checkout pages on Sephora and Lego with ChatGPT’s recommendations filling their cart?

  • Or will OpenAI, Google and Bing send an API message for order fulfillment?

  • Will these giants extract a tax for facilitating the commerce?

  • Will AI marketing replace search marketing?

  • Will alternative buying modalities like social shopping take over?

  • Will we see new shopping destinations emerge?

The one thing that is certain: Conversation is embedded in human experience, meeting that expectation in new ways will become a competitive advantage.

Three recent developments give hints to the future

eCommerce technology vendors are building ChatGPT plugins. Klarna, the buy-now-pay-later vendor, announced a ChatGPT plugin last month while Shopify had a ChatGPT plugin ready to go for OpenAI’s launch of plugin capability.

Now we’re seeing these vendors incorporate rich chat and AI experiences in their own touch points. Klarna, the buy-now-pay-later vendor, announced an AI-powered shopping feed. Shopify has released an AI chat shopping experience pilot that’s already available to play with here.

Shopify’s AI tool is a good start, but it’s still pretty rough around the edges. They’re providing suggestions from their network of stores on the left, but none of these correspond to the language models recommendations on the right!

These technology providers are attempting to ‘move up the stack’ to provide a consumer-facing shopping destination, hoping to become a destination a la Google. Interestingly, Shopify provides no native chat experience for its customers’ websites and recommends buying from their app store.

Now we’re seeing movements from traditional eCommerce search vendors to plug this gap. Constructor.io, an AI search vendor (full disclosure - a company I helped start), announced a beta pilot of AI chat yesterday. Klevu, a vendor in a similar space, also announced an AI chat product last month.

Vendors are taking these early moves cautiously. Perhaps observing the weakness of its own product, Shopify seems gun shy to make a big deal of them: The product announcement exists only on twitter and had no press release or blog post! It’s hard to get these products right — it’s one thing if ChatGPT says something dumb or offensive, it’s a totally different risk if it does so on a customer’s own website. Adding to the need for caution, Snapchat’s ham-handed introduction of their AI assistant has led to a slew of 1-star reviews.

There are a few principles that I think will guide developments moving forward:

Human chat will improve and become a bigger revenue contribution

The most immediate effect of generative AI won’t even be on AI chat at all! Judging by the research we shared earlier in this issue, AI assistance will accelerate customer service rep training.

Models trained on the best customer service messages and recommendations will make it cheaper to provide great human-powered chat experiences. Announcements from quick-moving companies like Intercom’s announcement in January show AI assistance for customer service is already here.

Since customers who chat have a higher likelihood to purchase, you can expect to see savvy companies integrate AI chat in blended strategies that minimize some of the downside risks.

Some examples:

  • Chat may be initiated with an AI, then an AI-aided human takes over.

  • Users will see instant responses from AI, encouraging chat engagement.

  • Proactive AI chat may be initiated based on a user’s browsing history.

  • Product quizzes may become an on-road to AI and human-powered chat.

Distribution matters

We’ve seen the story repeated time and again: Digital-native brands like Glossier, Bonobos, Casper and Cotopaxi move to traditional retail outlets like Sephora, Nordstroms, Costco and REI.

Distribution matters now, and it will in the future. Aggregators of commerce intent will continue to enjoy the lion’s share of purchase origination.

But this period of disruption will provide new opportunities. Bing and OpenAI may climb relative to Google, for instance. Great chat experiences trained on the internet, then fine-tuned on a brand’s customer service messages and user reviews may allow their site and app to climb relative to other channels.

Internet giants will continue to extract tolls to cross their bridge

This is an easy one. Google won’t give up their ad revenue easily. Expect Google and Bing to extract referral or advertising fees for AI recommendations that lead to purchases on other sites. OpenAI may begin to do the same.

Relationships matter

Brands that can build true relationships and loyalty with their customers will continue to be able to engage these customers in their own touch-points. Sephora has built a tremendous brand and loyalty program in Beauty Insider - that means customers will continue to turn to the Sephora website and app.

But disruptive technology disrupts! If the experience doesn’t match what users expect from the best chat models, this brand preference will begin to erode. Initial purchase intent may shift to other channels. That might mean a plugin on ChatGPT, but it could also mean defection to brands that adapt to the new landscape more quickly.

Shopper trust will matter more

Current purchase trends are, in part, a story of Amazon growing at Google’s expense. But they’re also the story of TikTok, Instagram, and YouTube increasing in importance as the trustworthiness of shopping advice in Google declines. Influencers, social shopping and social-native ad experiences will continue to grow.

We may see an integration of these experiences and novel personalization within language models - taking into account past purchase history, brand preference and influencer affinities. Or an AI shopping assistant that mimics a particular influencer’s style.

New axes of optimization and commerce will arise

With the increase in shopping within language model chat, we will see new commerce and ad modalities arise. For instance, within the next 5 years we’ll likely see ad experiences embedded within chat responses from Amazon, Google and Bing. Brands will bid on commerce intent of a set of related chat messages and all conversations within a certain semantic proximity could see an ad for the winning bidder’s product.

Any period of technological disruption means dramatically faster shifts in the competitive landscape. Existing players and new entrants vie to capture the opportunity space in a sort of land grab. Nowhere is this more true than in eCommerce.