Breaking Down Token-Based Pricing for Generative AI, Large Language Models (LLMs)

In episode 119 of the AI/Hyperautomation Minute, Toni Witt addresses confusion around token-based pricing and base-level model providers.

This episode is sponsored by “Selling to the New Executive Buying Committee,” an Acceleration Economy Course designed to help vendors, partners, and buyers understand the shifting sands of how mid-market and enterprise CXOs are making purchase decisions to modernize technology.

Highlights

00:33 — There has been some confusion in regard to the token-based pricing schema of language models and generative AI models. Toni evaluated providers based on compute and token costs in his recent analysis.

00:55 — Base-level models, large language models (LLMs) and generative AI models like GPT-4 or DALL-E 2 are priced by computational consumption. Toni notes, “The biggest difference, however, is the unit of measurement.”

01:12 — Language models are priced by the token, which is the basic unit of text or code that the LLM used to process language. How the actual token looks is dependent on your tokenization scheme, which is “a fancy algorithm that turns your natural language . . . into these tokens.”

Which companies are the most important vendors in AI and hyperautomation? Check out the Acceleration Economy AI/Hyperautomation Top 10 Shortlist.

01:45 — One thousand tokens equate to around 750 words in English. Most LLM providers will charge by the token count of the prompt in addition to the completion, or the output. So, the total cost of using the LLM will depend on how long the prompt is and how long the output is.

02:08 — For example, Anthropic has two models: Claude Instant and Claude-v1. Because Claude-v1 is a higher-performance model, Anthropic charges more for that model than Claude Instant.

02:43 — There’s a dual pricing factor when it comes to the prompt and the completion “because computation is required to turn your natural language into the vector format of the tokens that the model can actually read,” Toni explains. “Your build is always going to include these two costs.”

03:08 — The price difference between models, especially language models, is significant. It’s important to spend time with the models in a testing environment to ensure you’re making the right decision. “If you choose one model up in terms of performance, that can easily be 10x your cost.”

The Ethical & Workforce Impacts of Generative AI_featured — Guidebook: The Ethical & Workforce Impacts of Generative AI

03:36 — Context is the maximum length that your prompt can be in terms of the number of tokens. Toni uses the two versions of GPT-4 to demonstrate this.

04:18 — The main way to minimize cost is to spend time selecting the right model, determining the lowest performance acceptable for your use case. Then, you can use cost management tools, such as token tracking software, and consolidate your prompt lengths.

04:53 — It’s important to find the right provider for your business. Anthropic emphasized AI safety research and responsible AI. Cohere partnered with Oracle to drive enterprise-grade security flexibility. OpenAI has top-line models, like GPT-4, but it’s not as concerned with data privacy.

For more insights, visit the ai ecosystem channel