At 15c/million tokens, will GPT 4o Mini be the foundation of Agentic Workflows?

2 min readJul 18, 2024

Just 2 hours ago, OpenAI announced GPT 4o Mini. A mouthful, yes. But it’s pricing — mouthwatering. 33x cheaper than 4o, with a performance difference that won’t matter in most cases.

As the founder and CTO of an audio generation company, adorno.ai, it’s my job to understand AI’s latest promises and actual potential. I will try my best not to sound like an OpenAI promoter here. I actually dislike the company’s principles in general. But this price/performance ratio is bonkers.

Okay, so it’s cheap, but is it any good? The answer is, yes. Yes indeed.

Mini consistently beats other small LLMs (with a single shortcoming in MathVista) in reasoning tasks, math, coding and multimodal reasoning.
It often achieves scores close to regular 4o. See MATH, HumanEval and MMLU.

All of this while supposedly being close in size to LLaMa 8B. I doubt this actually means the model has only anywhere close to 8B parameters. Likely they use high-quality sparse parameters, quantization, pruning and that sexy new tokenizer from 4o.

So, with LLMs being cheaper than ever and guaranteed high-quality output, what are the implications for building an app in 2024?

I could list loads, but the most interesting one for me is agentic workflows. Agentic workflows don’t expect an LLM to provide you an answer straight away. Instead, generating an answer becomes a process with small steps that are hard for an LLM to mess up.

Say you want some code written:

The first “agent” writes the code
The agent is then asked to quality check and possibly improve or fix the code
Another agent writes a unit test
Yet another agent has access to a runtime environment and tests the generated code against the function
If the code fails, we head back to step 2, but now with an error log

There’s been a huge rise in papers on LLM-based agents in the two years, showing obvious benefits in output quality and complexity of the task that can be handled. There are 2 obvious problems:

Latency: because agents are talking to each other — usually sequentially — output generation will take longer.
Cost: much more tokens are being spent on feeding one output to another input. Over and over.

But these are exactly the things GPT 4o Mini concerns itself with. It’s the cheapest LLM available to date, but punches way above its weight class. I’m actually incorporating it right now into some preprocessing workflows we have at adorno.ai.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Chrisjan Wust

42 Followers

16 Following

Tech-optimist looking for a place where recent advancements change an industry. One day I’d like to create such a place myself. For now, I’m an ML Engineer.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Chrisjan Wust

The Apps You’ll Actually be Using in China

Chrisjan Wust

The Apps You’ll Actually be Using in China

Before leaving for China, I found many articles promising to list the “Top 10 Must-Have Apps for China”. While these lists certainly lead…

Apr 27, 2019

Machine Learning: How should I attempt to start?

Chrisjan Wust

Machine Learning: How should I attempt to start?

Not a resource list — and for good reason.

Aug 11, 2019

The Microsoft Driver that Replaced My $90 Mouse

Chrisjan Wust

The Microsoft Driver that Replaced My $90 Mouse

A must-read if you haven’t heard of Windows Gestures yet

Apr 27, 2019

See all from Chrisjan Wust

Recommended from Medium

Build Smarter AI Agents in Minutes — For Less Than $0!

Mr. Plan ₿ Publication

Ashen Thilakarathna

Build Smarter AI Agents in Minutes — For Less Than $0!

Discover the Secret Tool (MCP) That’s Revolutionizing AI Development — No Coding Expertise Needed!

5d ago

The Model Context Protocol (MCP) — A Complete Tutorial

Dr. Nimrita Koul

The Model Context Protocol (MCP) — A Complete Tutorial

Anthropic released the Model Context Protocol(MCP) in Nov. 2024.

Mar 27

LangGraph + MCP + Ollama: The Key To Powerful Agentic AI

Data Science Collective

Gao Dalie (高達烈)

LangGraph + MCP + Ollama: The Key To Powerful Agentic AI

In this story, I have a super quick tutorial showing you how to create a multi-agent chatbot using LangGraph, MCP, and Ollama to build a…

Mar 28

MCP Experiment — 1 : Using MCP SqlLite Server with Cline and Azure OpenAI

Indira KriGan

MCP Experiment — 1 : Using MCP SqlLite Server with Cline and Azure OpenAI

When every other post on LinkedIn spoke about MCP, I had to try it — and it has indeed been satisfying. If you have come across these…

Mar 21

Part 2: Chatting with My Company’s Brain: How I Built My AI Knowledge Buddy

Artificial Intelligence in Plain English

Krzysztof Jamroz

Part 2: Chatting with My Company’s Brain: How I Built My AI Knowledge Buddy

No more digging through Confluence and Slack — my AI assistant just tells me what I need to know

5d ago

Top 10 Open-Source AI Projects That Blew My Mind as a Developer

Let’s Code Future

Let's Code Future

Top 10 Open-Source AI Projects That Blew My Mind as a Developer

You need to checkout

Mar 29

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech