This Model Beats GPT and Gemini

In partnership with

Hey {{first_name | there}},

While everyone's watching tech giants fight over benchmarks, something else is happening:

The alternatives are winning.

Not in headlines. In actual capability. A Chinese model just outperformed GPT-4o at reasoning. Meta released speech recognition for languages OpenAI doesn't even touch. A vision model is reading documents & images better than anything Google has built.

And none of them are asking for your data.

So, let's talk about what's actually happening.

Thinking Models Are Getting Better

Kimi K2 Thinking, a model from Chinese startup Moonshot AI, was trained for $4.6 million and now outperforms models that cost 300+ times more to build. Not just competing. Outperforming.

Most language models predict the next word. They're sophisticated autocomplete engines. Kimi K2 actually reasons through problems step by step before responding.

The difference shows up in benchmarks. K2 beats several proprietary models at reasoning and coding tasks, the kinds of problems that require actual logic, not just pattern matching.

The team claims it competes with models like GPT-4o. That matters because K2 is designed for autonomous work: content workflows that need multiple steps, research synthesis that connects disparate sources, coding challenges that demand logical progression.

This isn't a demo. It's production-ready.

AI Can Finally Hear (Almost) Everyone

Meta recently dropped Omnilingual ASR, a speech recognition model that supports over 1,600 languages.

OpenAI's Whisper currently just handles 99 languages. That was already impressive. Meta's model supports 500 languages that have never been transcribed by AI before.

Think about what that means. There are roughly 7,000 languages spoken globally. Most AI tools work with maybe 1% of them. Omnilingual ASR brings that number closer to 25%.

Billions of people suddenly have access to speech technology in their native language. Not "coming soon." Right now.

The model runs on 7 billion parameters, large enough to handle complexity, small enough to be practical. This isn't research. This is infrastructure.

Vision AI Beating Tech Giants

Not just in speech, we’re also advancing in vision technology. Alibaba's Qwen3-VL is now the best OCR model available. It beats Gemini 2.5 Pro and GPT-4o. It beats everything on text recognition benchmarks.

But here's what makes it matter: it handles the messy stuff.

Blurred text. Tilted images. Complex layouts. Documents that would break other models work fine with Qwen3-VL.

If you're in legal tech, healthcare, logistics, or financial services—anywhere you process documents at scale—this changes what's possible. The gap between 95% accuracy and 99% accuracy is the difference between a useful tool and a reliable system.

You can finally automate tasks you've kept manual because OCR wasn't good enough. Invoice processing. Form digitization. Archival work. All of it.

The Content Authenticity Problem

Meanwhile LinkedIn introduced automatic tagging for AI-generated images using the C2PA standard.

Sounds promising until you realize anyone can bypass it by uploading a screenshot of a tagged image.

The intent is good, transparency about AI content helps maintain trust. But the execution shows how hard this problem actually is.

Still, it's progress. As more platforms adopt C2PA, the workarounds become harder to execute and easier to spot. The conversation alone matters.

The Pattern

From all of this, we can see that AI is moving past the "impressive demo" phase. These aren't research papers. They're production tools solving real problems.

Moonshot AI and Reflection AI show reasoning models are here. Meta's Omnilingual ASR shows speech recognition works for everyone now, not just English speakers. Qwen3-VL shows vision AI is reliable enough to trust.

The split happening in AI right now isn't just about capability. It's about access.

On one side: Companies keeping models locked down, charging for every API call, forcing you to send your data through their servers.

On the other side: Open-source models, local inference, transparent architectures. Your data stays yours.

- Aashish

Shoppers are adding to cart for the holidays

Over the next year, Roku predicts that 100% of the streaming audience will see ads. For growth marketers in 2026, CTV will remain an important “safe space” as AI creates widespread disruption in the search and social channels. Plus, easier access to self-serve CTV ad buying tools and targeting options will lead to a surge in locally-targeted streaming campaigns.

Read our guide to find out why growth marketers should make sure CTV is part of their 2026 media mix.

Learn more.