The newsletter for the technically curious. Updates, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.

Hey folks,

I was up late last night in SF at the meet-up I hosted, and it was so awesome to meet and speak to a bunch of you. I’m saving my full SF thoughts for a post next week.

In the meantime, Google released a preview of its first computer-use model [ link ] based on Gemini 2.5, in partnership with Browserbase [ link ]. It’s a good model—it scores decently better than Sonnet 4.5 and much better than OpenAI’s computer use model on benchmarks [ link ].

But benchmarks and evaluations can be misleading, especially if you only go by the official announcement posts. This one is a good example to dig into:

This is a model optimised for browser usage, so it’s not surprising that it does better than the base version of Sonnet 4.5

OpenAI’s computer use model used in this comparison is 7 months old—a version based on 4o. (side note: I had high expectations for a new computer use model at Dev Day)

The product experience of the model matters. ChatGPT Agent, even with a worse model, feels better because it’s a good product combining a computer-using model, a browser and a terminal.

I don’t mean to say that companies do it out of malice. Finding the latest scores and implementation of a benchmark is hard, and you don’t want to be too nuanced in a marketing post about your launch. But we, as users, need to understand the model cycle and the taste of the dessert being sold to us.

Even with all these factors, the new Gemini model definitely passes the smoke test.

The smoke test is just one way we decide what makes it into every newsletter post and what doesn’t. Shanice wrote about it in detail in this post, A day in the life of Ben’s Bites [ link wrote about it in detail and how it has evolved in the last three years ].

With Retool [ link ], you can turn prompts into full-stack internal tools—connected to your data, hosted in your cloud, and secured by your rules. Build easier, deploy safely, and move from idea to production in minutes.*

🌐 What I’m consuming

Taste is your moat [ link ] with Dylan Field, Figma.

Agents of Scale [ link ] - new podcast by my old boss at Zapier.

A cartoonist’s review of AI art [ link ] by The Oatmeal.

Vibe engineering [ link ] and embracing the parallel coding agent lifestyle [ link ].

Sora, AI Bicycles, and Meta Disruption [ link ].

Two things LLM coding agents are still bad at [ link ]: copy-pasting code and asking questions. This is actually a good critique—especially the first one. I wonder if you could create an app/service for refactoring codebases by teaching an open-source model (like Kimi K2 or GLM 4.6) to use “cut/copy/paste” as tools.

The State of AI report 2025 [ link ] — this one will take me a few days to go through.

⚙️ Tools and demos

Scout Monitoring’s MCP [ link ] [ link ]- AI-native monitoring. It feeds performance issues and slow endpoints directly into your AI coding assistant.*

Google AI Studio [ link ] now lets you use your voice as an input for vibe coding.

ElevenLabs [ link ] launched Agent Workflows and an open source UI library [ link ] for building voice agents.

Grok Imagine [ link ] now uses xAI’s Imagine 0.9 model with audio generation.

Opal [ link ], Google’s experimental product for chaining AI steps together with a visual builder, is now open globally (with MCP support coming soon).

🥣 Dev dish

Playwright [ link ], the browser automation library, has agents now. One to plan tests, another to generate them and a healer to debug and fix failing tests.

Recall [ link ] - Redis-powered persistent memory for Claude (usable as an MCP server).

sora-mcp [ link ] - An MCP server to use Sora video generation APIs.

You can use any open-source model in Factory AI’s Droid [ link ].

Repobench [ link ] - Ranking models for large context reasoning, file editing precision, and instruction adherence for coding tasks. I met Eric yesterday and chatted about how he built this [ link ].

🍦 Afters

Agentic Memory & Context Engineering Hackathon, Oct. 11 [ link ] — Push the boundaries of what you can build with MongoDB’s Atlas Vector Search and Voyage AI.*

Both OpenAI [ link ] and Google [ link ] have expanded their $5/mo AI plan to multiple countries.

n8n has raised $180M [ link ] at a $2.5B valuation, and Cursor is planning [ link ] another round at $30B valuation.

Enjoy this newsletter? Forward it to a friend.

That’s it for today. Feel free to comment and share your thoughts. 👋

Find me on X [ link ], Linkedin [ link ], or Instagram [ link ]

Read about me [ link ] and ben’s bites

📷 thumbnail creds: @keshavatearth [ link ],

* marks sponsors that make this newsletter possible :)

Wanna partner with us [ link ]? Last few slots left for the rest of the year.

Unsubscribe link

How to see through AI marketing

Want to read more from Ben's Bites?