The newsletter for the technically curious. Updates, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.
Hey folks,
I was up late last night in SF at the meet-up I hosted, and it was so awesome to meet and speak to a bunch of you. Iām saving my full SF thoughts for a post next week.
In the meantime, Google released a preview of its first computer-use model [ link ] based on Gemini 2.5, in partnership with Browserbase [ link ]. Itās a good modelāit scores decently better than Sonnet 4.5 and much better than OpenAIās computer use model on benchmarks [ link ].
But benchmarks and evaluations can be misleading, especially if you only go by the official announcement posts. This one is a good example to dig into:
This is a model optimised for browser usage, so itās not surprising that it does better than the base version of Sonnet 4.5
OpenAIās computer use model used in this comparison is 7 months oldāa version based on 4o. (side note: I had high expectations for a new computer use model at Dev Day)
The product experience of the model matters. ChatGPT Agent, even with a worse model, feels better because itās a good product combining a computer-using model, a browser and a terminal.
I donāt mean to say that companies do it out of malice. Finding the latest scores and implementation of a benchmark is hard, and you donāt want to be too nuanced in a marketing post about your launch. But we, as users, need to understand the model cycle and the taste of the dessert being sold to us.
Even with all these factors, the new Gemini model definitely passes the smoke test.
The smoke test is just one way we decide what makes it into every newsletter post and what doesnāt. Shanice wrote about it in detail in this post, A day in the life of Benās Bites [ link wrote about it in detail and how it has evolved in the last three years ].
With Retool [ link ], you can turn prompts into full-stack internal toolsāconnected to your data, hosted in your cloud, and secured by your rules. Build easier, deploy safely, and move from idea to production in minutes.*
š What Iām consuming
Taste is your moat [ link ] with Dylan Field, Figma.
Agents of Scale [ link ] - new podcast by my old boss at Zapier.
A cartoonistās review of AI art [ link ] by The Oatmeal.
Vibe engineering [ link ] and embracing the parallel coding agent lifestyle [ link ].
Sora, AI Bicycles, and Meta Disruption [ link ].
Two things LLM coding agents are still bad at [ link ]: copy-pasting code and asking questions. This is actually a good critiqueāespecially the first one. I wonder if you could create an app/service for refactoring codebases by teaching an open-source model (like Kimi K2 or GLM 4.6) to use ācut/copy/pasteā as tools.
The State of AI report 2025 [ link ] ā this one will take me a few days to go through.
āļø Tools and demos
Scout Monitoringās MCP [ link ] [ link ]- AI-native monitoring. It feeds performance issues and slow endpoints directly into your AI coding assistant.*
Google AI Studio [ link ] now lets you use your voice as an input for vibe coding.
ElevenLabs [ link ] launched Agent Workflows and an open source UI library [ link ] for building voice agents.
Grok Imagine [ link ] now uses xAIās Imagine 0.9 model with audio generation.
Opal [ link ], Googleās experimental product for chaining AI steps together with a visual builder, is now open globally (with MCP support coming soon).
š„£ Dev dish
Playwright [ link ], the browser automation library, has agents now. One to plan tests, another to generate them and a healer to debug and fix failing tests.
Recall [ link ] - Redis-powered persistent memory for Claude (usable as an MCP server).
sora-mcp [ link ] - An MCP server to use Sora video generation APIs.
You can use any open-source model in Factory AIās Droid [ link ].
Repobench [ link ] - Ranking models for large context reasoning, file editing precision, and instruction adherence for coding tasks. I met Eric yesterday and chatted about how he built this [ link ].
š¦ Afters
Agentic Memory & Context Engineering Hackathon, Oct. 11 [ link ] ā Push the boundaries of what you can build with MongoDBās Atlas Vector Search and Voyage AI.*
Both OpenAI [ link ] and Google [ link ] have expanded their $5/mo AI plan to multiple countries.
n8n has raised $180M [ link ] at a $2.5B valuation, and Cursor is planning [ link ] another round at $30B valuation.
Enjoy this newsletter? Forward it to a friend.
Thatās it for today. Feel free to comment and share your thoughts. š
Find me on X [ link ], Linkedin [ link ], or Instagram [ link ]
Read about me [ link ] and benās bites
š· thumbnail creds: @keshavatearth [ link ],
* marks sponsors that make this newsletter possible :)
Wanna partner with us [ link ]? Last few slots left for the rest of the year.
Unsubscribe link