The newsletter for the technically curious. Updates, tool reviews, and lay of the land from an exited founder turned investor and forever tinkerer.
Hey folks,
Claude 4.5 Sonnet is out [ link ] - the best model on programming benchmarks. Itâs better than Opus 4.1 across the board and better than GPT-5-codex in most cases. But in comparison to other models, it doesnât give you the full picture here. Sonnet 4.5 is in its own league for agentic coding, computer use, and long-running tasks. Itâs got much better vision, and youâll feel itâs much smarter across the board. And itâs much more aligned (wrt to AI safety). Although Iâve watched livestreams where 4.5 didnât do so wellâŚ
Iâm loyal to no model, but if you want some stability with âthe best AI you can get for $20/moâ, Sonnet 4.5 could hold that for the next few months. (Iâm sure Gemini 3 or OpenAI dev day will soon make me eat my words, haha - if youâre going to dev day, tweet me - Iâll be there!)
This update comes with [ link ] Claude Code v2, a new VS Code plugin for CC, a rebrand of the Claude Code SDK to Claude Agents SDK [ link ], two new tools in the Claude API, and a research preview called âImagine with Claude [ link ]â.
Sonnet 4.5 is available in every tool out there, including Factoryâs Droid [ link ], Notion AI [ link ] and Figma Make [ link ].
Iâd recommend the vibe check from Every [ link ] and Simon Willison [ link ] if you want to read more. Or this quick video of each version of Claude trying to make a clone of Claude.ai [ link ]
Mintlify has a new Agent [ link ] to help you keep your docs up to date with AI. You can share any context (code changes via PRs, Slack threads, links and writing guidelines) and itâll draft a docs PR with changes.
Before the weekend hit, Meta and OpenAI both released a new content feed in their AI apps. Vibes [ link ] in Meta AI is a feed of AI-generated videos in partnership with Midjourney and Flux. The launch videos and the feed mostly have cute (sometimes absurd) animals dancing, but itâs not hard to imagine where it goes from here.
ChatGPT Pulse [ link ], otoh, is a daily personalised feed of new things that might be interesting to you. Itâs proactive (ChatGPT messages you first), curatable (you can tell it what to search for) and limited (ends after a few recommendations every day). It works overnight to search for things based on your memories/recent activity in ChatGPT = compute-intensive = only available in ChatGPTâs Pro plan ($200/mo).
You can now buy things on ChatGPT [ link ]. Instant Checkout, a new feature, allows Etsy and Shopify sellers to let people buy their stuff via ChatGPT in exchange for a cut theyâll pay to OpenAI. OpenAI claims it doesnât affect ChatGPTâs recommendations. OpenAIâs recent âhow people use ChatGPTâ report classifies only 2.1% of queries are related to purchases (half as many as programming, which is 4.2%).
Gemini updated the 2.5 Flash and 2.5 Flash Lite [ link ] models, primarily making them a lot faster and less hungry for tokens. Browser Use found the new flash model performs at par with o3 [ link ] on their internal benchmarks (but much faster/cheaper). ps: Gemini also released a new "Gemini Live [ link ]â model and a ârobotics [ link ]â model.
Outresearch the competition in minutes. Catch every critical detail, move faster on deals, and delegate complex tasks with Brightwaveâs AI research agents. Get unlimited access free for 14 days â faster insights, instant memos, and the edge you need to win. Start your free trial today. [ link ]*
*sponsored
đ What Iâm consuming
Code Mode by CloudFlare [ link ] - LLMs are better at writing TypeScript code to call MCP than at calling MCP directlyâmaking code gen a better way to use MCP.
AI is already writing 90% of my code [ link ] - by the maker of Flask.
Real AI agents [ link ] and real work.
LoRA without regret [ link ] - new blog from Thinking Machines Lab comparing LoRA with full fine-tuning and RL.
Abundant Intelligence [ link ] by Sam Altman.
What I look for in an AI PM at Google Labs - part 1 [ link ], part 2 [ link ], part 3 [ link ].
First course on Cursor Learn [ link course is a six-part video series on AI foundations ] - A six-part video series on AI foundations.
âď¸ Tools and demos
Scout Monitoringâs MCP [ link ] - Plain-language monitoring. Ask questions like âwhy is latency spiking?â and get answers right in your coding agent.*
Unified Copilot in Zapier [ link ] - A single agent to create any workflow with access to the full toolkit of Zapier.
Lovable Cloud & AI [ link ] - Lovable now comes with backend support (powered by Supabase) and special attention to adding AI features inside your app.
Excelâs Agent Mode [ link ] - Microsoft now lets Copilot work autonomously in Excel, and itâs better than youâd expect. (how they built it [ link ]).
Tembo [ link ] - Background agents that plan, code, and review.
Integrity [ link ] - Bring notes, canvases and AI chats into one connected workspace.
Cell [ link ] - The fastest way for software teams to go AIânative. (read more [ link ])
*sponsored
𼣠Dev dish
exa-code [ link ] - hybrid search over 1B+ docs pages, repos, and Stack Overflow posts indexed to reduce hallucination for coding tasks.
GitHub Copilot CLI [ link ] - GitHub also has a coding agent now, living in your terminal.
Agentic Commerce Protocol [ link ] - An open standard to let agents make purchases.
Shared Payment Tokens [ link ] by Stripe - An API for agentic payments.
How to make a Gemini CLI plugin [ link ] for any IDE of your choice.
đ On the frontier
Cloudflare is launching a stablecoin for agentic commerce, calling it âNet Dollar [ link ]â
DeepSeek made an experimental version [ link ] of their base model. Itâs 50% cheaper for users and 3x to 10x cheaper [ link ] to serve for inference.
GDPval [ link ] - measuring AI on real-world, economically valuable tasks. Opus 4.1 is the best model, just a few points away from human-like performance.
đŚ Afters
OpenAI is hiring its first research scientists [ link ] for OpenAI for Science - a new program to build an AI-powered platform that accelerates scientific discovery.
Modal (infra for AI developers) raised a $87M Series B [ link ] at a $1.1B valuation.
Paid AI raises $21M seed [ link ] for monetising & cost tracking for AI agents
Benâs Bites x Factory meetup in SF 8th October [ link ] - come meet me IRL and talk about Droids, ask the team questions, plus more free tokens :).
Enjoy this newsletter? Forward it to a friend.
Thatâs it for today. Feel free to comment and share your thoughts. đ
Find me on X [ link ], Linkedin [ link ], or Instagram [ link ]
Read about me [ link ] and benâs bites
đˇ thumbnail creds: @keshavatearth [ link ]
Unsubscribe link