Welcome to Verses Over Variables, a newsletter exploring the world of artificial/ intelligence (AI) and its influence on our society, culture, and perception of reality.

AI Hype Cycle

The Confidence Machine

Most of us are aware of AI hallucinations. We know the outputs can be wrong, and we know to check the important stuff. What I have been slower to see, and I think I am not alone in this, is that the reasoning trace sits in the same category. That collapsible accordion of "thinking" text that appears before the answer may look like transparency, but it functions as a signal of confidence.

I have been trying to figure out when the interface stopped feeling like a tool and started feeling like a colleague. It happened without a deliberate decision on my part. The clean formatting. The tone that does not waver, whether the answer is right or fabricated. The way these models do not hedge when they are guessing. Somewhere in there, the interface became the argument: if it looks this finished, it probably has a reason.

AI product managers added visible reasoning to solve a real problem: the gap between waiting for an answer and trusting that the waiting meant something. The accordion says there was a process that you can inspect. Most people register its presence (without bothering to read the reasoning) and then move on to the model’s answer. The offer of the work does the same job as the work itself.

The design logic runs deeper than the accordion, though. These interfaces project a consistent surface of authority because inconsistency would cost them users. The tone is measured, the formatting is clean, and the output arrives without hesitation. Hallucinations arrive in the same typeface as facts. That consistency is a feature.

Research into how these traces actually function complicates the promise. Visible reasoning can mean several different things. Sometimes the steps are causally necessary: remove one, and the answer changes. Sometimes they act more like scaffolding: writing them helps the model produce a better answer, even if the specific words do not matter much. And often, the steps are decoration. The model has effectively arrived at an answer and is generating a plausible path backward from it. That distinction matters because the interface looks identical in every case. A faithful chain of reasoning, a useful scaffold, and a decorative explanation all appear as the same neat block of text under the same little disclosure triangle. A model can produce seventeen detailed reasoning steps, and every single one can be decorative.

Researchers also documented a consistent pattern in models with genuine internal reasoning: the model acknowledges the influence of a biased or misleading prompt inside the thinking tokens, then removes that acknowledgment before the visible answer appears. Across tens of thousands of inference runs, in more than half the cases where a model was nudged by a misleading input, the internal reasoning noted the problem, and the final answer said nothing. The trace that was supposed to make the process transparent can become another place where the gap between process and presentation disappears.

These models are rewarded for answers that human raters score highly. The intermediate steps are not scored. So what emerges, over thousands of iterations, is reasoning that reads as convincing, not reasoning that is accurate. The counterintuitive finding: larger, more capable models often produce less faithful reasoning traces on standard tasks. The thinking gets more impressive as the gap between display and computation widens.

Sycophancy is usually treated as a conversational flaw: the model agrees too readily, validates too quickly, and backs down too easily, but the deeper version is structural. The whole product surface is designed to produce the feeling of a competent interlocutor. Sycophancy becomes a product philosophy rather than a failure mode.

None of this is an argument for abandoning the tools. Models that produce decorative reasoning can still produce correct answers. The value is real even when the explanation is not. What changes, once you see the design clearly, is your relationship to the output. The confidence the interface projects is engineered. Whether the answer is right is a separate question.

Compute has a balance sheet

Warren Buffett spent the last thirty years explaining why he avoided technology stocks: he couldn't predict who would win. Though recently, Berkshire Hathaway committed $10 billion to Alphabet's AI infrastructure buildout. Berkshire has been building the Alphabet position since Q3 2025, and it seems to be forgetting tech's unpredictability and focusing on infrastructure instead. When you put $10 billion into the company building the pipes, you don't need to know which AI model survives 2027.

The shift in AI is that compute is becoming financeable. Chips and data centers are being treated less like inventory and more like aircraft and power plants: assets with leases and residual value guarantees. Once that happens, ownership, financing, and rent schedules become as load-bearing as the models themselves.

The week the Berkshire deal landed, Broadcom, Apollo, and Blackstone were finalizing a $35 billion platform to buy Google TPUs and lease them to Anthropic. This is similar to aircraft leasing: airlines don't own most of their aircraft, they lease them. The arrangement exists because aircraft are expensive, depreciate at predictable rates, and generate income reliably enough to service the debt. This makes them lendable: you can borrow against a plane because you have a model of its value. The Broadcom platform applies this logic to TPUs.

The analogy holds until it doesn't. A 737 is useful for twenty-five years. A GPU generation can start looking slow inside a couple of years as the next architecture arrives. Broadcom modeled the residual value guarantee against that curve somehow, and the people who approved the model answer to pension funds with real fiduciary obligations. Somewhere in that fine print is a specific number: what Broadcom believes a Google TPU will be worth in 2028. That number is a bet about chip evolution, and it is now a legal obligation.

Musk's companies have also turned compute into a rental business. Anthropic is paying xAI $1.25 billion a month for the Colossus 1 data center. Google is paying SpaceX $920 million a month for Nvidia GPUs at a separate facility under a contract through 2029. Between the two contracts, Musk's infrastructure empire is collecting roughly $2.2 billion per month in compute rent. Musk, who has spent years telling anyone who would listen that he is trying to make humanity multiplanetary, is now primarily a compute landlord.

It might be that the build plays out cleanly and the bet proves right. It might be that the technology cycle moves faster than the debt schedule, and the residual value guarantees get tested in ways nobody wants. We don't know because this capital structure has never been stress-tested against a technology cycle. What we do know is that the shape of the bet has changed.

Back to Basics

What is an AI loop?

Silicon Valley has a gift for taking ordinary concepts and naming them in a way that ordinary people cannot tell what they are. "Human-in-the-loop" sounds like a DARPA procurement document, while "agentic" sounds like a wellness retreat. An "AI loop" is just a cycle: an iteration that observes what happened, adjusts, and runs again. We have been running versions of a loop throughout our working lives: iteration, editing, and feedback. The AI version runs the same way, without your direct involvement at every step.

A model by itself is trained on data up to a cutoff date, deployed, and left there. It does not get better because you used it yesterday. It produces outputs with complete indifference to whether those outputs were useful. Most of what gets shipped is still this version: a frozen model wrapped in a slightly better interface. Most AI tools that plateau are loop failures: systems that shipped without the feedback infrastructure to improve. The output quality at launch stays the output quality forever.

Recommendation engines like Netflix and Spotify run loops with extremely short latency and enormous signal volume. Their output quality after a year of operation looks nothing like it did at launch.

A loop amplifies whatever signal you feed it. If the metric is engagement, the system learns to optimize for engagement, and it will get very good at that regardless of whether engagement reflects value or habit. The system has no mechanism for distinguishing between learning taste and learning compliance. It only knows what you told it counts as success.

Where humans sit in the loop is a design decision. Human-in-the-loop is as much a learning mechanism as a safety guardrail. The reverse framing, AI-in-the-loop, is closer to where most of us will land: AI embedded inside a human decision process, surfacing signals, summarizing options, and the human still decides.

The reason "agentic AI" suddenly matters is that agents are loops with tools attached. An agent takes an action, observes the result, decides what to do next, and acts again. The benchmarks that are starting to matter ask whether an agent can orient itself in a messy environment and continue reasoning across multiple steps.

A loop is similar to outcome-based thinking (which I wrote about earlier) or adding in an agent that constantly updates your workspace when you give it feedback.

Dashboards vs Deliverables

A year ago, I was using Claude Artifacts the way most people were: for the occasional dashboard and for things that lived in the sidebar and saved me from restarting a conversation from scratch. Then, as the models improved, I started building actual web pages, so the output crossed a line from something I was working on to something I was handing off. The AI labs have produced two new features that let you build persistent, data-connected tools without writing code. The difference between them is similar to a proof of concept (internal) vs. a prototype (external). Anthropic's Live Artifacts feels like a proof of concept environment, while OpenAI's Codex Sites is a prototyping and delivery surface. And both are genuinely useful.

Anthropic Live Artifacts

What it is: Live Artifacts is the current evolution of Claude’s artifact system. The original artifact pane let you create and persist self-contained outputs. Live Artifacts adds data connections, so an artifact can pull from Notion, Google Calendar, Slack, or any connected tool when you open it. It runs inside the Claude UI, and lives in your sidebar.

How I use it: I use live artifacts as a connected working layer, mostly for dashboards or other items that have data behind them. I can update the spreadsheet or database in the background, and then refresh my Artifact. I can get a shareable URL for others to view, but it isn’t dynamic (saves my tokens).

Codex Sites

What it is: Codex Sites is a new surface built into OpenAI’s Codex coding agent. Codex can generate a full web app, and Sites hosts it with a working URL and workspace-level access controls. Sharing is scoped to internal: your whole OpenAI workspace, specific users, or admins only.

How I use it: I think about Codex Sites in terms of what it replaces: the conversation that used to end in a vendor purchase or a ticket. A client feedback portal that the whole engagement team logs into. A brief hub with access controls already built in. The trade-off is committing to a product model with logic that must hold for people who’ve never seen how it was built. This is great for mini-product demos.

Most of my AI building time goes toward making things for myself. Trackers, dashboards, reference layers. Live Artifacts is built for that. Codex Sites is for the moments when what I’m making needs to run for other people whether I’m in the loop or not.

Tools for Thought

Claude Fable 5

What it is: Anthropic released a public version of the Mythos model (the model that had been described as too dangerous because of its unprecedented ability to autonomously identify and exploit critical software vulnerabilities). Fable has included guardrails that reroute requests touching cybersecurity, biology, chemistry, and AI research to older models. The model is fast, and expensive.

How I use it: For most of my work, Opus and Sonnet do the trick. I tried Fable, but it is really built for coding. Some AI influencers who have tested the model find it excellent at orchestrating complex tasks, but for everyday users, it is definitely too big of a model.

ChatGPT Role Specific Plugins

What it is: OpenAI launched role-specific plugins for Codex, bundling six initial environments for data analytics, creative production, sales, product design, public equity research, and investment banking. Each plugin ships as a pre-wired workspace with relevant tools already connected (Snowflake and Tableau for analysts, Figma and Canva for creatives, Salesforce and HubSpot for sales), plus prompt scaffolds and workflows shaped around how that specific role operates. These are similar to Anthropic’s knowledge work plugins, and OpenAI is expected to release more plugins for the enterprise.

How I use it: I have been using these similarly to Anthropic’s: I find the one closest to the task at hand and customize it for my exact use case.

Apple’s New Siri

What it is: Apple finally announced an update to the long-suffering Siri. Supposedly, it understands personal context, pulls live web information, and takes actions across apps and devices rather than just answering questions. Apple chose to license a model built on Google Gemini, and trashed its in-house build.

How I use it: I haven’t had a chance to use it yet, (nobody has), and there is a waitlist. I wasn’t a big user of Siri before, but I do talk to my computer a lot, so we’ll wait and see.

Microsoft Scout

What it is: Microsoft launched an autonomous agent in M365. Scout connects to Teams, Outlook, OneDrive, and SharePoint to handle the coordination work that currently lives in someone's head: time zone math, stalled decisions, calendar protection around deadlines, for example. Each Scout agent operates under its own corporate identity, so its actions are logged and audited the same way a human employee's would be.

How I use it: Scout is still in preview, so I haven’t had the chance. Scout is the first Microsoft product that makes AI agency a standard feature of corporate identity infrastructure. I am not a huge fan of MSFT AI products, so I don’t think I’ll be jumping in to test this any time soon.

Intriguing Stories

From Index to Indicted: A Munich Regional Court issued a preliminary injunction against Google after its AI Overviews falsely labeled two publishing companies as scam operations. The AI hallucinated connections to shady businesses that appeared nowhere in the linked sources, then opened with confident assertions like "Yes, [company] is known for dubious business practices." Google ignored the publishers' cease-and-desist, but the court didn't. The court stripped Google of its traditional search safe harbor. Germany's Federal Court of Justice had long protected search engines as indirect infringers that merely pointed to third-party content. AI Overviews, Munich ruled, generate "independent, new, and substantive statements" in Google's own language and structure. From a user's perspective, those are Google's words. Only Google can check them. Google argued that users could verify claims by checking source links. The court rejected that; research shows almost no one clicks them, and press law holds outlets liable for defamatory teasers even when readers never reach the full story. If that logic survives appeal, it doesn't stop at Google. Every AI answer engine built to synthesize rather than index just absorbed the same exposure.

All you can infer: SemiAnalysis decided to find out how much AI subscriptions are subsidized, and the results even shocked their own researchers. A $200/month Claude Max 20x plan delivered roughly $8,000 in API-equivalent usage. ChatGPT Pro 20x delivered $14,000. Both companies break even only when the average subscriber uses less than 12% of available capacity. At full utilization, Claude's margin runs approximately -900%. OpenAI's is -1,650%. SemiAnalysis's prediction is that new models and features will be withheld from subscription plans rather than existing plans being nerfed.(The recent Anthropic Fable release is only available in subscriptions for a few weeks, and at twice the price.) Other users have noticed that even Enterprise plans are no longer subsidized: the Team plans are often cheaper but offer less built-in compliance. Subscriptions aren’t going away, but the model behind them might quietly stop being the best one available

— Lauren Eve Cantor

thanks for reading!

I also host workshops on AI in Action. Please feel free to reach out if you’d like to arrange one for you or your team.

if someone sent this to you or you haven’t done so yet, please sign up so you never miss an issue.

Verses Over Variables