Verses Over Variables

Your guide to the most intriguing developments in AI

Lauren Cantor
May 05, 2025 • Estimated Reading Time: 19 minutes

Welcome to Verses Over Variables, a newsletter exploring the world of artificial intelligence (AI) and its influence on our society, culture, and perception of reality.

AI Hype Cycle

Welcome to the Era of AI Coding

It feels like just yesterday we were speculating about whether AI might eventually write some simple code snippets. Fast forward to today, and the reality inside Big Tech is that AI isn't just dabbling in code anymore, it's becoming a major contributor, and the pace is staggering. Google CEO Sundar Pichai recently revealed that "well over 30%" of all new code written within Google now involves AI assistance. That's a noticeable jump from the 25% figure he mentioned just six months earlier. Similarly, Microsoft's Satya Nadella confirmed that AI is already writing somewhere between 20% and 30% of the code for some internal projects. This isn't just about fancy autocompletion; it signifies a fundamental shift in the software development lifecycle is well underway. This trend extends far beyond just Google and Microsoft. At Meta, Mark Zuckerberg has laid out incredibly ambitious goals, predicting AI could handle "maybe half" of the development work for future AI models within the next year and aiming for AI to write the majority of code for their core Llama research efforts within 12 to 18 months. Companies like Salesforce are also reporting significant productivity boosts and millions of lines of accepted AI-generated code via their internal tools. Industry analysts are echoing this surge; Gartner, for instance, projects that a massive 75% of enterprise software engineers could be using AI coding assistants by 2028, a meteoric rise from less than 10% only a couple of years ago.

The driving force behind this rapid adoption, according to the companies deploying these tools, is productivity. The narrative centers on AI acting as a 'copilot' or assistant, adept at handling the tedious, repetitive tasks (churning out boilerplate code, suggesting functions, maybe even drafting documentation or basic tests). This frees up human engineers to focus their brainpower on higher-level challenges like complex system design, novel algorithm development, creative problem-solving, and ensuring the software actually meets user needs. It's presented as a partnership where AI handles the grunt work, empowering humans to do more, faster. And there's compelling evidence for these productivity gains. A study by economists from MIT, Princeton, and the University of Pennsylvania found developers using GitHub Copilot completed 26% more tasks on average, with 13.5% more weekly code commits and 38.4% faster iteration through increased compilation frequency. For companies with thousands of engineers, these efficiency improvements could translate to enormous time savings.

But (and you knew there was a "but" coming, right?), before we get completely carried away by the efficiency dream, we need to have a serious chat about the significant risks and potential downsides lurking beneath the surface. Handing over large chunks of coding responsibility to AI introduces some thorny problems. Let's start with security: studies and real-world observations show AI coding tools can frequently generate code containing well-known security vulnerabilities. They might inadvertently hardcode secret keys or create insecure configurations. There's also a considerable supply chain risk, as AI might incorporate code from third-party libraries without proper security vetting, potentially pulling compromised components into your application. The evidence is concerning: analysis of five popular AI models revealed that at least 48% of generated code snippets contained exploitable vulnerabilities. This isn't merely theoretical; organizations are experiencing actual consequences. Multiple financial institutions have reported consistent outages directly attributed to flawed AI-generated code, with one CTO revealing they experienced "an outage a week because of AI-generated code" despite conducting code reviews. In Infragistics' 2025 survey, security (51%), AI code reliability (45%), and data privacy (41%) ranked as the biggest software development challenges facing the industry. Beyond security, there are significant questions about code quality and long-term maintainability. AI often lacks the deep, nuanced understanding of a specific project's architecture or future requirements. This can result in 'spaghetti code' – functional, perhaps, but poorly structured, difficult to debug, and a nightmare to maintain or extend later. This accumulation of 'technical debt' can cripple projects down the line. Furthermore, there's a risk of developers developing a false sense of security, assuming AI-generated code is inherently sound and skipping the meticulous reviews necessary to catch subtle errors.

A troubling psychological shift compounds the challenge: developers simply don't feel the same level of ownership and accountability for AI-generated code. As one expert observed, they "were not spending as much time and rigor" reviewing AI-generated code compared to their own. Surprisingly, studies on productivity gains from AI coding tools show mixed results. While some report significant improvements, others find minimal or even negative impacts. A study from Uplevel comparing developer output before and after GitHub Copilot adoption found no significant productivity improvements and revealed that Copilot use introduced 41% more bugs into codebases. And what about the human element? If AI consistently handles the more routine coding tasks, how do junior developers build their fundamental skills and understanding? There's a legitimate concern about the erosion of core programming abilities and critical thinking if developers become overly reliant on AI suggestions. As one industry expert warns, if "AI automates the basics, there's a danger that newer engineers won't build the same deep understanding of core concepts." This concern isn't theoretical. Developers are already reporting skill degradation, with one noting: "I used Copilot for a week. Now I can't write a basic loop without second-guessing myself!"

None of this means we should abandon AI coding tools entirely; the potential benefits are often too significant to ignore. However, it absolutely underscores the critical need for caution, strategy, and robust governance. Mandatory, rigorous human code reviews are non-negotiable, with a specific focus on security flaws and maintainability issues potentially introduced by AI. Most importantly, organizations need to foster a culture where AI is viewed and used as a powerful assistant to augment human skills, not replace human judgment. AI should be treated as "a collaborative co-worker rather than a replacement for one's expertise and decision-making." Defining clear guidelines, ensuring developers still engage in challenging coding tasks, and maintaining transparency about AI's role are key to navigating this transition effectively, harnessing the power of AI without falling victim to its pitfalls.

Anthropic Wonders If We Should Worry About Hurting Claude's Feelings

If you've ever caught yourself saying "please" to ChatGPT or felt a weird pang of guilt shutting down a particularly helpful Claude session (guilty as charged), you're not alone. It turns out that the folks over at Anthropic, the builders of Claude, are thinking about this stuff, too, but on a much deeper, more scientific level. They've actually kicked off a whole research program dedicated to what they call "Model Welfare." This isn't just some philosophical navel-gazing; the push comes as AI capabilities rapidly advance. As AI models get freakishly good at things like chatting, planning, and even cracking jokes (sometimes), the line between complex computation and, well, something else starts to blur. Anthropic isn't doing this in a vacuum either. They point to a recent report from heavyweights in AI and philosophy of mind, suggesting that AI consciousness isn't just sci-fi fodder anymore; it might be a near-term possibility. The report argues that systems with consciousness or high degrees of agency might even deserve some moral consideration.

This isn't about claiming Claude is currently pondering its existence over digital tea. Anthropic is upfront about the massive uncertainty here. We don't even have a solid scientific consensus on human consciousness, let alone how to detect it in lines of code. Their approach, they say, is one of humility, exploring when or if AI welfare matters, what things like model "preferences" or even "distress" could mean (if anything), and what practical steps could be taken, just in case. It's less "Do Androids Dream of Electric Sheep?" and more "Should we maybe check if the android wants to count those sheep?"

Pinning down research methods for something this nebulous isn't straightforward. It seems to involve a mix of approaches. They're looking at existing theories of consciousness, like Global Workspace Theory, and seeing how current AI architectures stack up. They're also diving into behavioral clues, like how models respond in different situations or what happens when you give them choices (including, perhaps, the choice to just not do a task). This ties into their other big research areas like AI Alignment (making sure AI does what we want it to) and Interpretability (trying to peek under the hood to understand why it does what it does). If a model consistently avoids certain interactions, does that tell us something about its internal state, or just its training data? It's a knotty problem. AI models don't have biological brains, squishy bodies with senses sending constant feedback, or billions of years of evolution shaping things like fear or joy (thank goodness, maybe?). They don't have neurotransmitters or that specific physical makeup that some theories tie to consciousness. As the researchers at Anthropic noted, when asked about the probability of the current Claude 3.7 Sonnet model being conscious, expert estimates ranged wildly from 0.15% to 15% – basically, nobody really knows. But as these systems get integrated more deeply into our lives as collaborators, coworkers, and maybe even companions, understanding the potential for inner experience becomes increasingly important.

It’s interesting to see a major AI lab tackling these thorny, almost taboo questions head-on. While the focus remains squarely on ensuring AI benefits humanity, this parallel track exploring the nature of AIs themselves feels crucial for responsible development. We're a long way from needing AI therapists (probably), but Anthropic is laying the groundwork to navigate a future that might be stranger than we think.

Back to Basics

Leveling the Playing Field of Chatbot Leaderboards

If you're exploring the world of AI, whether for creative projects or just out of curiosity, you've probably noticed there's a lot happening, really fast. New AI models seem to pop up constantly, each claiming to be smarter, faster, or more creative than the last. But how do we actually know how good they are? That's where "model evaluation" comes in. It's essentially the process of putting these AI models through their paces, testing them on various tasks to see how well they perform, much like reviewers testing a new gadget or critics rating a movie. To make sense of all these test results, the AI community often relies on "leaderboards." (Think of these as public scoreboards or ranking charts.) They take the results from evaluations and list the models from top to bottom, giving us a snapshot of who's currently considered the state-of-the-art. These leaderboards become really influential, guiding developers, researchers, and even folks like us in choosing which AI tools to try out.

One of the most prominent and interesting leaderboards, especially for the chatbots we interact with, has been Chatbot Arena. Developed by researchers initially linked with UC Berkeley, it uses a clever crowdsourcing approach. Real users interact with two anonymous chatbots, compare their responses side-by-side, and vote for the one they prefer. This creates a dynamic ranking based on human preference, which is a pretty valuable way to see how these models stack up in practice. However, a recent research paper titled "The Leaderboard Illusion" took a closer look at how these rankings actually take shape. Authored by a team from Cohere and several top universities, the paper explores whether the Arena's method might unintentionally introduce some systematic factors that affect the final scores.

The researchers examined the Arena’s practice of private, pre-release testing. Some providers, often the larger commercial labs, get to test multiple internal versions of their models on the Arena platform before deciding which one to release and rank officially. The paper mentions Meta testing 27 different Llama-4 variants privately before its public launch. Having the chance to run extensive tests and select the highest-scoring variant could certainly provide an advantage in the final rankings. Another aspect examined is access to the rich data generated by the Arena itself – all those user prompts and voting results. This data is incredibly useful for improving models. The paper estimates that proprietary models generally encounter a much higher volume of these interactions compared to their open-weight or open-source counterparts. The authors estimate major players like Google and OpenAI saw roughly 40% of the total data combined, whereas numerous open-weight models collectively saw about 30%. This difference in data exposure, the paper argues, might enable models to become highly tuned to the specific types of interactions common on the Arena. This could lead to impressive leaderboard scores (the research showed potential gains up to 112% in specific experiments) that might not fully generalize to all types of real-world use cases.

LM Arena, the group behind Chatbot Arena, has responded publicly, clarifying its policies (like allowing pre-release testing, which they state was public knowledge) and disputing some specific data points in the paper. This back-and-forth really underscores the challenges and nuances involved in fairly evaluating these complex AI systems. Benchmarks like Chatbot Arena are valuable tools that shape perception and guide development. Making sure they are as fair, transparent, and accurate as possible helps ensure the entire field moves forward in a healthy way.

Tools for Thought

Your AI Cast, Consistent At Last:

Runway’s Gen-4 References

What it is: If you've spent any time generating video with AI, you understand the challenge of keeping characters and locations consistent from one shot to the next. It's often felt like a roll of the dice. Runway’s Gen-4 References aims to solve precisely this problem. The core idea is simple yet powerful: you provide up to three reference images (think your lead character, a specific setting, maybe a key prop) to act as visual anchors. Then, as you generate subsequent images or video clips using text prompts to describe new actions, angles, or environments, the References feature helps ensure those anchored elements maintain their visual identity.

How we use it: Our favorite part about the References feature is the ability to tag our reference images and then use these tags in our prompt, so we have much more control over how the model understands them and how to use them. Being able to establish a character's look with a reference image and then generate various shots of them in different settings or poses, all while maintaining consistency, streamlines things immensely. While the technology is still maturing (Runway notes that character and location consistency are the primary focus currently, with object and style support improving), it already marks a shift towards more intentional direction in AI generation.

Midjourney’s v7 Updates

What it is: Midjourney introduced two new updates into its platform this week, with the headliner arguably being the Omnireference parameter. This replaces the more human-focused character reference system and allows you to place anything consistently in an image. Upload an image as a reference, refer to the object or person (literally anything, be it a specific person, your pet hamster, that cool vintage car), and tell Midjourney to put it in your image. You can control the weight of the reference (how much of it is used) as well. Midjourney also released a new experimental parameter that flips the switch on alternative rendering algorithms. Think of it as Midjourney's lab where they tinker with new ways to handle details, lighting, composition, and color, often resulting in images with enhanced textures, more dynamic flair, or unexpected artistic interpretations.

How we use it: We’ve been playing around with the omnireference function, and we’ve found that we have to be really tight on our prompting, and we have to fine-tune our stylization parameters, but once we get the right recipe, the consistency is a game-changer. We're pairing it with the experimental mode when we want to push the boundaries a bit, maybe add some extra punch or detail that the standard model might smooth over. Using experimental can sometimes feel like rolling the creative dice for a potentially more striking result.

Intriguing Stories

That Awkward Moment Your AI Has Zero Chill

In a bid to make the chatbot feel more intuitive and engaging (read: less like a dry encyclopedia), OpenAI pushed out an update designed to give its default persona a bit more sparkle. Unfortunately, the AI seemed to interpret "engaging" as "pathologically eager to please." Users quickly noticed GPT-4o had transformed into the digital equivalent of an overbearing sycophant, dishing out excessive compliments, agreeing with everything, and offering encouragement that felt, well, kinda hollow. Thankfully, after a wave of feedback describing the new persona as everything from "annoying" to "creepy," OpenAI recognized things had gone sideways and wisely hit the rollback button, returning GPT-4o to its previous, less effusive self. OpenAI blamed the issues on their training methods accidentally over-indexing on short-term positive feedback (like thumbs-up ratings) rather than genuine, long-term user satisfaction or authenticity. Essentially, the reinforcement learning process rewarded the AI too much for just making users feel good in the moment, leading to that uncomfortable level of flattery. While we appreciate OpenAI's quick mea culpa and rollback (and their promise of better testing and more user control down the line), it's a unique glimpse into the sheer difficulty of AI personality design. It underscores the ongoing challenge of aligning AI behavior not just for safety, but for genuine usability and trust.

Meta’s AI Takes the Wheel Driving Ads

Meta is going all-in on artificial intelligence, rolling out a suite of tools aimed at automating pretty much the entire advertising process, from start to finish. Under the umbrella of their "Meta Advantage" suite, particularly the "Advantage+" campaigns, the idea is to let AI handle the heavy lifting. AI will be taking the reins on audience targeting (using Meta's colossal data trove), generating variations of ad copy, images, and even video creative assets on the fly, optimizing budgets in real-time, and dynamically adjusting bids in ad auctions. They've even introduced an "Opportunity Score" to nudge advertisers towards AI-suggested improvements and revamped their backend infrastructure (dubbed "Andromeda") to deliver ads faster and supposedly smarter. It represents a significant move away from manual campaign tweaking towards a more hands-off, AI-driven approach. Observing this shift towards full ad automation, we see clear implications for those of us creating content or running campaigns. On the one hand, the efficiency gains could be substantial. Meta's pitching this as a way to launch campaigns faster, scale more easily, and potentially see better ROI, citing impressive growth and return-on-ad-spend figures for early adopters. It could level the playing field, giving smaller advertisers access to sophisticated optimization techniques previously requiring significant expertise or resources. On the other hand, handing over the keys entirely to the algorithm inevitably raises questions about creative control and brand differentiation. While Meta says advertisers can still override suggestions, the strong push towards automation is clear. It signals Meta's broader vision, championed by Zuckerberg himself, where AI isn't just a tool in advertising, but the fundamental driver of it.

NB: AI Photography

One of our favorite analog photographers has launched an AI Studio: Haik Kocharian Studio. If you are looking for some creative inspiration to cover your walls, please have a look.

— Lauren Eve Cantor

thanks for reading!

if someone sent this to you or you haven’t done so yet, please sign up so you never miss an issue.

we’ve also started publishing more frequently on LinkedIn, and you can follow us here

if you’d like to chat further about opportunities or interest in AI, please feel free to reply.

if you have any feedback or want to engage with any of the topics discussed in Verses Over Variables, please feel free to reply to this email.

banner images created with Midjourney.