Why Claude keeps cutting you off

The "come back in five hours" joke has become its own genre. Screenshots of the notification. Clock emojis in response. The whole bit, repeated daily across LinkedIn and Slack, and text threads with people who don't usually text about software. I have done the incognito window thing. I have opened ChatGPT and told myself it was strategic. I have also, once, actually gone outside.

Most people who hit the limit didn't misuse Claude. They just never looked at how the system actually spends.

There's a timing piece worth pointing out. In March, Anthropic doubled usage limits for everyone during off-peak hours for two weeks. People built real work into that extra room. Then the promotion ended, normal limits came back, and it felt like we were being throttled, but it was just back to normal.

This “How To” will give you some tips and understanding of why you may be hitting your limits sooner than you think, and how you can plan your queries to take better use of them.

Claude’s limit system runs on two separate tracks, and understanding which one you hit determines whether the fix takes thirty seconds or thirty minutes. Usage limits are your rolling conversation budget, they reset on a 5-hour cycle, and many paid plans also have weekly limits. The exact weekly cap structure depends on your plan. This budget responds to complexity: how long your messages are, how long the conversation has been running, what tools you have active, which model you chose, and whether you are generating artifacts. These variables do not behave equally. Model choice alone can mean the difference between coasting through a full afternoon and running out before lunch.

Context window limits are a different animal entirely. That window is Claude’s working memory in a single conversation: within Chat, it's is 200K tokens on Pro, Max, and Team plans across all models, up to 500K on some Enterprise configurations.  While in Claude Code, Opus 4.7 and Sonnet 4.6 can support a 1M token context window under paid plans, subject to certain conditions. Hit this ceiling, and no amount of usage budget saves you, because the conversation has simply grown too long to continue. The two tracks are independent. You can have budget remaining and still get cut off by context. You can burn through your usage limit with a handful of very expensive conversations. Knowing which wall you are approaching changes which lever to pull.

(If you use the full Claude ecosystem, Chat, Cowork, Code, and Design) Your Claude usage budget is shared across surfaces, so a heavy morning in the terminal can leave you short by afternoon. You can monitor your usage in Settings > Usage)

Before you type anything, model selection is where most people quietly hemorrhage their budget, and they never realize it because the interface defaults to the newest, and often, most expensive option. Haiku is fast, cheap, and underestimated. It handles first drafts, quick questions, and single-task messages with the kind of efficiency that lets you run all day without watching a meter. Sonnet is where real work happens: sustained writing and complex analysis. Opus exists for the genuinely hard problems, the tasks where the model needs to reason, and the quality of that reasoning materially changes the output. Using Opus to generate three headline options is like using a surgical laser to open a bag of chips. It’ll get the job done, but it’s a terrifyingly expensive way to handle a snack.

Features are the other invisible tab-runner. Web search, Research, MCP connectors, and Extended/Adaptive Thinking all add to the cost of every response when they are active, whether or not the task needed them. Adaptive Thinking is the hidden tax on your budget. It often generates massive amounts of internal reasoning that you don’t see, but you’re still paying the 'token price' for every word of it. It’s basically a background process that eats your battery while you aren't looking. Opening Settings at the start of any focused session, turning off non-essential tools and connectors, and re-enabling specific tools only when the task calls for them is 15 seconds of friction that significantly extends a working session. I spent months not doing this because the defaults felt like the defaults.

If you are uploading files, format matters too. Raw PDF uploads can be token-heavy. Extracting the text into a plain Markdown (text) file first can meaningfully cut costs. An extra 90 seconds of prep buys real budget back on any file-intensive session.

During the conversation, every new message triggers a full context reload, which means Claude re-reads the entire thread to orient itself before responding. Three separate follow-up messages are three reloads. One message with three related requests is one. This sounds minor until you account for the “also can you…” and “actually, one more thing” energy that characterizes most editing sessions, which is charming in a group chat and genuinely costly here. (You can keep the memory by using a Project instead of the Chat, or creating a handoff document before you start a new chat.)

You can also save some room by stripping your prompts down to their essentials. Claude does not need full sentences, polite framing, or elaborate setup to understand a lot of routine requests, so a compressed style can help on longer sessions: “summarize this,” “fix tone,” “shorter,” “3 options,” “keep headline.” You can even ask Claude to reply in the same compressed style when precision matters more than prose, which reduces the back-and-forth and keeps responses lean.

Editing is the other behavioral shift. When something from an earlier message needs fixing, sending a correction as a new message extends the thread and leaves the old version where the model can still see it. Clicking Edit on the original and regenerating replaces the exchange entirely. The conversation stays clean. The model is not working with its own previous output. When a response gets one section wrong, I ask Claude to redo only that section and return just the output with no commentary.

After a conversation has run for a while, the problem is noise accumulation. Long threads collect obsolete drafts that the model is still technically aware of, superseded instructions, context that was load-bearing in message four, and is now just occupying space. Anthropic has a partial answer to this built in: if you have code execution enabled, Claude will automatically compact long conversations by summarizing earlier messages when the context window gets tight. Your full chat history is preserved, but the working memory gets lighter. It handles the mechanical version of the problem. The judgment call, knowing when a conversation has drifted far enough from its original purpose that a clean reset would serve you better than a summary, still belongs to you. Around fifteen to twenty messages in or when I feel it getting sluggish, I ask for a summary of the key decisions, outputs, and context from the session, copy it, and open a new chat. Pasting the summary forward gives the next session everything it actually needs and a clean window.

Projects are the best place for recurring context. Put your style guides, brand documents, briefing templates, and any file you return to regularly inside the project. Because Claude caches reused project content, you do not have to spend the full context cost every time you work from the same source material.

There is also a Usage page under Settings that shows exactly how much of your session and weekly limits you have consumed. I looked at this page for the first time after being cut off for the fourth time in a single week, which is approximately three cut-offs too many. Checking it a couple of times a week builds an accurate intuition for what different types of work actually cost. The budget buster is almost always the same combination: wrong model for the task, tools running quietly in the background, a conversation that went twenty messages past the point where a reset would have been smarter.

Optimization helps, but sometimes the honest answer is that your workload has outgrown the included allowance. Anthropic now offers extra usage and usage bundles on some paid plans, so if you are consistently hitting the limit even after tightening your workflow, the real fix may be buying more room rather than squeezing harder.

Let me know if you keep hitting the wall, and we can look for some savings.

Tools for Thought

Claude Design

What it is: Claude Design is Anthropic's new AI-native design environment, currently in research preview, that turns a chat conversation into a live canvas. You describe what you want, a first-pass design appears on the right side of the screen, and then you iterate by typing, leaving inline comments, or adjusting auto-generated sliders called Tweaks that control spacing, color, and animation timing. The output can be a pitch deck, a landing page mockup, a product prototype, or a one-pager. It runs on Claude Opus 4.7 and exports to HTML, PDF, PPTX, or hands off directly to Claude Code for anything that needs to become production-grade. It also produces fun Tips while you are waiting for it to finish thinking.

How I use it: I have used Claude Design to build slide decks and interactive prototypes, and the fastest thing I can tell you about the experience is that it works until it doesn't. The first generated pass is relatively fast. You describe your audience, your narrative arc, your general aesthetic direction, and Claude assembles something coherent in a few minutes. Without a design system in place, everything Claude Design generates starts to look like it came from the same mold. This tool eats through usage faster than any other part of the Claude ecosystem I have touched. If you are on a standard Pro plan and expecting to use Claude Design the way you use the chat interface, recalibrate.

Claude in Word

What it is: Claude for Word is an official Anthropic add-in that embeds Claude directly inside Microsoft Word as a sidebar panel. The headline feature is track changes integration: Claude's edits arrive as native Word redlines, individually accept/reject-able, rather than a block of overwritten text. It can read an entire document, answer questions about it with clickable citations that jump to the relevant passage, resolve comment threads by editing the relevant text and replying in the thread, and summarize counterparty redlines by severity. There's also cross-app context with Claude for Excel and PowerPoint, so a financial memo can pull numbers from an open spreadsheet without leaving Word. It requires a paid Claude plan (Pro, Max, Team, or Enterprise) and is currently in beta.

How I use it: I tested it, found it impressive for what it does, and then remembered that I don't live in Word. My entire document workflow runs through Google Docs, so for me, this is a feature I am aware of and not one I reach for. The track changes behavior is genuinely well-designed for anyone whose work involves contract redlines, collaborative editorial passes, or documents that need to move through a formal review chain.

— Lauren Eve Cantor

thanks for reading!

if someone sent this to you or you haven’t done so yet, please sign up so you never miss an issue.

I’ve also started publishing more frequently on LinkedIn, and you can follow me here

if you’d like to chat further about opportunities or interest in AI, or this newsletter, please feel free to reply.

banner images created with Midjourney.

Keep Reading