So What? How to Manage AI Usage Limits

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!

In this episode of So What, Trust Insights CEO Katie Robbert and Chief Data Scientist Christopher Penn break down exactly how AI usage limits work across Claude, Gemini, ChatGPT, and Minimax, and the four-part framework for managing them without burning out your tokens or your budget.

Watch the video here:

So What? How to Manage AI Usage Limits

Watch this video on YouTube

Can’t see anything? Watch it on YouTube here.

In this episode you’ll learn:

The most common rate limits you’ll run into with AI
How to manage AI usage limits with different strategies and tools
What tools are most comparable for managing AI usage limits

Transcript:

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.

Katie Robbert – 00:34

Happy Thursday. Welcome to So What, the Marketing, Analytics, and Insights live show. I’m joined by Chris and John. Howdy, fellas.

Christopher Penn – 00:40

Hello.

Katie Robbert – 00:42

Thankfully, our pre-show ramble is not recorded anywhere because it is nonsensical—not to be all mysterious. But it occurred to me while the credits were rolling to start the show that we don’t need to be talking about any of that publicly.

So, what we are going to talk about publicly today is how to manage AI usage limits. A lot of people a few weeks ago jumped ship from ChatGPT and Gemini to move over to Claude, especially Claude Desktop, which includes chat and cowork. Code doesn’t have to be desktop; you can use code and chat on the web app as well. But what happens is that people are running into usage limits now.

Katie Robbert – 01:32

I think it was just this week Claude started tightening some of the peak usage windows from 8:00 a.m. to 2:00 p.m. for whatever time zone you’re in. That’s going to look a little bit different for everybody. Basically, they’re like, “This is when people are using our systems the most, so you are stressing our systems out. We are going to make it undesirable for you to be working on Claude during those times.”

But you know when you can work? It’s when you should be sleeping. For a lot of people, that does not work because they’re sleeping.

Katie Robbert – 02:12

So, what we wanted to talk about today is how to manage your AI usage limits, but also what are some alternatives if you’re running into these usage limits and you just really want to get a good night’s sleep and not actually be working on your AI vibe coding. Chris, where should we start?

Christopher Penn – 02:34

Well, I think we should start actually with what you do in Claude cowork. Let’s say there’s a few different scenarios here, and we should probably start talking about what the usage limits are first, because that’s important.

There are two fundamental different ways to use AI these days. There is the “all you can eat”—within with a big asterisk—for a fixed fee a month. There’s fixed fee usage: pay 20 bucks a month, 200 a month, whatever. That’s one version, and that’s what pretty much everyone is used to. If you’re using Gemini in the web or using ChatGPT, you’ve been on the 20-a-month plan.

The second is called API usage, where you’re using the APIs directly in tools like Visual Studio Code, for example, and you pay per token that you use. A token is three-quarters of a word, essentially. So, the more code you write, the more you pay.

Christopher Penn – 03:24

To give you a sense of the disparity between these two and why usage limits are such a big deal: if you were to use a tool like Claude Code on the API plan and you paid the maximum amount of money for Claude Code—the 200-a-month Claude Max 20 plan, which is what we pay for—and you ran the exact same amount of usage full tilt for a month on the API, you would spend close to $3,000 in API usage for the same 200. Obviously, it’s kind of a money loser for Anthropic, but it gets people to use the product and very few people max it out 24 hours a day.

Their usage limits were specifically 5:00 a.m. to 11:00 a.m. Pacific time. That’s where the data centers and the headquarters are. So, wherever you are on the planet, translate that to your local time zone. For us, it’s 8:00 a.m. to 2:00 p.m. Eastern Time. If you’re in Europe, it’s 1:00 p.m. to 8:00 p.m. European Time. During those times you get throttled.

Anthropic has—and this is common to a lot of providers—two different sets of limits. There’s what’s called a session limit, which is five-hour windows. I don’t know why five, but five-hour windows where, during that period of time, you have a limit to the amount of usage you can get out of the system on the fixed plan. By the way, this is true of Gemini and ChatGPT too. I have run into Gemini limits in the Trust Insights workspace in 20 minutes sometimes.

Katie Robbert – 05:01

Apologies for interrupting you, but just for context: when a lot of these models first came out, you would see people using Anthropic’s Claude chat on the web and it would say, “You have exceeded your session for the day, try again in a few hours.” That was very common for a lot of people to see upfront. So, just for context, that’s what we’re referring to—your session limits, not your overall weekly and monthly usage limits.

Christopher Penn – 06:17

Right. And that brings me to the second point, which is the weekly limit. You also get a budget of usage for the week. Now, here’s the thing about Anthropic: they don’t tell you what that unit is. They just tell you you’ve used 82% of your weekly plan. Yeah, but of what? We don’t know if it’s by token or by whatever.

Other providers will tell you—like it’s the number of requests you make. Requests can be small or big, but they’re going to basically give you a certain number of requests. You can’t go over that. So, that’s the context of what usage limits are with AI. It’s a cap to prevent you from melting someone’s data center. Every provider implements it differently. Some providers don’t tell you; some do and they’re different. If it’s pay-as-you-go on the API, then there functionally are no limits. The limit is what your credit card can tolerate.

Katie Robbert – 06:25

What’s interesting—I use Claude Desktop, specifically cowork, a lot, but it doesn’t tell me, “Hey, you’re almost at your limit,” or “Hey, you’ve exceeded your limit.” It’s just like, “You went over your limit, so here’s the extra money we’re going to charge you now.” There’s nothing you can do about it.

So, I’ve gotten more diligent about checking the limits a couple of times a day. Right now, for example, my current session is at 44% and it resets in about three hours. The weekly limit, which resets in about 19 hours, is at 73%. And that’s for me and you and John and Kelsey.

Christopher Penn – 07:04

Exactly. In fact, let’s just do a very quick tour to show where some of these limits live in case people aren’t familiar. If you’re in Claude, either in the desktop, the mobile app, or wherever, go to your settings and go to your usage menu. That’s exactly what Katie was talking about. Here’s your session limit, which is the five-hour window, and here’s your weekly limit.

Anthropic, to make things even more confusing, also tells you “all models” and just the Sonnet model, which is a little bit more efficient. There’s the extra usage. If that toggle is turned on and you have a credit card attached to your account, then it will spend money on the API. I actually turn that off for our account so that it doesn’t do that; it will just tell us “Warning.”

Christopher Penn – 07:48

If you’re in Google’s AI Studio, you’ll see their API usage and you can see what you’re doing there. A provider like Groq—G-R-O-Q, not Elon Musk’s shop—tells you your activity and your costs. This is API-based, so there’s no spending—there’s no usage limit. But it will tell you, “Hey, you’ve spent $2 so far in the last 30 days.”

Finally, in providers like Minimax, you can see the same five-hour limit there. I’m at 4% of my five-hour limit there and the weekly limit I’m at 25%, and that resets in about three days. In the consumer versions like Gemini and Workspaces, you get no warnings and there’s no place to check the usage. You just get a “come back later.”

Katie Robbert – 08:36

Yeah, so you can be in the middle of a really heavy project—you’re in a flow—and then all of a sudden, what the heck? Then you totally forget it. It’s very unhelpful. Chris, one of the things that you said is if you’re not hitting your usage limits, then what are you doing? I’m paraphrasing. You didn’t say that exactly, but that’s what I said you said.

Christopher Penn – 09:03

And that’s interesting. It’s a game that we play every week with our accounts to say, “Okay, well we’ve paid for all this usage, and if its reset is in 22 hours and we’re at 16% usage,” you’re like, “Well, better use it somehow.” Hey, let’s write a murder mystery.

Katie Robbert – 09:22

I think the way for it to resonate with people: think of it like PTO. It’s “use it or lose it.” Nobody wants to lose their PTO, so they’re going to start squeezing in all those days at the end of the month, which is really annoying to your manager, but it’s your time.

Christopher Penn – 09:42

Exactly. So, there are three things that you should be doing to manage your AI usage limits. The first one Katie is going to talk about, which is intelligent planning—planning ahead to decide when to do things. The second is command line interface tools, which we’ll talk about. The third is drop-in replacements for some of the different tools—alternatives to look at.

So, Katie, let’s start with you on intelligent planning first. Maybe you don’t want to change providers; maybe you don’t want to juggle all the technology and you still have this planning limit. What do you do to get around those limits knowing that there are periods of time when you’re going to burn your session limit if you do stuff during that time?

Katie Robbert – 10:20

I’m going to shock you all—ta-da! It’s the 5P Framework. I am a project manager at heart, and everything I do starts with “What am I doing? What is the purpose of this thing? Who’s involved? What is the process?”

Process is huge. I wrote a post about this the other day because the amount of work that I’ve gotten done in the past month is 10x what I’ve been able to do previously. But that’s because everything has a clear process, so I’m able to produce more while not burning up the usage. I’m not sitting there trying to figure out this, that, and the other—you know, what are the platforms. For every project, it’s likely bringing in Google Analytics data, HubSpot data, other research, or stuff from the Trust Insights website.

Katie Robbert – 11:10

Then there is the performance—what is the output meant to be? I always start with a plan; that is just my default state. So, I’ll boot up a new Claude cowork instance on the desktop app and I’ll say, “I have all of these things. I’m thinking about this thing. Before you take any action, let’s make a plan.” I want to make sure that it’s what I’m after and what I’m looking for.

This is true of anything you’re doing in software development, marketing, or operations: you should have a measurable plan first that has a very clear, repeatable process in it so that if you want to do it more than once, you can create high-quality things at a high volume.

Katie Robbert – 11:57

When Chris says intelligent planning, that’s what I’m doing. Over this past weekend, I actually developed a plan to fix up as much of the technical SEO on the Trust Insights website as we could. We’re a small team—it’s the three of us plus Kelsey—and we don’t have time or the resources to dedicate to that stuff, even though we know it’s important.

So, I took the site issues from our SEO tool, exported the data, and put together a plan with Claude cowork. We came up with four phases. Phase one: critical errors; phase two: structural issues; phase three: content meta; and phase four: maintenance. I was like, “This is great.”

Katie Robbert – 12:54

That probably took me about 20 minutes to put that plan together with Claude cowork. I didn’t want to do this then; I wanted to be able to run it overnight. So, I said, “Let’s come up with scheduled projects.” It took each project plan and made its own task—a task within Claude cowork. It made its own project and basically said, “All right, whenever you’re ready to run these, hit run, or you can schedule it to run at a certain time.”

I personally chose to run them myself, knowing that I would need to be able to give permissions and that kind of thing. But I could set up at 7:00 on a Saturday night, give it the permissions, and then it just runs on its own.

Katie Robbert – 13:36

When I wake up the next morning, it’s done, and I haven’t really drained any of the usage limits. Because that plan was already built, I didn’t have to babysit it. I didn’t have to build the plan as I was going. I built the plan and basically handed it off to a capable SEO person and said, “Go execute it. Let me know what happens.”

Christopher Penn – 13:56

Exactly. Not to be a complete jackass, but this is something that I mentioned in my book, Almost Timeless. It’s called “Plan Big, Act Small.” You use a big model at some point to write a big plan, and then you can hand it off to a different model to implement, and you can save yourself a lot of time and get a lot of things done.

I do this a lot with coding, where I’ll use a model like Opus 4.6, which is a very smart model but very expensive in terms of usage limits, during off-peak hours. I’ll write the PRD, the spec, and the implementation plan, and then I will hand it off to a much smaller model like Minimax M2.7 to actually do the implementation.

Christopher Penn – 14:39

If you wanted to stay in one ecosystem, that’s how you do it. Follow Katie’s structure of making a strong plan and then hand it off for off-hours execution. You’re going to hit your limits slower.

Katie Robbert – 14:49

But the plan has to be there. Otherwise, you’re just handing off this vague set of instructions to an autonomous AI system, and goodness knows what it’s going to do to your system. We have a question: “To be clear, how could I also hand off between Claude and my Gemini Workspace account so I can use those tokens and get my knowledge work done?”

Christopher Penn – 15:15

That’s a good question. We’ll get to that in just a second. The second thing that you’ll want to do is install command line applications. Command line applications—when you’re using something like Claude cowork, Claude Code, etc.—have what are called MCP (Model Context Protocol) connectors. They are basically APIs for AI, but they are very token-intensive. They use a lot of tokens, which means you hit your usage limits faster.

If you install command line applications which run locally on your computer and are not AI, they can access the systems you want. A tool like Claude Code or Claude cowork could say, “I will pick up and just use the command line tool.” It runs the tool, does all the work, communicates data back and forth, and then hands the finished work product back to Claude cowork or Claude Code.

Christopher Penn – 16:08

Thus, you save a tremendous number of tokens—something like 10x—because the command line tool did the job and not the AI. A couple of examples: Google Workspace has the Google Workspace CLI. This is a tool that you can install—one command line tool that can access all of Google Workspace: Gmail, your calendar, your Google Sheets, and your Google Docs.

The other day, Katie and I were working on a business proposal and it looked like crap. I said, “You know what? I’m not going to sit here and make cowork click on things. That’s a waste of tokens.” Instead, I said, “Claude Code, use this command line tool and compare. Here’s a good version of the document with proper form. Here’s the hot mess I’m handing you. Make the hot mess look like the good version.”

Christopher Penn – 16:52

And it said, “Okay, I’ll download this. I’ll use this tool to download the schemas for both, compare them, and fix that.” It used the command line tool to upload a new version. This is literally 10x less usage because these command line tools don’t touch AI.

Katie Robbert – 17:14

I have a question, or maybe more of an observation. I feel like with this, and with anything else that you’re giving AI access to, you really should know what you’re doing if you’re giving this command line access to all of your Google Workspace. You don’t want to just do that all willy-nilly.

Christopher Penn – 17:39

No. When you do the setup the first time through, it gives you options like, “Do you want this to be read-only?” If you’re just getting started, say yes.

Katie Robbert – 17:49

John’s like, “Take all of my data, just go do it. Whatever.”

John Wall – 17:53

My inbox, go! I don’t have any things I have to answer. Hooray!

Christopher Penn – 18:01

Another example: I was working on an MCP server for WordPress for the longest time, and then I realized, “Why am I doing this? It’s completely stupid.” WordPress has its own command line tool. You install this locally and tell your AI agent, “Hey, WP-CLI is installed and it’s logged into my blog. Just go edit my blog this way instead.”

It will say, “Got it.” It picks up the tool and edits things. You don’t have to have cowork clicking on buttons on a page anymore. It can now programmatically access the entire site. Super handy. It saves token usage. Any opportunity you can to reduce token usage is better. To the question about how do you hand off between different systems…

Christopher Penn – 18:44

If you’re using a tool like Anti-Gravity and cowork, you should always be writing to a folder on your computer that you have given permission for these tools to work in. If you plan well and you have a good solid layout—a document layout—you can say, “Hey Claude, write this new plan to my Docs folder, my plans folder,” and then flip over to Anti-Gravity and say, “Hey, read my plans folder and execute this plan.”

Your disk drive effectively is the intermediary ground where these tools can pick up and hand off work to each other. It’s the digital equivalent of passing notes in class. You just have a place where the notes live in some kind of sensible order with a common structure.

Christopher Penn – 19:35

I’ll give you an example. Anytime I’m starting a new project—in fact, let me create a new folder on my desktop here. We’ll just call it “Live Stream New.” I start with a completely blank folder, and then we just create folders in here.

This is my structure now—it’s not for everybody; you should do what works best for you. I have a “data” folder, which is where I put structured data. I have a “documents” folder for things like requirements documents. I have an “input” folder for unstructured data. I have a “logs” folder where the tools can write their logs. I have an “output” folder, a “source” folder for code, and a “temp” folder which is like a playground.

Christopher Penn – 20:25

I have a “test” folder, so when I’m writing software, it has a place to put tests. I also have pre-baked rules files for all of my major applications. I have our Ideal Customer Profiles, and this just gets copied in automatically with one command for every new project. Anytime that I want to talk to Co-CEO Katie, she’s in the “characters” file. I can just bring her up.

I have checklists for best practices—this is an example for presentations. These are the rules that an agent will use to create a good presentation. I have an orientation document that tells the AI, “Here’s where things belong; don’t put things where they don’t belong.” For each of the languages, I have a guide document.

Christopher Penn – 21:15

My one for Python says, “Here are your 12 first core principles,” like “never reinvent the wheel.” If there’s an existing Python package, don’t write a new thing; just use the one that already exists. Stuff like this cuts down on token usage because it’s already pre-baked and proven. You’re giving it known good rules so it doesn’t have to think as hard.

So, the second and third principles of managing AI usage limits are: use command line utilities, use non-AI tools as much as you can, and have good governance internally. Even at the personal level, this is something I used to hate to do, but since I figured out how to automate it with Katie’s guidance, it’s now just baked into the process.

Katie Robbert – 22:06

If I may, I can show my version because mine is not structured like yours. I actually make Claude build the folder for me. I don’t build the folder anymore. I’m like, “You do it.”

I have this one master folder on my desktop called “Claude cowork organized files,” and I just give Claude access to this one master folder. Every project I’ve ever worked on is in here. Everything I do is related to the company, so it makes sense for me to have everything grouped together. Whereas if I had different clients, I would set it up differently.

Katie Robbert – 22:50

Because everything relates back to the company, I have all of my reference and instructions similar to what Chris was talking about—outputs, “start here,” quick starts, AI agents, and so on. Every time I start a new project, the first thing I say to Claude is, “You’re going to create a new folder structure following what you have access to and name it this—whatever it is we’re working on.”

I make it do it now because I’ve already done it so many times and there’s so much reference material already. I’m like, “Make that part of your plan. You do it.” I actually had Claude clean up my desktop for me, too.

Katie Robbert – 23:43

It organized those files, which was the greatest hack ever—or just a great idea. I was like, “Can you do that?” and it’s like, “I absolutely can.” So stay tuned; it’s going to organize Google Drive next.

Christopher Penn – 23:56

It’s funny you mentioned that, because I have a very similar piece of code in a shell script to organize my folders. Again, what I want to do is not use AI for this because it consumes tokens. We want to use AI as little as possible, which sounds so counter-intuitive, but it’s how you do it.

So, number one: plan ahead. Number two: use command line tools. Number three: have great governance locally for yourself and your team to have pre-baked stuff so you’re not having AI reinvent the wheel and chewing up tokens. Number four is have the ability to switch providers and models.

Christopher Penn – 24:43

In AI, there are two concepts that are really important to understand: the harness and the model. The model is the engine; the harness is the rest of the car. Claude Code and Claude cowork are the harness. Opus, Sonnet, or Haiku are the models—the engines.

It turns out you could swap the engine pretty easily. Anthropic has its own dialect in the same way that if you’ve ever worked with OpenAI APIs, there’s an OpenAI dialect that was the standard for a very long time. Well, Anthropic has their own dialect that Claude Code and Claude cowork speak. Other providers are saying, “We now offer that dialect,” and you can use that dialect to seamlessly swap your tools.

Christopher Penn – 25:36

In Claude Code, one of the things you can do is edit the settings file to switch to a different provider. You keep your Claude Code the way you like it—with all your skills and plugins—and you just change the provider in the settings JSON file.

One of the providers that I use is Singapore-based Minimax. Minimax M2.7 is about as smart as Opus 4.5. That previous generation was the hottest thing since sliced bread until January when Opus 4.6 came out. A model that’s as smart as Opus 4.5 but is dirt cheap is amazing to use. Minimax offers what they call a “token plan.”

Christopher Penn – 26:29

You can see here in Minimax: there’s my five-hour usage window. They tell you, “You’ve used 23% of this five-hour window,” which is about a thousand requests, and the weekly limit—which resets in about three days—is 12,000 out of 45,000 used. In Minimax’s documentation, they explain the Minimax token plan.

Katie Robbert – 27:01

While he’s pulling that up: one of the first questions we got was, “Would love to hear how I can find a robust replacement for Claude cowork when I run out of usage.” I personally have not tried Minimax; that’s next on my list of things to install and play with. But Chris, you’re what I would probably call a power user at this point.

Christopher Penn – 27:23

Yes, that’s right. Minimax gives you instructions on how to modify Claude Code to use the Minimax model. This is something that, because I don’t want to use AI to do this and I don’t want to be constantly flipping back and forth, I had Minimax and Claude build me a little utility. The utility just takes the settings and swaps them back and forth.

Minimax has high usage hours too. Their high usage hours are typically between 8:00 a.m. and 6:00 p.m. Singapore time. They’re a 12-hour offset from us. Right now, it’s 1:00 a.m. there; this is the off-peak time. Their high usage time is our “we’re asleep” time.

Christopher Penn – 28:17

During the workday, I’m on Minimax, and at night, I flip back to Claude because it’s after hours. Now, you can’t do this with Claude cowork; the Claude desktop app does not have that. However, there is an absolutely outstanding tool called OpenWork from OpenWork Labs. It is free, open source, and highly capable.

I would say it is a near-peer to Claude cowork. Remember what we said earlier: if you have a desktop folder that you’ve said is the folder you want it to work in, you can flip between Anti-Gravity, Codecs, Claude Code, cowork, and now OpenWork. In the settings file here, you can connect your provider, and there are 133 different providers that you can choose from.

Christopher Penn – 29:04

If you want to use Claude Pro and Max, you can use it there. This also works with local AI. If you’ve got a beefy enough computer or your own little server, you can run LM Studio, Cobalt.CPP, or Ollama with a model like Qwen 2.5. You connect it to this, and then there’s no payment. It’s not as smart as the big foundation models, but for simple stuff, it’s smart enough.

Christopher Penn – 29:56

In this tool, you do exactly what you did previously: start a new local workspace, choose your folder, and then you have agents to work with. There’s the default agent, the planning agent (which is a coding planning agent), the building agent (which is a coding implementation agent), and then the OpenWork agent (which is their browser-use agent). If you wanted to have it take control of your browser and do stuff, that’s the agent to use. You can swap providers as you want inside OpenWork.

Katie Robbert – 30:37

I’m going to pause you for a second, Chris, because you just covered a lot of information. First and foremost, we have covered on previous live streams how to set up a local model. We’ve also talked about setting up Claude Desktop in general. You can go to TrustInsights.ai/youtube and go to the “So What” playlist to find all of those episodes.

This episode, if you’re watching, you can replay on our YouTube channel as well. Everything I’m hearing says that if you don’t have a solid plan, governance, or a clear process, then you’re just switching contexts, and it’s going to be a bigger mess than just trying to use one model.

Katie Robbert – 31:12

It sounds like if you’re hitting your limits with Claude Code, then Minimax is a really good alternative that you can switch back and forth between because their peak hours likely don’t align with ours. You can use that and perhaps schedule projects to work after hours and run autonomously in Claude during our off-peak hours. But again, it comes back to having really good planning and making sure you know exactly what to expect and what you’re giving permission to.

Katie Robbert – 32:08

For Claude cowork, if you’re running into peak issues, this open-source tool called OpenWork allows you to have a near-identical experience to Claude cowork. If you’re getting really adept at Claude cowork, OpenWork is a really nice alternative if you’re hitting your usage limits—provided you have really good planning and governance. John Wall, did you get all that? There is going to be a quiz.

John Wall – 32:44

I’m ready. The question I had: one of the latest rounds of spam that I get is vendors asking me, “Hey, do you need help managing your tokens?” That just seems like it makes the problem a lot more complicated, and you’re going to have to give people access to everything. I’ve completely been ignoring those. Is there any validity to any of that stuff?

Christopher Penn – 33:05

No. Get out of here.

Katie Robbert – 33:10

Someone said, “Thank you for always helping us. What is that YouTube channel again?” Go to TrustInsights.ai/youtube, go to the “So What” playlist, and you will find all of our background episodes of the live stream. The other question, Chris, was: “Does OpenWork have the same VM concept as cowork?”

Christopher Penn – 33:37

I believe it uses its own container, but it’s not a full VM, if I recall correctly. It’s essentially either a Mac container or a Docker container that it uses internally to partition itself so that it doesn’t just blow up your system.

Christopher Penn – 34:15

Here’s the big difference between Claude Code and Claude cowork: Claude Code operates in your system itself, which means it can do things like delete things you don’t want it to. Claude cowork creates a little Linux virtual machine that it operates in; it’s self-contained and it creates and destroys it on demand. That prevents less technical users from accidentally doing things like deleting their hard drives through bad prompting. It is its own little environment.

Christopher Penn – 35:11

OpenWork uses a container, which basically is its own miniature thing, but it’s not a full VM or a full server, so it still can access stuff in your environment. There are trade-offs to doing this. It makes it harder to run things like command line utilities from inside of a Claude cowork or an OpenWork container because it doesn’t have access to the local tools installed on your computer.

You have to provide that as part of skills to simulate it where possible. Whereas Claude Code can directly access the file system, which is much more dangerous but also much more powerful because it can directly talk to local apps installed on your machine and pick them up.

Katie Robbert – 35:53

Again, with the caveat: not great for experimentation. Excellent if you know what you’re doing and if you have a plan. I highly recommend starting with the Trust Insights 5P Framework. The goal is to help you outline a plan either really quickly or more in-depth.

If you’re just starting with these tools, use the 5P Framework to say, “What are the different things I need to think through?” You can even give the 5P Framework to the AI you’re using and say, “Help me think through a plan with this structure before we change anything—before I give you access to stuff.” I love how well the autonomous feature in cowork works. I’ve been giving it access to our website to let it do all of those technical SEO fixes. I’m not going to stay up until 3:00 a.m. doing it, but cowork doesn’t care what time it is.

Katie Robbert – 36:41

Making sure you have that plan in place so that when you give it autonomous access and it clicks buttons, it’s not going to do something where you’re like, “Oops, I didn’t have a backup of my website.” You never want to be in that position.

Christopher Penn – 37:10

Exactly. OpenWork, Anti-Gravity, and Codecs support the Skill format. If you’ve built skills in Claude cowork, you can copy-paste them, sometimes literally, into OpenWork. For example, we have the Trust Insights brand style guidelines, the Slide Maker, and our Landing Page Refresh plugins. You can copy and paste those into the OpenWork environment, and then it will be able to use them as well.

Katie Robbert – 37:43

It reminds me of questions that have come up. A friend was asking about the file type for skills. A lot of our files default to Markdown files—dot MD. She was like, “Well, my Claude cowork is just making them as dot Skills.” I feel like it’s easier to share a Markdown file between systems than it is to share a Skills file, but I don’t know if that’s true.

Christopher Penn – 38:27

In each folder, there is a file called “Skill.md.” They’re all Markdown files. If it’s creating a “.skill” file, that’s weird. That’s not Anthropic’s own standard.

Katie Robbert – 38:42

Basically, the standard for Anthropic, and across different systems, is to make sure you have Markdown files. My understanding, Chris, is that a Markdown file is the easiest file type for a large language model to read—easier than a text file or a PDF. If you can export PDFs as Markdown, you’re going to be able to get the LLMs to read them faster and not burn tokens by trying to convert those files.

Christopher Penn – 39:22

That’s correct. This is the other reason to look at Minimax. When you look at the pricing, the plus plan they have there—when billed annually at 200 a year—is roughly equivalent to the Claude Max 20 plan.

Think about that: for 200 a year instead of 200 a month, you can get the same amount of usage out of Minimax that you can out of Anthropic models. Anthropic’s models are very expensive. If you wanted to do a hybrid and only pay for Claude Pro—not Max—you could do your planning in Claude Pro, then take those plan files, flip to your Minimax subscription, and implement them with Minimax at a much lower cost. Reserve Claude for the stuff that requires the actual thinking and planning. That’s a great way to manage those usage limits. A lot of Pro users found that with the Pro plan and cowork, you hit your limit in 16 minutes.

Katie Robbert – 40:32

Again, it comes back to what you are doing in the first place. I’m all for a hybrid model and for saving money, but my first question is: what are we doing that we’re hitting the limits? Are we using the tools correctly, or are we just playing around? Have we thought through what the company is using it for to necessitate bringing more tools into the stack? If we bring more tools into the stack, can we put together an outline to say, “If you’re doing this, use this tool”? It really comes back to planning and governance.

Christopher Penn – 41:25

This is generally where things are right now. For doing text, all of the major models are great at it. If you’re doing coding, the smartest model is Opus 4.6. The second choice is Sonnet 4.6. Minimax is the open-weights model that’s really good.

If you’re doing video, Google Veo 3.1 is still the best. The open-weights model LTX 2.3 Pro just came out. For images, Nano Banana 2 is at the top of the list. ChatGPT Image 1.5 is high, and Flux 2 Dev Turbo is the best version of that. For text-to-speech, 11 Labs is generally considered the best, although there’s a lot of competition. Google is second, and Voxtral from Mistral just came out and is open weights. I am doing a test of Voxtral right now to render an entire book as an audiobook to see how it sounds. If it turns out well, that costs no money at all because it can run on a Mac. For music, Suno version 5.5 just came out at the top of the charts.

Christopher Penn – 42:40

There is no open-weights model that can generate music—incredibly, it just isn’t there yet. For speech recognition, Nvidia’s Parakeet TDT2 is the best speech recognition model; it is also an open-weights model. Second is Coherent Transcribe. There is no reason to pay for a speech recognition model; the open-weights models are so good now. Finally, for agentic coworking: Claude cowork is best-of-breed, and OpenWork is the second choice. This is where things are as of today.

Katie Robbert – 43:12

I volunteer to take this and turn it into an Instant Insights for our website because I have a really good process. The whole thing will likely take me 10 minutes from you giving this to me to getting it live on the website, because we’re going to wait until 2:00 p.m. since that’s when Claude limits relax.

Katie Robbert – 43:34

My point being: once you have a really solid, repeatable process, you still pay attention to usage limits, but the amount of time you’re spending doing things is so quick. This is true in software, in life, and with AI. A repeatable process is going to help you win. I have a solid, consistent, repeatable process for putting up things like new Instant Insights, case studies, or landing pages. Hand it over; I’ll get that up fast and it’s all going to look consistent.

Christopher Penn – 44:20

Exactly. So, the four things you need to do to manage AI limits:

One: learn how to plan. Plan well and write great plans locally on your computer so that you can hand off plans from one tool to the next.

Two: have good governance. Have as much pre-written as possible so that you’re making the AI work less. Make sure you have command line tools installed on your system so you’re taking away token usage and handing it to things that don’t need to be consuming tokens.

Three: internal governance and file structure.

Four: have great tools that you can swap in and out—being able to swap the model in Claude Code or swap the environment from Claude cowork to OpenWork. Whatever you’ve got to work with, be able to swap tools in and out.

Christopher Penn – 45:10

If you have a tool like Minimax, you build a great plan with Claude Opus at night when usage limits are more relaxed, and you input with Minimax during the day. It’s the same idea as “coffee in the morning and martinis at night.” That’s how you work around these usage limits.

If you plan well, even on a budget, you can get a lot done. If Trust Insights were to say to me, “Chris, we’re no longer going to pay for any AI tools,” I would pay for the Minimax plus plan for 200 a year out of pocket. It is capable enough at that usage level that I could make do.

Katie Robbert – 46:08

That’s a really important distinction. It’s not $200 a month; it’s $200 a year—$16 a month. A lot of people are getting their budgets squeezed, so you need to be thinking about why you’re using the tools in the first place.

Audit your tools; make sure you’re only paying for the ones that you’re using. Chris knows that once a month when I go through our Amex, I’ll say, “I thought we canceled that. Why am I still seeing that, sir?” It’s very easy to get swept up in buying multiple tools and forgetting about them. Process, process.

Christopher Penn – 47:04

Speaking of process: the question is, “When you ask Claude cowork to create the off-peak plan, do you explicitly take all the permissions?” No, because you never want to use “YOLO mode.” YOLO mode goes very badly.

Instead, I built a skill that says, “Here is the PRD, here is the spec, and here is the permissions file for this environment. Alter the permissions file to permit as many safe operations as possible.” It goes in and rewrites the permissions file so it doesn’t need to ask the user for permission every six seconds. That skill can then change the permissions.

Christopher Penn – 47:47

But you’ve got to have the PRD and the spec first, because it doesn’t know what to permit until you have those two documents. It doesn’t just have to be code; I write PRDs and specs for everything because it’s easier to hand it to the rest of the ecosystem to execute.

Katie Robbert – 48:06

I don’t have that skill that Chris has built. I’m the most risk-averse person on our team, so I still set up the project and manually grant the permissions. Once that’s set, the autonomous project is off and running, but I still like to double-check as the human. It comes down to how complex your project is and your comfort level with risk. John, what’s the first thing you’re going to do?

John Wall – 48:42

I’ve got to load Claude cowork and get it to work. I have the DMG on my desktop, so I am at square one.

Katie Robbert – 48:50

All right, John, we’re going to work on this with you. We’ll get it set up because I’ll share with you offline what I’ve been working on that affects you.

Christopher Penn – 49:01

Regarding the PRD permissions thing: I put it in our free Slack group, Analytics for Marketers. If you go to TrustInsights.ai/analyticsformarketers, you’ll get in there, and there’s a link to the GitHub repo for that particular skill.

Katie Robbert – 49:21

We covered a lot. If you want to know what we covered, go to TrustInsights.ai/youtube. We’ll have the video on our “So What” playlist, and you can get the transcript and all the things we covered—including Minimax and OpenWork. Make sure you have a plan, good governance, and apparently, a cup of coffee and a martini.

Christopher Penn – 50:03

Exactly—coffee in the morning, martinis at night. Thanks for tuning in, folks, and we’ll talk to you on the next one.

Christopher Penn – 50:05

Thanks for watching today. Be sure to subscribe to our show wherever you’re watching it. For more resources and to learn more, check out the Trust Insights podcast at TrustInsights.ai/tipodcast and our weekly email newsletter at TrustInsights.ai/newsletter. Got questions? Join our free Analytics for Marketers Slack group at TrustInsights.ai/analyticsformarketers. See you next time.

Need help with your marketing AI and analytics?

So What? How to Manage AI Usage Limits

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

In this episode you’ll learn:

Transcript:

Leave a Reply Cancel reply

Pin It on Pinterest