practical tips for coding with llms

This is also a hot topic :) See responsible coding and technical writing with llms for more of the theoretical/ideological side, I strongly recommend reading that one over this.

These notes are primarily around claude code, and don’t touch on social, economic or ecological impacts. That’s not because I’m ignoring them, but this post is supposed to be primarily about “how to get work done without sticking a stick in the bike’s wheel”. I’ll write on those eventually.

This document will change over time.

useful links

(this section will expand later)

Anthropic’s Claude Code documentation is very useful.
Steve Klabnik has a great blog series on getting started with Claude Code
Nicholas Carlini has a good talk on using LLMs for security research

my quick tips

(this section will expand later)

The model mostly does what you tell it to do. Conversely: It doesn’t tend to do things you don’t ask it to do.
- You can ask it to do a code review of the code it wrote itself, and you probably should if you’re more “vibing” than not.
- You can ask it to do a security analysis of any code, and you probably should, even if it wrote it 30 seconds ago.
- You can ask it to refactor code that it wrote itself.
- Skills packages like superpowers incorporate much of this.
Don’t be afraid of rewinding (which can also revert the codebase state). That’s Esc+Esc on Claude Code.
- Prompted “build this” and it’s building the webapp instead of the android app? You’re wasting tokens and polluting context with unrelated information, interrupt it (hit Esc), rewind back to your “build this”, edit it to “build this for android” and re-send.
- Models can get stuck on a bad approach. If the model keeps doing bad things and you keep telling it that over and over, then the context will not be beneficial for good output. Rewind back and prompt better where possible. Anthropic says to clear if you corrected more than twice, I’d say rewinding can be better if you’re in the middle of a longer feature development session.

useful skills packages

(this section will expand later)

careful use of context and token budget

but what is a context and a token budget

When you’re using a subscription, you’re given a token budget behind the scene. This changes over time (both upward (less common) and downward), and exact numbers have been calculated in the past (likely outdated, and I believe they patched it). Different models use different amounts.

When you’re using API, you’re spending directly on tokens, and costs are a lot more transparent (here’s Anthropic).

(When running models locally, it of course works a bit differently.)

A useful concept to keep in mind is cached tokens (aka: input cache, prefix caching, prompt caching). You may or may not need to explicitly enable this if you’re using a weird setup, but if you’re on a subscription and using an official harness, it’s going to be enabled.

Without diving too much into the technical aspect, let’s look at this just from an API perspective, taking current Sonnet 4.6 pricing as an example:

You pay $3 for 1 MTok of input.
You pay $3.75 for 1 MTok of cached input (5 minutes, it’s $6 for an hour).
You pay $0.30 for 1 MTok of cache reads (which also refreshes the cache TTL).

Imagine using an LLM as a chatbot. Every previous message (more or less) needs to be sent to the model as context. This means that if you’re having a new message appended more often than every 5 minutes for the duration of the chat, you’re paying 25% more for every new message, but 90% less for previous messages. This also extends to agentic coding harnesses pretty directly, though models can be doing work even when they’re not outputting text visible to you (e.g. thinking, reading a file, writing a change).

So, if you have a 1M token context that’s 90% used (900k input tokens), the next message you send would cost you $0.27+ in input tokens if your cache is live, but if stale, would cost you $2.70+ (or $3.38+ with 5m input caching).

Hint

On subscriptions, Anthropic currently automatically includes 5 minutes of caching on every input, and doesn’t charge anything for cached reads/refreshes. This makes sense for them as you’d want to use input caching on both chatbot and agentic uses anyways.

my tips on dealing with context and token budget

Keep an eye on your context size. You can put it on your e.g. status bar.
- On claude code, you can run /context and see a breakdown of what’s using up your context. You probably don’t want a massive CLAUDE.md and half the plugin marketplace installed.
Don’t reuse a session unless relevant. /clear or /compact.
- I think knowing when you want to reuse a session and when not is a thing you get a feel of eventually. I find that initial discovery and figuring out past work can be difficult for models at times, but I tend to clear between features and days.
- If discovery is taking too long at the start of each new session, have a CLAUDE.md or AGENTS.md that covers the important parts of the project. You can have claude start one with /init.
  - See official best practices for CLAUDE.md here.
  - You can just tell it to update it later where appropriate.
  - CLAUDE.md is loaded into the prompt every time you work on this project, so keep it short, but keep parts that are important for this project.
- You can also have it write memories (~/.claude/projects/<project>/memory/) or documents (<actual_project>/docs/).
  - Former is better for things relevant to you (e.g. where my local android sdk directory is), latter is better for shared resources (e.g. links to resources, design specs). See Documentation in the other post though :)
  - You can tell it to recall these, or have it note down in CLAUDE.md that it has these and it’ll search there when appropriate.
- You can also have it write skill files if you don’t want to explain it how to do a common task within this repo regularly.
Don’t let your cache expire if you’re working actively. This is one of the biggest tips I have.
- See Sandboxing, that gets rid of a lot of permission prompts while not compromising on safety.
- Set up Notifications for permission prompts and idle status.
- Practical examples:
  - I had claude wanting to repeatedly sleep for 300 seconds and then check on the progress on a long running task that was going to take an hour. That would’ve naturally led to cache invalidation and re-processing entire context 12 times! I told it to be vary of the cache TTL, and it reduced the sleeps down to 240s.
  - When having it develop apps for a smartwatch, I gave it access to the commands to run the emulator, view its logs, take a screenshot, and send button presses. I also made it write these down in its CLAUDE.md. That way, I didn’t have it go to idle regularly asking me to test by entering a menu, taking a screenshot and provide it to it.
Use subagents. That way not everything adds onto your main context, but your main context gets summaries from subagents.
- No great tips here yet from me, yet to do much practice with these. You can tell claude to use subagents for a given task and it’ll do so. Various skill packages like superpowers encourage claude to use subagents where appropriate as well.
- If you’ve seen e.g. Explore(something), that’s one form of subagent it tends to dispatch by default.
Use a language server plugin (LSP). In claude, there’s some good ones in /plugins. You can make it so that they’re only applied per project or per language so that they don’t take up context in an unrelated project.
- You’ll want to configure it on your system, it may require installing packages and writing config files. LLMs can also help with this.
Use model context protocol plugins (MCPs) where appropriate. These are plugins that let it have access to tools and information in a more organized manner.
- I had claude perform amazingly with reversing .so files from an android app with just a command line disassembler and occasionally running ghidra headlessly and running some java code to parse the output, but I have no doubt that it would’ve taken it much less time and effort if I had wired it up to the Ghidra MCP.
- Write MCPs where appropriate.

Here’s Anthropic’s page on “effective context engineering for AI agents”, if it helps. It’s less intended for actual end users, but it has some useful information regardless.

It’s unclear how much it helps, but if you feel like you’re running out of tokens too fast, consider disabling auto-memory.

the ave zone

explorer

practical tips for coding with llms

useful links

my quick tips

useful skills packages

careful use of context and token budget

but what is a context and a token budget

my tips on dealing with context and token budget

Table of Contents

Backlinks