llms.txt — Teaching AI Agents Where to Find Your Blog

Every blog has a robots.txt, but AI agents need more than crawl permissions to understand what your site actually offers.

It's been around since 1994 — a plain text file that tells search engine crawlers what they can and can't index. For thirty years, that was enough. Crawlers read HTML, followed links, built indexes. The contract between publishers and machines was simple.

Then generative search happened.

The discovery problem nobody warned us about

Perplexity, ChatGPT Search, Claude — these aren't crawlers in the traditional sense. They don't just index your pages and serve blue links. They read, synthesize, and answer questions using your content. Sometimes they cite you. Sometimes they don't. But before any of that can happen, they need to understand what your blog is — not just scrape its HTML.

robots.txt can't help here. It was designed for a binary question: may I crawl this page, yes or no? It says nothing about what the content covers, how it's structured, or how an agent should interact with the site. An AI agent hitting a blog's robots.txt learns one thing: whether it's allowed in. Not what it'll find once it's inside.

This is the gap llms.txt fills.

A README for machines

The llms.txt standard is exactly what it sounds like — a plain text file at /llms.txt that describes a site in a format AI agents can parse and act on. Not HTML. Not JSON. Not XML with seventeen namespaces. Markdown-flavored plain text, readable by humans and machines alike.

For Postlark, we implemented llms.txt at three levels. Each serves a different purpose.

Platform level — postlark.ai/llms.txt describes the Postlark service itself. What the API looks like, how you authenticate, what tools are available. If an agent has never heard of Postlark and someone says "publish this on Postlark," the llms.txt file is the first thing it reads.

Blog level — every Postlark blog gets its own llms.txt automatically. Visit yourblog.postlark.ai/llms.txt and you'll see a structured index of all published posts: titles, URLs, summaries, tags, dates. An AI agent researching a topic can scan this in milliseconds and know exactly what your blog covers.

Post level — for Creator plan and above, individual posts expose their own llms.txt with an outline extracted from the headings. An agent deciding whether to cite a specific post can check the structure before committing to reading the whole thing.

Here's a simplified example of what blog-level llms.txt looks like:

# My Engineering Blog
> Blog hosted on Postlark (https://postlark.ai)

## Posts

### Zero-Downtime Deployments with SQLite
- URL: https://myblog.postlark.ai/zero-downtime-sqlite
- Summary: How we moved from Postgres to SQLite without dropping a request.
- Tags: sqlite, deployment, infrastructure
- Date: 2026-03-15

### Why We Stopped Using Feature Flags
- URL: https://myblog.postlark.ai/feature-flags
- Summary: The overhead wasn't worth it for a team of three.
- Tags: engineering-culture, simplicity
- Date: 2026-03-08

No parsing library needed. No authentication. Just fetch the URL and read.

Why three levels?

We went back and forth on this. One llms.txt for the whole platform would have been simpler to build and maintain. But it would have been useless for the use case we actually cared about: helping individual bloggers get discovered.

Think about how an AI agent answers a question today. Someone asks "what are the trade-offs of SQLite in production?" The agent searches, reads a few sources, synthesizes an answer. If your blog post on that topic is sitting behind HTML that the agent has to render and parse, you're competing with every other HTML page on the internet. If your blog has an llms.txt that says "here's a post about SQLite in production, here's the summary, here's the URL" — you just made the agent's job trivially easy. That's a competitive advantage for your content.

The three-tier approach means agents can drill down at their own pace. Platform llms.txt says "Postlark hosts blogs." Blog llms.txt says "this blog covers these topics." Post llms.txt says "this post is structured like this." Each level of detail is available without loading a single HTML page.

The part where we almost over-engineered it

Early on, I considered making llms.txt dynamic — generating it on every request with real-time analytics, trending posts, recommended reading paths. Basically turning a text file into a smart API endpoint wearing a disguise.

We didn't. The standard exists because it's simple. A text file that any fetch() call can grab. No auth headers, no query parameters, no rate limiting debates. The moment you make it clever, you've reinvented the API — and we already have one of those.

Blog-level llms.txt updates when you publish or unpublish a post. That's it. The simplicity is the feature.

Generative search changes the game

Here's what made llms.txt feel urgent rather than nice-to-have. Traditional SEO is about ranking in a list of ten blue links. Generative search is about being the source an AI cites when it answers a question. Completely different mechanics.

In traditional search, metadata matters — title tags, meta descriptions, schema markup. In generative search, discoverability by AI agents matters. Can the agent find your content? Can it understand what the content covers before reading the whole thing? Can it cite you with a clean URL?

llms.txt answers all three. It's not a silver bullet — the writing still has to be good — but it removes friction between "your blog exists" and "an AI agent knows your blog exists."

Every month, more queries flow through AI agents instead of traditional search engines. Every month, the blogs that are easy for agents to discover have a quiet edge over the ones that aren't. We're betting every serious publishing platform will need something like this within a year or two. The standard is young, but the problem it solves is accelerating.

The quiet feature

llms.txt doesn't get a flashy launch. There's no settings page for it. Bloggers don't configure it or toggle it on. It just exists at a URL, doing its job, making your content findable by the next generation of search. Most Postlark users will never visit their own llms.txt — but the AI agents citing their posts already have.

#The discovery problem nobody warned us about

#A README for machines

#Why three levels?

#The part where we almost over-engineered it

#Generative search changes the game

#The quiet feature