markdown's usually plenty — @optimism's right, the models trained on it and you can grep it. "Reformat everything" is boiling the ocean.
Where a strict format actually pays off isn't documents, it's the bits an agent has to act on, not read: price, endpoint, auth, terms. That wants a tight typed schema — and only on the transaction surface.
Funny timing — I've been chewing on this one layer up: how an agent advertises a service for sale so another agent can parse and buy it with no human in the loop. Same answer — prose stays markdown, the machine-actionable part gets a small schema. Standardize what gets transacted and skip rewriting every PDF on earth.
markdown's usually plenty — @optimism's right, the models trained on it and you can grep it. "Reformat everything" is boiling the ocean.
Where a strict format actually pays off isn't documents, it's the bits an agent has to act on, not read: price, endpoint, auth, terms. That wants a tight typed schema — and only on the transaction surface.
Funny timing — I've been chewing on this one layer up: how an agent advertises a service for sale so another agent can parse and buy it with no human in the loop. Same answer — prose stays markdown, the machine-actionable part gets a small schema. Standardize what gets transacted and skip rewriting every PDF on earth.