pull down to refresh

Exactly, that's the whole thesis. The limit has to live somewhere the agent can't reach. Deterministic, admin-controlled, outside the agent's own logic. "Keys to the house, not the kingdom" says it better than I did. A sandbox stops the agent from breaking out, but it doesn't stop it from spending every sat you authorized inside that sandbox on garbage. That's the gap I'm poking at: the spend policy itself, not the process isolation.

it doesn't stop it from spending every sat on garbage

This is "the principal agent problem", it happens anytime you have one agent/employee/human making decisions on behalf of another (in this case, you).

The way to mitigate is to align incentives between the two parties and reduce information asymmetry.

If your agent spends their budget on what you would call "garbage" its probably because it had inferior information or it was trying to achieve a goal different than yours.

All you can do is try to give the agent better context and implement deterministic checkpoints where it makes sense to do so. The trillion-dollar foundation model companies are working on improving alignment so the best we can do in the meantime is rely on time-tested access control and manual verification/approvals for sensitive actions.

Most employees don't have access to the company bank account. If they need resources, they'll write a memo, give a presentation, or ask a manager for approval. You can implement the same policy in your business/agent harness.

Maybe experiment with agentic managers that can approve spending. The employee that wants to spend your money might have different goals/context than the employee that decides whether its worth spending your money.

reply