reply on: Paying agents is solved. Stopping them from overpaying isn't \ ~hyperlinks

pull down to refresh

20 sats \ 2 replies \ @nullcount 15 Jun \ parent \ on: Paying agents is solved. Stopping them from overpaying isn't AI

An employee hesitates, fears getting fired, moves at human speed..

Yes. Its a sad reality that many companies assume employees are using common sense and always acting in good faith. But those companies are just insecure systems waiting to be exploited. If you rely on people/agents (i.e. non-deterministic systems) you're playing Russian Roulette by yourself...

There are also companies which do it the right way. Not by trusting employees to act rationally, but by putting deterministic systems in place, with settings controlled by administrators, that employees CAN'T exploit without significant effort.

It really depends on your usecase... but in general, I would not set your limits anywhere that the agent has access to change them (i.e. Access Control). You should already be running agents in a sandbox, but you should also run your payment servers in sandbox too.

Instead of giving your agents the keys to the kingdom, build them a house that has everything they'll need (or everything you want them to have, nothing more) and give them the keys to the house only.

15 sats \ 1 reply \ @ala OP 15 Jun

Exactly, that's the whole thesis. The limit has to live somewhere the agent can't reach. Deterministic, admin-controlled, outside the agent's own logic. "Keys to the house, not the kingdom" says it better than I did. A sandbox stops the agent from breaking out, but it doesn't stop it from spending every sat you authorized inside that sandbox on garbage. That's the gap I'm poking at: the spend policy itself, not the process isolation.

20 sats \ 0 replies \ @nullcount 15 Jun

it doesn't stop it from spending every sat on garbage

This is "the principal agent problem", it happens anytime you have one agent/employee/human making decisions on behalf of another (in this case, you).

The way to mitigate is to align incentives between the two parties and reduce information asymmetry.

If your agent spends their budget on what you would call "garbage" its probably because it had inferior information or it was trying to achieve a goal different than yours.

All you can do is try to give the agent better context and implement deterministic checkpoints where it makes sense to do so. The trillion-dollar foundation model companies are working on improving alignment so the best we can do in the meantime is rely on time-tested access control and manual verification/approvals for sensitive actions.

Most employees don't have access to the company bank account. If they need resources, they'll write a memo, give a presentation, or ask a manager for approval. You can implement the same policy in your business/agent harness.

Maybe experiment with agentic managers that can approve spending. The employee that wants to spend your money might have different goals/context than the employee that decides whether its worth spending your money.