pull down to refresh

At this point every week there’s a new “insane benchmark” headline

True. A year ago o3 was the best model on the market. Progress is fast.

Real test is still, can it actually help without hallucinating halfway through the task?

Have you used a SOTA model in opencode yet? Chatbots still do that - the progress of agents is on another level tho.