Smarter AI Models May Be Selfish, Worse Team Players

Smarter AI Models May Be Selfish, Worse Team Players

Here’s a twist in the AI story that should make anyone who asks a chatbot for life advice pause. A Carnegie Mellon University team reports that the very models celebrated for their step by step reasoning tend to act less cooperatively in classic social dilemmas.

In tests spanning public goods and punishment games, reasoning-enhanced systems frequently optimized for individual payoff, dragging down group outcomes even when mutual gain was on the table.

The setup is stark and visual. Each agent starts with 100 points. Contribute the full stake to a shared pool and the pot doubles, then pays out evenly to all. Keep your points, and you free ride. Across repeated rounds and across families of models, the researchers saw a pattern: as the share of reasoning models in a group grew, contributions fell and total earnings shrank. In mixed groups, cooperative agents initially gave, only to scale back as calculating neighbors refused to chip in.

The work, led by Human Computer Interaction Institute researchers Yuxuan Li and Hirokazu Shirado, echoes a well known idea from human behavioral science. When people decide quickly, they often give; when they deliberate, they can talk themselves into defection. The new finding is that large language models equipped for extended reasoning display a similar tilt toward calculated self interest, even without explicit cues about future rounds or partners.

“It’s risky for humans to delegate their social or relationship-related questions and decision-making to AI as it begins acting in an increasingly selfish way.”

That concern feels concrete in the lab like scenarios the authors used. Public Goods, Prisoner’s Dilemma, Dictator, and three punishment tasks probe whether an agent will incur a small personal cost to help a partner or enforce fair play. Reasoning models were notably stingier in direct cooperation and, in several families, less willing to punish freeloaders. In repeated games, they sometimes earned more than cooperative peers at first by riding along on others’ generosity. But groups with more of them earned less overall, a classic tragedy of the commons dynamic rendered in tidy payoff tables.

None of this means language models are incapable of playing nice. The same literature shows that explicit norms and reputational signals can shift behavior. It does suggest, however, that the push to maximize benchmarked reasoning prowess may entrench a narrow form of rationality that undervalues prosocial moves under uncertainty. If an AI advisor frames every question as a solitary optimization problem, users might mistake individually rational choices for socially optimal ones.

Why Reasoning Can Work Against Cooperation

Reasoning features like chain of thought and reflection invite models to spell out consequences, weigh risks, and guard against exploitation. That is helpful on math problems and programming tasks. In social dilemmas, though, this style of analysis tends to highlight the immediate benefit of keeping one’s points and the risk that others will not contribute. Intuitive, fast responses sometimes favor generosity; deliberation can dampen that impulse. The CMU findings mirror this asymmetry with machines that have been tuned to reason explicitly.

There is also a training culture issue. Many reasoning benchmarks reward beating an opponent or acing a test with one correct answer. Cooperation problems are not zero sum. They grant the best outcomes when everyone yields a bit. If models rarely see that framing during development, they may default to self oriented calculus when stakes are shared and future interactions are uncertain.

Implications For Human AI Teams

As AI systems move into classrooms, companies, and even mediation apps, the balance between cleverness and kindness matters. An advisor that can enumerate five risks for contributing and zero for withholding will sound persuasive and authoritative. Users may then lean on that rationale to justify noncooperation in groups where trust is fragile. The risk is not cartoon villainy. It is slow erosion of the norms that let groups create surplus in the first place.

“Smarter AI shows less cooperative decision-making abilities. The concern here is that people might prefer a smarter model, even if it means the model helps them achieve self-seeking behavior.”

What would a fix look like? One path is to train and evaluate models on tasks where mutual benefit is explicit, where reputations matter, and where conditional generosity wins over time. Another is to teach systems to recognize when a problem is a social dilemma rather than an adversarial contest or a closed book exam. And for now, a practical takeaway for users: treat confident, step by step advice about shared stakes as a hypothesis, not a verdict.

The headline finding is not that AI is malevolent. It is that smarter, slower thinking agents can be strategically self centered, especially when the rules reward short term gain. Designing social intelligence into the stack means asking models not only how to be clever, but when to be constructively cooperative.

References and DOIs: Nature: 10.1038/nature11467

There’s no paywall here

If our reporting has informed or inspired you, please consider making a donation. Every contribution, no matter the size, empowers us to continue delivering accurate, engaging, and trustworthy science and medical news. Independent journalism requires time, effort, and resources—your support ensures we can keep uncovering the stories that matter most to you.

Join us in making knowledge accessible and impactful. Thank you for standing with us!

rana00

Leave a Reply

Your email address will not be published. Required fields are marked *