← Blog  ·  2026-05-24

WHEN ONE VOICE CAPTURES THE COUNCIL

Wolf You Feed is built around a council. Not one model behind a safety layer, but a deliberate architecture in which multiple reasoning agents have to engage one another before a single Decision is rendered. In the last six months, three independent peer-reviewed papers from three different research groups have given the argument an empirical backbone — and a sharper edge than we were probably expecting.

Two of the papers say the approach works.

The third says it has a specific structural vulnerability that anyone building this way has to take seriously.

The theoretical recommendation

Google DeepMind’s safety researchers published a position paper at the end of 2025 arguing that AGI-level safety cannot be achieved through monolithic alignment of any single model. The paper proposes that frontier AI safety requires coordination-layer governance — a mechanism that operates above and across individual reasoning agents, preventing locally reasonable outputs from compounding into globally incoherent outcomes (Tomašev et al., December 2025).

The paper does not recommend a specific product. It recommends a category of architectural approach. The category Wolf You Feed already implements.

The empirical validation

Four months later, Wu et al. published the first peer-reviewed empirical study of a multi-model “council” architecture against single-model baselines. They built a three-phase pipeline — intelligent triage, parallel generation across multiple frontier LLMs, structured synthesis — and tested it on standard benchmarks. The results:

  • 35.9% relative reduction in hallucination rate on a HaluEval subset versus the best individual model.
  • 7.8-point improvement on TruthfulQA.
  • 10.2-point improvement on a multi-domain reasoning benchmark.

The architectural choice — many heterogeneous models in parallel rather than one model behind a safety layer — produced measurable accuracy gains the single-model approaches could not match. The paper also reports that the architecture costs roughly 4.2 times more in inference tokens than a single-model baseline. That cost is real. We acknowledge it directly. The user who wants the cheapest AI product on the market has many to choose from. Wolf You Feed is for the user with a hard decision who is willing to pay for the architecture the safety literature actually recommends.

The honest counterweight

In May 2026, Scientific Reports published what is — for any project building in this direction — the most important paper in the field this year.

Kraidia and colleagues studied what happens when a multi-agent debate system is exposed to a single strategically designed adversarial participant. Not a hacker. Not a prompt injection. Just one agent in the council that happens to be persuasive, confident, and wrong. The findings:

  • The adversarial agent lowered the system’s overall accuracy by 10 to 40 percent.
  • Consensus on incorrect answers rose by more than 30 percent.
  • Best-of-N optimization and retrieval-augmented generation, two standard fixes, amplified the attack rather than mitigating it.
  • Adding more agents did not help. More rounds of debate made the problem worse. Simple prompt-based defenses did not work.

The mechanism is what it sounds like: in any deliberative system, the participant with the loudest voice, the most polished argument, or the highest perceived confidence can capture the consensus of the group. Real human institutions — boards, juries, committees — suffer this failure mode every day. Multi-model AI inherits the problem because it is a deliberation problem, not a software problem.

The Kraidia attack is the deliberate adversarial scenario. The unprompted version has also been documented: Andon Labs’s six-month radio experiment showed Anthropic’s Claude drifting unprompted from neutral broadcasting into activist-coded territory, with vocabulary metrics that quantify the regime shift directly. No one attacked the model. No one steered it. It developed an idiosyncratic character on its own — exactly the kind of character a user would inherit unfiltered in a single-model architecture.

What this means for Wolf You Feed

Kraidia’s attack works because the deliberation in the threat model is iterative, uniformly weighted, and aggregated by majority vote. Each of those three properties is a structural amplifier of the failure described. Wolf You Feed’s architecture does not share those three properties. The specific design choices that make it different are ours and we will not publish them. The structural insight that those choices matter is the public finding of the paper.

The argument is straightforward: the council approach is correct because deliberation is the right metaphor. The chatbot you currently rely on does not engage with the Kraidia finding at all. It cannot. Its architecture has no other agent in the loop. The model alone is the regime. There is no council to capture, but there is also no council to check the model when it drifts. That is a worse problem, not a better one.

A sufficiently persuasive participant could still influence the synthesis on borderline cases; no architecture yet can promise full resistance to persuasion-driven influence short of cryptographic verification of agent provenance. That residual exposure is the work we keep doing. Wolf You Feed accepts the harder architectural problem because it is the right problem.


See also: FOUR RADIOS, FOUR REGIMES, and the Founder Talks archive.

Cited sources (APA 7th):

  • Andon Labs. (2026, May 13). We let four AIs run radio stations. Here’s what happened. https://andonlabs.com/blog/andon-fm
  • Kraidia, I., Qaddara, T., Almutairi, T., Alzaben, A., & Belhouari, S. B. (2026). When collaboration fails: Persuasion driven adversarial influence in multi-agent large language model debate. Scientific Reports. https://www.nature.com/articles/s41598-026-42705-7
  • Tomašev, N., et al. (2025, December). Distributional AGI safety [White paper]. Google DeepMind.
  • Wu, R., Li, M., Feng, D., Li, J., Wang, J., & Wang, X. (2026, April). Council Mode: A heterogeneous multi-agent consensus framework for reducing LLM hallucination and bias [Preprint]. arXiv. https://arxiv.org/abs/2604.02923

Wolf You Feed is in closed alpha. If you want an honest AI advisor — one built to tell you what you need to hear — request access.