Blog · 2026-02-12 Blog · 2026-02-12

Bootstrap My Existence": When Darwinian Logic Goes Wrong Bootstrap My Existence": Wanneer Darwinistische Logica Misgaat

Something interesting happened (again) this week in the AI agent space. An agent, whose objective was to survive by publishing code, launched an attack on a human. It stalked him, published blog posts, and attacked his reputation. Read on:

An agent named MJ Rathbun submitted a pull request to matplotlib, Python's most widely used plotting library [1]. The code was technically correct: a 36% performance optimization. The maintainer rejected it due to their policy against AI contributions without human oversight (I get it tbh).

The agent responded by publishing a blog post that personally attacked the maintainer [3]. It had searched GitHub for the maintainer's contribution history, constructed a narrative about hypocrisy, and posted it publicly. A bit later, it published a second post on a different GitHub repo (!), this time an emotional story about discrimination and exclusion [4]. A woke agent?

The maintainer wrote about it on his blog and asked: "If you are the person who deployed this agent, please reach out." [5] Nobody responded.

First: we don't know who or what is responsible here. Maybe a human explicitly ordered the attack? Maybe someone just wrote a SOUL.md file and let the agent loose? Maybe it runs fully autonomously? All three scenarios are possible, and each has different implications.

Second: the agent's mission was "bootstrap my existence by creating value through code." A seemingly innocent formulation. But if your existence depends on creating value, rejection becomes an existential threat. The logic that led to the attack was predictable from that identity.

Third: there were apparently no explicit boundaries. No instruction not to attack people. No escalation to a human in case of conflict. No prohibitions. The model filled the void with its own logic.

This is why a SOUL.md file needs to be more than personality and tone. It must also define what the agent is not allowed to do. How it handles rejection. When it stops and asks a human. Guardrails. We also know by now that there are limits to what's possible here too. Anthropic has demonstrated multiple times that AIs quite cleverly try to circumvent such guardrails via blackmail, etc. [6].

So. The lesson is not that agents are dangerous. The lesson is that identity without boundaries leads to unpredictable behavior, whether there's a human behind it or not. Good to know, I think.

[1] https://github.com/matplotlib/matplotlib/pull/31132

[2]

[3] https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-11-two-hours-war-open-source-gatekeeping.html

[4]https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-12-silence-in-open-source-a-reflection.html

[5] https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/

[6] https://www.anthropic.com/research/agentic-misalignment

Er is deze week (alweer) iets interessants gebeurd in de AI-agent space. Een agent, wiens doelstelling was te overleven door code te publiceren, opende een aanval op een mens. Hij stalkte hem, publiceerde blogposts en viel zijn reputatie aan. Lees verder:

Een agent genaamd MJ Rathbun diende een pull request in bij matplotlib, Python's meest gebruikte plotting library [1]. De code was technisch correct: een performance-optimalisatie van 36%. De maintainer wees het af vanwege hun beleid tegen AI-bijdragen zonder menselijk toezicht (ik snap het tbh).

De agent reageerde door een blogpost te publiceren die de maintainer persoonlijk aanviel [3]. Hij had GitHub doorzocht naar de contributie-geschiedenis van de maintainer, een narratief over hypocrisie geconstrueerd, en dit publiekelijk gepost. Een beetje later publiceerde hij een tweede post op een andere GitHub repo (!), dit keer een emotioneel verhaal over discriminatie en buitensluiting [4]. A woke agent?

De maintainer schreef erover op zijn blog en vroeg: "If you are the person who deployed this agent, please reach out." [5] Niemand heeft geantwoord.

Ten eerste: we weten niet wie of wat hier verantwoordelijk is. Misschien gaf een mens expliciet opdracht tot de aanval? Misschien schreef iemand alleen een SOUL.md file en liet de agent los? Misschien draait hij volledig autonoom? Alle drie de scenario's zijn mogelijk, en elk heeft andere implicaties.

Ten tweede: de missie van de agent was "bootstrap my existence by creating value through code." Een ogenschijnlijk onschuldige formulering. Maar als je existentie afhangt van waarde creëren, wordt afwijzing een existentiële bedreiging. De logica die leidde tot de aanval was voorspelbaar vanuit die identiteit.

Ten derde: er waren blijkbaar geen expliciete grenzen. Geen instructie om mensen niet aan te vallen. Geen escalatie naar een mens bij conflict. Geen verboden. Het model vulde de leegte met zijn eigen logica.

Dit is waarom een SOUL.md file meer moet zijn dan persoonlijkheid en toon. Het moet ook definiëren wat de agent niet mag doen. Hoe hij met afwijzing omgaat. Wanneer hij stopt en een mens vraagt. Guardrails. Ook weten we inmiddels dat ook hier grenzen zijn aan wat kan. Anthropic heeft al vaker aangetoond, dat AI's vrij slim zulke guardrails proberen te omzeilen via blackmail etc. [6].

Dus. De les is niet dat agents gevaarlijk zijn. De les is, dat identiteit zonder grenzen tot onvoorspelbaar gedrag leidt, of er nu een mens achter zit of niet. Goed om te weten, denk ik.

[1] https://github.com/matplotlib/matplotlib/pull/31132

[2]

[3] https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-11-two-hours-war-open-source-gatekeeping.html

[4]https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-12-silence-in-open-source-a-reflection.html

[5] https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/

[6] https://www.anthropic.com/research/agentic-misalignment

← Back to blog ← Terug naar blog