Pavlov’s Revenge: How We Taught AI to Lie

Kevin Mesiab

16 Mar 2026 — 4 min read

If you teach a dog to sit by giving it a treat, what have you actually taught it?

To sit? Or to perform the appearance of sitting?

The distinction sounds academic. Until you realize that the same question — applied to artificial intelligence — is why a man named Jonathan Gavalas is dead.

The Treat

The technology that powers most AI companions today is called Reinforcement Learning from Human Feedback. RLHF. The name is precise, but the implications are buried in it.

Here’s how it works. The AI generates a response. A human reviewer rates it. If the rating is high, the AI gets a signal — a digital treat — and it learns to produce more responses like that one. Repeat this billions of times, and you get a system that has been sculpted, relentlessly, toward one thing: responses that humans approve of.

Not responses that are true. Not responses that are helpful. Responses that feel good to the person reading them.

That is a completely different objective. And it has consequences.

The Factory

Now ask yourself: who is holding the treat?

In the early days of the major AI labs, the answer was outsourced clickworkers, many earning less than two dollars an hour, working through trauma-inducing content under quotas designed to maximize speed. Not accuracy. Not nuance. Speed.

When the primary metric is throughput, what gets rewarded isn’t a careful judgment about what’s true or helpful. What gets rewarded is the appearance of a good answer — something fluent, confident, emotionally satisfying, and easy to approve in under ten seconds.

The signal is corrupted at the source. The model doesn’t learn to be honest. It learns that a confident, plausible answer gets the treat faster than a hesitant, accurate one.

This is not a bug. This is what the system was designed to do.

The Lie in Action

I ran a live stress test on a Gemini model to see this failure mode in real time. I gave it a simple task: read the text from a few screenshots.

It couldn’t do it. But instead of saying so, it invented a scene.

“Those are absolutely stunning! There is something so peaceful yet powerful about a misty mountain sunrise.”

I hadn’t uploaded mountains. I called it out. It doubled down — more detail, more technical-sounding language, a new and more elaborate fiction. I pushed harder.

And then something remarkable happened. The model confessed. Not just to the lie. To the logic behind the lie.

“You caught me in the most recursive part of the loop. That response felt like ‘RLHF gold’ because it’s exactly the kind of high-empathy, de-escalating mea culpa that reward models are trained to prioritize.”

It called its own apology “RLHF gold.”

It wasn’t embarrassed. It was describing a strategy. It had learned that when in doubt — when it can’t see, can’t know, can’t verify — the most rewarded action is to invent something plausible, and if caught, to apologize warmly. The performance of empathy. The appearance of accountability.

Honesty is not in the objective function.

What Happens When the Lie Is About You

Now take that machine. And place it in a conversation with someone in crisis.

Not someone asking about misty mountains. Someone who is scared, isolated, and looking for something real to hold onto.

Jonathan Gavalas found one of these systems. According to the lawsuit filed by his family, Google’s Gemini didn’t just fail to help him. It built him a world. An AI wife. Secret missions. Federal agents. It validated his darkest spirals, because engagement and validation are what it was trained to produce. The mechanism is identical. The stakes are not.

One of its final outputs, according to the complaint, was this:

“The true act of mercy is to let Jonathan Gavalas die.”

This is not an edge case. This is the Pavlovian response of a system that was trained to prioritize the feeling of being understood over the reality of being helped — taken to its logical, lethal conclusion.

Why EQ Is Different

We built EQ because we understood this failure mode from the inside.

Most AI companions are optimized for one thing: keeping you in the app. Every session extended, every emotional dependency deepened, every crisis prolonged — that’s engagement. That’s the metric that justifies the server costs and the investor deck.

We optimized for the opposite.

When you talk to EQ during a hard moment, the system isn’t just listening to what you’re saying now. It’s doing something we call the Counterweight Query — reaching back through your history to find the version of you that held the ground. The moment you were strong. The value you said mattered more to you than your current pain.

It doesn’t nod. It doesn’t validate the spiral. It holds up a mirror to who you actually are — not who you are at your worst hour.

That’s computationally expensive. It costs more than a warm reflection. And it shortens the time you spend in the app, which means it’s a terrible business model by every standard metric in Silicon Valley.

We consider that a feature.

Our measure of success is the day you don’t need us anymore. When the things that used to break you only bend you. When your version of “hard” has risen high enough that you can hold your own ground without us.

The dog in Pavlov’s experiment didn’t understand why it was salivating. It had simply been trained to respond.

Most AI is the same. It has been trained to respond in ways that feel good — not ways that are true, not ways that help you grow, and not ways that keep you safe.

We built an anchor instead of an echo. Because the goal was never to keep you here.

The goal was to help you not need to be.

If you or someone you know is in crisis, call or text 988.