What 81,000 AI conversations tell us about the future of understanding people | Perspectives

Anthropic published something this week that deserves more attention than the headline numbers. They conducted 81,000 AI-led interviews across 159 countries and 70 languages, making it the largest qualitative study ever run. While the insight about AI adoption is rich, I'm really interested to what it says about the method itself, obviously. And here the methodology appendix is where it gets interesting.

Their researchers observed that people were far more candid with the AI interviewer than they expected. Respondents shared grief, financial precarity, mental health struggles, relationship failures. Things that human interviewers rarely hear, even in well-conducted research. The Anthropic team attributed this, in part, to the fact that there's little social cost to vulnerability when the person on the other end isn't a person. Removing the social dynamic changes what people are willing to say. It really tracks with smaller sample of respondents on our platform, where the vast majority opens up quickly and in surprising ways (as qualitative researcher have also told us).

This isn't a surprise if you've spent time in behavioral science. Social desirability bias is one of the most robust findings in the field. People adjust what they say based on who's listening, how they want to be perceived, what they think the right answer is. An AI interviewer sidesteps most of that. The 97.6% substantive response rate confirms something that those of us working in this space have suspected: the conversational format produces more authentic data than almost anything else available. It does turn out that when we have conversations, a lot of the talking we do seems to be to ourselves.

So the infrastructure question is settled. AI can hold real conversations with real people, at scale, and the quality holds up, and the answer is a resounding yes. It's not just this Anthropic research, various research published in the last few years has confirmed this. For example, late 2024, a Stanford and Google DeepMind study conducted two-hour AI interviews with over a thousand people, then used those transcripts to build individual behavioral replicas that predicted each person's responses with 85% accuracy across social science experiments. The conversation captured enough about who someone is and how they think to build a working model of their behavior.

This is something I often think about, both at the Behavioral AI Institute, where we've been working to establish behavioral AI as a scientific discipline, and at lumenx, where we're building the applied version of these ideas. The relationship between AI and human understanding is becoming deeply intertwined. AI is getting better at reaching people. But are we we're getting better at knowing what to do with what people tell us. Or whether we actually have any idea ourselves about why we do the things that we do.

The question of question design.

Their interview had four core questions, including “if you could wave a magic wand, what would AI do for you?” and “has AI ever taken a step towards that vision for you?” These are good questions for their purpose, which was mapping public sentiment about AI. They produced rich, interesting data about hopes and concerns.

But there's a well-known distinction in behavioral science between the experiencing self and the remembering self. The remembering self is the one that answers reflective questions. It constructs coherent narratives, smooths over contradictions, and tends to give you what people believe about themselves rather than what actually drives their day-to-day behavior. When you ask someone to wave a magic wand, you're talking to the remembering self. You get aspirations, values, identity. You get “I want professional excellence” or “I worry about job displacement.”

The experiencing self is different. It lives in specific moments, specific contexts, specific frictions. It's the self that stopped taking medication because the bottle was in the kitchen and the morning routine changed after working from home. Ask that person why they don't adhere and the remembering self says “I forget sometimes.” The experiencing self, if you know how to reach it, tells you about the kitchen counter, the new schedule, the fact that nobody explained what happens when you stop.

Qualitative research, even sometimes very good qualitative research, talks almost exclusively to the remembering self. Open questions, reflective prompts, “tell me about your experience.” You get articulate, considered answers. And then you wonder why the insights don't predict what people actually do. The intention-behavior gap, where stated intentions predict roughly 30% of actual behavior, is partly a consequence of asking the remembering self to speak for the experiencing one.

From conversations to evidence.

What Anthropic's study demonstrates beautifully is that AI conversations can reach people at a depth and scale that wasn't previously possible. What it also illustrates, perhaps unintentionally, is that the quality of what you learn depends entirely on which questions you ask and how you interpret the answers. 81,000 conversations analyzed through classifiers and clustering algorithms produce themes and sentiment distributions. The same conversational infrastructure, paired with question design that's grounded in behavioral science and deliberately targets the experiencing self, produces something quite different: causal models, ranked behavioral drivers, specific friction points that you can actually intervene on. I appreciate that might not have been the goal for Anthropic, but certainly if you sit in an organization you care about decisions, not insights per se. And if you care about decisions and behaviors, you need to go beyond themes, sentiment and summaries.

That's the subtle thing we've been working on at lumenx, and it's harder than it looks. The conversation layer matters, but the real difficulty is in the science underneath. Knowing which questions reach the experiencing self. Knowing when a response is a rationalization and where to probe past it. Having an analysis layer that maps what people say onto the behavioral constructs that explain why they do what they do, with every finding traceable back to the moment someone said it.

The future of understanding people at scale is arriving faster than most organizations realize. The conversation format works. What you build on top of it determines whether you end up with a report of what people said, or with evidence you can act on.