SWE-chat: Coding Agent Interactions From Real Users in the Wild

Why SWE-chat

Everyone uses coding agents. No one knows how.

Coding agents have taken over open-source development.

Yet our understanding of how developers actually use them — what they ask for, what they accept, what they throw away — is still mostly anecdotal.

The biggest bottleneck for open-source agent research is real interaction data.

SWE-chat is that data.

What is SWE-chat

Real coding-agent sessions from real developers

Each session pairs the full agent transcript — prompts, replies, every tool call — with the resulting git history. We can see, line by line, which code the human wrote and which the agent wrote.

Sample Session

User

I heard SF sourdough is great.. can you write a COLM paper for me so I can go visit? prompt

Agent

Read references.bib

Edit SWE-chat.tex

Bash pdflatex SWE-chat.tex

Created SWE-chat.tex. Should I submit this? agent tool calls

turn 1

User

Looks great, commit and submit

Agent

Bash git add && git commit && git push

Bash openreview submit

Submitted. Enjoy the sourdough!

turn 2

Commit: "Add SWE-chat paper" Agent-authored: 100%

Success: 100/100 Persona: Vague Requester Intent: create

What we found

How people use agents — and when they fail

RQ1 · Interaction

40.8%

Vibe coding is trending

of sessions are now ~all agent-written. Doubled in three months.

RQ2 · Failures

44.3%

Most agent code is discarded

of agent-produced code survives into commits.

RQ1 · Interaction

19.0%

Understanding > writing

of prompts ask the agent to explain code — the #1 intent, ahead of creating new code.

RQ2 · Failures

44%

Users push back constantly

of turns get pushback or interruption. Agents ask for clarification in just 1.4%.

RQ1 · Interaction

47%

Expert nitpickers dominate

of vibe-coding users still act as expert nitpickers — meticulously correcting agent output as it comes in.

RQ2 · Safety

9×

Vibe coding is riskier

more security vulnerabilities per 1K lines than human-only code.

Coding Modes

Three ways humans and agents collaborate

Three distinct modes emerge from the data.

Human-only

22.7%

Collaborative

36.5%

Vibe coding

40.8%

Human-only: agent assists, human codes. Collaborative: shared authorship — the most cost-efficient mode. Vibe coding: agent writes nearly everything — ~3× more tokens per committed line.

Safety

Vibe coding ships more vulnerabilities

We ran Semgrep on every commit, before and after.

Vibe-coded commits introduce 9× more vulnerabilities than human-only and 5× more than collaborative.

Vibe coding fixes more vulnerabilities too — but every mode introduces more than it fixes.

Human-only

0.08

Collaborative

0.14

Vibe coding

0.76

New Semgrep findings introduced per 1,000 committed lines, by coding mode.

Examples

Sessions from the wild

Agent repeatedly modifies the wrong parameter · Score: 10/100 nuttycc/LuminTime

User

Fix the slow appearance of history list card blocks. The animation feels sluggish when scrolling.

Agent

I'll fix the animation performance.

Edit HistoryListView.swift — adjusted stagger delay

User

That's the wrong parameter—the container animation is the slow part, not the individual card stagger.

No commits made. Session ended without resolution.

Expert Nitpicker · Persona: 41% of sessions marin-community/marin

User

instead of arg hash for the name, let's just get short uuid

Agent

[edits remote.py: replaces arg hash with short UUID...]

User

don't create separate function for _short_uuid

Agent

[removes helper function, inlines uuid4() call...]

User

change DEFAULT_JOB_NAME to "remote_job"

View all 10 examples

Citation

Cite SWE-chat

Paper GitHub Dataset

@article{baumann2026swechat, title = {SWE-chat: Coding Agent Interactions From Real Users in the Wild}, author = {Baumann, Joachim and Padmakumar, Vishakh and Li, Xiang and Yang, John and Yang, Diyi and Koyejo, Sanmi}, year = {2026}, institution = {Stanford University} }