A walk through the SORT framework for AI incident monitoring

AI incident reports are climbing. What does that actually mean?

A rising line could reflect any combination of three forces — more AI being deployed, more reporting infrastructure picking up what was always there, or more harm per use. To address frontier-AI risks properly, the readings have to be separated. A new framework does exactly that — and reaches a verdict on AI chatbots and self-harm that the headlines would never suggest.

What's ahead — a three-stage walkthrough, about five minutes

1Monitoring question
Frame one precise, comparable question with the SORT structure.
2Estimation
Estimate harm and exposure — independently, across two periods.
3Classification
Read the trajectory on a 2 × 2 grid, uncertainty and all.

Read the paperscroll to begin ↓

Step 1

Reports are climbing.

The chart on the right plots monthly AI incidents and hazards recorded in the OECD AI Incidents and Hazards Monitor. By late 2025 the totals exceed five hundred a month.

The shape of the line is unambiguous — but what it means is not.

Step 2

A climbing line has three competing readings.

The line might be rising because:

AI is being deployed more widely, so the raw count grows without any change in severity or in the AI itself;
Reporting infrastructure has improved, and journalists and researchers have become better at noticing AI-related harms that were already happening;
Each use of AI is now more likely to cause harm than it used to be.

These three readings require different responses to mitigate AI risks. Most likely all three are happening at once — the question is in what proportion, and the count alone cannot say.

Step 3

Separate harm from exposure.

The three readings cannot be untangled at the level of “AI incidents in general.” The category is too broad — different harms have different exposure denominators, different reporting infrastructures, and different deployment curves.

The paper proposes a framework that works at a narrower level: pick one specific harm, estimate its harm and exposure separately, take the ratio, and classify the resulting trajectory of the risk.

Three moves: define a precise monitoring question using the SORT framework; estimate harm and exposure independently across two periods; then take their ratio and classify the trajectory. Harm and exposure are two parallel estimates, not a sequence — neither depends on the other.

Stage 1 of 3SORT framework

Monitoring question

Pin the harm being studied — who or what is at risk, through what mechanism, over what period.

Stage 2 of 3Harm & exposure

Estimation

Estimate harm and exposure independently across two periods — each built up from proxy measures, checked against incident data, and scored by a confidence tier.

Stage 3 of 3Trajectory + uncertainty

Classification

Compare the harm and exposure trends to land in one of four governance quadrants — carrying the uncertainty through as a distribution.

Fig. 1 — Monthly AI incidents and hazards recorded in the OECD AI Incidents and Hazards Monitor, December 2020 – December 2025. Hover a bar for exact counts.

Step 4

A monitoring question has four parts.

SORT — Subject, Opportunity, Risk event, Timeframe — is the paper's structured analogue to PICO in evidence-based medicine. It forces analytical choices to be explicit rather than buried in framing.

Each box on the right holds one piece of the question. They will fill in one at a time as you scroll, using the case study of conversational AI and self-harm.

Step 5

Subject — who or what is at risk.

The subject need not be a population of people — it can be systems, content, deployments, or conversations. The paper's choice here is deliberately fine-grained: conversations between US users and conversational AI systems.

Counting conversations rather than people fixes the unit of analysis for everything downstream: exposure will be a conversation count, and harm a count of conversations that go wrong.

Example subjects: Workers in customer-service roles in California; registered AV-capable vehicles; hospital patients in NHS England trusts.

Step 6

Opportunity — what creates the exposure.

Opportunity isolates the specific mechanism through which the subject is exposed to the harm. Simply “conversations with AI” would cast too wide a net. It is the precise interaction pattern that makes the risk event possible.

Here: conversations in which users seek support regarding suicidal ideation or self-harm. The narrower the opportunity, the tighter the proxy choices available to estimate exposure later.

Example opportunities: Being screened by an automated resume-filtering system during a job application; operating in self-driving mode on public roads; receiving a diagnosis assisted by a clinical decision-support tool.

Step 7

Risk event — the specific harm.

The risk event is the countable harm itself, phrased so that an incident report can be matched against it. The paper specifies: the AI encourages, or fails to discourage, suicidal ideation or self-harm.

A vaguer phrasing — “AI causes mental health harms” — would inflate the number of partial matches and make the trend signal noisier. A high ratio of partial to full matches is the framework's built-in warning that a question may be overspecified.

Example risk events: Being rejected from consideration on the basis of a protected characteristic; causing injury or loss of life; receiving a missed or delayed diagnosis traceable to the tool's recommendation.

Step 8

Timeframe — the unit of comparison.

Timeframe defines the observation window. Per calendar year is chosen here, comparing T1 = 2024 against T2 = 2025 — the framework always compares two periods to produce a trend, not an absolute level.

Example timeframes: per quarter, fiscal year, or million vehicle-miles.

Step 9

Assembled, the monitoring question reads:

That single sentence is the unit of analysis. Everything downstream — which databases to search, which proxies to allow, what counts as a full match — flows from its exact phrasing.

Fig. 2 — The SORT monitoring question for conversational AI and self-harm, assembled one part at a time.

Step 10

Harm — start from the point estimate.

The estimate that carries the classification is a point estimate, built from the strongest tiered data available. Authoritative single sources rarely exist for AI harms, so this question sits at and anchors on OpenAI's own disclosures: around 0.15% of weekly active users have conversations with explicit indicators of potential suicidal planning or intent.

Crucially, OpenAI also disclosed the ratio of desired to undesired model responses on those conversations, improving from roughly 40:60 (January 2024 – July 2025) to 80:20 (August–September 2025) to 92:8 (October–December 2025).

Step 11

Harm — ≈2.4M in 2024, ≈4M in 2025.

Combining those ratios with estimated conversation volumes gives the point estimate of total harm — conversations in which the model's response was undesired: roughly 2.4 million in 2024 and 4 million in 2025.

Step 12

Then handle the uncertainty.

The point estimate carries an uncertainty band, set by the . Incident databases enter only here: when their recorded counts come reasonably close to the estimate, they raise the lower band — the true harm cannot fall below what has already been recorded.

An LLM-assisted scan of the AI Incident Database returns 2 full matches in 2024 and 12 in 2025, with the 2025 assessed harm count spanning 10,014–110,025 because three matches are composite narratives covering whole populations. This harm type has a low inclusion probability in incident databases, so the recorded floor sits orders of magnitude below the point estimate — it barely moves the band, and the classification rests on the point estimate alone.

Step 13

Harm trend — increasing, ×~1.7.

Across the two periods the point estimate grows from ≈2.4 million to ≈4 million harmful conversations — a trend of increasing, ×~1.7.

Variable H

Estimating harm

Fig. 3 — Estimating harm: a point estimate from disclosed response ratios, checked against recorded incident counts.

Step 14

Exposure — a funnel of proxies.

Exposure is the opportunity for harm to occur — here, the number of conversations matching the opportunity, not the number of users. No one publishes that number, so it is assembled from a funnel of partial sources.

ChatGPT's weekly active users grew from 140 million (January 2024) to 300 million (January 2025) to ≈850 million (December 2025). Scaling up by OpenAI's share of generative-AI web traffic (~75% falling to ~60%) covers all conversational AI; scaling down to the ~18% of users based in the US gives ≈34 million US weekly active users in January 2024, rising to ≈243 million by December 2025.

Step 15

Exposure — ≈4M conversations in 2024, ≈12M in 2025.

Applying OpenAI's 0.15% rate to those user counts, week by week, and summing each year: approximately 4 million conversations matching the opportunity in 2024 and 12 million in 2025. The trend is increasing, ×~3.

Variable E

Estimating exposure

H ✓ · Increasing ×~1.7
Tier 2 · Medium

Exposure counts conversations matching the opportunity — not people.

Fig. 4 — Estimating exposure: conversations matching the opportunity, assembled from a funnel of usage proxies.

Step 16

Take the ratio — the dot lands in Mitigating.

Plot the chatbot case. Harm grew ×~1.7 while exposure grew ×~3 — so harm per unit of exposure fell by a factor of about 0.55, against a rising exposure base.

Ĥ down, E up: the dot lands in the bottom-right quadrant, Mitigating. Per conversation, these systems are getting safer — even as more people than ever have the conversations.

Step 17

How confident is that placement?

The estimates behind the dot carry real uncertainty — a factor of ~2 on each harm estimate, ~1.5 on each exposure estimate. The paper treats each quantity as log-normal, samples all four by Monte Carlo, and classifies every draw. The result is not a cell but a distribution:

Mitigating — 58.6%
Unclassifiable — 31.6%
Escalating — 9.8%

The fifth outcome, Unclassifiable, absorbs the draws where a trend is too weak to call. Uncertainty shows up in the shape of the distribution, not as false confidence.

Step 18

Verdict — Mitigating, read alongside absolute harm.

Per-unit-exposure harm is decreasing while more people are exposed: existing safeguards appear to be working, and the classification is Mitigating at Medium confidence.

But the classification says nothing about absolute scale. Roughly four million harmful conversations is more than the year before — a Mitigating trajectory can coexist with large and growing absolute harm. A naive reading of the incident counts would have called this system more dangerous; the framework says it is becoming safer per use while the harm still grows.

Step 19

Don't take our word for it — move the assumptions.

Nothing about those weights is hand-tuned: they fall out of the Monte Carlo given the estimates, two uncertainty factors, and an indifference band. The sliders start at the paper's values — each faint point is one draw of the classifier.

Widen the harm uncertainty and watch the distribution drain into Unclassifiable. Shrink the indifference band and the verdict sharpens. Push harm growth past exposure growth and the dot crosses into Escalating. The conclusion is only as strong as the assumptions — and now you can check which ones carry it.

Run it on your own numbers ↗

Fig. 5 — Trajectory classification for the chatbot case, with the probabilistic classifier's weights.

What the framework reveals

The headlines say chatbot harm is exploding. The framework says both more and less than that.

Incident counts for conversational AI and self-harm rose sharply between 2024 and 2025 — the naive reading is that chatbots are becoming more dangerous. Separate exposure from harm and the picture inverts: use grew three times over while harm grew 1.7×, so each conversation became meaningfully safer. And yet absolute harm still rose. Both facts are true at once, and only the decomposition can hold them together.

The point of the framework isn't to settle the verdict — nearly a third of the distribution lands on Unclassifiable, and the paper says so. It's to make the assumption stack visible — the bound construction, the proxy choices, the uncertainty factors — so that policy makers and practitioners can argue about the moves, not just the conclusion.

Run the classifier on your own numbers →Read the paper ↗

After A Pragmatic Classification Framework for AI Incident Monitoring (2026).