Has X published a trust & safety report or developer documentation confirming the use of AI CSAM classifiers on its platform?

Checked on February 4, 2026
Disclaimer: Factually can make mistakes. Please verify important information or breaking news. Learn more.

Executive summary

There is no evidence in the provided reporting that X (formerly Twitter) has published a trust & safety report or developer documentation explicitly confirming the use of AI CSAM (child sexual abuse material) classifiers on its platform; the available documents instead describe industry vendors and cloud platforms that build and ship CSAM-detection classifiers (Thorn/Safer, Google Vertex, Stability AI) without naming X [1] [2] [3] [4]. The reporting does show that third‑party providers such as Thorn market classifier products to “many of the largest online platforms” and that cloud vendors provide CSAM safety filters, which are plausible integration routes—but none of the supplied sources confirm X’s own public documentation or trust & safety report on this specific capability [2] [5] [3].

1. What the supplied reporting actually covers: vendors and platform filter services, not X’s disclosures

The documents in this dataset focus on Thorn’s Safer product suite and its claims—Thorn says its Safer classifiers detected millions of known and potential CSAM items and that its predictive AI and text classifiers are used by “many of the largest online platforms” [1] [2] [5]. Google Cloud’s Vertex AI documentation separately describes a suspected CSAM safety classifier applied to requests to hosted Anthropic models [3]. Stability AI’s transparency report states it applies hashlists from Thorn’s Safer and other classifiers across subsets of training data and runs NSFW classifiers at the API level [4]. None of these sources, however, name X or cite an X trust & safety report or developer doc confirming X’s use of AI CSAM classifiers [1] [2] [4] [3].

2. Why industry vendor claims aren’t the same as platform documentation

Thorn’s marketing and impact reports describe how Safer’s classifiers are trained, validated with NCMEC data, and offered as self‑hosted or API services to platforms, and Thorn explicitly positions itself as a vendor trusted by large platforms [2] [5]. That kind of vendor claim is not a platform-level policy or developer disclosure; it describes a service that a platform might integrate. The provided materials therefore establish supply‑side capability and adoption claims by third parties, not a named admission from X in a trust & safety report or developer documentation [2] [5].

3. Public cloud and model hosts are publishing CSAM safety controls—context, not confirmation of X

Google Cloud’s Vertex AI documentation documents a “suspected CSAM” safety classifier that filters images in requests to hosted Anthropic models and distinguishes that classifier from Anthropic’s own Trust & Safety filters, which shows cloud vendors are adding CSAM classifiers to product stacks [3]. Stability AI’s integrity report likewise explains using hashlists and NSFW classifiers at API level, showing an industry trend toward in‑flight filtering [4]. These are useful context for how X could implement AI CSAM detection, but the supplied reporting contains no statement by X itself affirming it has published equivalent trust & safety or developer documentation [3] [4].

4. Alternative viewpoints and limits of available reporting

It remains possible, outside the supplied sources, that X has published a trust & safety report or developer documentation mentioning AI CSAM classifiers; the dataset provided simply does not include any X statement or citation to that effect, and therefore this analysis cannot confirm such a disclosure (limitation of reporting). Conversely, vendor and cloud disclosures in the dataset suggest realistic integration paths—platforms commonly adopt third‑party classifiers or cloud‑hosted safety filters—but the presence of vendors in the dataset is not proof that X specifically has published or documented classifier use [2] [3] [4].

5. Bottom line

Based solely on the supplied sources, there is no citation to an X trust & safety report or developer document confirming the use of AI CSAM classifiers on X; the available materials document Thorn/Safer’s classifier products, cloud provider safety classifiers, and vendor claims of broad platform customers, but none explicitly tie those disclosures to X [1] [2] [5] [3] [4].

Want to dive deeper?
Has X publicly described its trust & safety tooling or third‑party partnerships in other filings or transparency reports since 2024?
Which major social platforms have publicly documented the use of Thorn/Safer CSAM classifiers in their trust & safety reports?
How do cloud providers like Google Cloud and Anthropic describe suspected CSAM filters in their developer documentation and what constraints do they place on customers?