What federal and state datasets exist that track the full pipeline from CSAM identification to prosecution and conviction?
Executive summary
Federal repositories document prosecution, sentencing and post-conviction outcomes in detail, but publicly available datasets that follow child sexual abuse material (CSAM) from initial identification (e.g., by tech platforms or NCMEC) through investigation to prosecution and conviction are piecemeal and dispersed across agencies and state systems [1] [2] [3]. State-level transparency efforts and advocacy compilations fill gaps for sentencing and statutory differences, yet none of the reviewed sources establish a single, end-to-end CSAM pipeline dataset from identification to final disposition [4] [5] [6].
1. Federal sentencing and conviction datasets — comprehensive for court outcomes but not upstream identification
The U.S. Sentencing Commission maintains annual individual offender datafiles and analytic tools that provide detailed, de‑identified records on federal sentences, offense types, and guideline applications, enabling researchers to track convictions and sentencing outcomes across statutes [3] [7] [8]. The Bureau of Justice Statistics publishes Federal Criminal Case Processing Statistics and a Federal Justice Statistics Program that offer national figures on prosecutions, convictions, declinations, sentencing, and case processing timelines—valuable for measuring prosecution and conviction rates but focused on court processing rather than how cases originated [1] [9] [2].
2. State datasets and transparency portals — useful for local disposition data but inconsistent in scope
Several states operate public dashboards and open‑data initiatives that surface arrests, charges, and sentencing at the state level—California’s OpenJustice is an example of a state DOJ transparency effort that publishes criminal justice datasets useful for local analyses [4]. Independent organizations compile state comparisons on statutes and sentencing enhancements for CSAM specifically, revealing substantial variation in laws and penalties across jurisdictions [5]. Advocacy and research outlets like The Sentencing Project produce standardized state‑level incarceration and sentencing comparisons, but these sources focus broadly on criminal justice metrics rather than a branded CSAM caseflow trace [6].
3. What exists for “pipeline” measurement — federal case processing programs and sentencing data, not platform identification
The best federal building blocks for a pipeline view are the FBI/U.S. attorney investigative counts that feed into the Federal Justice Statistics Program and the Sentencing Commission’s individual datafiles; together they provide the prosecution and conviction legs of the pipeline and enable cross‑sectional research on outcomes and sentencing patterns [2] [3] [7]. The Federal Criminal Case Processing Statistics tool aggregates prosecution-to-sentencing flows at the federal level, giving researchers measures like prosecutorial declination and median case-processing times, but the tool does not incorporate pre‑investigative identification events such as platform referrals or NCMEC reports in the public dataset descriptions provided [1] [9].
4. Gaps and implicit agendas — identification data, privacy constraints, and institutional incentives
Public reporting shows a structural gap: datasets that would tie industry detection (platform flags, hash‑matching, NCMEC referrals) into criminal case records are not described in the reviewed federal and state sources, creating a blind spot for anyone seeking an end‑to‑end CSAM evidence trail [5] [1]. Transparency efforts like OpenJustice and federal datasets prioritize accountability for prosecutions and sentencing, which can reflect an institutional agenda to measure state performance, while privacy, victim‑protection concerns, and investigative integrity likely constrain public release of identification‑stage data—none of which is detailed in the provided sources [4] [9].
5. Practical research approach given available datasets
Researchers aiming to reconstruct the CSAM pipeline must link multiple public sources: use NCMEC‑adjacent public reporting where available (not in these sources), pair federal prosecution and sentencing datasets from BJS/FCCPS and the U.S. Sentencing Commission to capture legal outcomes, and augment with state open‑data portals and statutory comparisons to understand local charging and sentencing regimes [1] [2] [3] [5] [4]. The reviewed sources support rigorous analysis of prosecution-to-conviction stages but do not, on their own, provide a documented, publicly accessible dataset that traces platform identification through investigation to conviction [3] [9].
6. Bottom line — solid federal and state building blocks, but no single public end‑to‑end CSAM pipeline dataset in the reviewed reporting
Federal tools—FCCPS, the Federal Justice Statistics Program, and the Sentencing Commission’s datafiles—offer authoritative, reusable data on prosecutions, convictions and sentencing [1] [2] [3], and state portals and advocacy compilations reveal statutory and sentencing variation [4] [5] [6], yet the sources reviewed do not document a unified, public dataset that tracks CSAM from initial identification by platforms to final conviction; any claim that such an end‑to‑end public dataset exists is unsupported by the referenced material [1] [3] [5].