What datasets and methodology did the Deportation Data Project and Cato Institute use to estimate the share of deportees with violent convictions?

Checked on January 24, 2026
Disclaimer: Factually can make mistakes. Please verify important information or breaking news. Learn more.

Executive summary

The Cato Institute’s estimates that only about 5% of people booked into ICE custody had violent convictions rested on analysis of two streams of ICE data: a nonpublic “leaked” ICE bookings dataset shared with Cato and publicly released arrest records compiled by the Deportation Data Project, which obtained agency arrest spreadsheets via public‑records requests [1] [2] [3]. Both projects counted bookings/arrests and coded criminal history categories (violent, property, other, none), then compared shares across time windows, but each source carries classification and coverage limits that shape the 5% headline [1] [4] [2].

1. What data each group used — undercover and FOIA tracks

Cato’s primary figures came from a dataset of ICE “book‑ins” that Cato says was shared with them by people outside the agency — a nonpublic DHS/ICE extract that listed roughly 44,800 bookings between October 1 and mid‑November and included fields the institute used to tabulate criminal convictions and types (violent, property, immigration/traffic, or none) [2] [1]. The Deportation Data Project at UC Berkeley Law and UCLA law assembled ICE arrest records obtained through public‑records requests and Freedom of Information Act filings; those arrest tallies (covering broader stretches of 2025 and earlier months of the administration) were used to corroborate Cato’s book‑ins analysis and to produce independent counts of arrests with and without convictions or pending charges [1] [4] [3].

2. How convictions and ‘violent’ offenses were defined and coded

Both teams relied on ICE’s internal coding of prior criminal history as recorded in the agency files rather than on independent court records; Cato reported counts of “violent criminal convictions,” “property convictions,” and “no convictions” using the offense labels present in the booking dataset it analyzed [2] [1]. The Deportation Data Project likewise tallied arrests by whether records showed convictions, pending charges, or no criminal records in ICE’s arrest data provided via public‑records channels [4] [3]. Neither news accounts nor the summaries examined here describe a re‑audit of court dockets to validate ICE’s classifications, so the estimates reflect ICE’s internal categorization practices [1] [4].

3. Simple arithmetic, time windows and cross‑checks produced the headline share

Cato’s “5% violent convictions” headline is the result of dividing the number of bookings flagged as violent‑conviction cases in its leaked book‑ins file by the total number of recorded bookings for the chosen window (Oct. 1–mid‑Nov.), with similar calculations used for the 73% no‑conviction figure [2] [1]. The Deportation Data Project’s arrest logs—covering different months and a larger span (e.g., Jan. 21–Oct. 15 in some releases)—were presented as consistent corroboration: its FOIA‑obtained arrest counts showed large shares of arrests with no convictions or only pending charges, supporting the pattern Cato observed in the leaked bookings [4] [3].

4. Key methodological caveats the reporting highlights

Reporting emphasizes several limits: the counts depend on ICE’s internal labels (not independent court verification), the datasets cover specific time windows that may not represent longer trends, and ICE’s own practice of treating dismissed charges as “criminal” for removal purposes can blur the distinction between conviction and allegation [5] [1] [4]. Additionally, one industry critique, from prior disputes over Cato work in Texas, points to risks like double‑counting or missed identifications when datasets are assembled from administrative sources—an implicit warning that classification and linkage choices can shift reported shares [6].

5. Alternative interpretations and institutional incentives

Advocates for the administration point to DHS claims emphasizing that many arrests involve people with charges or convictions, a position that DHS contested in public rebuttals of some coverage; independent analysts noted ICE’s own public detainee rosters and other agency tallies that present different slices of the universe [1] [4]. The Deportation Data Project’s public‑records approach trades potential secrecy for replicability, while Cato’s use of a leaked internal extract yields timelier snapshots but depends on provenance and the agency’s own coding — each choice reflects tradeoffs between access, transparency, and auditability [3] [2].

Want to dive deeper?
How does ICE classify convictions and pending charges in its administrative databases, and how often do those classifications match court records?
What are the methodological differences between ICE’s public detainee roster, Deportation Data Project FOIA extracts, and internal ‘book‑in’ datasets used by researchers?
How have independent audits or court studies validated or contradicted ICE’s internal criminal history coding in recent years?