What datasets exist that track crimes by immigration status and how reliable are they?
Executive summary
There is no single, comprehensive U.S. dataset that neatly tracks crimes by immigration status nationwide; instead researchers stitch together state-level administrative records, federal enforcement tables, victimization surveys that now include citizenship questions, and agency-produced “criminal alien” tallies—each with clear strengths and systemic limitations [1] [2] [3] [4]. Even careful academic work stresses that measurement choices (arrests versus convictions versus incarceration), underreporting, and data gaps in legal status produce divergent pictures and restrain confident causal claims [5] [6] [7].
1. The most direct place researchers look: Texas arrest and conviction records
A rare example of administrative data linking immigration status to criminal outcomes comes from Texas, where criminal records have included immigration status and enabled NIJ-funded analyses showing undocumented people had lower arrest rates than U.S.-born residents for violent, drug, and property crimes during 2012–2018 [1] [8]. That dataset’s chief strength is individual-level administrative linkage that separates documented, undocumented and native-born groups, but its obvious limitation is external validity: Texas is “the only state” routinely recording immigration status in criminal records, so results cannot be automatically generalized to other states [8] [1].
2. Federal enforcement tallies: DHS, CBP and the “criminal alien” counts
The Department of Homeland Security publishes annual tables on foreign nationals—detention, removals, encounters and related enforcement metrics—via OHSS, and Customs and Border Protection issues “criminal alien” statistics based on record checks of apprehended individuals’ prior convictions [2] [4]. These sources are comprehensive for enforcement activity and useful for tracking convictions surfaced during immigration encounters, but they are inherently enforcement-centric: they count people who intersect immigration agencies and rely on inter-agency record matches rather than representing population-level offending rates [2] [4].
3. Victimization and reporting datasets: NCVS and the problem of silence
The National Crime Victimization Survey (NCVS), now including citizenship-related variables in recent waves, provides the best nationally representative window on victimization and reporting patterns across nativity and citizenship [3]. Yet multiple studies using NCVS subsets find that immigrant presence correlates with lower police notification—meaning underreporting, fear of deportation and police mistrust can bias police-recorded crime downward for immigrant communities—so survey-based victimization estimates must be interpreted alongside reporting dynamics [6] [3].
4. Incarceration and census-derived datasets: long-run measures with caveats
Scholars seeking long-term comparisons have used decennial census and correctional-population records to build incarceration-based datasets that bypass arrest data’s absence of immigration markers; such work argues incarceration is a stronger indicator of serious crime because it usually requires conviction [9]. But incarceration counts are also shaped by prosecutorial discretion, bail and sentencing differences, and international conviction records considerations—factors critics warn can distort comparisons of immigrant versus native involvement in criminal justice systems [9] [10].
5. Aggregate repositories and the researcher’s tool kit
Practitioners assemble data from many sources—TRAC, DHS Yearbook, BJS, FBI, USCIS and state repositories—to approximate immigrant-related crime patterns; curated lists of authoritative sources are now common reference tools for journalists and analysts [11] [12]. That multiplicity is both a resource and a problem: useful crosswalks exist, but the absence of a unified national dataset forces reliance on cross-jurisdictional harmonization and model-based imputations for unauthorized status, introducing uncertainty [12] [5].
6. Reliability assessment: what these datasets can and cannot tell policymakers and the public
Existing datasets reliably document enforcement interactions and provide defensible snapshots where immigration markers exist (Texas, DHS enforcement tables, NCVS for victimization), but none provides a definitive national rate of crime by immigration status; common pitfalls include underreporting by immigrant victims, law-enforcement-driven arrest patterns, differing legal definitions, and the need to impute unauthorized status in survey data—issues flagged by government studies and critics alike [1] [6] [7] [5]. Good practice is to triangulate: combine administrative records where available, population-based victimization surveys, and incarceration-based historical series while explicitly accounting for reporting bias and geographic limits [9] [3].