How do methodological choices (data sources, definitions of 'undocumented') in 2020–2025 studies affect findings on immigrant crime rates?

Checked on January 27, 2026
Disclaimer: Factually can make mistakes. Please verify important information or breaking news. Learn more.

Executive summary

Methodological choices — especially which data sources researchers use, how they define and estimate "undocumented" populations, the geographic scale of analysis, and the crime categories modeled — systematically shape whether studies from 2020–2025 find immigrants raise, reduce, or have no effect on crime rates. Robust findings often emerge when multiple estimation strategies, longitudinal designs, and careful measures of immigrant heterogeneity are used; weaker or divergent results correlate with single-source estimates, cross-sectional snapshots, or opaque definitions of legal status [1] [2] [3].

1. Data sources: the engine behind divergent results

Studies that rely on administrative arrest or incarceration data can show different patterns than those using victimization surveys or population estimates because each dataset captures different slices of criminal justice processing and reporting; for example, nationally representative victimization data reveal different risk profiles than arrest-centered studies, and scholars explicitly warn that collecting citizenship and status data is challenging and contested [4] [2]. Research using comprehensive arrest records—such as the Texas DPS dataset—can control precisely for arrests but must still pair those counts with an accurate denominator for undocumented population size or percentage; when the denominator is uncertain, rate comparisons shift [2].

2. Definitions and estimation of “undocumented”: residuals, triangulation, and limits

Most influential U.S. studies in this period use residual methods—subtracting authorized immigrants from total foreign-born—to estimate unauthorized populations, a practice validated by triangulation with birth/death records and used by CMS and Pew as standard sources [1] [2]. That widely accepted approach, however, is an estimate not a headcount: small changes in assumptions about legal admissions, emigration, or census undercounts can noticeably alter state- or neighborhood-level rates and thus statistical inferences about crime [1] [5].

3. Scale and design: cross-sectional vs. longitudinal and neighborhood heterogeneity

Analyses that exploit longitudinal variation or fixed-effects designs are less susceptible to confounding than cross-sectional snapshots; authors warn that cross-sectional studies are ill-suited for unauthorized-immigration questions because the process unfolds over time [1]. Newer work emphasizes immigrant heterogeneity at the neighborhood level—language diversity, country of origin mixes, and nonlinear effects—and demonstrates that disaggregating immigrant groups often changes conclusions about crime associations versus aggregating all foreign-born into a single parameter [3].

4. Measurement choices and statistical knobs: transforms, controls, and robustness checks

Statistical choices such as log-transforming skewed arrest percentages, selection of control variables (age structure, economic conditions), and choice of crime categories (violent vs. property) materially affect outcomes; PNAS authors explicitly tested transforms and robustness checks and found results sensitive to specification choices but robust under reasonable alternatives [2]. International and cross-country econometric papers likewise show that choice of control variables and econometric method (OLS, IV, AMG, spatial models) alters whether immigration appears linked to crime [6] [7].

5. Reporting bias, perception gaps, and the politics of source selection

Media and policy debates often cite selective studies—think tanks or single-jurisdiction reports—that align with advocacy aims; peer-reviewed syntheses and migration-policy summaries find a growing body of evidence that immigrants generally do not raise crime and often have lower crime rates, especially when inclusive policies and established communities are present, but those conclusions rely on particular datasets and methods [8] [9]. Opposing think tanks have published contrary analyses using different datasets and fewer robustness checks, underscoring how institutional agendas shape which methods and numbers are foregrounded [2] [10].

6. What reliable inference requires (and what remains unresolved)

Stronger inferences come from triangulating multiple population-estimation methods, using longitudinal or instrumental designs to address endogeneity, disaggregating immigrant subgroups, and being transparent about sensitivity to model choices; government and NIJ-funded methodological work stresses local-level estimation challenges and calls for mixed methods to improve precision [5] [1]. Where reporting or datasets do not cover a claim, this review cannot adjudicate it; remaining ambiguities include precise undocumented-denominator errors at very local scales and how media framing alters perceived links between immigration and crime [5] [10].

Want to dive deeper?
How do residual method assumptions (emigration, undercount, legal admissions) change state-level estimates of undocumented populations?
What neighborhood-level immigrant heterogeneity measures (language diversity, origin mix) most strongly predict changes in violent vs. property crime?
Which longitudinal or instrumental-variable studies 2020–2025 provide the most robust causal evidence on undocumented immigration and crime, and what methods do they use?