How do Deportation Data Project A-number linkages work and what are their limits?

Checked on January 24, 2026
Disclaimer: Factually can make mistakes. Please verify important information or breaking news. Learn more.

Executive summary

The Deportation Data Project centralizes individual-level U.S. immigration enforcement datasets obtained via FOIA and government releases and provides documentation and codebooks to help users navigate them [1] [2]. The team performs structured processing and limited joins across ICE tables (for first, last and longest detention stints) to create analyzable person-level records, but they explicitly warn that linking across the many government datasets is incomplete and that inconsistent releases and documentation constrain what can reliably be inferred [3] [4].

1. What the Deportation Data Project collects and why linkage matters

The project reposts individual-level records on arrests (apprehensions), detainers, detentions (book-ins), encounters, and removals obtained through FOIA or government publication, and it frames these datasets as central for journalism, litigation, and research into immigration enforcement [5] [6]. Putting those datasets into one repository makes it possible in principle to follow a single person’s interaction with enforcement—arrest, detention stints, transfers, and potential removal—which is why the project documents variables and provides a codebook to aid linking and interpretation [7] [2].

2. How linkages are implemented inside the repository

The project’s processing pipeline reconstructs person-level files by joining detention “stint” records and then adding back select fields for the first, last, and longest stint (book-in/book-out dates and facility), a methodological choice described in the project’s processing documentation [3]. The repository also provides dataset-specific documentation and codebooks to guide users on which columns to use when trying to match records across tables, and the team publishes notes on how particular fields (for example, departed country versus citizenship country) should be compared to identify third‑country removals [2] [5].

3. What these linkages enable in practice

When joins succeed, researchers can trace patterns such as lengths of detention, transfers between facilities, field-office level arrest trends, and who is more likely to be released versus removed—patterns already reported using the project’s data, including office- and facility-level analyses and work showing who is targeted in enforcement [8] [6]. The centralized data have also enabled external tools and dashboards to make the information more accessible to nontechnical audiences, expanding the pool of analysts who can work with the records [9].

4. Core limits: missing fields, inconsistent releases, and incomplete linking

Despite the value, the project and external commentators flag three structural limits: infrequent and inconsistent government data releases, insufficient cross-dataset linking in source records, and variable documentation from the agencies—each of which undermines confidence in automated person-level matchings and time-series completeness [4] [10]. The project itself cautions users about reliability for some tables (for example, removals has had reliability warnings) and documents trade-offs in how it reconstructs stints when full linkage is impossible [5] [3]. Independent repositories and analysts using the DDP data emphasize that results are preliminary and should be interpreted with caution, underscoring that imperfect linking can produce undercounts, double‑counts, or misattributed outcomes [11] [10].

5. Institutional and political context that shapes linkage limits

The need for FOIA to obtain many datasets and the project’s role in rehosting government records reflect deeper opacity in enforcement data flows; critics of ICE’s data practices argue the agency’s uneven transparency and surveillance practices complicate accountability and analysis, a context that the DDP cannot by itself remedy [2] [12]. At the same time, advocates and scholars celebrate the DDP as a corrective tool that raises the bar for transparency even while acknowledging the project’s dependence on what agencies choose—or are forced—to release [13] [4].

6. Bottom line: practical use and prudent caveats

Linkages in the Deportation Data Project work by reassembling government records into person-centered files using documented joins and by guiding analysts with codebooks and processing notes, enabling valuable reporting and research into arrests, detention, transfers, and removals [3] [2]. However, persistent gaps—sporadic releases, limited native linking in source files, and known reliability issues in some tables—mean any person-level conclusions derived from the repository require careful validation, transparent methodological notes, and humility about what the available data can and cannot prove [4] [5].

Want to dive deeper?
What specific fields in ICE datasets are most reliable for linking records across arrests, detention stints, and removals?
How have journalists and researchers validated person-level matches using the Deportation Data Project data?
What FOIA strategies have researchers used to obtain missing ICE enforcement datasets and documentation?