How does L2 model race/ethnicity on the Texas voter file and what are its known biases?

Checked on January 19, 2026
Disclaimer: Factually can make mistakes. Please verify important information or breaking news. Learn more.

Executive summary

L2 assigns race and ethnicity on the Texas voter file by combining whatever self-reported or administrative race codes the state provides with proprietary statistical modeling that draws on name analysis, consumer and demographic append data, and geographic linkage to Census blocks; L2 and downstream researchers treat those outputs as probabilistic rather than definitive [1] [2] [3]. Known biases include misclassification of nonwhite voters as white in some contexts, imperfect treatment of Hispanic identity across sources, and the propagation of uncertainty when researchers sum probabilities rather than rely on self‑reported race [4] [5] [6].

1. How L2 constructs race/ethnicity: a hybrid of state records and proprietary modeling

L2 starts with whatever race and ethnicity fields appear in the official voter registration records it ingests, then “enhances” those records by appending commercial demographic variables and running proprietary statistical models that use name-ethnicity algorithms, consumer data, and geographic mapping to Census blocks to estimate racial probabilities for individuals [3] [2] [7]. L2’s public descriptions emphasize that some fields—ethnicity among them—are explicitly modeled rather than purely taken from state self-identification, and that their files include both coded values and model-based variables [1] [3] [2].

2. The mechanics reporters and researchers actually use: probabilities, not hard labels

When academic groups and advocacy organizations analyze racial disparities using L2 data, they typically treat race as a set of predicted probabilities rather than a single categorical assignment—for example summing the “probabilities that any individual voter is a member of that race” to compute denominators and rejection rates (Brennan Center’s approach) [6]. L2 likewise documents that some demographic statistics on the platform derive from modeling and private sources, signaling that downstream analyses must account for measurement uncertainty [2] [8].

3. Common modeling inputs and their implications: names, neighborhoods, consumer traces

The models leverage name analysis (to infer likely Hispanic or Asian heritage), linkage to 2010/2020 Census blocks (to borrow neighborhood racial composition), and commercial consumer data that correlates with race and ethnicity; these inputs increase coverage but import their own correlations between socioeconomic status and racial prediction [1] [3] [7]. L2’s assignment of voters to Census blocks and use of multiple snapshots to build histories is useful for turnout work but also means geographic segregation patterns influence individual racial probabilities [3] [8].

4. Known biases documented in the literature and reporting

Scholars and reports note systematic misclassification risks: BISG-style and administrative-record–based methods can undercount nonwhite voters as white in wealthier or differently composed neighborhoods, and the treatment of “Hispanic” varies across data sources—sometimes being treated as a race rather than an ethnicity—complicating comparability [4] [5]. The Brennan Center’s analysis of Texas used L2-derived probability sums for race in calculating differential ballot-rejection rates, explicitly acknowledging the modeled nature of the underlying race variable [6].

5. Practical consequences for research and for legal claims

Because L2’s race fields are probabilistic, studies using them must either aggregate probabilities (as Brennan Center did) or accept classification error; that matters for litigation or policy findings where small percentage differences are consequential, since misclassification can attenuate or exaggerate racial disparities depending on spatial and socioeconomic patterns [6] [4] [5]. Researchers mitigate this by comparing multiple vendor files, using states with self-reported race where possible, and explicitly modelling uncertainty—strategies advocated in the academic literature [4] [5].

6. Where reporting and transparency fall short

L2 provides documentation and a data dictionary but its core modeling algorithms and bias audits are proprietary, so external users must infer error modes from academic validations and cross-vendor comparisons rather than from full methodological disclosure by L2 [8] [7]. This opacity means that when legal or policy actors cite L2-based racial counts, readers should expect probabilistic inference rather than direct measurement and watch for sensitivity analyses using alternative assumptions [2] [9].

Want to dive deeper?
How do BISG and name-based algorithms differ in accuracy for Hispanic and Asian classification?
What validation studies compare L2 racial predictions to self-reported race in states that collect it?
How have courts treated probabilistic race assignments from voter-file vendors in Voting Rights Act litigation?