Without raw patient data no verification of drug or vaccine efficacy is possible
Executive summary
Raw patient-level data are indispensable for the fullest, independent verification of a drug or vaccine’s efficacy because they allow reanalysis, subgroup exploration, and detection of anomalies that published summaries can hide [1] [2]. That said, verification is not categorically impossible without raw data: regulators, large electronic health records, and robust post‑marketing surveillance provide complementary lines of evidence that can corroborate or challenge efficacy claims, albeit with limitations around transparency, bias adjustment, and patient privacy [3] [4] [5].
1. Why advocates insist raw data are the gold standard
Advocates argue that only raw individual participant data let independent scientists reproduce trial analyses, test alternative definitions, probe subgroup effects, and investigate unexplained patterns such as differential exclusions — tasks repeatedly highlighted as necessary after COVID‑19 vaccine trials [1] [2]. The BMJ and other commentators state that transparency is essential to public trust and to answering substantive questions about endpoints, adjudication and blinding that summary papers and regulatory reports often omit [1] [3]. Calls for full protocols and raw case report forms date back at least a decade and underpin demands that regulators make underlying data publicly available [6].
2. What can be verified without raw patient data
Regulatory review documents, aggregate trial results, and independent observational studies can and do provide meaningful verification of efficacy: regulators review case counts, prespecified endpoints, and safety datasets before authorization; large observational analyses using electronic medical records can estimate real‑world effectiveness and rare outcomes across broader populations [3] [4] [7]. Methodological advances — for example, linking immunogenicity biomarkers to protection or using Bayesian methods to infer efficacy from subsets — can produce robust estimates even when full raw datasets are restricted [8]. These approaches, however, rely on assumptions and layers of processing that raw data would allow others to scrutinize [8] [9].
3. Limits of passive and aggregated surveillance
Passive reporting systems and aggregated adverse event datasets are useful for signal detection but cannot by themselves establish causality or precise risk estimates because of underreporting, reporting bias, and lack of denominators; modern vaccine pharmacovigilance therefore uses active surveillance and linked clinical datasets to estimate rates and adjust for confounding [5] [10]. Even where public databases like VAERS expand access, privacy protections and deidentification mean individual‑level linkages and some raw narratives remain unavailable or incomplete, constraining independent verification [11] [10].
4. Practical, ethical and commercial tensions around sharing raw data
Data sharing faces competing demands: patient privacy and regulatory mandates to protect identifiers, commercial incentives that preserve proprietary value, and public expectations for transparency [11] [12]. Commentators and institutions have pushed for deidentified individual participant data repositories as a compromise, but critics note companies and regulators historically limited access, and litigation and advocacy have been required to release more material in some high‑profile cases [3] [13] [14]. Where raw data remain restricted, independent groups use regulatory documents, trial reports, and real‑world datasets to fill gaps — but those are imperfect substitutes [3] [4].
5. Bottom line: verification is strongly constrained but not utterly impossible
Absolute verification that reproduces every analytic choice requires raw patient data — this is the only route to fully independent reanalysis, adjudication audits, and detection of subtle biases flagged by BMJ commentators [1] [2]. Nevertheless, a layered evidence approach — regulatory reviews, deidentified trial summaries, EHR‑based effectiveness studies, active surveillance systems, and immunogenicity modeling — can provide converging evidence sufficient for many scientific and policy decisions, while still leaving unanswered questions that only raw data could finally settle [5] [4] [8]. The tradeoff between transparency, privacy and commercial interest remains the central political and methodological battleground [9] [12] [6].