How were penis length percentiles measured and standardized in the 2015 Veale study?

Checked on January 10, 2026
Disclaimer: Factually can make mistakes. Please verify important information or breaking news. Learn more.

Executive summary

The 2015 Veale et al. paper pooled penile measurements from published studies that used clinician-measured, standardized procedures, combined those data into weighted means and pooled standard deviations, and then simulated a large normal-distributed dataset (20,000 observations) to produce nomograms and percentile lines for flaccid, stretched and erect length and girth [1] [2]. The authors explicitly limited included samples to those measured by health professionals, excluded clinical abnormalities and small samples, and warned that heterogeneity—especially for flaccid stretched measures and few clinical erect measurements—limits precision [1] [3] [4].

1. Study selection and inclusion rules that set the foundation

Veale and colleagues began with a systematic search of the literature and included only studies in which penis size was measured by a health professional using a described, standard procedure and where each sample had at least 50 participants; they excluded studies of men with congenital/acquired penile abnormalities, prior surgery, erectile dysfunction or those self-selecting for small‑penis complaints, thereby attempting to limit bias in the pooled dataset [1] [2] [5]. The final data synthesis drew on up to 15,521 men from roughly 17 studies identified in the review, which is the source of the frequently cited “up to 15,521” sample size [5] [3].

2. How raw measurements were handled and harmonized across studies

Because original studies used different measurement states (flaccid, flaccid-stretched, erect) and slightly varying protocols, Veale et al. extracted the reported means and standard deviations for each measurement type and then calculated weighted means and a pooled standard deviation across studies for each measurement category—an approach designed to give larger studies proportionally more influence while combining disparate samples into a single summary estimate [2] [6]. The team also reported ratios between dimensions and correlations (for example, stronger correlations of stretched/erect length with height) to describe inter-relationships in the pooled data [6] [4].

3. Simulation of a normal distribution to generate percentiles and nomograms

Rather than publishing raw pooled percentiles directly from heterogenous samples, Veale et al. used the pooled mean and pooled standard deviation to simulate 20,000 observations from a normal distribution for each measurement category; those simulated distributions were then used to draw nomogram curves and percentile cutoffs (e.g., 2.5th, 50th, 97.5th) that clinicians can use to map an individual measurement to a percentile [2]. The simulation method assumes approximate normality of true population variation for each measurement after pooling, which simplifies construction of smooth percentile lines but depends on the pooled mean/sd being representative [2].

4. Limitations flagged by the authors and subsequent methodological critiques

Veale and co‑authors acknowledged important limitations: relatively few erect measurements were taken in clinical settings, the greatest between-study variability was seen with flaccid‑stretched length, and heterogeneity remained across studies despite eligibility criteria [3] [4]. Later methodological work highlighted interobserver variability and underestimation of erect size by flaccid measures, arguing that erect measurements are preferable for clinical decisions and that observer differences can materially affect estimates—points that temper confidence in pooled nomograms built from mixed methods [7] [8] [9].

5. The practical and implicit aims shaping the analytic choices

The paper’s explicit aim was to create clinician-friendly nomograms to reassure or counsel men worried about size and to provide a reference for research (a framing repeatedly emphasized in the press release and the article); that clinical counseling goal helps explain conservative inclusion criteria and use of smooth simulated percentiles rather than presenting raw heterogeneous distributions, but it also introduces an implicit agenda to produce clear “normal ranges” despite measurement variability [5] [1]. Where the underlying studies differ in protocol, the pooling and simulation produce usable percentiles, but at the cost of smoothing over methodological heterogeneity that can matter clinically [6] [9].

Want to dive deeper?
How do erect penis measurements taken in clinical settings differ from stretched or flaccid measurements in accuracy and variability?
What statistical assumptions underlie simulating percentiles from pooled means and pooled standard deviations, and when do they fail?
How has subsequent research (post‑2015) changed recommendations for measuring penile size in clinical practice?