An Updated Survey on Statistical Thresholding and Sample Size of fMRI Studies

Seuils statistiques et tailles des échantillons des études en IRMf: une nouvelle recension de la littérature


Yeung, A. W. K. (2018). An Updated Survey on Statistical Thresholding and Sample Size of fMRI Studies. Frontiers in Human Neuroscience12, 16.

DOI: 10.3389/fnhum.2018.00016


Background: Since the early 2010s, the neuroimaging field has paid more attention to the issue of false positives. Several journals have issued guidelines regarding statistical thresholds. Three papers have reported the statistical analysis of the thresholds used in fMRI literature, but they were published at least 3 years ago and surveyed papers published during 2007–2012. This study revisited this topic to evaluate the changes in this field.

Methods: The PubMed database was searched to identify the task-based (not resting-state) fMRI papers published in 2017 and record their sample sizes, inferential methods (e.g., voxelwise or clusterwise), theoretical methods (e.g., parametric or non-parametric), significance level, cluster-defining primary threshold (CDT), volume of analysis (whole brain or region of interest) and software used.

Results: The majority (95.6%) of the 388 analyzed articles reported statistics corrected for multiple comparisons. A large proportion (69.6%) of the 388 articles reported main results by clusterwise inference. The analyzed articles mostly used software Statistical Parametric Mapping (SPM), Analysis of Functional NeuroImages (AFNI), or FMRIB Software Library (FSL) to conduct statistical analysis. There were 70.9%, 37.6%, and 23.1% of SPM, AFNI, and FSL studies, respectively, that used a CDT of p ≤ 0.001. The statistical sample size across the articles ranged between 7 and 1,299 with a median of 33. Sample size did not significantly correlate with the level of statistical threshold.

Conclusion: There were still around 53% (142/270) studies using clusterwise inference that chose a more liberal CDT than p = 0.001 (n = 121) or did not report their CDT (n = 21), down from around 61% reported by Woo et al. (2014). For FSL studies, it seemed that the CDT practice had no improvement since the survey by Woo et al. (2014). A few studies chose unconventional CDT such as p = 0.0125 or 0.004. Such practice might create an impression that the threshold alterations were attempted to show “desired” clusters. The median sample size used in the analyzed articles was similar to those reported in previous surveys. In conclusion, there seemed to be no change in the statistical practice compared to the early 2010s.