[*AoI = âArticles of Interestâ is a feature of TRN where we report abstracts of recent research related to replication and research integrity.]
ABSTRACT (taken from*** the article***)
âA scientific discovery in empirical research, e.g., establishing a causal relationship between two variables, is typically based on rejecting a statistical null hypothesis of no relationship.
[* EIR = Econometrics in Replications, a feature of TRN that highlights useful econometrics procedures for re-analysing existing research. The material for this blog is motivated by a recent blog at TRN, â The problem isnât just the p-value, itâs also the point-null hypothesis!â by Jae Kim and Andrew Robinson] In a recent blog, Jae Kim and Andrew Robinson highlight key points from their recent paper, â Interval-Based Hypothesis Testing and Its Applications to Economics and Financeâ (Econometrics, 2019).
In Frequentist statistical inference, the *p-*value is used as a measure of how incompatible the data are with the null hypothesis. When the null hypothesis is fixed at a point, the test statistic reports a distance from the sample statistic to this point. A low (high) p-value means that this distance is large (small), relative to the sampling variability.
This blog is based on the homonymous paper by Norbert Hirschauer, Sven GrĂŒner, Oliver MuĂhoff, and Claudia Becker in the Journal of Economics and Statistics. It is motivated by prevalent inferential errors and the intensifying debate on p-values â as expressed, for example in the activities of the American Statistical Association including its p-value symposium in 2017 and the March 19 Special Issue on Statistical inference in the 21st century: A world beyond PÂ <Â 0.
Replication researchers cite inflated effect sizes as a major cause of replication failure. It turns out this is an inevitable consequence of significance testing. The reason is simple. The p-value you get from a study depends on the observed effect size, with more extreme observed effect sizes giving better p-values; the true effect size plays no role.
âReplicability of findings is at the heart of any empirical scienceâ (Asendorpf, Conner, De Fruyt, et al., 2013, p. 108) The idea that scientific results should be reliably demonstrable under controlled circumstances has a special status in science. In contrast to our high expectations for replicability, unfortunately, recent reports suggest that only about 36% (Open Science Collaboration, 2015) to 62% (Camerer, Dreber, Holzmeister, et al.
[This blog is based on the paper, âA Primer on the âReproducibility Crisisâ and Ways to Fix Itâ by the author] A standard research scenario is the following: A researcher is interested in knowing whether there is a relationship between two variables, x and y. She estimates the model y = ÎŒ**0 + ÎŒ**1 x+ Δ, Δ ~ N(0,**Ï2).
[NOTE: This is a repost of a blog that Prasanna Parasurama published at the blogsite Towards Data Science]. Prasanna1 âThe confidence intervals of the two groups overlap, hence the difference is not statistically significantâ The statement above is wrong. Overlapping confidence intervals/error bars say nothing about statistical significance. Yet, a lot of people make the mistake of inferring lack of statistical significance.
[Note: This blog is based on our articles âBlinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidenceâ (Management Science, 2016) and âStatistical Significance and the Dichotomization of Evidenceâ (Journal of the American Statistical Association, 2017).] Introduction The null hypothesis significance testing (NHST) paradigm is the dominant statistical paradigm in the biomedical and social sciences.
[The following is an adaption of (and in large parts identical to) a ** recent blog post**by Anne Scheel that appeared on The 100% CIÂ .] Many, probably most empirical scientists use frequentist statistics to decide if a hypothesis should be rejected or accepted, in particular null hypothesis significance testing (NHST).