What’s the Difference Between Statistical Significance and Clinical Significance?
There are two types of significance used to interpret research studies – statistical significance and clinical significance. They are not the same thing. One answers the question, Are the statistical results due to random chance? and the other answers the question, So what? Will the results matter to our patients?
Significance means “the quality of being important.” Importance is a value judgment, right? Though value judgments are considered subjective, there are elements of the “thing” being appreciated that help one assign the characteristic of significance. In research, we attribute importance or worth to research findings according to accepted, albeit sometimes arbitrary, conventions.
Significant, in terms of statistics, is defined as “probably caused by something other than mere chance.” Researchers proclaim a study finding to be “statistically significant” or not, depending on whether their research result is less than the a priori alpha level set before the study commenced.
The alpha level is (supposed to be) formed or conceived beforehand (i.e., a priori). “The term [a priori] usually describes lines of reasoning or arguments that proceed from the general to the particular… a priori knowledge is knowledge that comes from the power of reasoning based on self-evident truths” (https://www.merriam-webster.com/dictionary/a%20priori).
Statistical significance has to do with the likelihood that a research result is true (i.e., a real effect of the intervention) and not merely a matter of chance.
Clinical significance is a subjective interpretation of a research result as practical or meaningful for the patient and thus likely to affect provider behavior.
(Oh, just an FYI — other than identifying the probability of a research result as a function of chance, p-values don’t tell you very much. But more on why in a future post!)
The Effect of Sample Size on Statistical Significance
Statistical significance is a function of sample size. If you have a large enough sample size, almost anything can be found to be statistically significant!
Research costs time, effort, and money. Researchers are trying to show relationships between or among their variables of interest – but to do that they need an adequate sample of representative subjects.
If the sample size is not large enough, random error is increased, and the results may not show significant differences (even if the intervention really works better than the SOC) because there are not enough subjects to show that difference. A small sample size is a major reason for making this Type II error.
You can also have studies that have too much power. The problem with very large sample sizes is that very small, statistically significant differences between the research groups can be found. These statistically significant results may not necessarily be clinically significant, though.
A Priori Sample Size Estimation: Researchers should do a power analysis before they conduct their study to determine how many subjects to enroll. Power analysis, precision estimation, or sample size calculations are based on a number of factors, which we won’t go into in this post, but include the type of study they are conducting (cross-sectional, survey, case-control, clinical trial, etc.), significance level, power, effect size, and standard deviation or event rate (Hayat; 2013; Heavey, 2015; Malone, Nicholl, & Coyne, 2016). Studies with samples that are large are expensive to conduct, so funding, the resources available, available population, time, and other factors will also be considered by the researcher when deciding on the final sample size.
“Statistical power is the probability of making a correct decision, namely, that of rejecting the null hypothesis when it is actually false” (Hyatt, 2013, p. 944).
The larger the sample size, the greater the power of the study to find a difference if one really exists.
Research Sticking Points
Short break before we go on to clarify some commonly misunderstood points. I’m labeling these tougher or complicated research concepts as research sticking points. Here are a few related to our topic.
Research Sticking Point 1: Many students don’t quite get how researchers determine whether their research results are statistically significant or not. Basically, the researchers conduct their experiment and enter their data into a statistical program, like IBM’s SPSS Statistics. When they are satisfied that the data entered is complete and accurate, they run the statistical tests required to answer their research hypotheses (e.g., Chi-square, correlation, ANOVA). They then compare the p-value they get from their test results to their a priori alpha level. If the p-value is less than their alpha level (i.e., p < 0.05) then they declare their result statistically significant. If the resulting p-value is greater than the alpha level, the result is not statistically significant.
Research Sticking Point 2: The null hypothesis is always assumed to be true. In most studies, the researcher is trying to reject or disprove the null hypothesis that their variables of interest have no relationship or are no different than the SOC (H0 = H1)). By rejecting or disproving the null, they are able to accept the alternative hypothesis (H1 ≠ H0) and declare their variable is statistically significant.
Research Sticking Point 3: Whenever you read a study remember that there will always be differences between the groups because we are dealing with living beings, right?
For example, the experimental group lost an average of 9.8 pounds (lb) in the diet trial and the control group lost 9.1 pounds. Or the average diastolic blood pressure (DBP) of the experimental group taking the new blood pressure drug was 85mmHg versus 91mmHg in the control group. Are there differences between the groups? Well, yes, clearly!
But the key question is whether that 0.7 lb difference in weight or 6mmHg difference in DBP between these groups was statistically significant or not – being able to say your intervention is [statistically] significantly better than the standard of care (SOC) carries more weight than saying “there was a trend towards better outcomes in the experimental group.” Obviously, in the second statement, those researchers did not find statistically significant differences in outcomes between their groups (or they would have said it!). The most common reason for nonsignificant findings is not having a large enough sample size to find a difference if one really exists!
See my recommendations for good stat books for nurses at the end of the post!
Clinical significance is sometimes called clinical importance or practical importance. I don’t think I’ve ever seen an “official” or gold standard definition of clinical significance because the criteria will change depending on the disease, condition, or patient population.
This term is generally accepted as the practical significance of the research or how meaningful the results would be to the patient? For example, will you change your prescribing behavior or treatment of certain patient populations as a result of the research findings? Is the reduction in pain/blood pressure/heart rate/etc., enough promote positive patient outcomes?
Determining clinical significance is important to healthcare providers who prescribe patient care treatments (e.g., physicians, advanced practice nurses, physician assistants). Research findings with very large or very small treatment effect sizes are easier to interpret than those in the middle. Most providers would change their practice based on large, statistically significant treatment effects or continue with the status quo for small or trivial treatment effects, even if the effect was statistically significant.
You can ascertain clinical significance of a treatment from reported risk measures. The magnitude of the risk or benefit would indicate how harmful or how effective the treatment would be in real life. This information would be important for your decision-making process.
Confidence intervals are starting to be reported over p-values in the nursing and medical literature because they are more helpful for clinical decision making than p-values. The confidence interval signifies a range wherein the true population parameter lies with 90%, 95%, or 99% confidence. We can use the range of the reported confidence interval to help us determine clinical significance.
Relative risks and the minimal clinically important difference (MCID) or minimal important difference (MID) are other methods to determine clinical importance for patients. Again, the MCID/MID would change for the condition in question and/or with the specific patient! Another parameter that helps providers make this determination is the Number Needed to Treat or NNT.
I’ll discuss these statistical measures (risks, CI, MCID/MID, NNT) in more detail in future posts.
Let’s Put It All Together
Let’s say you are studying a new blood pressure drug to lower the risk of stroke and you design a randomized controlled trial (RCT) against the standard of care (SOC) drug. You have access to many patients and randomly select a HUGE representative sample of adults with hypertension. (Some drug studies enroll thousands of people!)
You conduct the study, enter the data into a statistical computer program, run your statistics, and get a result.
You had set your alpha level (AKA p-value) at p < 0.001 because this is a drug study and you want to be very sure that if the research findings are statistically significant that the probability of a false result due to chance is less than 1 in 1000.
Comprehension Check: A p value set at p < 0.001 means that the researchers will only claim that the intervention is statistically significant if the p-value of the research result is less than p < 0.001. That means that there is less than a 1/1000 chance that a result as big as the one the researchers got would merely be a function of chance. So if the result would be false only once in a thousand times of running the study — the researchers can feel pretty secure that their result is real, right?
You are happy because you find that the new drug lowers blood pressure and the result is statistically significant: the statistic has a p-value = 0.0001. You can reject the null hypothesis (that the two drugs were alike) because 0.0001 is less than 0.001. You declare that the new drug significantly lowers diastolic blood pressure in hypertensive adults. YAY!
Now that we know the research finding is statistically significant, is it clinically significant? Well, that depends. Let’s say the difference between the groups was 6mmHg. Is that a large enough difference for you to prescribe the new drug? Is reducing the DBP by 6mmHg enough to decrease one’s risk of stroke? So here is where the providers would have to know the research literature around these variables to have a clear answer.
What if the difference was only 2mmHg? Again, with a large enough sample, that small difference in DBP may be statistically significant. But would you switch a patient to this drug just to reduce their DBP by an average of 2mmHg? It turns out reducing SBP by 10mmHg or DBP by 5mmHg reduces stroke risk by 33% (Ettehad et al., 2016; Takagi & Umemoto, 2013) – so the 6mmHg drop in DBP would be clinically significant, but the DBP reduction of 2mmHg would not.
Just for the record, according to the literature, it turns out reducing SBP by 10mmHg or DBP by 5mmHg reduces stroke risk by 20%-33% (Ettehad et al., 2016; Takagi & Umemoto, 2013) – so the 6mmHg drop in DBP would be clinically significant, but the DBP reduction of 2mmHg would not.
Statistical significance tells us how likely a research result is a chance finding based on the researcher’s predetermined significance level. Many factors impact statistical power.
Very small differences between the groups tested can be found to be statistically significant if you have a very large sample. The researchers can say “YAY!” our intervention made a difference – because they are able to reject the null hypothesis of no difference.
Research findings may not be important enough to fundamentally change a provider’s prescribing practice or treatment choice, even if found to be statistically significant. Clinical significance tells us how effective or meaningful the research finding might be to patients.
Don’t forget that the determination of clinical significance is a subjective decision. The decision will depend on which disease process or condition is being studied, how many people are affected by the condition, etc. For some conditions, very small changes could have a large impact on symptomatology, disease burden, or quality of life.
What questions do you have after reading this post? Let me know in the comments or email me at email@example.com!
My Recommendations for Statistics Books for Nurses
Harris, M., & Taylor, G. (2014). Medical statistics made easy (3rd ed.). Banbury, Oxford, UK: Scion Publishing LTD.
Heavey, E. (2015). Statistics for nursing: A practical approach (2nd ed.). Burlington, MA: Jones & Bartlett Learning.
Grove, S. K., & Cipher, D. J. (2017). Statistics for nursing research: A workbook for evidence-based practice (2nd ed.). St. Louis, MO: Elsevier.
Kim, M., & Mallory, C. (2014). Statistics For evidence-based practice in nursing.Burlington, MA: Jones & Bartlett Learning.
Ettehad, D., Emdin, C. A., Kiran, A., Anderson, S. G., Callender, T., Emberson, J., … Rahimi, K. (2016). Blood pressure lowering for prevention of cardiovascular disease and death: A systematic review and meta-analysis. Lancet, 387(10022), 957-67. doi: 10.1016/S0140-6736(15)01225-8.
Hayat, M. J. (2013). Understanding sample size determination in nursing research. Western Journal of Nursing Research, 35(7), 943-956. doi:10.1177/0193945913482052
Heavey, E. (2015). Statistics for nursing: A practical approach (2nd ed.). Burlington, MA: Jones & Bartlett Learning.
Malone, H. E., Nicholl, H., & Coyne, I. (2016). Fundamentals of estimating sample size. Nurse Researcher, 23(5), 21–25. https://doi.org/10.7748/nr.23.5.21.s5
Takagi, H., & Umemoto T. (2013). The lower, the better? Fractional polynomials meta-regression of blood pressure reduction on stroke risk. High Blood Pressure Cardiovascular Prevention, 20(3), 135-138. doi: 10.1007/s40292-013-0016-1.