What Do Confidence Intervals Really Tell You?
P-values and Confidence Intervals
In a previous post, I talked about p-values – what they tell you and what they don’t tell you. Quick recap since talking about confidence intervals is related to p-values: p-values ONLY tell you whether a research result is statistically significant or not based on the a priori alpha level. P-values do NOT tell you how precise or reliable the finding is, how big the effect size is, or how practical or important the finding will be for changing nursing or medical practice.
Comprehension Check: P-values help the researcher decide whether the research result is statistically significant. P-values ONLY tell you Yes or No – is the p-value result, computed by the statistics software from the research data, less than or greater than the p-value set by the researcher (AKA the threshold level)? Do you reject the null hypothesis or not?
The p-value is only an estimate of the chances of the results being wrong.
And just because a p-value is greater than the threshold value (so not statistically significant), doesn’t mean there is no evidence for the experimental intervention – it just means that the sample size wasn’t large enough to find evidence of a difference, if one really exists!
For all the fuss that’s made about p-values, they don’t sound all that helpful in making decisions about clinical care, huh? Without more information, you don’t know whether the calculated result, called the point estimate, is close to the true population value. And if another study is conducted on the same research question, the p-value for that point estimate may be entirely different (Kryzwinski & Altman, 2013).
There is another statistical measure that can give you more information from which to make clinical decisions. It is possible to use the research sample to calculate a range within which the population value is likely to fall: this is called the confidence interval (CI).
The CI gets around the all-or-none principle behind p-values and can be calculated for almost any test. You probably have already started to see confidence intervals reported in the nursing literature.
What is a Confidence Interval?
Confidence intervals (CI) are calculated from the research data and define the range of scores around the point estimate within which the population value is likely to fall. The CI range is chosen a priori and can be 90%, 95%, 99% — whatever level of confidence the researcher wants – for the purpose of decreasing uncertainty of the research findings. We want to know how reliable a specific result is for the whole population of patients affected by the condition. The most common choice is the 95% CI.
The confidence interval is defined as the range of plausible numerical values in which we can be confident (to a computed probability, such as 90% [90% CI] or 95% [95% CI]) that the population value being estimated will be found.
Remember the normal curve? The normal curve, also called a bell-curve or probability curve, depicts the natural distribution of probabilities. The normal curve has certain characteristics, including the fact that the total area under the curve is 100%; the curve is symmetric so that the mean value lies directly in the middle of that curve. and that the mean, median, and mode are all equal. One standard deviation (SD) covers 34% on either side of the mean – so plus or minus 1 SD covers 68% of the curve. Plus or minus 2 SDs covers 95% of the curve and 99.7% of the data fall within 3 SDs of the mean.
Where to Find Confidence Intervals in a Research Report
After the researcher enters their study data they run their statistical tests and get the results. The observed result or point estimate is the quantitative “answer” to the researcher’s question. Let me explain with an example.
If the researcher was looking to see which wound dressing provides faster healing – they would likely be looking at how many days it took a wound to heal. The observed value would be the average number of days for the wound to heal using the experimental dressing and the standard of care (SOC) or control dressing. The statistical test chosen will provide the researcher with information about whether the observed difference, in this case between healing time of the experimental intervention and of the control group, was statistically significant or not.
The calculated point estimate is the size or magnitude of the result from the study sample and will be found in the Results section of a journal article . The research result may be reported as an effect size with a mean and standard deviation, a Chi-square statistic, or a ratio (odds ratio, mortality ratio, rate ratio, risk ratio).
The point estimate (X) will be listed first followed by the CI which will identify the confidence range chosen by the researcher (90%, 95%, 99%). The format is X (95% CI, Lower limit-Upper limit). The values at each end of the interval are called the confidence limits. All the values between the confidence limits make up the confidence interval.
You will also see confidence intervals visually depicted in Forest plots for a meta-analysis. The Forest plot offers a big picture view of the results of specific studies in a meta-analysis, as well as provide a summary statistic to help you get an “answer” to the question studied. More on meta-analyses and Forest plots in future posts.
The idea is that you read the research result and make a determination about how reliable that result is based on the CI. You want to know how likely the TRUE population value falls within the CI reported.
How to Interpret Confidence Intervals
How Precise is the Result?
To interpret how precise the data are, look at the width of the CI.
- Narrow CIs indicate that the results are very precise and more credible than wide CIs.
- Confidence intervals indicate the strength of evidence; the smaller the confidence interval the better.
A wide CI means that the results are not very reliable (i.e., they may be all over the board!). Wide confidence intervals indicate less precise estimates of effect and are usually the result of an underpowered study — that is, that the sample size was probably too small. The recommendation would be to repeat the study with a larger sample.
The larger the sample size, the closer you get to the population mean.
Deciding whether a CI is narrow or wide is a bit subjective, I admit, especially for nurses new to reading and interpreting research studies. For some results, a span of 5 may be considered wide and for others, a 5-point span may be considered narrow. All I can say is that your decision will be based on what is being studied: blood pressure or blood sugar? reading level or IQ? etc.
Is the Result Clinically Significant?
The lower and upper confidence limits need to be interpreted separately, also. The lower (or numerically smaller) limit shows how small the effect might be in the population; The upper limit shows how large the effect might be. I’ll show you how these are interpreted in the examples here and at the end of the post.
You can answer the question about clinical significance by evaluating the lower confidence limit. Ask yourself whether the lower limit is either partially or completely within the area of clinical indifference (Clarke, 2012). In other words, if the true population parameter is near or at the lower limit, will that make a difference in patient outcomes?
Example: Let’s say we are reading about a weight loss study. The average weight loss in the experimental group was 10 pounds in a month with a 95% CI of 2-18 pounds. You would interpret this finding as: the experimental treatment caused an average of 10 pounds of weight loss per month in the sample, but the true weight loss in the population could be as little as 2 pounds a month to as much as 18 pounds a month.
Now you ask yourself if a weight loss of 2 pounds a month (which may be the true effect in the population) is clinically significant? Is that enough to change a patient’s diet or to prescribe a diet pill for? No.
By the way, what do you think about the width of that CI? 10 (95% CI, 2-18). Is this narrow or wide? I’ll give you a minute. I’ll tell you what I think at the end of this post.
Is the Result Statistically Significant?
We figured out the clinical significance of the reported value, but wait! If there are no p-values, how do we know if the result is statistically significant? No worries! It turns out you can determine the statistical significance of the results by examining the reported CI — without needing a p-value statistic.
The key is to see if the CI contains a value that, if it was the true population parameter, would indicate no difference or no association between the groups tested. This decision point is called the line of no difference.
To make this determination, look at the span of the CI. Ask yourself: Does the CI contain the line of no difference?
For results reported as a MEAN DIFFERENCE, if the CI contains zero (0), the result is not significant. Why? Because the CI gives you a range of values that could be the population parameter. If 0 is within the interval, then it is possible that 0 difference could be the REAL population parameter. A mean difference of zero is no difference between the groups, right? If group A lost 10 pounds and group B lost 10 pounds, the difference is 0 – so the experimental intervention did not make a difference.
For results reported as an ASSOCIATION RATIO, if the CI contains 1, the result is not significant. Why? Because, again, the CI gives you a range of values that could be the population parameter. If 1 is within the interval, then it is possible that 1 could be the REAL population parameter. A ratio of one means there is no difference between the groups, right? If group A has a risk of heart failure of 25% and group B has a risk of heart failure of 25%, the ratio of 25:25 = 1 – so the experimental intervention did not make a difference.
For a 95% CI, when the confidence interval contains the line of no difference (LND) there is more than a 5% chance that there is no real change in the outcome variable due to the independent variable.
- A 90% CI that contains the LND = more than a 10% chance of no real change
- A 99% CI that contains the LND = more than a 1% chance of no real change
Nurses are Honest: Gallup Poll Example
Think of the results of a survey or a poll being reported. When the reporters state that the poll or survey has a margin of error of plus or minus 3, 4, or 5 percentage points – that’s a CI of the percentage of people who answered a certain way.
For example, let’s take the famous Gallup poll on how Americans view honesty and ethical standards of people from different professional fields.
Since professional nurses have to critique research for validity, I want to share what Gallup reports for their survey methodology and you can decide how strong you think the methods were. I’ve highlighted some key points about the methods – which look pretty good to me.
“Results for this Gallup poll are based on telephone interviews conducted Dec. 7-11, 2016, with a random sample of 1,028 adults, aged 18 and older, living in all 50 U.S. states and the District of Columbia. For results based on the total sample of national adults, the margin of sampling error is ±4 percentage points at the 95% confidence level. All reported margins of sampling error include computed design effects for weighting.
Each sample of national adults includes a minimum quota of 60% cellphone respondents and 40% landline respondents, with additional minimum quotas by time zone within region. Landline and cellular telephone numbers are selected using random-digit-dial methods.” (Gallup, 2016, Survey Methods section).
At the time I’m writing this post, there are 326 million people in the US. Roughly 76% of that total are over the age of 18, about 248 million. Of course, the Gallup organization could not survey every adult in the US, so they used a representative random sample of adults in the US. The total sample was 1,028 adults from all 50 states and the District of Columbia. (We’ll talk about why this sample size is can represent the adult population of the US in a future post!)
Results of this Gallup Poll: Nurses ranked the highest among a list of 22 professions with 84% of those polled who said nurses had high to very high honesty and ethical standards. So how would you use the 95% CI to interpret the meaning of this result?
- If the percent of people who ranked nurses highest in this poll was 84%, that means 864 of the survey respondents rated nurses high or very high out of 1,028 surveyed.
- The margin of error was +-4% and reflects survey accuracy. “Survey accuracy numbers tell us how much we should believe the survey results as an indicator of how [the] group of interest feels” (Van Bennekom, n.d.). So the 95% CI around this result is 80-88%. We interpret this CI to mean that we can be 95% confident that, if the whole population of the US was surveyed, that as little as 80% of the American public would believe nurses are highly honest and ethical to as many as 88% of the American public. Even the lower confidence limit of 80% is the vast majority of Americans!
Clinical Research Example
From a study “to test models of risk factors for asthma prevalence and severity (frequency of attacks)” in South African children; 6,002 children were interviewed. Some of the results were as follows: child anxiety (odds ratio [OR] 1.08; 95% confidence interval [CI] 1.04 – 1.12) and community violence (OR 1.14; 95% CI 1.00 – 1.30) were associated with increased odds of having asthma” (Yakubovich, Cluver, & Gie, 2016, p. 404).
The researchers wanted to know which risk factors contributed to the prevalence of asthma and the severity of asthma in these children. A confidence interval is a spread around the point estimate (the best estimate of the data) within which the TRUE effect size or value for the population is likely to lie; they chose a 95% confidence level. (Note that I just gave you the results for asthma prevalence. You can read the article for more details.)
The first variable was child anxiety. The statistic was reported as an odds ratio [OR] 1.08, 95% confidence interval [CI] 1.04 – 1.12.
- The OR was 1.08 = In this study, the presence of anxiety increased the odds of the onset of asthma in South African children by 8%. Now let’s see if that increase is statistically and clinically significant.
- The CI of 1.04-1.12 is very narrow – so it is very precise. It indicates that the effect of anxiety was large enough that the sample size for this study was appropriate to find this difference.
- The confidence limits: The interval of 1.04-1.12 is interpreted as: child anxiety can increase the odds of asthma onset by as little as 4% to as much as 12%.
- Statistical significance: Because the point estimate is an odds ratio – the line of no difference would be 1. Examine the CI to determine whether it contains the number 1. Look at the figure above. The CI is 1.04-1.12. The interval does NOT contain 1, so this finding would be considered statistically significant.
- Clinical significance – look at the LL to make this determination. According to the research study, there are 50 million asthmatic children under 15 years old in Sub-Saharan Africa and South Africa has the fourth highest asthma mortality rate in the world. A rise in asthma onset because of child anxiety by 4% (an additional 2 million children) would be considered a clinically significant effect, don’t you agree?
Interpretation: Statistically Significant and Clinically Significant result
The second variable was community violence. The statistic was reported as an odds ratio [OR] of 1.14, 95% confidence interval [CI] 1.00 – 1.30.
- The OR was 1.14 = The social risk factor of community violence increased the odds of the onset of asthma in South African children by 14% in this sample. Now let’s see if that increase is statistically and clinically significant.
- The CI of 1.00-1.30 is wide – so it is NOT precise. It indicates that the effect of community violence is either not a factor in asthma prevalence or too small to be detected by the sample size in this study. For this finding, the effect of community violence may be so small that the sample needed to be bigger to find that small effect if community violence really does impact the asthma rate.
- The confidence limits: The interval of 1.00-1.30 is interpreted as community violence can increase the odds of asthma onset in South African children by as little as 0% to as much as 30%. Because we are 95% confident that the true population value lies within the CI – it is possible that the true population parameter is 1, or that there is no difference in the prevalence of asthma due to community violence.
- Statistical significance: Because the point estimate is an odds ratio – the line of no difference would be 1. Examine the CI to determine whether it contains the number 1. Look at the figure above. The CI is 1.00-1.30. The interval DOES contain 1, so this result would NOT be considered statistically significant.
- Clinical significance: according to the research study, there are 50 million asthmatic children under 15 years old in Sub-Saharan Africa and South Africa has the fourth highest asthma mortality rate in the world. Additionally, South Africa has high rates of violence, poverty, and psychosocial problems. Social risk factors can increase life stress thereby increasing anxiety and childhood asthma rates. Because the CI contains 1, it is possible that the true population parameter is 1 and that there is no effect of community violence on asthma onset – so not a clinically significant finding. Because the CI contains 1, there is more than a 5% chance that there is no real change in asthma onset due to community violence. The researchers can only recommend that more research be conducted, including research on the impact of social factors on asthma prevention and treatment.
Interpretation: Not a statistically significant or clinically significant result.
P.S. Answer to CI question about weight loss: The CI of 2-18 for the weight loss example is a wide interval and the study should be repeated with a larger sample size to get the CI closer to the population mean.
What questions do you have about confidence intervals? Let me know in the comments or email me: firstname.lastname@example.org
Clarke, J. (2012). What is a CI? Evidence-Based Nursing, 15(3), 66. doi: 10.1136/ebnurs-2012-100802
Gallup. (2016, December 19). Americans rate healthcare providers high on honesty, ethics. Retrieved from http://www.gallup.com/poll/200057/americans-rate-healthcare-providers-high-honesty-ethics.aspx?g_source=Social%20Issues&g_medium=lead&g_campaign=tiles
Kryzwinski, M., & Altman, N. (2013). Importance of being uncertain. Nature methods, 10(9), 809-810. doi:10.1038/nmeth.2613
Van Bennekom, F. C. (n.d.). Survey statistical accuracy defined. Retrieved from https://greatbrook.com/survey-statistical-accuracy/
Yakubovich, A. R., Cluver, L. D., & Gie, R. (2016). Socioeconomic factors associated with asthma prevalence and severity among children living in low-income South African communities [Abstract]. The South African Medical Journal, 106(4), 404-412. DOI:10.7196/SAMJ.2016.v106i4.10168 http://www.samj.org.za/index.php/samj/article/view/10168
My Recommendations for Statistics Books for Nurses
Grove, S. K., & Cipher, D. J. (2017). Statistics for nursing research: A workbook for evidence-based practice (2nd ed.). St. Louis, MO: Elsevier.
Harris, M., & Taylor, G. (2014). Medical statistics made easy (3rd ed.). Banbury, Oxford, UK: Scion Publishing LTD.
Heavey, E. (2015). Statistics for nursing: A practical approach (2nd ed.). Burlington, MA: Jones & Bartlett Learning.
Katz, D. L., Elmore, J. G., Wild, D. M. G., & Lucan, S. C. (2014). Jekel’s epidemiology, biostatistics, preventive medicine, and public health (4th ed.). Philadelphia, PA: Elsevier Saunders.
Kim, M., & Mallory, C. (2014). Statistics For evidence-based practice in nursing.Burlington, MA: Jones & Bartlett Learning.