A Family Physician's Look at
Journal Article Validity

We live in an age of Evidence Based Medicine, and naturally all of us want to practice medicine using investigations and treatments that are based upon the best evidence available. But how do we evaluate the evidence presented to us? How do we know if the evidence presented is of high quality, and is in fact valid?

The content of this article is directed at family physicians, and not researchers or statisticians. For those of you who would like a more "in depth" review of the topic, there will be references provided below. This article relies heavily on information gathered from the Bandolier website and the University of Alberta website.

The articles that we are presented with generally fall into one of a few categories. They may be "superiority" studies, where one treatment is compared to a placebo or another accepted treatment, or they may be "equivalency" studies, where the goal is to see whether one treatment is equivalent to an already accepted treatment. We will also see Systematic reviews or Metanalysis, which will combine data or results from a number of studies to reach some conclusion about treatment efficacy.

There are a number of criteria that should be considered when evaluating the quality of these studies. These will be described in the next few paragraphs.

Common criteria are as follows:

  • Relevance: Is the question being researched of clinical interest?
  •  
  • Randomization: Patients must be randomized to eliminate bias. A randomization method is appropriate if it allows each patient the same chance of receiving each treatment and prevents the investigator from predicting which treatment is next. Examples of appropriate randomization include coin tossing and computer generated randomization. Allocating patients to a study based on date of birth, date of admission, or alternation, are not considered appropriate.
  •  
  • Blinding: When neither the investigator nor the patient can identify the treatment being assessed, the trials are considered to be double blinded.
  •  
  • Withdrawals and drop -outs: Patients who were included in the study, but who did not complete the observation period or who were not included in the analysis must be described, and the number and reason for their withdrawal must be stated.
  •  
  • Size of the group: The size of the group must be large enough to generate relevant conclusions. The larger the group size is, the better. If the group is too large, differences that are of little interest, but which may be distracting, can be found.
  •  
  • Outcomes: Outcomes must be clearly defined, and where composite outcomes are reported, the individual components must be stated. Outcomes stated must adhere to the original protocol.
  •  
  • Baselines: The baselines for all the treatment groups must be such that changes following an intervention can be measured.
  •  
  • Data analysis: The appropriate statistical tests must have been used.
When looking at "equivalency" studies there are a few other criteria to consider.
  • The active control must have been previously shown to be effective.
  •  
  • Patients and outcomes should be similar to the previous studies.
  •  
  • The regimens must be applied in identical fashion. For example, the best dose of A should not be compared to an ineffective dose of B.
  •  
  • The equivalence margin should be pre-specified. There should be a prior definition of how big a difference is a difference, and justification for this.
  •  

Often the evidence is reported to us in Systematic reviews, where a number of articles are analyzed with respect to a particular point. The same principles that apply to epidemiological surveys apply to overviews. Look specifically to see if:

  • The research methods used to find the evidence on the primary question are clearly stated.
  •  
  • The search for the evidence is reasonably comprehensive.
  •  
  • The criteria for deciding which articles were to be included in the overview, is reported.
  •  
  • Bias in the selection of articles was avoided.
  •  
  • The criteria used for assessing the validity of the included studies were reported.
  •  
  • The validity of all the studies referred to in the text was assessed using appropriate criteria.
  •  
  • The methods used to combine the findings of the relevant studies were reported.
  •  
  • The findings of the relevant studies were combined appropriately relative to the primary question posed in the overview.
  •  
  • The conclusions made by the authors are supported by the data and/or analysis reported in the overview.

An excellent review of this subject entitled, On Quality and Validity, can be found on the Bandolier web site. Examples for each analysis are to be found in this article.

Although statistical concepts are about as exciting as watching paint dry for most family physicians, there are a few that you should know about when reviewing an article.

These include:

P values:
P value is the probability that any particular outcome would have arisen by chance. Standard scientific practice usually deems a P value of less than 1 in 20 (expressed as P<0.05) as "statistically significant" and a P value of less than 1 in 100 (P<0.01) as "statistically highly significant." A P value of <0.07 may indicate a trend, but is not considered to be statistically significant.

Intention to treat analysis: A method for data analysis in a randomized clinical trial in which individual outcomes are analyzed according to the group to which they have been randomized, even if they never received the treatment they were assigned. By simulating practical experience it provides a better measure of effectiveness (versus efficacy). This is a complex concept, and there are many definitions of what constitutes ITT.

Absolute Risk Reduction (ARR) is the difference in the event rate between control group (CER) and treated group (EER): ARR = CER - EER. Depending on circumstances it can be reduction in risk (death or cardiovascular outcomes, for instance, in trials of statins), or an increase (pain relieved, for instance, in trials of analgesics).

Relative risk is the ratio of risk in the treated group (EER) to risk in the control group (CER). RR = ERR/CER. It is used in randomised trials and cohort studies.

Relative risk reduction (RRR) is the extent to which a treatment reduces a risk, in comparison with patients not receiving the treatment of interest. It is the difference between the EER and CER (CER-EER) divided by the CER, and usually expressed as a percentage. RRR = CER-EER/CER

For example in the MRC/BHF Heart Protection Study of cholesterol-lowering with simvastatin in 5963 people with diabetes: a randomised placebo-controlled trial. (Lancet. 2003 Jun 14;361(9374):2005-16.) 601 of the simvastatin treated group vs 748 of the placebo treated group suffered a major vascular event. This represented a 748-601/748 = 20% RRR

Relative risk reduction can lead to over-estimation of treatment effect.

Number needed to treat (NNT) is the number of patients who must be exposed to an intervention before the clinical outcome of interest occurred; for example, the number of patients needed to treat to prevent one adverse outcome. It is the inverse of the ARR: NNT = 1/ARR.

The ideal NNT is 1, where everyone has improved with treatment and no one has with control. The higher the NNT, the less effective is the treatment. But the value of an NNT is not just numeric. For instance, an NNT of about 1 might be seen when treating sensitive bacterial infections with an antibiotic, while an NNT of 2-5 might indicate an effective therapy for acute pain with an analgesic. An NNT of 40 or more might be useful when considering preventative treatments, such as using a statin to reduce the risk of heart attack.

NNT should always be given with the control treatment, the intervention used, the intensity (dose) and duration of the intervention, the outcome, and the period over which observations were made for that outcome.

Odds ratios: Many studies report their results as odds ratios, or as a reduction in odds ratios. Odds ratios are also commonly used in epidemiological studies to describe the likely harm an exposure might cause.

The odds of an event occurring are calculated as the number of events divided by the number of non-events. For example, on average 51 boys are born in every 100 births, so the odds of any randomly chosen delivery being that of a boy is:

number of boys 51 / number of girls 49, or about 1.04)

The odds ratio is the ratio of the odds of having the target disorder in the experimental group relative to the odds in favour of having the target disorder in the control group (in cohort studies or systematic reviews) or, the odds in favour of being exposed in subjects with the target disorder divided by the odds in favour of being exposed in control subjects (without the target disorder).

Confidence interval (CI) is the range of numerical values in which we can be confident (to a computed probability, such as 90 or 95%) that the population value being estimated will be found.

Confidence intervals indicate the strength of evidence. Where confidence intervals are wide, they indicate less precise estimates of effect.

The larger the trial's sample size, the larger the number of outcome events and the greater becomes the confidence that the true relative risk reduction is close to the value stated. Thus the confidence intervals narrow and so "precision" is increased.

In a "positive finding" study the lower boundary of the confidence interval, or lower confidence limit, should still remain important or clinically significant if the results are to be accepted. In a "negative finding" study, the upper boundary of the confidence interval should not be clinically significant if you are to confidently accept this result.

The University of Alberta has developed a convenient checklist that covers the important points to consider when evaluating an article. Definitions of statistical concepts and a demonstration of how to calculate them are linked to the list. This checklist is give below, but the original can be found on line at http://www.med.ualberta.ca/ebm/therapy.htm

Are the results valid?

  • Was the assignment of patients to treatment randomized?
  •  
  • Were all patients who entered the trial properly accounted for and attributed at its conclusion?
    • Was follow-up complete?
    • Were patients analyzed in the groups to which they were randomized?
    • Intention to treat analysis?
    •  
  • Were patients, their clinicians and study personnel 'blind' to treatment?
  •  
  • Were the groups similar at the start of the trial?
    • Baseline prognostic factors (demographics, co-morbidity, disease severity, other known confounders) balanced?
    • If different, were these adjusted for?
    •  
  • Aside from the experimental intervention, were the groups treated equally?
    • Co-intervention?
    • Contamination?
    • Compliance?

What are the results?

  • How large is the treatment effect?
    • Absolute risk reduction?
    • Relative risk reduction?
    •  
  • Did the study have a sufficiently large sample size?
  •  
  • How precise is the estimate of the treatment effect?
    • Confidence intervals?
Will the results help me with patient care?
  • Can the results be applied to my patients?
    • Patients similar for demographics, severity, co-morbidity and other prognostic factors?
    • Compelling reason why they should not be applied?
    •  
  • Were all clinically relevant outcomes considered?
    • Are substitute endpoints valid?
    •  
  • Are the benefits worth the harms and costs?
    • NNT for different outcomes?

These are complex concepts and for many physicians reading about them will be more effective than taking a sleeping pill at inducing sleep (odds ratio of 9, 95% CI 1-10). This has been an attempt to give you "just the berries".

Fortunately Bandolier, the University of Alberta, McMaster University, and the CMAJ have developed websites and published articles that help to make the water a little less murky. Have a look at these sites for a more complete analysis of the topic.

- John Hickey

Thanks to Dr. Denis Chouquette, Director of the Department of Rheumatology at the University of Montreal, Montreal, Quebec, Canada, and Pamela McLean-Veysey, BSc (Pharm), Drug Evaluation Pharmacist, QEII Health Sciences Centre, Halifax, Nova Scotia, Canada for reviewing the draft copy of this article.

You can search for abstracts of the above references by following this link: PubMed


Return to Archives Page ] [ Berries Home Page