BACKGROUND This article is written in the form of a literature review for the journal Sportscience. A few of the requirements for form and content are unique to Sportscience, but most are common to all good scientific journals. You can therefore use this article to help you write a review for any journal. You can also use this article to structure a literature review for a thesis, but check with your supervisor for any special requirements. This article exists in slightly modified form as a template for a Sportscience review article. If you intend to submit a review to Sportscience, you should download the template from the Information for Authors page at the Sportscience site. Whether you are writing a review for Sportscience, another journal, or a thesis, you should read my guidelines on scientific writing (Hopkins, 1999a). Here are the main points from that article: Avoid technical terms. Avoid abbreviations. Use simple sentences. Avoid common errors of punctuation and grammar. Use the first person (I, we) rather than the passive voice. Link your ideas into a sensible sequence without repetitions or discontinuities. Get feedback on your article from colleagues. In this Background section, make the topic interesting by explaining it in plain language and by relating it to actual or potential practical applications. Explain any scientific principles underlying the topic. Define and justify the scope of the review: why you are limiting it to certain sports, why you are including studies of non-athletes and non-human species, and so on. LITERATURE In this short section you should list how many of each kind of publication you summarized (for example, 31 original investigations, one monograph, five reviews, four popular articles, one manuscript), and how you found them (for example, a search of the sport-science database SportDiscus). Be specific about any database search you performed. Include the key words you used, and the ways you refined your search if necessary. For example: "A search for overtrain* produced 774 references, which reduced to 559 when we limited the search to intermediate or advanced levels (not le=basic). Further restricting the search to psych* or mood produced 75 references. We read 47 of these as full papers. Of the 41 papers cited in this review, we were able to obtain the following only in abstract form: Jones et al. (1979) and Smith and Brown (1987)." Describe and justify briefly any papers or areas that you decided not to include. FINDINGS This section is the most important part of your review. Do not give a summary paper-by-paper; instead, deal with themes and draw together results from several papers for each theme. I have identified four themes for this section: assessing the quality of published work; interpreting effects; points of grammar and style; and a few remarks about tables and figures. These themes are dealt with under subheadings. I encourage you to use such subheadings, which will make it easier for you to write the review and easier for others to read it. Quality of Published Work Look critically at any published work. The fact that something has been published does not mean the findings are automatically trustworthy. Some research designs are better than others (see Hopkins, 1998a). The most trustworthy conclusions are those reached in double-blind randomized controlled trials with a representative sample of sufficient size to detect the smallest worthwhile effects. The weakest findings are those from case studies. In between are cross-sectional studies, which are usually plagued by the problem of interpreting cause and effect in the relationship between variables. How subjects were sampled is an important issue. You can be confident about generalizing results to a population only if the sample was selected randomly from the population and there was a low proportion of refusals and dropouts (<30%). Be wary of generalizing results from novice athletes to elites. Something that enhances performance in young or untrained individuals may not work so well in highly trained athletes, who may have less headroom for improvement. There are big differences in the way data can be collected. At one extreme are qualitative methods, in which the researcher interviews subjects without using formal psychometric instruments (questionnaires). At the other extreme are quantitative methods, in which biological or behavioral variables are measured with instruments or techniques of known validity and reliability. In the middle are techniques with uncertain precision and questionnaires with open-ended responses. Qualitative assessment is time consuming, so samples are usually small in size and non-representative, which in turn limit the conclusions that can be made about effects in a population. The conclusions may also be biased by the prejudices of the researcher-interviewer. Quantitative data collection is more objective, but for some projects it could miss important issues that would surface in an interview. A combination of qualitative methods for pilot work and quantitative methods for a larger study should therefore produce valuable conclusions, depending, of course, on the design. You will probably find that your topic has been dealt with to some extent in earlier reviews. Cite the reviews and indicate the extent to which you have based your review on them. Make sure you look at the key original papers cited in any earlier reviews, to judge for yourself whether the conclusions of the reviewers are justified. Reviews, like original research, vary in quality. Problems with reviews include poor organization of the material and lack of critical thought. Some of the better reviews attempt to pull together the results of many papers using the statistical technique of meta-analysis. The outcomes in such reviews are usually expressed as relative risk, variance explained, or effect size, terms that you will have to understand and interpret in your review if you meet them. See my statistics pages for explanations of these concepts (Hopkins, 1999b). Interpreting Effects You cannot assess quantitative research without a good understanding of the terms effects, confidence limits of effects, and statistical significance of effects. An effect is simply an observed relationship between variables in a sample of subjects. An effect is also known as an outcome. Confidence limits and statistical significance are involved in generalizing from the observed value of an effect to the true value of the effect. The true value of the effect is the average value of the effect in the whole population, or the value of the effect you would get if you sampled the whole population. The confidence limits of an effect define the likely range of the true value of the effect: in short, how big or positive and how small or negative the effect could be. An effect is statistically significant if the likely range of the true value of the effect is unlikely to include the zero or null effect. Roughly speaking, statistically significant effects are unlikely to be zero, but such a rough interpretation is misleading: in sport and exercise science, the true value of an effect is never exactly zero. Statistical significance is notoriously difficult to understand, whereas confidence limits are at once more simple and more informative. Confidence limits are appearing more frequently in publications, but most authors still use statistical significance. As a reviewer you therefore have to come to terms with statistical significance. Here are a few suggestions on how to cope. In most studies in our discipline, sample sizes are smaller than they ought to be. So if a result is statistically significant, it will probably have widely separated confidence limits. Check to make sure the observed value of the effect is substantial (whatever that means--more about that in a moment). If it is, then you can conclude safely that the true value of the effect is likely to be a substantial. If the observed effect is not substantial--a rare occurrence for a statistically significant effect, because it means the sample size was too large--you can actually conclude that the true value of the effect is likely to be trivial, even though it was statistically significant! Problems of interpretation arise when researchers get a statistically non-significant effect. If the sample size is too small--as in almost all studies in sport and exercise science--you can get a statistically non-significant effect even when there is a substantial effect in the population. Authors of small-scale studies who do not understand this point will interpret a statistically non-significant effect incorrectly as evidence for no effect. So whenever you see a result that is not statistically significant, ignore what the author concludes and look at the size of the effect in question: if the effect is nearly zero and the sample size is reasonable, chances are there is indeed no worthwhile relationship in the population; if the effect is large, there may well be a substantial relationship in the population. But in either case, a bigger sample is required to be sure about what is going on. Sometimes the research may have been done: for example, moderate but non-significant effects in several studies probably add up to a moderate real effect, if the designs are trustworthy. How big is a moderate effect anyway? And what about large effects, small effects, and trivial effects? Make sure you look closely at the effects and interpret their magnitudes, regardless of whether they are statistically significant; the authors often don't. There are two approaches: statistical and practical. In the statistical approach, effects or outcomes are expressed as statistics that are independent of the units of measurement of the original variables. These statistics are the same ones referred to in the previous subsection: relative risk, variance explained, and effect size. Statisticians have come up with rules of thumb for deciding whether the magnitude of the effect is to be considered trivial, small, moderate, or large. For example, Cohen (1988) claims that an effect size of 0.2, a variance explained of 1% (equivalent to a correlation coefficient of 0.1), and a relative risk of 1.2 are the smallest effects worth detecting. I have extended Cohen's scale to effects of any magnitude, and I have made adjustments to his scale (Hopkins, 1998b). In the practical approach, you look at the size of the effect and try to decide whether, for example, it would make any difference to an athlete's position in a competition. For many events, a difference in performance of 1% or even less would be considered worthwhile. This approach is the better one for most studies of athletes. Whether you use the statistical or the practical approach, you must apply it to the confidence limits as well as the observed effect. Why? Because you want to describe how big or how small the effect could be in reality, not just how big or small it was in the sample that was studied. If the researchers do not report confidence limits, you can calculate them from the p value. I have devised a spreadsheet for this purpose (Hopkins, 1998c).