A multiple baseline design is used in medical, psychological, and biological research. The multiple baseline design was first reported in 1960 as used in basic operant research. It was applied in the late 1960s to human experiments in response to practical and ethical issues that arose in withdrawing apparently successful treatments from human subjects. In it two or more (often three) behaviors, people or settings are plotted in a staggered graph where a change is made to one, but not the other two, and then to the second, but not the third behavior, person or setting. Differential changes that occur to each behavior, person or in each setting help to strengthen what is essentially an AB design with its problematic competing hypotheses.
Because treatment is started at different times, changes are attributable to the treatment rather than to a chance factor. By gathering data from many subjects (instances), inferences can be made about the likeliness that the measured trait generalizes to a greater population. In multiple baseline designs, the experimenter starts by measuring a trait of interest, then applies a treatment before measuring that trait again. Treatment does not begin until a stable baseline has been recorded, and does not finish until measures regain stability.[1] If a significant change occurs across all participants the experimenter may infer that the treatment is effective.
Multiple base-line experiments are most commonly used in cases where the dependent variable is not expected to return to normal after the treatment has been applied, or when medical reasons forbid the withdrawal of a treatment. They often employ particular methods or recruiting participants. Multiple baseline designs are associated with potential confounds introduced by experimenter bias, which must be addressed to preserve objectivity. Particularly, researchers are advised to develop all test schedules and data collection limits beforehand.
Although multiple baseline designs may employ any method of recruitment, it is often associated with "ex post facto" recruitment. This is because multiple baselines can provide data regarding the consensus of a treatment response. Such data can often not be gathered from ABA (reversal) designs for ethical or learning reasons. Experimenters are advised not to remove cases that do not exactly fit their criteria, as this may introduce sampling bias and threaten validity.[1] Ex post facto recruitment methods are not considered true experiments, due to the limits of experimental control or randomized control that the experimenter has over the trait. This is because a control group may necessarily be selected from a discrete separate population. This research design is thus considered a quasi-experimental design.
Multiple baseline studies are often categorized as either concurrent or nonconcurrent.[1] [2] Concurrent designs are the traditional approach to multiple baseline studies, where baseline measurements of all participants start at (roughly) the same moment in real time. This strategy is advantageous because it moderates several threats to validity, and history effects in particular.[3] Concurrent multiple baseline designs are also useful for saving time, since all participants are processed at once. The ability to retrieve complete data sets within well defined time constraints is a valuable asset while planning research.
Nonconcurrent multiple baseline studies apply treatment to several individuals at delayed intervals. This has the advantage of greater flexibility in recruitment of participants and testing location. For this reason, perhaps, nonconcurrent multiple baseline experiments are recommended for research in an educational setting.[2] It is recommended that the experimenter selects time frames beforehand to avoid experimenter bias,[1] but even when methods are used to improve validity, inferences may be weakened.[4] Currently, there is debate as to whether nonconcurrent studies represent a real threat from history effects.[4] [5] It is generally agreed, however, that concurrent testing is more stable.
Although multiple baseline experimental designs compensate for many of the issues inherent in ex post facto recruitment, experimental manipulation of a trait gathered by this method may not be manipulated. Thus these studies are prevented from inferring causation if there are no phases to demonstrate reversibility. However, if such phases are included (as is the standard of experimentation), they can successfully demonstrate causation.
A priori (beforehand) specification of the hypothesis, time frames, and data limits help control threats due to experimenter bias.[1] For the same reason researchers should avoid removing participants based on merit. Multiple probe designs may be useful in identifying extraneous factors which may be influencing your results. Lastly, experimenters should avoid gathering data during sessions alone. If in-session data is gathered a note of the dates should be tagged to each measurement in order to provide an accurate time-line for potential reviewers. This data may represent unnatural behaviour or states of mind, and must be considered carefully during interpretation.[4]