Skip to main content

Full text of "ERIC ED605516: Effect Size Estimation for Combined Single-Case Experimental Designs"

See other formats

Evidence-Based Communication Assessment and 

ISSN: 1748-9539 (Print) 1748-9547 (Online) Journal homepage: 

Effect size estimation for combined single-case 
experimental designs 

Mariola Moeyaert, Diana Akhmedjanova, John Ferron, S. Natasha Beretvas & 
Wim Van den Noortgate 

To cite this article: Mariola Moeyaert, Diana Akhmedjanova, John Ferron, S. Natasha Beretvas 
& Wim Van den Noortgate (2020): Effect size estimation for combined single-case experimental 
designs, Evidence-Based Communication Assessment and Intervention 

To link to this article: 

sea Published online: 30 Apr 2020. 

(sg Submit your article to this journal @ 

Cy View related articles @ 

® View Crossmark data@ 


Full Terms & Conditions of access and use can be found at 

Evidence-Based Communication Assessment and Intervention, 2020 g a 

EBP Advancement Corner 


Taylor & Francis Group 

® Check for updates 

Effect size estimation for combined single-case 
experimental designs 

Mariola Moeyaert', Diana Akhmedjanova', John Ferron’, §. Natasha Beretvas* 

& Wim Van den Noortgate* 

'Department of Educational and Counseling Psychology, University at Albany-SUNY, Albany, NY, USA; 
2Department of Educational Measurement and Research, University of South Florida, Tampa, FL, USA; 
3Department of Educational Psychology, University of Texas, Austin, TX, USA; *Faculty of Psychology and 
Educational Sciences & Imec-itec, KU Leuven, Leuven, Belgium 


The methodology of single-case experimental designs (SCED) has been expanding its efforts toward rigorous 
design tactics to address a variety of research questions related to intervention effectiveness. Effect size 
indicators appropriate to quantify the magnitude and the direction of interventions have been recommended 
and intensively studied for the major SCED design tactics, such as reversal designs, multiple-baseline designs 
across participants, and alternating treatment designs. In order to address complex and more sophisticated 
research questions, two or more different single-case design tactics can be merged (i.e., “combined SCEDs”). 
The two most common combined SCEDs are (a) a combination of a multiple-baseline design across 
participants with an embedded ABAB reversal design, and (b) a combination of a multiple-baseline design 
across participants with an embedded alternating treatment design. While these combined designs have the 
potential to address complex research questions and demonstrate functional relations, the development and 
use of proper effect size indicators lag behind and remain unexplored. Therefore, this study probes into the 
quantitative analysis of combined SCEDs using regression-based effect size estimates and two-level hierarch- 
ical linear modeling. This study is the first demonstration of effect size estimation for combined designs. 

Keywords: Combined designs; effect size; hierarchical linear modeling; regression models; single- 
case experimental design. 

Single-case experimental designs (SCEDs) are 
rigorous experimental designs that have been 
applied in a variety of fields (e.g., biomedical 
research, language and speech therapy, beha- 
vior modification, school psychology, counsel- 
ing psychology, physical therapy, special 
education, and neuropsychological rehabilita- 
tion) to evaluate the efficacy and effectiveness 
of interventions (Kennedy, 2005; Kratochwill 
et al., 2014; Moeyaert, Ferron, et al., 2014). In 
SCEDs, a case (one unit [e.g., participant], or 

For correspondence: Mariola Moeyaert, School of Education, 
Department of Educational and Counseling Psychology, Division 
of Educational Psychology & Methodology, The University at 
Albany - SUNY, 1400 Washington Ave, Albany, NY 12222. 

an aggregate unit such as a class) is measured 
repeatedly across time during conditions (e.g., 
baseline and intervention condition or multi- 
ple intervention conditions). Data from differ- 
ent conditions are compared to evaluate the 
efficacy or effectiveness of one or multiple 
interventions. The basic question examined 
using SCEDs is whether there is evidence for 
a functional relation between the systematic 
manipulation of an independent variable (i.e., 
the conditions) and its consistent effect on 
a dependent variable (i.e., the target behavior) 
(Kratochwill et al., 2010; Kratochwill & Levin, 
2014; J. Ledford et al., 2018). 

Valid and reliable structured visual ana- 
lysis techniques (J. Ferron & Jones, 2006; 

© 2020 Informa UK Limited, trading as Taylor & Francis Group 


Kratochwill et al., 2010) have been devel- 
oped for interpreting SCED results and are 
widespread. Visual analysis has a rich his- 
tory and is strongly embedded in the field of 
SCEDs. It is considered to be a valid 
approach for identifying “weak”, “moder- 
ate”, or “strong” evidence for a causal rela- 
tionship between an independent and 
dependent variables by evaluating data 
using six steps described by Kratochwill 
et al. (2010). Following the technical doc- 
umentation of the What Works Clearinghouse 
(WWC) Standards for Design and Analysis of 
SCEDs (Kratochwill et al., 2010), the field 
is now moving toward estimating effect size 
indicators to supplement and support the 
visual analysis results. Efforts have been 
made to develop effect size estimates for 
“single” SCEDs such as the alternating 
treatment design, multiple-baseline design, 
and ABAB reversal design (e.g., Lenz, 2013; 
Maggin et al., 2011; Manolov & Solanas, 
2013; Moeyaert, Ugille, Ferron, Beretvas, 
et al., 2014; Moeyaert, Ugille, Ferron, 
Onghena, et al., 2014; Parker, Vannest, & 
Davis, 2011; Parker et al., 2014; Shadish 
et al., 2008, 2014; Swaminathan et al., 
2010; Wolery et al., 2010). However, the 
formulation of these effect size indicators 
for “combined” SCEDs is not yet fully 
developed. This study is timely, especially 
given the potential of these types of designs 
to answer rich research questions and to 
make internally and externally more valid 
inferences about the efficacy or effective- 
ness of an intervention. 

Combined single-case designs 

Shadish and Sullivan (2011) conducted 
a review of SCED studies published in 
2008 to review their design and data char- 
acteristics. Their search resulted in 809 
unique SCED studies, 73.1% of which con- 
sisted of “single” designs: 54.3% were 
Multiple-Baseline Designs (MBD) across 

participants; 8.2% represented Withdrawal 
and Reversal Designs (WRD, such as ABAB 
reversal designs); 8.0% were Alternating 
Treatment Designs (ATDs); and 2.6% were 
Changing Criterion Designs (CC). The 
authors found that a proportion of SCEDs 
(26.9%) do not use a “single” design, but 
rather a design that combines characteris- 
tics of two or more “single” SCED designs — 
so-called “combined SCEDs” (J. Ledford & 
Gast, 2018). Specifically, the combination 
of MBD + WRD appeared to be the most 
popular one (12.0%), followed by the com- 
bination of MBD + ATD (9.9%). 

Combined or combination SCEDs (J. 
Ledford & Gast, 2018) offer three major 
advantages compared to single SCEDs. 
First, they allow assessment of multiple 
research questions. For example, Trottier 
et al. (2011) looked at the functional rela- 
tion between peer-tutoring interventions 
and the number of spontaneous appropriate 
communicative acts generated by students 
with autism spectrum disorder (ASD) as the 
main focus of their study. The use of 
a combined SCED let the researchers exam- 
ine whether normally developing peers 
could independently teach children with 
ASD to use speech-generating devices or 
whether the typically developing peers had 
to first be taught how to instruct the chil- 
dren with ASD. As a result, this combined 
design study allowed the researchers to 
evaluate two different interventions simul- 
taneously: (a) teaching typically developing 
peers to give timely prompts to children 
with ASD to use the device; and (b) letting 
typically developing peers teach children 
with ASD to use the device (Trottier et al., 
2011). Additionally, the two interventions 
were alternated for each child, and the 
interventions were staggered across partici- 
pants (7 = 2), resulting in an MBD + ATD 
combined design. 

Second, a combined SCED allows for 
more evaluations of the effectiveness of 

the treatment as more replications are pre- 
sent. For example, the MBD + WRD com- 
bined design allows for replication of 
a treatment effect after removing and rein- 
troducing the treatment within 
a participant as well as across participants, 
taking into account different start times 
for the treatment. In case of the MBD + 
ATD combined design, the replication of 
alternating treatments can be seen both 
within each participant and across partici- 
pants at different points in time. The repli- 
cation effects can be identified both within 
and across participants. Replication is 
a central theme in  SCED | studies 
(Kratochwill et al., 2010) because it 
enhances the external validity of the 
resulting conclusions. Indeed, there is 
additional documentation of the effect at 
more points in time and more replications 
within one case. 

Third, due to the dynamic nature of com- 
bined designs, they grant an opportunity to 
modify pure SCEDs by adding design ele- 
ments in the middle of the study. For 
instance, Kelley et al. (2002) initially used 
an MBD to investigate the effectiveness of 
competing reinforcement schedules on 
functional communication (Figure 1). 
However, the data demonstrated problems. 
The disruptive behaviors for two out of the 
three participants were not decreasing; as 
a result, the authors slightly changed the 
condition from Functional Communication 
Training (FCT) without extinction to FCT 
with extinction, ensuring treatment fidelity 
for all the other steps in the study. In this 
way, the introduction of the ABAB allowed 
the study to continue and provided an 
opportunity to address the core research 

The analysis of the majority of the com- 
bined design studies typically relies on visual 
analyses and non-overlap indices to identify 
and make inferences about the intervention 
effects (Chung & Cannella-Malone, 2010; 


Jason & Frasure, 1979; Matson & Keyes, 
1990; Trottier et al., 2011). For example, 
Lindberg et al. (1999) used an MBD + WRD 
combined design study to evaluate the effects 
of manipulation and reinforcement on self- 
injurious behaviors of two participants, solely 
relying on visual analysis. Another combined 
SCED study, MBD + ATD (Trottier et al., 
2011), reported the results of the effective- 
ness of peer-tutoring on the use of speech- 
generating devices for students with autism 
in social game routines using visual analysis 
and the Percentage of Non-Overlapping Data 
index (PND; Schlosser et al., 2008; Scruggs 
et al., 1987)). Relying on visual analysis and 
non-overlap indices is unfortunate because 
the opportunity is lost to precisely address 
additional questions through quantitative 
summaries (e.g., What is the magnitude of 
the intervention effect? To what extent is 
the intervention immediately effective? To 
what extent does the intervention remain 
effective over time? Are all the participants 
benefiting equally from the intervention?). 
While visual analysis and non-overlap indices 
provide an initial indication of effectiveness 
of an intervention, effect size indices are 
needed to provide additional information 
through quantitative synthesis. Effect size 
indicators can be used to quantify the magni- 
tude of intervention effectiveness at multiple 
points in time both for each participant and 
across participants. In addition, effect size 
estimates are supplemented with a standard 
error that reflects precision for the individual 
estimate and which can be used as a weight 
for quantitative summaries or analyses (i.e., 
multilevel meta-analysis; Moeyaert, 2019). 
Therefore, in this article, we are breaking 
new ground by applying the effect size logic 
to quantify intervention effectiveness for 
combined SCEDs. The effect size estimates 
will provide a more comprehensive picture 
regarding intervention effects by taking into 
account the design complexity of combined 
SCEDs, and they can be used in meta- 


FCT without Extinction 

14 BL 






nl a nl 
o,RP Fe DH O&O 


oon ke HO 

Responses per Minute (Aggression or Disruption) 











° Keen teen erent cA 
ad 21 41 61 






(sasuodsay UoNner1uNWWO) JUepUadapu)) anu, Jad sasuodsey 

Extinction and ° i} 
ing |B 

f 25 
=| 18 
MC enue 
81 101 121 141 L 3 

Figure 1. An example of modifying the multiple baseline design by adding a phase change reversal. Frequency of target 
behaviors for three participants. Adapted from “The Effects of Competing Reinforcement Schedules on the Acquisition of 
Functional Communication,” by M. E. Kelley, D. C. Lerman, and C. M. Van Camp, 2002, Journal of Applied Behavior 

Analysis, 35(1), p. 62. 

analyses to assess generalizability across 

interventions and outcome variables. 
Previous research has focused on the cod- 

ing schemes and synthesis of results for 

each of the “single” SCEDs, including the 
simple AB phase design, the MBD across 
participants, WRD (ABAB), and ATDs 
(Moeyaert, Ugille, Ferron, Onghena, et al., 

2014; Shadish, Kyse et al., 2013). 
Researchers have not investigated (1) cod- 
ing and effect size estimation for combined 
SCEDs, and (2) meta-analysis of studies 
involving combined SCEDs. Due to the 
lack of methodology to quantify combined 
SCEDs, these studies tend to be simplified 
or excluded from meta-analyses, which 
contributes to biased effect size estimates 
and/or publication bias (e.g., Kokina & 
Kern, 2010; Wang et al., 2013). Therefore, 
we focus on how to quantify treatment 
effects for combined designs. Thus, the pur- 
pose of this study is to illustrate effect size 
estimation for combined designs using real 
data. In particular, we will focus on the 
MBD + WRD combined designs (=45.97%) 
and the MBD + ATD combined designs 
(=37.91%) as they are the two most popu- 
lar classes of combined SCEDs: 83.38% of 
the combined SCEDs (Shadish & Sullivan, 


We identified combined design studies and 
then randomly selected one MBD + WRD 
and one MBD + ATD study. Combined 
SCEDs were identified by examining primary 
studies from four meta-analyses of SCEDs 
(Heyvaert et al., 2014; Kokina & Kern, 2010; 
Moeyaert et al., 2019; Shogren et al., 2004) 
and 20 primary studies that evaluated reading 
fluency interventions. These meta-analyses 
and primary SCED studies were chosen 
because the first author had access to raw 
data. The meta-analysis of Heyvaert et al. 
(2014) included 59 studies of which 11 studies 
(i.e., 18.64%) were combined SCEDs. The 
review by Kokina and Kern (2010) consisted 
of 18 SCEDs of which only four (i.e., 22.22%) 
were combined SCEDs. The peer-tutoring 
meta-analysis by Moeyaert et al. (2019) 
included 65 studies and contained nine com- 
bined SCEDs (ie., 13.85%). The last meta- 


analysis (Shogren et al., 2004) had 13 SCED 
studies and two of them (15.38%) were 
combined SCEDs. Finally, of the 20 primary 

studies that examined reading fluency 
interventions, seven (ie., 35%) were 
combined SCEDs. Thus, a_ substantial 

proportion of reviewed studies was combined 
SCEDs, a finding that is consistent with the 
review of Shadish and Sullivan (2011). The 
full list of the 33 combined design studies 
from the meta-analyses that we reviewed is 
available from the first author upon request. 
Of these combined designs, the combinations 
MBD + WRD (i.e., 58.82%, 20 studies) and 
MBD + ATD (i.e., 23.52%, eight studies) were 
the most popular. This also supports the 
results from the study of Shadish and 
Sullivan (2011) and our decision to focus on 
these two classes of combined SCEDs in this 

One study per combined SCED type was 
randomly selected from the set to demonstrate 
the coding of the design matrix and estimation 
of the effect sizes. The design matrix gives an 
overview of the overall data structure and 
includes all variables (e.g., participant identi- 
fier, the dependent variable, the independent 
variables) together with scores assigned to 
these variables. All variables needed to esti- 
mate the effect sizes of interest should be 
reflected in the design matrix. For more infor- 
mation about the design matrix for SCEDs, see 
Moeyaert, Ugille, Ferron, Beretvas et al. 
(2014). However, other studies from the selec- 
tion could also have been chosen. Raw data for 
the dependent variable in SCEDs are tradition- 
ally graphically displayed as can be seen in 
Figure 2 (MBD + WRD) and Figure 3 (MBD 
+ ATD). As a result, researchers can retrieve 
raw data from the graphical displays in pri- 
mary studies. We used WebPlotDigitizer 
(Rohatgi, 2011) to retrieve raw data. The raw 
data represent the measures of the dependent 
variable over time. The dependent variable 
(i.e., targeted behavior) together with other 
variables (i.e., phase and time indicators) that 


i ' 
100)  no-choice ! no-choice choice 
80 4 ! ! 
‘ : 
60 + : ! Scott 
i i 
i i 
40 + i i 
am i 
i ! 
0 I = | Tr 6 SC 4 er | T T T J T T T T 7 
1 4 13 is, 19 22 
100 i 
no-choice choice i —_no-choice 
80 ! 
60 ‘ Bob 
2 i 
§ | 
am a et 
2 i 
= i 
<= . 
$ i) T T T T T T T T T T T T T T Oy 4 T T T T T 7 
2 1 4 7 10 13 16 19 22 
§ too no-choice choice no-choice choice 
= ; 3 
ae 80 ' ' 
$ i i 
§ 3 i 
2 : ‘| Maria 
i i 
40 i i 
' i 
20 : 
ote! : 
' ' 
0 T T | me T TT TT 1 en Ta T T T TT 4 | T t Cy as i 7 
1 4 7 10 13 1 19 : 22 125 28 

Figure 2. An example of the mixed design: MBD + PCR. Percentage of intervals with problem behaviors for three 
participants. Adapted from “The Effects of Choice-making on the Problem Behaviors of High School Students with 
Intellectual Disabilities,” by S. Seybert, G. Dunlap, and J. Ferro, 1996, Journal of Behavior Education, 6 (1), p. 58. 

are needed to conduct the statistical analysis and will be discussed later. For more informa- 
are part of the design matrix. The design tion about the data retrieval process, see 
matrix needed for effect size estimation of the Moeyaert, Maggin, et al. (2016). The raw 
combined designs is displayed in Tables 1 and 4 data from Figures 2 and 3 can be found in the 

2 Se Son os 2 
oooo:s ©& o °o 


Percentage Intervals with Challenging Behavior 





5 min Blocking 

. T T T T T T T T T T T T T T T T 
} bn 10min 15min 
4 Alone Alone Preferred Activity (Walk) 
“4 i i ! 
; 1 
i i No Pre-Session Access 
t 1 

| i i 

' ; ' 
| rb | 

1 ‘ r 

1 i 

i ' 

H i Pre-Session Access 
a i Anna 
TT T ma | T T rN T es 



H ; ‘ 
7 ' Smin Blocking = 49min = 15 min 
4 i. : 


T T T 

rr a a 

' in A 
sr rr ae 

1 5S 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 8S 


Figure 3. An example of the Mixed Design: MBD + ATD. Percentage intervals with challenging behavior for three 
participants. Adapted from “The Effects of Presession Manipulations on Automatically Maintained Challenging Behavior 
and Task Responding,” by Y.-C. Chung, and H. |. Cannella-Malone, 2010, Behavior Modification, 34(6), p. 493. 

supplement to this article (together with the 
SAS codes that can be used for the analyses) to 

facilitate replication of the analyses demon- 
strated in this study, using the same data sets. 


Table 1. Design matrix for Case 1 (i.e., Scott) — Seybert 
et al. (1996) 

Case Session Outcome A1B1 BIA2 A2B2 

20 22.43 
21 16.24 
22 20.29 





Effect sizes are used as a complement to 
visual analysis in primary studies and can 

be used for between-study comparison of 
treatment effects and for meta-analytic pur- 
poses. Visual analysis has been well docu- 
mented by Kratochwill et al. (2010), 
whereas the focus of the current study is 
on the quantitative summary of combined 
SCEDs. The analyses in the empirical illus- 
tration sections are performed using SAS 
software, Version 9.4 (© SAS Institute 
Inc.) SAS codes are available in the supple- 
ment to this article. 

Multiple-baseline design — Withdrawal or 
reversal design 

To demonstrate the effect size estimation for 
the first class of combined SCEDs, we 
selected the study of Seybert et al. (1996). 
Seybert et al. (1996) investigated the differ- 
ences in problem and on-task behaviors in 
choice and no-choice conditions of three 
independent participants with intellectual 
disabilities. In the choice condition, partici- 
pants were given a choice of the domestic 
task to do. In contrast, in the no-choice con- 
dition, participants were assigned to do 

Table 2. Results of ordinary least squares analysis and Empirical Bayes analysis per 

Case Parameter OLS Estimate (SE) 
Scott Bor 61.31 (6.90 
Bi —24.28 (9.34 
Bo, 20.73 (8.91 
Bar —40.01 (9.34 
Bob Boo 38.20 (4.99 
Bio —22.47 (8.39 
pe 1.31 (9.55) 
Maria Bos 16.53 (2.97 
Bis —12.82 (6.64 
Bos 26.99 (8.40 

Bas -10.90 (7.97 

(Standard error of prediction) 

57.74 (11.87 
—19.30 (-) 
18.02 (10.35 
~37.38 (15.50) 
36.37 (11.77 
—19.30 (-) 
2.11 (10.32) 
18.90 (11.77 
—19.30 (-) 
29.85 (10.44 
~10.98 (15.50) 


Table 3. Results of two-level analysis across participants 

Parameter Estimate (SE) t Pp 
Fixed Effects 
Baseline level Al 8 37.67 (11.74) 3.21 .082 
Change in level Al — B] 6, —19.30 (4.59) —4.2) <.001 
Change in level B1 — A2 6, 16.66 (10.26) 1.62 227 
Change in level A2 — B2 63 —24.18 (15.76) —1.53 367 
Random Effects Estimate (SE) Zz Pp 
Baseline level Al Gi, 391.51 (406.93) 0.96 .168 
Change in level Al — B] a, 0 (/) / / 
Change in level B1 — A2 6, 236.77 (291.24) 0.81 .208 
Change in level A2 — B2 a. 414.59 (701.37) 0.59 277 
Within-case variance oa 207.75 (36.40) 5.71 <.0001 
Table 4. Design matrix for Case 1 — Data retrieved from Chung and Cannella-Malone 
Case Session Outcome Treatment, Treatment, 
] 0.27933 0 0 
2 29.88827 0 0 
3 39.38547 0 0 
4 24.86034 0 0 
5 22.90503 0 0 
6 19.55307 0 0 
7 23.18436 0 0 
8 46.64804 0 0 
9 0 0 
0 0.27933 0 ] 
] 0 0 
2 0 0 ] 
3 0 0 
4 ) 0 1 
5 0.27933 0 
6 0.27933 0 ] 
7 0.27933 0 ] 
8 0.27933 0 
a certain domestic task. The outcome vari- (Scott) to m = 29 (Maria). Seybert et al. 

able reflected the percentage of problem (1996) used the combination of the MBD + 
behaviors and task engagement in the choice WRD to investigate the effectiveness of 
versus no-choice conditions. The data were choice-making on problem _ behavior. 
recorded using the 15-s partial interval A graphical display is given in Figure 2. 
recording: that is, only the five last seconds Seybert et al. (1996) claimed that the MBD 
was recorded per each 15-s interval. Data + WRD allowed them to provide further 
points per participant ranged from 1 = 22 evidence for the changes in the treatment 


phase as a result of manipulating the inde- 
pendent variable — choice versus no-choice 
conditions. The inter-rater observer percent 
agreement ranged from 81% to 99% for 
occurrence and nonoccurrence of problem 
behaviors. Seybert et al. (1996) analyzed 
the data using visual analysis techniques, 
and the results were reported as percentages 
of intervals with problem behaviors. This 
combined SCED has the potential to demon- 
strate a functional relation between the 
choice-making condition and problem beha- 
vior as the effectiveness of the treatment can 
be evaluated at three or more different 
points in time. In addition, most of the 
phases included at least five measurements 
(one choice and one no-choice condition for 
Maria included only four measurements). 
The MBD embedded in the combined design 
meets the WWC design standards as it 
includes at least three potential demonstra- 
tions of treatment effectiveness across at 
least three different points in time. The 
WRD embedded in the combined design 
meets basic replications standards for Scott 
and Maria whereas this is not the case for 
Bob. There appears to be a non-effect for the 
withdrawal of the treatment. In addition, 
the WRD for Bob does not meet the WWC 
design standards as there are only two 
potential demonstrations of treatment effec- 
tiveness. According to Gast et al. (2018) this 
prohibits the conclusion that a functional 
relation is present for Bob. Notwithstanding 
of this non-effect and lack of experimental 
control for Bob, effect size estimation for this 
combined design can still be meaningful. 
Researchers might be interested in quantify- 
ing the size of the effect, and this quantifica- 
tion can be used to confirm the results based 
on the visual analysis. This effect size esti- 
mate can be used afterward for meta-analy- 
tic purposes. We focused on estimating 
regression-based effect size estimates for the 

occurrence of problem behaviors in choice- 
making conditions for three participants 
with intellectual disabilities. The statistical 
model and empirical illustration are dis- 
cussed in the following sections. 

Statistical model Step 1: single-level 
analysis. The single-level analysis can also 
be called an individual analysis as it involves 
a case-by-case evaluation of treatment 
effectiveness. Here, we are interested in 
demonstrating the effectiveness of 
a treatment at different points in time 
within participants. In the _ simplest 
scenario, the results are an estimate of 
change in levels between baseline and 
treatment phases for each participant 
separately. In other words: “Is there evidence 
for change in level between adjacent phases?” In 
this particular scenario, the design matrix 
contains dummy-coded variables indicating 
the specific phase to which a measurement 
belongs (see Table 1). We chose the 
following notation to distinguish between 
the consecutive phases: Al and A2 
indicate, respectively, the _ first and 
the second baseline phase, and Bl and B2 
denote the first and the second treatment 
phase. For the ABAB phase design, three 
dummy variables, AIB1, B1A2, and A2B2 
are coded as suggested by Moeyaert, Ugille, 
Ferron, Beretvas, et al. (2014) and Shadish, 
Kyse, et al. (2013). AIJB1 = 1 for all the 

measurement occasions after the first 
baseline phase, BIA2 = 1 for all the 
measurement occasions after the first 

treatment phase and A2B2 equals 1 during 
the last treatment phase (see Table 1). In 
order to predict the outcome score at the 
ith measurement occasion, the following 
multiple regression equation can be used 
and parameters can be estimated using 
Ordinary Least Squares (i.e., OLS): 

¥; = Bo + B,A1B1; + ByB1A2; + B,A2B2; 
+ e; with e;~N(0, o2) (1) 

When all three dummy-coded variables equal 
zero (i.e., AlBI = B1A2 = A2B2 = 0), then the 
indicated phase is the first baseline phase (B,). 
Each dummy variable represents the change 
from an earlier to its adjacent phase. Thus, for 
example, B1A2 refers to the change in level 
from B1 to A2 (ie., difference in level between 
Treatment 1 and Baseline 2). An extension 
here could be to investigate whether there 
are changes in linear (Moeyaert, Usgille, 
Ferron, Beretvas, et al., 2014) or non-linear 
trends (Hembry et al., 2015) or changes in 
variance of scores between adjacent phases 
(Baek & Ferron, 2013). 

Statistical model Step 2: two-level 
analysis. The two-level analysis involves an 
aggregate estimate of the treatment 
effectiveness across participants. Here, we are 
investigating the replication of the treatment 
effect across participants (within the same 
study), in addition to the replication of the 
treatment effect within participants. As 
a consequence, more generalized conclusions 
can be made, which strengthens the external 
validity of the inferences. In addition, 
variability in effectiveness of the treatment 
between participants can be quantified. One 
way to perform this analysis is to conduct 
a two-level analysis, which takes the 
hierarchical nature of the data into account; 
namely, measurements are nested within each 
of multiple cases. 

The coefficients from the first level: Boj, 
B,;, Ba, and B3;, can be modeled as varying 
at the second (participant) level. By fitting 
this multilevel model, overall average 
changes in level from one phase to another 
can be obtained in addition to how indivi- 
dual participants deviate from that overall 
change. The level 1 and level 2 equations 
are presented in Equations (2) and (3): 


Level 1: 

+ ej with ej~N(0, 02) 
with ej~N(0, 02) 

Level 2: 
Boj = O00 + Uoj Uoj 
B,; = 910 tu 
1 10 se Uy 
: with c 
Boj = O29 + Up Uj 
Bz; = 830 + U3) U3j 
0 Ou, Oupu, Oupu, Ouyus 
N 0 Ou, Uo 01, Ou, up Ou, U3 
) 2 
0 Ounuy Our 01, Oupus 
0 Ous3uU9 Ousu; Ousuy Or, 

The first line in Equation (3) indicates 
that the baseline level for participant j is 
modeled as a function of an average base- 
line level, 090, plus a random deviation from 
this mean, Ug. The subsequent equations 
describe the average change in level 
between Al and B1 (810), change in level 
between Bl and A2 (69), and change in 
level between A2 and B2 (030)phases, 
respectively. The variability in baseline 
level (i.e., 07,) and variability in changes 
in levels (i.e., 0f,, 02, and o7,) are captured 
by estimating the 


Empirical illustration. We use the Seybert 
et al. (1996) study for the empirical 
illustration of the single-level (individual) 
and two-level (average) effect size estimates 
for the MBD + WRD design. Seybert et al. 
(1996) investigated the effects of choice- 
making on the problem behaviors of three 
high school students with intellectual 
disabilities. In this example, we are looking 


only at the outcome variable of occurrence conditions (i.e., no-choice — denoted as Al 
and nonoccurrence of problem behaviors and A2 in Figure 4) are interrupted by 
within choice and no-choice conditions. The treatment conditions (i.e., choice — denoted 
start of the intervention was staggered across as B1 and B2 in Figure 4). Participant 2 (i.e., 
the three participants, and two _ baseline Bob) has no second treatment phase as the 

! ! 
isl B1 Al B2 A2 
i i 
80 4 no-choice choice i no-choice—j choice 
69 = 37.67 ; 6,=16.66 ; 
i i 
1 1 

100 - choice no-choice choice 

Percentge ot Intervals with Problem Behavior 

Bo = 16.53 


TF tt ot ee ee t 


1 4 7 10 13 16 ig 

Figure 4. Estimated parameters for each participant across phases. Note: The lines indicate case-specific and study- 
specific estimates. 

problem behavior remained low when the 
treatment was removed (phase A2). The 
graphical presentation of the data is given 
in Figure 2. The coding of the design 
matrix for participant 1 (i.e., Scott) in 
accordance with the mathematical model 
presented in Equation (1) can be found in 
Table 1 (the same coding is applied for the 
other cases). The SAS code to run the 
analyses is available as a supplement to 
this article. 

The output of the single-level analysis is 
presented in Table 2, and the visual pre- 
sentation of the estimated parameters is 
provided in Figure 4. From the single- 
level analysis, we can conclude that there 
is a demonstration of treatment effective- 
ness at three different points in time for 
Case 1 (i.e., Scott). When the choice-mak- 
ing intervention is introduced, we see 
a_ significant drop in problem behavior 
[B1, = —24.28, t(25) = —2.60, p= .018 and 
B3, = —40.01, t(25) = —4.28, p = .032]. When 
the  choice-making intervention is 
removed, we see a significant increase in 
problem behavior  [8,, = 20.73, t(25) = 
2.33,p = .032;]. For Case 2 (i.e., Bob) and 
Case 3 (i.e., Maria), there was only one 
demonstration of significant treatment 
effectiveness [Case 2: B,) = —22.47, t(20) = 
—2.68, p=.015, and Case 3: B53 = 26.99, 
t(25) = 3.21p = .004]. According to the 
WWC design’ standards (Kratochwill 
et al., 2010), the choice-making interven- 
tion was only effective for Scott as three 
demonstrations of treatment effectiveness 
at three different points in time are 
required to demonstrate a causal relation- 
ship between the introduction of the treat- 
ment and the change in outcome score. 

The two-level analysis was conducted to 
estimate the overall baseline level and 
changes in level between subsequent 
phases across the three cases in addition 


to between-case variability in these esti- 
mates. The two-level analysis enhances 
the generalizability of treatment effective- 
ness beyond the cases under investigation. 
For didactic purposes (allowing visual pre- 
sentation of the estimated coefficients, 
Figure 4), a small dataset with only three 
cases is used. In order to run a two-level 
analysis and obtain generalizable esti- 
mates, it is suggested to use a larger data- 
set, including more than three cases. The 
results indicate that the choice-making 
intervention succeeded in reducing the 
problem behavior and large effect size esti- 
mates were obtained for the change in 
level between Al and B1 and A2 and B2 
[910 = —19.30, t(66) = —4.21, p< .001; 039 
= —24.18,t(1) = —1.53, p = .367]. Howev- 
er, only one estimate (810) is statistically 
significant (p <.05). 

An additional advantage of using the 
two-level analysis is that the between- 
case variance in treatment effect estimates 
can be estimated. Most variability was 
found in the estimate of the between- 
case variance for the change in level 
between A2 and B2 (Table 3, random 
effects). The results of the single-level 
and two-level analyses are visually pre- 
sented in Figure 4. 

Another advantage of using the two-level 
analysis is that empirical Bayes estimates of 
the case-specific parameters can be obtained. 
The empirical Bayes estimate can be viewed as 
a fully Bayesian approach that uses informa- 
tion of the full dataset to build prior distribu- 
tions (Shadish, Rindskopf, et al., 2013). 
Therefore, the empirical Bayes estimates are 
shrunken toward the mean (the overall aver- 
age fixed effects). These case-specific estimates 
are improved estimates compared to the sin- 
gle-level ordinary least squares estimates 
because information from the entire dataset 
is used (in other words, the empirical Bayes 
estimate is “borrowing strength” from all 


available study evidence). For an introduction 
to empirical Bayes estimates, see Casella 
(1985). Instead of running three separate sin- 
gle-level analyses, one two-level hierarchical 
linear model can be run, providing both the 
effect size estimates across cases and case-spe- 
cific estimates. The results of the case-specific 
estimates based on the empirical Bayes esti- 
mates are displayed in Table 2 and closely 
match the results of the single-level ordinary 
least squares analyses. 

Multiple-baseline design — Alternating 
treatment design 

In Alternating Treatment Designs (ATDs), 
two or more treatments (possibly following 
a baseline phase) are rapidly alternated 
(Barlow & Hayes, 1979; Barlow et al., 
2009), or treatment sessions are alternated 
with no treatment sessions. Most of the 
ATDs are characterized by a baseline phase 
and two or more treatments, which are 
alternated during the treatment phase. In 
this scenario, the researcher is interested in 
the differential effect between the two treat- 
ment effects (i.e., the relative effectiveness of 
two or more interventions; Horner & Odom, 
2014). Other ATDs are characterized by an 
alternation of two or more treatments, or 
with alternation of two or more treatments 
with baseline sessions. In this later scenario, 
a pure baseline comparison is not possible 
unless the alternation is proceeded or fol- 
lowed by a phase only including baseline 
measures (Zimmerman et al., 2019). If the 
baseline sessions are alternated with treat- 
ment comparisons from the beginning, it is 
unknown how the participants perform 
without being introduced to the treatment 
(which could be a confounding factor). In 
addition, multitreatment inference can 
occur as it can be the case that multiple 
treatments are effective because they are 
given in an alternated fashion (one treat- 
ment might strengthen the effectiveness of 

the other treatment and vice versa). 
Zimmerman et al. (2019) indicate that pos- 
sible multitreatment interference can be 
detected with the inclusion of an initial base- 
line and visual analysis that compares the 
initial baseline level to the baseline observa- 
tions that are part of the alternating 
sequence. Similarly, a phase for a specific 
treatment can be included so that the obser- 
vations within the treatment phase can be 
compared to the treatment observations that 
are part of the alternating sequence. 

To demonstrate a functional relation 
between the independent and dependent 
variables, the data from different treatments 
should not overlap. In addition, the ATD 
study should include at least four data 
points of comparison in each of the treat- 
ments and at least five repetitions of alter- 
nating sequence to meet the standards of 
What Works Clearinghouse (Horner & Odom, 
2014; Kratochwill et al., 2010). 

This combined SCED combines the 
unique strengths of ATDs with MBDs (i.e., 
external validity, making more generalized 
treatment effects). That is, the combination 
of ATDs with MBDs uses the rapid compar- 
ison of two or more conditions (ATDs) and 
the start of the intervention phase is stag- 
gered across participants (MBD). In this 
way, the combination of ATD + MBD 
allows identifying the treatment that has 
a larger effect with higher degrees of inter- 
nal and external validity of measurements. 
Another possibility of the ATDs is that 
researchers may choose to continue only 
the treatments with the strongest effects in 
the final phases of the study (Kratochwill 
et al., 2010). 

Statistical model Step 1: single-level 
analysis. Similar to the single-level (i.e., 
case-specific) analysis for the MBD + WRD, 
a case-by-case intervention effectiveness 
evaluation can be performed for MBD + 
ATD. More specifically, the following 

research question is of interest: “Is there 
a change in level for Treatment 1 and Treatment 
2, respectively?” The effect sizes of interest can 
be obtained by introducing dummy variables 
for each treatment. The dummy-coded 
variables, Treatment,,;s, indicate the treatment 
phase. For instance, Treatment,,; equals one if 
the score belongs to treatment phase m on 
moment i, zero otherwise. If all the 
Treatment,,S are zero, then the measurement 
occasion belongs to the baseline phase. For 
two treatments, the following regression 
equation can be used (using treatment 
indicators Treatment); and Treatment;). 

Y¥; = By + B, Treatment;; + B,Treatmenty; 
+ e; with e~N(0, 02) (4) 

Bo indicates the baseline level, B, refers to 
the change in level between the baseline 
and Treatment | and £, refers to the change 
in level between the _ baseline and 
Treatment 2. The difference between B, 
and B, refers to the differential effect (e.g., 
“Is one of the treatments relatively more effec- 
tive?”). Equation (4) can be extended by 
modeling linear or non-linear trends 
(Hembry et al., 2015; Moeyaert, Ugille, 
Ferron, Beretvas, et al., 2014), or adding 
more dummy variables in case more than 
two treatments are examined. 

Statistical model Step 2: two-level 
analysis. This step is similar to Step 2 
described for MBD + WRD design, where 
coefficients from the first level can be 
modeled as varying at the second level: 

Level 1: Yj = Bo + B,Treatment;; 
+ ByTreatmenty; 
+ ej with ey~N(0,02) (5) 

Level 2: 


Boj = 900 + Uoj Uoj 
Bj = O19 + uy with uy; 
Boj = 820 + Ua uy 
2 (6) 
0 Or, Ougu;  Aupur 
N~ 0], | Ouu Oy,  Puruy 

0 Ouuo Ounuy oO, 

This two-level analysis allows for making 
more generalized conclusions as overall 
average estimates across cases are obtained 
(the @s in Equation (6)). As noted before, 
case-specific estimates are available by 
requesting the empirical Bayes estimates. 
By estimating the variance/covariance 
matrix, the between-case variance in base- 
line level (o7,) and treatment effect esti- 

mates (07, and o7,) can be obtained. 


The study of Chung and Cannella-Malone 
(2010) will be used for the empirical 
demonstration. This study used an ATD 
that is characterized by a baseline phase 
followed by an alternating phase in which 
baseline and treatment sessions are alter- 
nated. In addition, the ATD is repeated 
across multiple independent participants, 
and the start of the randomization phase 
is staggered across the participants (MBD). 
The purpose of the Chung and Cannella- 
Malone study was to examine separate 
and combined effects of motivation opera- 
tions of three participants with multiple 
disabilities in four pre-session conditions: 
(1) attention, (2) response blocking, (3) 
attention with response blocking, and (4) 
non-interaction. The dependent variable 
was stereotypic behavior, which was mea- 
sured using the 10- partial interval 
recording. Inter-observer data were calcu- 
lated for pre-session (39% of data) and 
treatment (40% of data) conditions, with 
the agreement reaching 98% and 99%. 

The graphical display of the data can be 

found in Figure 3 (i.e., copied from the 
original study) and Figure 5 (i.e., 

recreated graph, using the retrieved data 
obtained with WebPlotdigitizer; Rohatgi, 

| Baseline 5 min Blocking 


80) fy +2584 fp, =-25.73 

70 4 ! 

60 | B, = —-25.67 

50 4 

40 i 

30 - 

20 i 



T T T T | PO 1 

1 6 }11 16 21 26 31 36 41 46 Si 56 61 66 71 76 81 

7 i..... 10min 
4 Alone 

7 Bo = 48.40 i 

ses2e8s8 8 

15 min 

| py +-38.58 

Preferred Activity (Walk) 

No Pre-Session Access 


Percentage Intervals with Challenging Behavior 

anh 6, = —15.10 
8, = —20.86 
‘ 1 
z] Pre-Session Access 
6, = -30.55 
Pre, Anna 
0 ‘ aa) ae ea | 

1 6 11 16 2h 26 31 36 41 46 SI 56 61 66 71 76 81 

5 min Blockine 10 min 15 min 

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 


Figure 5. Estimated parameters for the single-level analysis and two-level analysis. The line during the baseline indicates 
the overall average baseline level estimate; the lines during the intervention indicate the estimated challenging behavior 
during the pre-session access intervention and the challenging behavior during the no pre-session access intervention. 

For this empirical demonstration, we will 
analyze the problem behavior for the three 
participants of the study of Chung and 
Cannella-Malone (2010). During the treat- 
ment, participants did two tasks: Task A and 
Task B, which were individualized to the 
needs and skills of the participating stu- 
dents. Students did the tasks in two condi- 
tions as shown in Figure 3: (1) pre-session 
access condition that was identified in the 
functional analysis part of the study and (2) 
no pre-session access. Because of the indi- 
vidual needs in the Chung and Cannella- 
Malone (2010) study, the treatment phases 
are participant-specific. This is commonly 
the case using SCEDs as one of the 
strengths of this design is to adjust the treat- 
ment according to the participant's needs. 
As a consequence, the baseline versus treat- 
ment comparison for the three participants 
is not completely the same (i.e., Lilly: base- 
line -— 5 min blocking; Anna: baseline —- 
10 min alone and Kellie: baseline - 5 min 
blocking). Therefore, strictly speaking, no 
experimental conclusions can be drawn 
from this combined design (Ledford and 
Gast, 2018). However, the treatment phases 
can be treated as subcategories of the same 


treatment and as a consequence it is still 
meaningful to investigate generalization of 
the effect across the three participants. In 
the original study, the data were visually 
analyzed, and the results were reported as 
percentages of intervals with problem beha- 
vior. Chung and Cannella-Malone (2010) 
reported that the intervention was success- 
ful for two out of the three participants, 
whose problem _ behaviors noticeably 
decreased. The results of the intervention 
for the third participant were contradictory 
(i.e., the intervention condition identified 
as successful in the previous experiment 
failed to decrease problem behaviors). 
Notwithstanding, the interventions were 
successful for only two out of the three 
participants, it is still worth estimating the 
size of the intervention effect to comple- 
ment this finding. The coding of the design 
matrix for Case 1 (i.e., Lilly) in accordance 
with the mathematical model presented in 
Equation (4) can be found in Table 4. The 
SAS codes to run the analyses are available 
as a supplement to this article. 

The output of the single-level analysis is 
presented in Table 5. From the case by case 
analysis, we can conclude that there is 

Table 5. Results of ordinary least squares analysis and Empirical Bayes analysis 
per participant 

Case Parameter 

Lily By 

Anna Bo 

Kellie Bo 


(Standard error 

Estimate (SE) of prediction) 

25.84 (3.34 26.69 (11.68 
—25.73 (5.39 —30.46 (1.07 
—25.67 (5.39 —21.60 (6.49 

48.40 (4.52 43.16 58 
—38.53 (5.48 —30.75 (1.07 
—20.36 (5.39 —15.11 (6.01 

62.10 (5.81 65.03 (11.56 
—25.93 (7.79 —30.44 (1.07 
—4.02 (7.79 —8.59 (6.03 


Table 6. Results of two-level analysis across participants 


Fixed Effects 

Baseline level cn 
Change in level Treatment 1 6, 
Change in level Treatment 2 05 
Random Effects 
Baseline level one 
Change in level Treatment 1 6, 
Change in level Treatment 2 Gi, 
Within-case variance oe 

a demonstration of treatment effectiveness 
for both interventions at two different 
points in time for Case 1 (i.e., Lilly) and 
for Case 2 (ie., Anna) at the .05 signifi- 
cance level. When both pre-session and no 
pre-session access are introduced, we see 
a significant drop in problem behavior for 
Lilly [Case1 : B, = —25.73,t(15) = —4.77,p 
= .0002 and B, = —25.67, t(15) = —4.76, p 
= .0003], and Anna [Case2: B, = —38.53, t 
(41) = —7.03, p<.0001 and B, = —20.36, t(41) 
= —3.78,p = .0005]. For Kelly (Case 3), 
there was only one demonstration of treat- 
ment effectiveness [8, = —25.93, t(39) = 
— 3.33 p = .0019]. 

The two-level analysis was conducted to 
generalize treatment effectiveness beyond 
individual cases. Again, for didactic pur- 
poses, a small dataset with only three 
cases is used. In order to run a two-level 
analysis and obtain generalizable estimates, 
it is recommended to use a larger dataset. 
The results indicate that both the pre-ses- 
sion access and no pre-session access inter- 
ventions succeeded in reducing the problem 
behaviors as negative estimates were 
obtained for the change in level between 
the baseline and Treatment | and the base- 
line and Treatment 2 [81:9 = —30.55, t(61) = 
~7.44, p = .012; O99 = —15.10, t(61) = —2.41, 
p= .152]. However, only the estimate of 

Estimate (SE) t Pp 
44.96 (11.67) 3.85 .057 
—30.55 (4.10) —7.44 .012 
—15.10 (6.26) —2.41 .152 
Estimate (SE) Zz Pp 
381.27 (399.56) 0.95 17 
1.16 (43.95) 0.03 489 
66.37 (122.25) 0.54 293 
251.33 (36.64) 6.86 <.0001 

the effect of Treatment 1 is statistically sig- 
nificant (p < .05). As can be seen in Table 6, 
the between-case variance in the treatment 
effects was large for Treatment 2 [6;, = 
66.37, Z = 0.54, p =.293], and the within- 
case residual variance is statistically signifi- 
cant [62 = 251.33, Z = 6.86, p < .0001]. 

The visual presentation of the single-level 
analysis and two-level analysis is given in 
Figure 5. 

As mentioned earlier, an extra advantage 
of using the two-level model is that case- 
specific estimates are obtained in addition 
to the overall average estimates across 
cases. The results of the case-specific esti- 
mates based on the empirical Bayes esti- 
mates are displayed in Table 5 and closely 
resemble the results of the single-level 


Previous research in the field of SCEDs 
solely focused on estimating intervention 
effectiveness using data from “single” 
SCEDs. This study expands on this and 
introduces an analysis technique suitable 
to estimate treatment effectiveness for 
more complex SCEDs, namely “combined 
SCEDs”. This study is the first study to 
demonstrate how applied researchers can 

use an extension of established methodol- 
ogy to come up with an effect size estimate 
appropriate for combined designs. The pro- 
posed technique is generic and not limited 
to combined designs. For instance, by 
excluding predictors in the two-level mod- 
els, the technique can be used to quantify 
treatment effects across single SCEDs. 
Combined SCEDs are combinations of sin- 
gle SCEDs, and are frequently used as they 
are more internally and externally valid 
and can answer richer research questions. 
The two most popular combined designs are 
discussed in detail, namely the MBD + 
WRD and MBD + ATD. For these combined 
designs, we discuss (a) the mathematical 
models appropriate for the quantitative 
analysis, (b) the coding of the design 
matrix, (c) the statistical software to per- 
form the analysis, (d) the interpretation of 
the output tables, and (e) the visual presen- 
tation of the obtained coefficients. We 
demonstrate the process using data from 
previously published studies. The purpose 
is to assist single-case researchers in draw- 
ing valid and reliable inferences regarding 
the treatment effectiveness for complex 

The single- and two-level hierarchical lin- 
ear modeling (HLM) techniques are sug- 
gested. The two-level HLM is appropriate 
as both participant-specific and _ overall 
average  study-specific estimates are 
obtained simultaneously (instead of run- 
ning separate single-level analyses for each 
case), which leads to drawing more gener- 
alized inferences. Empirical Bayes estimates 
of the participant-specific treatment effects 
are more precisely estimated compared to 
the OLS (single-level) estimates, but they 
are biased toward the average effect. By 
ignoring the hierarchical structure of the 
data (i.e., measurements are nested within 
cases, and cases are nested within study), 
biased standard errors are obtained (the 
standard errors are too small due to 


ignoring the dependency), and, conse- 
quently, the analysis is prone to Type 
I errors. The two-level HLM_ provides 
regression-based effect size estimates and 
their standard errors. Therefore, they can 
be used afterward for meta-analytic pur- 
poses. A third level can be added to the 
model, and overall average treatment effec- 
tiveness can be estimated across studies. In 
addition, the variability in treatment effec- 
tiveness between studies can be explored. If 
a large amount of variability is identified, 
moderators can be added to the model. 
Another advantage of summarizing treat- 
ment effects across studies is the increased 
power to identify true treatment effects. 

Limitations and future research directions 

The HLM model introduced in this study is 
the most basic model, which ignores, for 
instance, data trend and autocorrelation, 
and is only appropriate for continuous out- 
comes. In addition, use of conventional 
HLM requires assumptions about multivari- 
ate normality that need to be met in order 
to make valid inferences (Raudenbush & 
Bryk, 2002). This was beyond the scope of 
this study as the focus was on the logic of 
modeling combined design SCEDs, which is 
already a complexity. However, use of the 
HLM is flexible, and other complexities can 
be introduced into the model. For instance, 
in case a researcher is studying a target 
behavior or skill in which a trend is 
expected, the introduced models can be 
extended by including a time indicator vari- 
able in the treatment phase. This results in 
two effect size estimators of interest: (1) 
change in level of the dependent variable 
when introducing the treatment and (2) the 
trend during the treatment phase. Two- 
level hierarchical linear modeling including 
a linear time trend is discussed in detail in 
Moeyaert, Ugille, Ferron, Beretvas, et al. 


Another complexity relates particularly 
to the MBD + ATD design. In ATDs, the 
effectiveness of two (or more) treatments 
is compared with a common baseline 
phase, which introduces dependency. The 
model can be further extended by exploring 
options to model this dependency (by, for 
instance, estimating the covariance or using 
a more complex estimation technique if 
more cases within a study are included, 
specifically robust variance estimation; 
Hedges et al., 2010). Last, when using 
HLM, caution needs to be exercised when 
interpreting the between-case variance esti- 
mates as severely biased estimates can be 
obtained (Moeyaert et al., 2013). The lim- 
itations discussed here are not specific to 
HLM of combined SCEDs, but for using 
HLM in general as an analysis technique 
for the quantitative integration of SCED 

In addition, the results of the two studies 
discussed in this article should be interpreted 
with caution because in both of them there 
was a lack of experimental control. In Seybert 
et al. (1996), the withdrawal and reversal 
design embedded in the combined design did 
not meet the basic replication standards for 
one of the participants. In addition, there was 
a non-effect for the withdrawal of the treat- 
ment for that same_ participant. As 
a consequence, to meet the WWC design stan- 
dards to demonstrate experimental control, 
there is an additional basic replication needed 
for one of the participants of the Seybert et al. 
(1996) study. Similarly, in Chung and 
Cannella-Malone (2010), the treatment to 
reduce problem behaviors was effective for 
two out of three participants. In addition, the 
effectiveness of the treatment was investigated 
across slightly different treatment phases. In 
order to meet the WWC design standards, the 
treatment phases across the participants 
should be identical and there should be three 
demonstrations of the effectiveness of the 
treatment at three different points in time. 

Effect size estimation for these combined 
designs is still informative as it quantifies the 
magnitude of treatment effect. This quantifi- 
cation provides an overall summary of the 
study findings (and variability between parti- 
cipants in treatment effectiveness) and can be 
used for meta-analysis purposes afterward. 
However, we encourage applied SCED 
researchers designing combined SCEDs that 
meet the WWC design standards for experi- 
mental control. In order to demonstrate our 
methodology, we were limited to published 
combined designs. The examples included 
are typical for the field and are solely used to 
demonstrate the analysis technique. 

In terms of future research directions, the 
suggested models can be extended by adding 
case characteristics (gender, age, race, etc.) to 
investigate their moderating effect on the 
treatment effectiveness. However, recent 
research related to power indicates that at 
least 12 cases are needed, or 7 cases in combi- 
nation with at least 40 measurement occa- 
sions, to be able to include case 
characteristics in the analyses (Moeyaert 
et al., 2017). This, of course, depends on the 
particular predictors and the value of the true 
treatment effect. Simulation studies can be 
performed in order to investigate the power 
for a particular set of design conditions. Again, 
this is beyond the scope of this paper. Other 
ways of coding the design matrix are also pos- 
sible depending on the specific research ques- 
tions and structure of the data being analyzed. 

To further enhance the internal validity, 
single-case researchers might consider 
introducing randomization when develop- 
ing the combined SCED design. As dis- 
cussed in depth by J. R. Ledford et al. 
(2018), several forms of randomization can 
be incorporated in the design. First, the 
start and the retrieval of the intervention 
can be randomized. In this scenario, it is 
recommended that the randomization does 
not start until baseline stability is estab- 
lished. Second, the order of the conditions 

can be randomized, which is typically done 
in ATDs. Unrestricted randomization is not 
recommended to avoid conditions not 
representing ATDs (i.e., all baseline condi- 
tions could be chosen first) or to avoid that 
a certain randomized pattern is consistently 
chosen (i.e., treatment 1 is always adminis- 
tered after treatment 2). A third randomiza- 
tion form is the random assignment of 
participants to intervention start points. 
This is relevant for multiple-baseline 
designs across participants. Incorporating 
randomization in the design allows for use 
of randomization tests to make conclusions 
related to treatment effectiveness. The 
advantage of such tests is that the sampling 
distribution is built based upon the rando- 
mization patterns and as a consequence, no 
parametric assumptions are made and 
needed (for more details about randomiza- 
tion, see J. M. Ferron & Levin, 2014; 
Heyvaert et al., 2017). Inclusion of rando- 
mization has the potential to reduce the risk 
of biased effect size estimates. 

In order to increase the external validity of 
treatment effectiveness and contribute to evi- 
dence-based decisions in research, practice 
and policy, multiple SCED studies can be sum- 
marized. Previous research demonstrates how 
the multilevel meta-analytic framework can 
be used to combine single SCEDs (Moeyaert, 
2018; Moeyaert, Ugille, Ferron, Onghena, 
et al., 2014). Therefore, future research is 
needed to demonstrate how pure and com- 
bined SCEDs can be combined using the mul- 
tilevel meta-analytic approach. Similarly, 
a following-up study can be conducted to 
evaluate the consequences of ignoring the 
complex nature of combined designs. 


This study is the first study introducing 
and demonstrating a promising methodo- 
logical framework for effect size estimation 


for combined SCEDs. The two-level hier- 
archical model is recommended as it has 
the possibility to include variables to 
account for the combined design complex- 
ity. In this study, the logic of modeling the 
combined SCED study is introduced, 
empirical illustrations are given, analysis 
output is discussed and SAS code is sup- 
plemented. Single-case researchers are 
given the tools (and are encouraged) to 
modify and/or further extend the models. 
The proposed method of coding and esti- 
mating effect sizes for combined SCEDs 
can be a useful technique to inform 
researchers and practitioners about the 
effectiveness of interventions. 


No potential conflict of interest was 
reported by the author(s). 


This research was supported by the Institute of 
Education Sciences, U.S. Department of 
Education, through grants [R305D150007 and 
R305D190022]. The content is solely the respon- 
sibility of the author and does not necessarily 
represent the official views of the Institute of 
Education Sciences, or the U.S. Department of 


Baek, E., & Ferron, J. M. (2013). Multilevel models for 
multiple-baseline data: Modeling across participant 
variation in autocorrelation and residual variance. 
Behavior Research Methods, 45(1), 65-74. doi: 

Barlow, D. H., & Hayes, S. C. (1979). Alternating treat- 
ments design: One strategy for comparing the effects 
of two treatments in a single subject. Journal of 
Applied Behavior Analysis, 12(2), 199-210. doi: 

Barlow, D. H., Nock, M., & Hersen, M. (2009). Single 
case experimental designs : Strategies for studying beha- 
vior for change. Pearson/Allyn and Bacon. 


Casella, G. (1985). An introduction to empirical Bayes 
analysis. The American Statistician, 39(2), 83-87. doi: 

Chung, Y., & Cannella-Malone, H. I. (2010). The 
effects of profession manipulations on automatically 
maintained challenging behavior and _ task 
responding. Behavior Modification, 34(6), 479-502. 
doi: 10.1177/0145445510378380 

Ferron, J., & Jones, P. K. (2006). Tests for the visual 
analysis of response-guided multiple-baseline data. 
The Journal of Experimental Education, 75(1), 66-81. 
doi: 10.3200/JEXE.75.1.66-81 

Ferron, J. M., & Levin, J. R. (2014). Single-case per- 
mutation and randomization statistical tests: Present 
status, promising new developments. In 
T. R. Kratochwill & J. R. Levin (Eds.), Single-case 
intervention research: Methodological and _ statistical 
advances (pp. 153-183). American Psychological 

Gast, D. L., Lloyd, B. P., & Ledford, J. R. (2018). 
Multiple baseline and multiple probe designs. In 
J. R. Ledford & D. L. Gast (Eds.), Single case research 
methodology (3rd ed., pp. 239-281). Routledge. 

Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). 
Robust variance estimation in meta- regression with 
dependent effect size estimates. Research Synthesis 
Methods, 1(1), 39-65. 

Hembry, I., Bunuan, R., Beretvas, S. N., Ferron, J., & 
Van den Noortgate, W. (2015). Estimation of 
a nonlinear intervention phase trajectory for 
multiple-baseline design data. The Journal of 
Experimental Education, 83(4), 514-546. doi: 

Heyvaert, M., Saenen, L., Maes, B., & Onghena, P. 
(2014). Systematic review of restraint interventions 
for challenging behaviour among persons with 
intellectual disabilities: Focus on effectiveness in 
single-case experiments. Journal of Applied Research 
in Intellectual Disabilities, 27(6), 493-590. doi: 
10.1111 /jar.12094 

Heyvaert, M., Moeyaert, M., Verkempynck, P., Van den 
Noortgate, W., Vervloet, M., Ugille, M., & Onghena, P. 
(2017). Testing the intervention effect in single-case 
experiments: A Monte Carlo simulation study. The 
Journal of Experimental Education, 85(2), 175-196. doi: 

Horner, R. H., & Odom, S. L. (2014). Constructing 
single-case research designs: Logic and options. In 
T. Kratochwill & J. Levin (Eds.), Single-case interven- 
tion research: Methodological and statistical advances (pp. 
27-51). American Psychological Association. 

Jason, L. A., & Frasure, S. (1979). Increasing 
peer-tutoring behaviors in the third grade classroom 
[Paper presentation]. Annual Convention of the 
American Psychological Association, New York. 

Kelley, M. E., Lerman, D. C., & Van Camp, C. M. (2002). 
The effects of competing reinforcement schedules on 
the acquisition of functional communication. Journal 
of Applied Behavior Analysis, 35(1), 59-63. doi: 10.1901/ 

Kennedy, C. (2005). Single-case designs for educational 
research (Vol. 1). Pearson/A & B. 

Kokina, A., & Kern, L. (2010). Social story™ interven- 
tions for students with autism spectrum disorders: 
A meta-analysis. Journal of Autism and Developmental 
Disorders, 40(7), 812-826. doi: 10.1007/s10803-009- 

Kratochwill, T. R., Hitchcock, J., Horner, R. H., 
Levin, J. R., Odom, S. L., Rindskopf, D. M., & 
Shadish, W. R. (2010). Single-case designs technical 
documentation. What Works Clearinghouse. https:// 

Kratochwill, T. R., & Levin, J. R. (2014). Enhancing 
the scientific credibility of single-case intervention 
research: Randomization to the rescue. In 
T. R. Kratochwill & J. R. Levin (Eds.), Single-case 
intervention research: Statistical and methodological 
advances (pp. 53-91). American Psychological 

Kratochwill, T. R., Levin, J. R., Horner, R. H., & 
Swoboda, C. M. (2014). Visual analysis of single- 
case intervention research: Conceptual and metho- 
dological issues. In T. R. Kratochwill & J. R. Levin 
(Eds.), Single-case intervention research: Statistical and 
methodological advances (pp. 91-125). American 
Psychological Association. 

Ledford, J., & Gast, D. L. (2018). Combination and 
other designs. In J. R. Ledford & D. L. Gast (Eds.), 
Single case research methodology (3rd ed., pp. 
239-281). Routledge. 

Ledford, J., Lane, J., & Severini, K. (2018). Systematic 
use of visual analysis for assessing outcomes in sin- 
gle case design studies. Brain Impairment, 19(1), 
4-17. doi: 10.1017/BrImp.2017.16 

Ledford, J. R., Lane, J. D., & Tate, R. (2018). 
Evaluating quality and rigor in single case research. 
In J. R. Ledford & D. L. Gast (Eds.), Single case 
research methodology (3rd ed. pp. 365-392). 

Lenz, A. S. (2013). Calculating effect size in single-case 
research: A comparison of nonoverlap methods. 
Measurement and Evaluation in Counseling and 
Development,  46(1), 64-73. doi: 10.1177/ 

Lindberg, J. S., Iwata, B. A., & Kahang, S. W. (1999). 
On the relation between object manipulation and 
stereotypic self-injurious behavior. Journal of Applied 
Behavior Analysis, 32(1), 51-62. doi: 10.1901/ 

Maggin, D. M., Swaminathan, H., Rogers, H. J., 
O'Keeffe, B. V., Sugai, G., & Horner, R. H. (2011). 
A generalized least squares regression approach for 
computing effect sizes in single-case research: 
Application examples. Journal of School Psychology, 
49(3), 301-321. doi: 10.1016/j.jsp.2011.03.004 

Manolov, R., & Solanas, A. (2013). A comparison of 
mean phase difference and generalized least squares 
for analyzing single-case data. Journal of School 
Psychology, 51(2), 201-215. doi: 10.1016/j. 

Matson, J. L., & Keyes, J. B. (1990). A comparison of 
DRO to movement suppression time-out and DRO 
with two self-injurious and aggressive mentally 
retarded adults. Research in Developmental Disabilities, 
11(1), 111-120. doi: 10.1016/0891-4222(90)90008-V 

Moeyaert, M., Ugille, M., Ferron, J., Beretvas, S., & Van 
den Noortgate, W. (2013). Three-level analysis of 
single-case experimental data: Empirical validation. 
Journal of Experimental Education, 82(1), 1-21. doi: 

Moeyaert, M., Ugille, M., Ferron, J., Beretvas, S., & 
Van den Noortgate, W. (2014). The influence of the 
design matrix on treatment effect estimates in the 
quantitative analyses of single-case experimental 
design research. Behavior Modification, 38(5), 
665-704. doi: 10.1177/01454455 14535243 

Moeyaert, M., Ugille, M., Ferron, J., Onghena, P., 
Heyvaert, M., & Van den Noortgate, W. (2014). 
Estimating intervention effects across different 
types of  single-subject experimental designs: 
Empirical illustration. School Psychology Quarterly, 25 
(1), 191-211. doi: 10.1037/spq0000068 

Moeyaert, M., Ferron, J., Beretvas, S. N., & Van den 
Noortgate, W. (2014). From a single-level analysis 
to a multilevel analysis of single-case experimental 
designs. Journal of School Psychology, 52(2), 191-211. 
doi: 10.1016/j.jsp.2013.11.003 

Moeyaert, M., Maggin, D. M., & Verkuilen, J. (2016). 
Reliability and validity of extracting data from image 
files in contexts of single-case experimental design 
studies. Behavior Modification, 40(6), 874-900. doi: 
10.1177/01454455 16645763 

Moeyaert, M., Akhmedjanova, D., & Bogin, D. (2017). 
The power to test moderator effects in multilevel modeling 
of single-case data [Manuscript in preparation]. 

Moeyaert, M., Zimmerman, K., & Ledford, J. (2018). 
Analysis and meta-analysis of single-case experi- 
mental data. In J. Ledford & D. Gast (Eds.), Single- 
case methodology: applications in special education and 
behavioral sciences. New York: Routledge. 

Moeyaert, M. (2019). Quantitative synthesis of 
research evidence: Multilevel meta-analysis. 
Behavioral Disorders, 44(4), 241-256. doi: 10.1177/ 


Moeyaert, M., Klingbeil, D., Rodabaugh, E., & 
Turan, M. (2019). Multilevel meta-analysis of 
peer-tutoring interventions to increase academic 
performance and social interactions for people with 
special needs. Remedial and Special Education. doi: 

Parker, R. I., Vannest, K. J., & Davis, J. L. (2011). 
Effect size in single-case research: A review of nine 
nonoverlap techniques. Behavior Modification, 35(4), 
303-322. doi: 10.1177/0145445511399147 

Parker, R. I., Vannest, K. J., & Davis, J. L. (2014). A simple 
method to control positive baseline trend within data 
nonoverlap. The Journal of Special Education, 48(2), 79- 
91. doi: 10.1177/0022466912456430 

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical 
linear models: Applications and data analysis methods 
(Vol. 1). Sage. 

Rohatgi, A. (2011). WebPlotDigitizer. https://automeris. 

Schlosser, R. W., Lee, D. L., & Wendt, O. (2008). 
Application of the percentage of non-overlapping data 
in systematic reviews and meta-analyses: A systematic 
review of reporting characteristics. Evidence-based 
Communication Assessment and _ Intervention, 2(3), 
163-187. doi: 10.1080/17489530802505412 

Scruggs, T. E., Mastropieri, M. A., & Casto. (1987). The 
quantitative synthesis of single subject research: 
Methodology and _ validation. Remedial © Special 
Education, 8(2), 24-33. doi: 10.1177/ 

Seybert, S., Dunlap, G., & Ferro, J. (1996). The effects 
of choice-making on the problem behaviors of 
high-school students with intellectual disabilities. 
Journal of Behavior Education, 6(1), 49-65. doi: 

Shadish, W. R., Rindskopf, D. M., & Hedges, L. V. (2008). 
The state of the science in the meta-analysis of 
single-case experimental designs. Evidence-based 
Communication Assessment and Intervention, 2(3), 
188-196. doi: 10.1080/17489530802581603 

Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of 
single-case designs used to assess intervention effects in 
2008. Behavior Research Methods, 43(4), 971-980. doi: 

Shadish, W. R., Kyse, E. N., & Rindskopf, D. M. (2013). 
Analyzing data from single-case designs using multi- 
level models: New applications and some agenda 
items for future research. Psychological Methods, 18 
(3), 385-405. doi: 10.1037/a0032964 

Shadish, W. R., Rindskopf, D. M., Hedges, L. V., & 
Sullivan, K. J. (2013). Bayesian estimates of autocorre- 
lations in single-case designs. Behavior Research Methods, 
45(3), 813-821. doi: 10.3758/s13428-012-0282-1 

Shadish, W. R., Zuur, A. F., & Sullivan, K. J. (2014). 
Using generalized additive (mixed) models to analyze 


single case designs. Journal of School Psychology, 52(2), 
149-178. doi: 10.1016/j.jsp.2013.11.004 

Shogren, K. A., Faggella-Luby, M. N., Bae, S. J., & 
Wehmeyer, M. L. (2004). The effect of choice-making 
as an intervention for problem behavior: A 
meta-analysis. Journal of Positive Behavior Interventions, 
6(4), 228-237. doi: 10.1177/10983007040060040401 

Swaminathan, H., Horner, R. H., Sugai, G., 
Smolkowski, K., Hedges, L., & Spaulding, S. A. 
(2010). Application of generalized least squares regres- 
sion to measure effect size in single-case research: 
A technical report. Unpublished technical report, 
Institute for Education Sciences. 

Trottier, N., Kamp, L., & Mirenda, P. (2011). Effects of 
peer-mediated instruction to teach use of 
speech-generating devices to students with autism in 
social game routines. Augmentative and Alternative 

Communication, 27(1), 26-39. doi: 10.3109/ 

Wang, S. Y., Parrila, R., & Cui, Y. (2013). Meta-analysis of social 
skills interventions of single- case research for individuals 
with autism spectrum disorders: Results from three-level 
HLM. Journal of Autism and Developmental Disorders, 43(7), 
1701-1716. doi: 10.1007/s10803-012-1726-2 

Wolery, M., Busick, M., Reichow, B., & Barton, E. E. 
(2010). Comparison of overlap methods for quantita- 
tively synthesizing single-subject data. The Journal of 
Special Education, 44(1), 18-28. doi: 10.1177/ 

Zimmerman, K. N., Ledford, J. R., & Severini, K. E. 
(2019). Brief Report: The effects of a weighted blan- 
ket on engagement for a student with ASD. Focus on 
Autism and Other Developmental Disabilities, 34(1), 
15-19. doi: 10.1177/1088357618794911