# Full text of "ERIC ED605516: Effect Size Estimation for Combined Single-Case Experimental Designs"

## See other formats

tors Evidence-Based Communication Assessment and Intervention ISSN: 1748-9539 (Print) 1748-9547 (Online) Journal homepage: https://www.tandfonline.com/loi/tebc20 Effect size estimation for combined single-case experimental designs Mariola Moeyaert, Diana Akhmedjanova, John Ferron, S. Natasha Beretvas & Wim Van den Noortgate To cite this article: Mariola Moeyaert, Diana Akhmedjanova, John Ferron, S. Natasha Beretvas & Wim Van den Noortgate (2020): Effect size estimation for combined single-case experimental designs, Evidence-Based Communication Assessment and Intervention To link to this article: https://doi.org/10.1080/17489539.2020.1747146 sea Published online: 30 Apr 2020. NJ (sg Submit your article to this journal @ N Cy View related articles @ ® View Crossmark data@ CrossMark Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journallnformation?journalCode=tebc20 Evidence-Based Communication Assessment and Intervention, 2020 g https://doi.org/10.1080/17489539.2020.1747146 a EBP Advancement Corner Routledge Taylor & Francis Group ® Check for updates Effect size estimation for combined single-case experimental designs Mariola Moeyaert', Diana Akhmedjanova', John Ferron’, §. Natasha Beretvas* & Wim Van den Noortgate* 'Department of Educational and Counseling Psychology, University at Albany-SUNY, Albany, NY, USA; 2Department of Educational Measurement and Research, University of South Florida, Tampa, FL, USA; 3Department of Educational Psychology, University of Texas, Austin, TX, USA; *Faculty of Psychology and Educational Sciences & Imec-itec, KU Leuven, Leuven, Belgium Abstract The methodology of single-case experimental designs (SCED) has been expanding its efforts toward rigorous design tactics to address a variety of research questions related to intervention effectiveness. Effect size indicators appropriate to quantify the magnitude and the direction of interventions have been recommended and intensively studied for the major SCED design tactics, such as reversal designs, multiple-baseline designs across participants, and alternating treatment designs. In order to address complex and more sophisticated research questions, two or more different single-case design tactics can be merged (i.e., “combined SCEDs”). The two most common combined SCEDs are (a) a combination of a multiple-baseline design across participants with an embedded ABAB reversal design, and (b) a combination of a multiple-baseline design across participants with an embedded alternating treatment design. While these combined designs have the potential to address complex research questions and demonstrate functional relations, the development and use of proper effect size indicators lag behind and remain unexplored. Therefore, this study probes into the quantitative analysis of combined SCEDs using regression-based effect size estimates and two-level hierarch- ical linear modeling. This study is the first demonstration of effect size estimation for combined designs. Keywords: Combined designs; effect size; hierarchical linear modeling; regression models; single- case experimental design. Single-case experimental designs (SCEDs) are rigorous experimental designs that have been applied in a variety of fields (e.g., biomedical research, language and speech therapy, beha- vior modification, school psychology, counsel- ing psychology, physical therapy, special education, and neuropsychological rehabilita- tion) to evaluate the efficacy and effectiveness of interventions (Kennedy, 2005; Kratochwill et al., 2014; Moeyaert, Ferron, et al., 2014). In SCEDs, a case (one unit [e.g., participant], or For correspondence: Mariola Moeyaert, School of Education, Department of Educational and Counseling Psychology, Division of Educational Psychology & Methodology, The University at Albany - SUNY, 1400 Washington Ave, Albany, NY 12222. E-mail: mmoeyaert@albany.edu an aggregate unit such as a class) is measured repeatedly across time during conditions (e.g., baseline and intervention condition or multi- ple intervention conditions). Data from differ- ent conditions are compared to evaluate the efficacy or effectiveness of one or multiple interventions. The basic question examined using SCEDs is whether there is evidence for a functional relation between the systematic manipulation of an independent variable (i.e., the conditions) and its consistent effect on a dependent variable (i.e., the target behavior) (Kratochwill et al., 2010; Kratochwill & Levin, 2014; J. Ledford et al., 2018). Valid and reliable structured visual ana- lysis techniques (J. Ferron & Jones, 2006; © 2020 Informa UK Limited, trading as Taylor & Francis Group 2 EBP ADVANCEMENT CORNER Kratochwill et al., 2010) have been devel- oped for interpreting SCED results and are widespread. Visual analysis has a rich his- tory and is strongly embedded in the field of SCEDs. It is considered to be a valid approach for identifying “weak”, “moder- ate”, or “strong” evidence for a causal rela- tionship between an independent and dependent variables by evaluating data using six steps described by Kratochwill et al. (2010). Following the technical doc- umentation of the What Works Clearinghouse (WWC) Standards for Design and Analysis of SCEDs (Kratochwill et al., 2010), the field is now moving toward estimating effect size indicators to supplement and support the visual analysis results. Efforts have been made to develop effect size estimates for “single” SCEDs such as the alternating treatment design, multiple-baseline design, and ABAB reversal design (e.g., Lenz, 2013; Maggin et al., 2011; Manolov & Solanas, 2013; Moeyaert, Ugille, Ferron, Beretvas, et al., 2014; Moeyaert, Ugille, Ferron, Onghena, et al., 2014; Parker, Vannest, & Davis, 2011; Parker et al., 2014; Shadish et al., 2008, 2014; Swaminathan et al., 2010; Wolery et al., 2010). However, the formulation of these effect size indicators for “combined” SCEDs is not yet fully developed. This study is timely, especially given the potential of these types of designs to answer rich research questions and to make internally and externally more valid inferences about the efficacy or effective- ness of an intervention. Combined single-case designs Shadish and Sullivan (2011) conducted a review of SCED studies published in 2008 to review their design and data char- acteristics. Their search resulted in 809 unique SCED studies, 73.1% of which con- sisted of “single” designs: 54.3% were Multiple-Baseline Designs (MBD) across participants; 8.2% represented Withdrawal and Reversal Designs (WRD, such as ABAB reversal designs); 8.0% were Alternating Treatment Designs (ATDs); and 2.6% were Changing Criterion Designs (CC). The authors found that a proportion of SCEDs (26.9%) do not use a “single” design, but rather a design that combines characteris- tics of two or more “single” SCED designs — so-called “combined SCEDs” (J. Ledford & Gast, 2018). Specifically, the combination of MBD + WRD appeared to be the most popular one (12.0%), followed by the com- bination of MBD + ATD (9.9%). Combined or combination SCEDs (J. Ledford & Gast, 2018) offer three major advantages compared to single SCEDs. First, they allow assessment of multiple research questions. For example, Trottier et al. (2011) looked at the functional rela- tion between peer-tutoring interventions and the number of spontaneous appropriate communicative acts generated by students with autism spectrum disorder (ASD) as the main focus of their study. The use of a combined SCED let the researchers exam- ine whether normally developing peers could independently teach children with ASD to use speech-generating devices or whether the typically developing peers had to first be taught how to instruct the chil- dren with ASD. As a result, this combined design study allowed the researchers to evaluate two different interventions simul- taneously: (a) teaching typically developing peers to give timely prompts to children with ASD to use the device; and (b) letting typically developing peers teach children with ASD to use the device (Trottier et al., 2011). Additionally, the two interventions were alternated for each child, and the interventions were staggered across partici- pants (7 = 2), resulting in an MBD + ATD combined design. Second, a combined SCED allows for more evaluations of the effectiveness of the treatment as more replications are pre- sent. For example, the MBD + WRD com- bined design allows for replication of a treatment effect after removing and rein- troducing the treatment within a participant as well as across participants, taking into account different start times for the treatment. In case of the MBD + ATD combined design, the replication of alternating treatments can be seen both within each participant and across partici- pants at different points in time. The repli- cation effects can be identified both within and across participants. Replication is a central theme in SCED | studies (Kratochwill et al., 2010) because it enhances the external validity of the resulting conclusions. Indeed, there is additional documentation of the effect at more points in time and more replications within one case. Third, due to the dynamic nature of com- bined designs, they grant an opportunity to modify pure SCEDs by adding design ele- ments in the middle of the study. For instance, Kelley et al. (2002) initially used an MBD to investigate the effectiveness of competing reinforcement schedules on functional communication (Figure 1). However, the data demonstrated problems. The disruptive behaviors for two out of the three participants were not decreasing; as a result, the authors slightly changed the condition from Functional Communication Training (FCT) without extinction to FCT with extinction, ensuring treatment fidelity for all the other steps in the study. In this way, the introduction of the ABAB allowed the study to continue and provided an opportunity to address the core research question. The analysis of the majority of the com- bined design studies typically relies on visual analyses and non-overlap indices to identify and make inferences about the intervention effects (Chung & Cannella-Malone, 2010; EBP ADVANCEMENT CORNER 3 Jason & Frasure, 1979; Matson & Keyes, 1990; Trottier et al., 2011). For example, Lindberg et al. (1999) used an MBD + WRD combined design study to evaluate the effects of manipulation and reinforcement on self- injurious behaviors of two participants, solely relying on visual analysis. Another combined SCED study, MBD + ATD (Trottier et al., 2011), reported the results of the effective- ness of peer-tutoring on the use of speech- generating devices for students with autism in social game routines using visual analysis and the Percentage of Non-Overlapping Data index (PND; Schlosser et al., 2008; Scruggs et al., 1987)). Relying on visual analysis and non-overlap indices is unfortunate because the opportunity is lost to precisely address additional questions through quantitative summaries (e.g., What is the magnitude of the intervention effect? To what extent is the intervention immediately effective? To what extent does the intervention remain effective over time? Are all the participants benefiting equally from the intervention?). While visual analysis and non-overlap indices provide an initial indication of effectiveness of an intervention, effect size indices are needed to provide additional information through quantitative synthesis. Effect size indicators can be used to quantify the magni- tude of intervention effectiveness at multiple points in time both for each participant and across participants. In addition, effect size estimates are supplemented with a standard error that reflects precision for the individual estimate and which can be used as a weight for quantitative summaries or analyses (i.e., multilevel meta-analysis; Moeyaert, 2019). Therefore, in this article, we are breaking new ground by applying the effect size logic to quantify intervention effectiveness for combined SCEDs. The effect size estimates will provide a more comprehensive picture regarding intervention effects by taking into account the design complexity of combined SCEDs, and they can be used in meta- 4 EBP ADVANCEMENT CORNER FCT without Extinction 14 BL 12 1¢ 14 10 -2 nl a nl o,RP Fe DH O&O e ILE oon ke HO Responses per Minute (Aggression or Disruption) i) 10 9 8 7 6 5 4 3 2 1 ° Keen teen erent cA ad 21 41 61 2 25 18 11 PSP SLES (sasuodsay UoNner1uNWWO) JUepUadapu)) anu, Jad sasuodsey Extinction and ° i} ing |B f 25 =| 18 ae +4 MC enue 81 101 121 141 L 3 Figure 1. An example of modifying the multiple baseline design by adding a phase change reversal. Frequency of target behaviors for three participants. Adapted from “The Effects of Competing Reinforcement Schedules on the Acquisition of Functional Communication,” by M. E. Kelley, D. C. Lerman, and C. M. Van Camp, 2002, Journal of Applied Behavior Analysis, 35(1), p. 62. analyses to assess generalizability across interventions and outcome variables. Previous research has focused on the cod- ing schemes and synthesis of results for each of the “single” SCEDs, including the simple AB phase design, the MBD across participants, WRD (ABAB), and ATDs (Moeyaert, Ugille, Ferron, Onghena, et al., 2014; Shadish, Kyse et al., 2013). Researchers have not investigated (1) cod- ing and effect size estimation for combined SCEDs, and (2) meta-analysis of studies involving combined SCEDs. Due to the lack of methodology to quantify combined SCEDs, these studies tend to be simplified or excluded from meta-analyses, which contributes to biased effect size estimates and/or publication bias (e.g., Kokina & Kern, 2010; Wang et al., 2013). Therefore, we focus on how to quantify treatment effects for combined designs. Thus, the pur- pose of this study is to illustrate effect size estimation for combined designs using real data. In particular, we will focus on the MBD + WRD combined designs (=45.97%) and the MBD + ATD combined designs (=37.91%) as they are the two most popu- lar classes of combined SCEDs: 83.38% of the combined SCEDs (Shadish & Sullivan, 2011). METHOD We identified combined design studies and then randomly selected one MBD + WRD and one MBD + ATD study. Combined SCEDs were identified by examining primary studies from four meta-analyses of SCEDs (Heyvaert et al., 2014; Kokina & Kern, 2010; Moeyaert et al., 2019; Shogren et al., 2004) and 20 primary studies that evaluated reading fluency interventions. These meta-analyses and primary SCED studies were chosen because the first author had access to raw data. The meta-analysis of Heyvaert et al. (2014) included 59 studies of which 11 studies (i.e., 18.64%) were combined SCEDs. The review by Kokina and Kern (2010) consisted of 18 SCEDs of which only four (i.e., 22.22%) were combined SCEDs. The peer-tutoring meta-analysis by Moeyaert et al. (2019) included 65 studies and contained nine com- bined SCEDs (ie., 13.85%). The last meta- EBP ADVANCEMENT CORNER 5 analysis (Shogren et al., 2004) had 13 SCED studies and two of them (15.38%) were combined SCEDs. Finally, of the 20 primary studies that examined reading fluency interventions, seven (ie., 35%) were combined SCEDs. Thus, a_ substantial proportion of reviewed studies was combined SCEDs, a finding that is consistent with the review of Shadish and Sullivan (2011). The full list of the 33 combined design studies from the meta-analyses that we reviewed is available from the first author upon request. Of these combined designs, the combinations MBD + WRD (i.e., 58.82%, 20 studies) and MBD + ATD (i.e., 23.52%, eight studies) were the most popular. This also supports the results from the study of Shadish and Sullivan (2011) and our decision to focus on these two classes of combined SCEDs in this study. One study per combined SCED type was randomly selected from the set to demonstrate the coding of the design matrix and estimation of the effect sizes. The design matrix gives an overview of the overall data structure and includes all variables (e.g., participant identi- fier, the dependent variable, the independent variables) together with scores assigned to these variables. All variables needed to esti- mate the effect sizes of interest should be reflected in the design matrix. For more infor- mation about the design matrix for SCEDs, see Moeyaert, Ugille, Ferron, Beretvas et al. (2014). However, other studies from the selec- tion could also have been chosen. Raw data for the dependent variable in SCEDs are tradition- ally graphically displayed as can be seen in Figure 2 (MBD + WRD) and Figure 3 (MBD + ATD). As a result, researchers can retrieve raw data from the graphical displays in pri- mary studies. We used WebPlotDigitizer (Rohatgi, 2011) to retrieve raw data. The raw data represent the measures of the dependent variable over time. The dependent variable (i.e., targeted behavior) together with other variables (i.e., phase and time indicators) that 6 EBP ADVANCEMENT CORNER i ' 100) no-choice ! no-choice choice ! 80 4 ! ! ‘ : 60 + : ! Scott i i i i 40 + i i am i i ! 0 I = | Tr 6 SC 4 er | T T T J T T T T 7 1 4 13 is, 19 22 ! 100 i ' no-choice choice i —_no-choice 80 ! ! ! 60 ‘ Bob | ! “ 2 i § | am a et 2 i = i <= . $ i) T T T T T T T T T T T T T T Oy 4 T T T T T 7 2 1 4 7 10 13 16 19 22 so § too no-choice choice no-choice choice = ; 3 ae 80 ' ' $ i i § 3 i 2 : ‘| Maria i i 40 i i ' i 20 : ote! : ' ' 0 T T | me T TT TT 1 en Ta T T T TT 4 | T t Cy as i 7 1 4 7 10 13 1 19 : 22 125 28 Sessions Figure 2. An example of the mixed design: MBD + PCR. Percentage of intervals with problem behaviors for three participants. Adapted from “The Effects of Choice-making on the Problem Behaviors of High School Students with Intellectual Disabilities,” by S. Seybert, G. Dunlap, and J. Ferro, 1996, Journal of Behavior Education, 6 (1), p. 58. are needed to conduct the statistical analysis and will be discussed later. For more informa- are part of the design matrix. The design tion about the data retrieval process, see matrix needed for effect size estimation of the Moeyaert, Maggin, et al. (2016). The raw combined designs is displayed in Tables 1 and 4 data from Figures 2 and 3 can be found in the 2 Se Son os 2 oooo:s ©& o °o 8 Percentage Intervals with Challenging Behavior é 0 Baseline EBP ADVANCEMENT CORNER 5 min Blocking Lilly . T T T T T T T T T T T T T T T T ! } bn 10min 15min 4 Alone Alone Preferred Activity (Walk) “4 i i ! ; 1 i i No Pre-Session Access t 1 | i i ' ; ' | rb | 1 ‘ r 1 i i ' H i Pre-Session Access | a i Anna TT T ma | T T rN T es ! \ H ; ‘ 7 ' Smin Blocking = 49min = 15 min 4 i. : Kellie T T T rr a a ' in A sr rr ae 1 5S 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 8S Sessions Figure 3. An example of the Mixed Design: MBD + ATD. Percentage intervals with challenging behavior for three participants. Adapted from “The Effects of Presession Manipulations on Automatically Maintained Challenging Behavior and Task Responding,” by Y.-C. Chung, and H. |. Cannella-Malone, 2010, Behavior Modification, 34(6), p. 493. supplement to this article (together with the SAS codes that can be used for the analyses) to facilitate replication of the analyses demon- strated in this study, using the same data sets. 8 EBP ADVANCEMENT CORNER Table 1. Design matrix for Case 1 (i.e., Scott) — Seybert et al. (1996) Case Session Outcome A1B1 BIA2 A2B2 65.92 29.89 55.7] 33.46 50.84 33.82 34.15 27.39 33.36 23.35 20.75 44.32 21.51 60.35 66.91 32.76 48.10 16.19 23.37 20 22.43 21 16.24 22 20.29 OOCOC00O DOCDWVWOOVCO0COC0C0 OONAAKRWN—DOOANAAKRWN — >= >> —-$ -~ DDVWDDVDVWVDDVDDVDDGVDVGVCGC0CO00 RESULTS Effect sizes are used as a complement to visual analysis in primary studies and can be used for between-study comparison of treatment effects and for meta-analytic pur- poses. Visual analysis has been well docu- mented by Kratochwill et al. (2010), whereas the focus of the current study is on the quantitative summary of combined SCEDs. The analyses in the empirical illus- tration sections are performed using SAS software, Version 9.4 (© SAS Institute Inc.) SAS codes are available in the supple- ment to this article. Multiple-baseline design — Withdrawal or reversal design To demonstrate the effect size estimation for the first class of combined SCEDs, we selected the study of Seybert et al. (1996). Seybert et al. (1996) investigated the differ- ences in problem and on-task behaviors in choice and no-choice conditions of three independent participants with intellectual disabilities. In the choice condition, partici- pants were given a choice of the domestic task to do. In contrast, in the no-choice con- dition, participants were assigned to do Table 2. Results of ordinary least squares analysis and Empirical Bayes analysis per participant Case Parameter OLS Estimate (SE) Scott Bor 61.31 (6.90 Bi —24.28 (9.34 Bo, 20.73 (8.91 Bar —40.01 (9.34 Bob Boo 38.20 (4.99 Bio —22.47 (8.39 pe 1.31 (9.55) Maria Bos 16.53 (2.97 Bis —12.82 (6.64 Bos 26.99 (8.40 Bas -10.90 (7.97 Estimate (Standard error of prediction) 57.74 (11.87 —19.30 (-) 18.02 (10.35 ~37.38 (15.50) 36.37 (11.77 —19.30 (-) 2.11 (10.32) 18.90 (11.77 —19.30 (-) 29.85 (10.44 ~10.98 (15.50) EBP ADVANCEMENT CORNER 9 Table 3. Results of two-level analysis across participants Parameter Estimate (SE) t Pp Fixed Effects Baseline level Al 8 37.67 (11.74) 3.21 .082 Change in level Al — B] 6, —19.30 (4.59) —4.2) <.001 Change in level B1 — A2 6, 16.66 (10.26) 1.62 227 Change in level A2 — B2 63 —24.18 (15.76) —1.53 367 Random Effects Estimate (SE) Zz Pp Baseline level Al Gi, 391.51 (406.93) 0.96 .168 Change in level Al — B] a, 0 (/) / / Change in level B1 — A2 6, 236.77 (291.24) 0.81 .208 Change in level A2 — B2 a. 414.59 (701.37) 0.59 277 Within-case variance oa 207.75 (36.40) 5.71 <.0001 Table 4. Design matrix for Case 1 — Data retrieved from Chung and Cannella-Malone (2010) Case Session Outcome Treatment, Treatment, ] 0.27933 0 0 2 29.88827 0 0 3 39.38547 0 0 4 24.86034 0 0 5 22.90503 0 0 6 19.55307 0 0 7 23.18436 0 0 8 46.64804 0 0 9 0 0 0 0.27933 0 ] ] 0 0 2 0 0 ] 3 0 0 4 ) 0 1 5 0.27933 0 6 0.27933 0 ] 7 0.27933 0 ] 8 0.27933 0 a certain domestic task. The outcome vari- (Scott) to m = 29 (Maria). Seybert et al. able reflected the percentage of problem (1996) used the combination of the MBD + behaviors and task engagement in the choice WRD to investigate the effectiveness of versus no-choice conditions. The data were choice-making on problem _ behavior. recorded using the 15-s partial interval A graphical display is given in Figure 2. recording: that is, only the five last seconds Seybert et al. (1996) claimed that the MBD was recorded per each 15-s interval. Data + WRD allowed them to provide further points per participant ranged from 1 = 22 evidence for the changes in the treatment 10 EBP ADVANCEMENT CORNER phase as a result of manipulating the inde- pendent variable — choice versus no-choice conditions. The inter-rater observer percent agreement ranged from 81% to 99% for occurrence and nonoccurrence of problem behaviors. Seybert et al. (1996) analyzed the data using visual analysis techniques, and the results were reported as percentages of intervals with problem behaviors. This combined SCED has the potential to demon- strate a functional relation between the choice-making condition and problem beha- vior as the effectiveness of the treatment can be evaluated at three or more different points in time. In addition, most of the phases included at least five measurements (one choice and one no-choice condition for Maria included only four measurements). The MBD embedded in the combined design meets the WWC design standards as it includes at least three potential demonstra- tions of treatment effectiveness across at least three different points in time. The WRD embedded in the combined design meets basic replications standards for Scott and Maria whereas this is not the case for Bob. There appears to be a non-effect for the withdrawal of the treatment. In addition, the WRD for Bob does not meet the WWC design standards as there are only two potential demonstrations of treatment effec- tiveness. According to Gast et al. (2018) this prohibits the conclusion that a functional relation is present for Bob. Notwithstanding of this non-effect and lack of experimental control for Bob, effect size estimation for this combined design can still be meaningful. Researchers might be interested in quantify- ing the size of the effect, and this quantifica- tion can be used to confirm the results based on the visual analysis. This effect size esti- mate can be used afterward for meta-analy- tic purposes. We focused on estimating regression-based effect size estimates for the occurrence of problem behaviors in choice- making conditions for three participants with intellectual disabilities. The statistical model and empirical illustration are dis- cussed in the following sections. Statistical model Step 1: single-level analysis. The single-level analysis can also be called an individual analysis as it involves a case-by-case evaluation of treatment effectiveness. Here, we are interested in demonstrating the effectiveness of a treatment at different points in time within participants. In the _ simplest scenario, the results are an estimate of change in levels between baseline and treatment phases for each participant separately. In other words: “Is there evidence for change in level between adjacent phases?” In this particular scenario, the design matrix contains dummy-coded variables indicating the specific phase to which a measurement belongs (see Table 1). We chose the following notation to distinguish between the consecutive phases: Al and A2 indicate, respectively, the _ first and the second baseline phase, and Bl and B2 denote the first and the second treatment phase. For the ABAB phase design, three dummy variables, AIB1, B1A2, and A2B2 are coded as suggested by Moeyaert, Ugille, Ferron, Beretvas, et al. (2014) and Shadish, Kyse, et al. (2013). AIJB1 = 1 for all the measurement occasions after the first baseline phase, BIA2 = 1 for all the measurement occasions after the first treatment phase and A2B2 equals 1 during the last treatment phase (see Table 1). In order to predict the outcome score at the ith measurement occasion, the following multiple regression equation can be used and parameters can be estimated using Ordinary Least Squares (i.e., OLS): ¥; = Bo + B,A1B1; + ByB1A2; + B,A2B2; + e; with e;~N(0, o2) (1) When all three dummy-coded variables equal zero (i.e., AlBI = B1A2 = A2B2 = 0), then the indicated phase is the first baseline phase (B,). Each dummy variable represents the change from an earlier to its adjacent phase. Thus, for example, B1A2 refers to the change in level from B1 to A2 (ie., difference in level between Treatment 1 and Baseline 2). An extension here could be to investigate whether there are changes in linear (Moeyaert, Usgille, Ferron, Beretvas, et al., 2014) or non-linear trends (Hembry et al., 2015) or changes in variance of scores between adjacent phases (Baek & Ferron, 2013). Statistical model Step 2: two-level analysis. The two-level analysis involves an aggregate estimate of the treatment effectiveness across participants. Here, we are investigating the replication of the treatment effect across participants (within the same study), in addition to the replication of the treatment effect within participants. As a consequence, more generalized conclusions can be made, which strengthens the external validity of the inferences. In addition, variability in effectiveness of the treatment between participants can be quantified. One way to perform this analysis is to conduct a two-level analysis, which takes the hierarchical nature of the data into account; namely, measurements are nested within each of multiple cases. The coefficients from the first level: Boj, B,;, Ba, and B3;, can be modeled as varying at the second (participant) level. By fitting this multilevel model, overall average changes in level from one phase to another can be obtained in addition to how indivi- dual participants deviate from that overall change. The level 1 and level 2 equations are presented in Equations (2) and (3): EBP ADVANCEMENT CORNER 11 Level 1: + ej with ej~N(0, 02) with ej~N(0, 02) Level 2: Boj = O00 + Uoj Uoj B,; = 910 tu 1 10 se Uy : with c Boj = O29 + Up Uj Bz; = 830 + U3) U3j 2 0 Ou, Oupu, Oupu, Ouyus 2 N 0 Ou, Uo 01, Ou, up Ou, U3 ) 2 0 Ounuy Our 01, Oupus 2 0 Ous3uU9 Ousu; Ousuy Or, The first line in Equation (3) indicates that the baseline level for participant j is modeled as a function of an average base- line level, 090, plus a random deviation from this mean, Ug. The subsequent equations describe the average change in level between Al and B1 (810), change in level between Bl and A2 (69), and change in level between A2 and B2 (030)phases, respectively. The variability in baseline level (i.e., 07,) and variability in changes in levels (i.e., 0f,, 02, and o7,) are captured by estimating the matrix. variance/covariance Empirical illustration. We use the Seybert et al. (1996) study for the empirical illustration of the single-level (individual) and two-level (average) effect size estimates for the MBD + WRD design. Seybert et al. (1996) investigated the effects of choice- making on the problem behaviors of three high school students with intellectual disabilities. In this example, we are looking 12 EBP ADVANCEMENT CORNER only at the outcome variable of occurrence conditions (i.e., no-choice — denoted as Al and nonoccurrence of problem behaviors and A2 in Figure 4) are interrupted by within choice and no-choice conditions. The treatment conditions (i.e., choice — denoted start of the intervention was staggered across as B1 and B2 in Figure 4). Participant 2 (i.e., the three participants, and two _ baseline Bob) has no second treatment phase as the ! ! isl B1 Al B2 A2 i i 80 4 no-choice choice i no-choice—j choice 69 = 37.67 ; 6,=16.66 ; i i 1 1 100 - choice no-choice choice Percentge ot Intervals with Problem Behavior Bo = 16.53 ' TF tt ot ee ee t ' ' 1 4 7 10 13 16 ig Figure 4. Estimated parameters for each participant across phases. Note: The lines indicate case-specific and study- specific estimates. problem behavior remained low when the treatment was removed (phase A2). The graphical presentation of the data is given in Figure 2. The coding of the design matrix for participant 1 (i.e., Scott) in accordance with the mathematical model presented in Equation (1) can be found in Table 1 (the same coding is applied for the other cases). The SAS code to run the analyses is available as a supplement to this article. The output of the single-level analysis is presented in Table 2, and the visual pre- sentation of the estimated parameters is provided in Figure 4. From the single- level analysis, we can conclude that there is a demonstration of treatment effective- ness at three different points in time for Case 1 (i.e., Scott). When the choice-mak- ing intervention is introduced, we see a_ significant drop in problem behavior [B1, = —24.28, t(25) = —2.60, p= .018 and B3, = —40.01, t(25) = —4.28, p = .032]. When the choice-making intervention is removed, we see a significant increase in problem behavior [8,, = 20.73, t(25) = 2.33,p = .032;]. For Case 2 (i.e., Bob) and Case 3 (i.e., Maria), there was only one demonstration of significant treatment effectiveness [Case 2: B,) = —22.47, t(20) = —2.68, p=.015, and Case 3: B53 = 26.99, t(25) = 3.21p = .004]. According to the WWC design’ standards (Kratochwill et al., 2010), the choice-making interven- tion was only effective for Scott as three demonstrations of treatment effectiveness at three different points in time are required to demonstrate a causal relation- ship between the introduction of the treat- ment and the change in outcome score. The two-level analysis was conducted to estimate the overall baseline level and changes in level between subsequent phases across the three cases in addition EBP ADVANCEMENT CORNER 13 to between-case variability in these esti- mates. The two-level analysis enhances the generalizability of treatment effective- ness beyond the cases under investigation. For didactic purposes (allowing visual pre- sentation of the estimated coefficients, Figure 4), a small dataset with only three cases is used. In order to run a two-level analysis and obtain generalizable esti- mates, it is suggested to use a larger data- set, including more than three cases. The results indicate that the choice-making intervention succeeded in reducing the problem behavior and large effect size esti- mates were obtained for the change in level between Al and B1 and A2 and B2 [910 = —19.30, t(66) = —4.21, p< .001; 039 = —24.18,t(1) = —1.53, p = .367]. Howev- er, only one estimate (810) is statistically significant (p <.05). An additional advantage of using the two-level analysis is that the between- case variance in treatment effect estimates can be estimated. Most variability was found in the estimate of the between- case variance for the change in level between A2 and B2 (Table 3, random effects). The results of the single-level and two-level analyses are visually pre- sented in Figure 4. Another advantage of using the two-level analysis is that empirical Bayes estimates of the case-specific parameters can be obtained. The empirical Bayes estimate can be viewed as a fully Bayesian approach that uses informa- tion of the full dataset to build prior distribu- tions (Shadish, Rindskopf, et al., 2013). Therefore, the empirical Bayes estimates are shrunken toward the mean (the overall aver- age fixed effects). These case-specific estimates are improved estimates compared to the sin- gle-level ordinary least squares estimates because information from the entire dataset is used (in other words, the empirical Bayes estimate is “borrowing strength” from all 14 EBP ADVANCEMENT CORNER available study evidence). For an introduction to empirical Bayes estimates, see Casella (1985). Instead of running three separate sin- gle-level analyses, one two-level hierarchical linear model can be run, providing both the effect size estimates across cases and case-spe- cific estimates. The results of the case-specific estimates based on the empirical Bayes esti- mates are displayed in Table 2 and closely match the results of the single-level ordinary least squares analyses. Multiple-baseline design — Alternating treatment design In Alternating Treatment Designs (ATDs), two or more treatments (possibly following a baseline phase) are rapidly alternated (Barlow & Hayes, 1979; Barlow et al., 2009), or treatment sessions are alternated with no treatment sessions. Most of the ATDs are characterized by a baseline phase and two or more treatments, which are alternated during the treatment phase. In this scenario, the researcher is interested in the differential effect between the two treat- ment effects (i.e., the relative effectiveness of two or more interventions; Horner & Odom, 2014). Other ATDs are characterized by an alternation of two or more treatments, or with alternation of two or more treatments with baseline sessions. In this later scenario, a pure baseline comparison is not possible unless the alternation is proceeded or fol- lowed by a phase only including baseline measures (Zimmerman et al., 2019). If the baseline sessions are alternated with treat- ment comparisons from the beginning, it is unknown how the participants perform without being introduced to the treatment (which could be a confounding factor). In addition, multitreatment inference can occur as it can be the case that multiple treatments are effective because they are given in an alternated fashion (one treat- ment might strengthen the effectiveness of the other treatment and vice versa). Zimmerman et al. (2019) indicate that pos- sible multitreatment interference can be detected with the inclusion of an initial base- line and visual analysis that compares the initial baseline level to the baseline observa- tions that are part of the alternating sequence. Similarly, a phase for a specific treatment can be included so that the obser- vations within the treatment phase can be compared to the treatment observations that are part of the alternating sequence. To demonstrate a functional relation between the independent and dependent variables, the data from different treatments should not overlap. In addition, the ATD study should include at least four data points of comparison in each of the treat- ments and at least five repetitions of alter- nating sequence to meet the standards of What Works Clearinghouse (Horner & Odom, 2014; Kratochwill et al., 2010). This combined SCED combines the unique strengths of ATDs with MBDs (i.e., external validity, making more generalized treatment effects). That is, the combination of ATDs with MBDs uses the rapid compar- ison of two or more conditions (ATDs) and the start of the intervention phase is stag- gered across participants (MBD). In this way, the combination of ATD + MBD allows identifying the treatment that has a larger effect with higher degrees of inter- nal and external validity of measurements. Another possibility of the ATDs is that researchers may choose to continue only the treatments with the strongest effects in the final phases of the study (Kratochwill et al., 2010). Statistical model Step 1: single-level analysis. Similar to the single-level (i.e., case-specific) analysis for the MBD + WRD, a case-by-case intervention effectiveness evaluation can be performed for MBD + ATD. More specifically, the following research question is of interest: “Is there a change in level for Treatment 1 and Treatment 2, respectively?” The effect sizes of interest can be obtained by introducing dummy variables for each treatment. The dummy-coded variables, Treatment,,;s, indicate the treatment phase. For instance, Treatment,,; equals one if the score belongs to treatment phase m on moment i, zero otherwise. If all the Treatment,,S are zero, then the measurement occasion belongs to the baseline phase. For two treatments, the following regression equation can be used (using treatment indicators Treatment); and Treatment;). Y¥; = By + B, Treatment;; + B,Treatmenty; + e; with e~N(0, 02) (4) Bo indicates the baseline level, B, refers to the change in level between the baseline and Treatment | and £, refers to the change in level between the _ baseline and Treatment 2. The difference between B, and B, refers to the differential effect (e.g., “Is one of the treatments relatively more effec- tive?”). Equation (4) can be extended by modeling linear or non-linear trends (Hembry et al., 2015; Moeyaert, Ugille, Ferron, Beretvas, et al., 2014), or adding more dummy variables in case more than two treatments are examined. Statistical model Step 2: two-level analysis. This step is similar to Step 2 described for MBD + WRD design, where coefficients from the first level can be modeled as varying at the second level: Level 1: Yj = Bo + B,Treatment;; + ByTreatmenty; + ej with ey~N(0,02) (5) Level 2: EBP ADVANCEMENT CORNER 15 Boj = 900 + Uoj Uoj Bj = O19 + uy with uy; Boj = 820 + Ua uy 2 (6) 0 Or, Ougu; Aupur 2 N~ 0], | Ouu Oy, Puruy 2 0 Ouuo Ounuy oO, This two-level analysis allows for making more generalized conclusions as overall average estimates across cases are obtained (the @s in Equation (6)). As noted before, case-specific estimates are available by requesting the empirical Bayes estimates. By estimating the variance/covariance matrix, the between-case variance in base- line level (o7,) and treatment effect esti- mates (07, and o7,) can be obtained. EMPIRICAL ILLUSTRATION The study of Chung and Cannella-Malone (2010) will be used for the empirical demonstration. This study used an ATD that is characterized by a baseline phase followed by an alternating phase in which baseline and treatment sessions are alter- nated. In addition, the ATD is repeated across multiple independent participants, and the start of the randomization phase is staggered across the participants (MBD). The purpose of the Chung and Cannella- Malone study was to examine separate and combined effects of motivation opera- tions of three participants with multiple disabilities in four pre-session conditions: (1) attention, (2) response blocking, (3) attention with response blocking, and (4) non-interaction. The dependent variable was stereotypic behavior, which was mea- sured using the 10- partial interval recording. Inter-observer data were calcu- lated for pre-session (39% of data) and treatment (40% of data) conditions, with the agreement reaching 98% and 99%. 16 EBP ADVANCEMENT CORNER The graphical display of the data can be found in Figure 3 (i.e., copied from the original study) and Figure 5 (i.e., recreated graph, using the retrieved data obtained with WebPlotdigitizer; Rohatgi, 2011). | Baseline 5 min Blocking “gl 80) fy +2584 fp, =-25.73 70 4 ! 60 | B, = —-25.67 50 4 40 i 30 - 20 i RE Lilly T T T T | PO 1 1 6 }11 16 21 26 31 36 41 46 Si 56 61 66 71 76 81 ! 7 i..... 10min 4 Alone 7 Bo = 48.40 i ses2e8s8 8 15 min Alone | py +-38.58 Preferred Activity (Walk) No Pre-Session Access 30 Percentage Intervals with Challenging Behavior anh 6, = —15.10 8, = —20.86 ‘ 1 z] Pre-Session Access 6, = -30.55 Pre, Anna 0 ‘ aa) ae ea | 1 6 11 16 2h 26 31 36 41 46 SI 56 61 66 71 76 81 i 5 min Blockine 10 min 15 min 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 Sessions Figure 5. Estimated parameters for the single-level analysis and two-level analysis. The line during the baseline indicates the overall average baseline level estimate; the lines during the intervention indicate the estimated challenging behavior during the pre-session access intervention and the challenging behavior during the no pre-session access intervention. For this empirical demonstration, we will analyze the problem behavior for the three participants of the study of Chung and Cannella-Malone (2010). During the treat- ment, participants did two tasks: Task A and Task B, which were individualized to the needs and skills of the participating stu- dents. Students did the tasks in two condi- tions as shown in Figure 3: (1) pre-session access condition that was identified in the functional analysis part of the study and (2) no pre-session access. Because of the indi- vidual needs in the Chung and Cannella- Malone (2010) study, the treatment phases are participant-specific. This is commonly the case using SCEDs as one of the strengths of this design is to adjust the treat- ment according to the participant's needs. As a consequence, the baseline versus treat- ment comparison for the three participants is not completely the same (i.e., Lilly: base- line -— 5 min blocking; Anna: baseline —- 10 min alone and Kellie: baseline - 5 min blocking). Therefore, strictly speaking, no experimental conclusions can be drawn from this combined design (Ledford and Gast, 2018). However, the treatment phases can be treated as subcategories of the same EBP ADVANCEMENT CORNER 17 treatment and as a consequence it is still meaningful to investigate generalization of the effect across the three participants. In the original study, the data were visually analyzed, and the results were reported as percentages of intervals with problem beha- vior. Chung and Cannella-Malone (2010) reported that the intervention was success- ful for two out of the three participants, whose problem _ behaviors noticeably decreased. The results of the intervention for the third participant were contradictory (i.e., the intervention condition identified as successful in the previous experiment failed to decrease problem behaviors). Notwithstanding, the interventions were successful for only two out of the three participants, it is still worth estimating the size of the intervention effect to comple- ment this finding. The coding of the design matrix for Case 1 (i.e., Lilly) in accordance with the mathematical model presented in Equation (4) can be found in Table 4. The SAS codes to run the analyses are available as a supplement to this article. The output of the single-level analysis is presented in Table 5. From the case by case analysis, we can conclude that there is Table 5. Results of ordinary least squares analysis and Empirical Bayes analysis per participant Case Parameter Lily By Anna Bo Kellie Bo Estimate (Standard error Estimate (SE) of prediction) 25.84 (3.34 26.69 (11.68 —25.73 (5.39 —30.46 (1.07 —25.67 (5.39 —21.60 (6.49 48.40 (4.52 43.16 58 —38.53 (5.48 —30.75 (1.07 —20.36 (5.39 —15.11 (6.01 62.10 (5.81 65.03 (11.56 —25.93 (7.79 —30.44 (1.07 —4.02 (7.79 —8.59 (6.03 18 EBP ADVANCEMENT CORNER Table 6. Results of two-level analysis across participants Parameter Fixed Effects Baseline level cn Change in level Treatment 1 6, Change in level Treatment 2 05 Random Effects Baseline level one Change in level Treatment 1 6, Change in level Treatment 2 Gi, Within-case variance oe a demonstration of treatment effectiveness for both interventions at two different points in time for Case 1 (i.e., Lilly) and for Case 2 (ie., Anna) at the .05 signifi- cance level. When both pre-session and no pre-session access are introduced, we see a significant drop in problem behavior for Lilly [Case1 : B, = —25.73,t(15) = —4.77,p = .0002 and B, = —25.67, t(15) = —4.76, p = .0003], and Anna [Case2: B, = —38.53, t (41) = —7.03, p<.0001 and B, = —20.36, t(41) = —3.78,p = .0005]. For Kelly (Case 3), there was only one demonstration of treat- ment effectiveness [8, = —25.93, t(39) = — 3.33 p = .0019]. The two-level analysis was conducted to generalize treatment effectiveness beyond individual cases. Again, for didactic pur- poses, a small dataset with only three cases is used. In order to run a two-level analysis and obtain generalizable estimates, it is recommended to use a larger dataset. The results indicate that both the pre-ses- sion access and no pre-session access inter- ventions succeeded in reducing the problem behaviors as negative estimates were obtained for the change in level between the baseline and Treatment | and the base- line and Treatment 2 [81:9 = —30.55, t(61) = ~7.44, p = .012; O99 = —15.10, t(61) = —2.41, p= .152]. However, only the estimate of Estimate (SE) t Pp 44.96 (11.67) 3.85 .057 —30.55 (4.10) —7.44 .012 —15.10 (6.26) —2.41 .152 Estimate (SE) Zz Pp 381.27 (399.56) 0.95 17 1.16 (43.95) 0.03 489 66.37 (122.25) 0.54 293 251.33 (36.64) 6.86 <.0001 the effect of Treatment 1 is statistically sig- nificant (p < .05). As can be seen in Table 6, the between-case variance in the treatment effects was large for Treatment 2 [6;, = 66.37, Z = 0.54, p =.293], and the within- case residual variance is statistically signifi- cant [62 = 251.33, Z = 6.86, p < .0001]. The visual presentation of the single-level analysis and two-level analysis is given in Figure 5. As mentioned earlier, an extra advantage of using the two-level model is that case- specific estimates are obtained in addition to the overall average estimates across cases. The results of the case-specific esti- mates based on the empirical Bayes esti- mates are displayed in Table 5 and closely resemble the results of the single-level analyses. DISCUSSION Previous research in the field of SCEDs solely focused on estimating intervention effectiveness using data from “single” SCEDs. This study expands on this and introduces an analysis technique suitable to estimate treatment effectiveness for more complex SCEDs, namely “combined SCEDs”. This study is the first study to demonstrate how applied researchers can use an extension of established methodol- ogy to come up with an effect size estimate appropriate for combined designs. The pro- posed technique is generic and not limited to combined designs. For instance, by excluding predictors in the two-level mod- els, the technique can be used to quantify treatment effects across single SCEDs. Combined SCEDs are combinations of sin- gle SCEDs, and are frequently used as they are more internally and externally valid and can answer richer research questions. The two most popular combined designs are discussed in detail, namely the MBD + WRD and MBD + ATD. For these combined designs, we discuss (a) the mathematical models appropriate for the quantitative analysis, (b) the coding of the design matrix, (c) the statistical software to per- form the analysis, (d) the interpretation of the output tables, and (e) the visual presen- tation of the obtained coefficients. We demonstrate the process using data from previously published studies. The purpose is to assist single-case researchers in draw- ing valid and reliable inferences regarding the treatment effectiveness for complex designs. The single- and two-level hierarchical lin- ear modeling (HLM) techniques are sug- gested. The two-level HLM is appropriate as both participant-specific and _ overall average study-specific estimates are obtained simultaneously (instead of run- ning separate single-level analyses for each case), which leads to drawing more gener- alized inferences. Empirical Bayes estimates of the participant-specific treatment effects are more precisely estimated compared to the OLS (single-level) estimates, but they are biased toward the average effect. By ignoring the hierarchical structure of the data (i.e., measurements are nested within cases, and cases are nested within study), biased standard errors are obtained (the standard errors are too small due to EBP ADVANCEMENT CORNER 19 ignoring the dependency), and, conse- quently, the analysis is prone to Type I errors. The two-level HLM_ provides regression-based effect size estimates and their standard errors. Therefore, they can be used afterward for meta-analytic pur- poses. A third level can be added to the model, and overall average treatment effec- tiveness can be estimated across studies. In addition, the variability in treatment effec- tiveness between studies can be explored. If a large amount of variability is identified, moderators can be added to the model. Another advantage of summarizing treat- ment effects across studies is the increased power to identify true treatment effects. Limitations and future research directions The HLM model introduced in this study is the most basic model, which ignores, for instance, data trend and autocorrelation, and is only appropriate for continuous out- comes. In addition, use of conventional HLM requires assumptions about multivari- ate normality that need to be met in order to make valid inferences (Raudenbush & Bryk, 2002). This was beyond the scope of this study as the focus was on the logic of modeling combined design SCEDs, which is already a complexity. However, use of the HLM is flexible, and other complexities can be introduced into the model. For instance, in case a researcher is studying a target behavior or skill in which a trend is expected, the introduced models can be extended by including a time indicator vari- able in the treatment phase. This results in two effect size estimators of interest: (1) change in level of the dependent variable when introducing the treatment and (2) the trend during the treatment phase. Two- level hierarchical linear modeling including a linear time trend is discussed in detail in Moeyaert, Ugille, Ferron, Beretvas, et al. (2014). 20 EBP ADVANCEMENT CORNER Another complexity relates particularly to the MBD + ATD design. In ATDs, the effectiveness of two (or more) treatments is compared with a common baseline phase, which introduces dependency. The model can be further extended by exploring options to model this dependency (by, for instance, estimating the covariance or using a more complex estimation technique if more cases within a study are included, specifically robust variance estimation; Hedges et al., 2010). Last, when using HLM, caution needs to be exercised when interpreting the between-case variance esti- mates as severely biased estimates can be obtained (Moeyaert et al., 2013). The lim- itations discussed here are not specific to HLM of combined SCEDs, but for using HLM in general as an analysis technique for the quantitative integration of SCED data. In addition, the results of the two studies discussed in this article should be interpreted with caution because in both of them there was a lack of experimental control. In Seybert et al. (1996), the withdrawal and reversal design embedded in the combined design did not meet the basic replication standards for one of the participants. In addition, there was a non-effect for the withdrawal of the treat- ment for that same_ participant. As a consequence, to meet the WWC design stan- dards to demonstrate experimental control, there is an additional basic replication needed for one of the participants of the Seybert et al. (1996) study. Similarly, in Chung and Cannella-Malone (2010), the treatment to reduce problem behaviors was effective for two out of three participants. In addition, the effectiveness of the treatment was investigated across slightly different treatment phases. In order to meet the WWC design standards, the treatment phases across the participants should be identical and there should be three demonstrations of the effectiveness of the treatment at three different points in time. Effect size estimation for these combined designs is still informative as it quantifies the magnitude of treatment effect. This quantifi- cation provides an overall summary of the study findings (and variability between parti- cipants in treatment effectiveness) and can be used for meta-analysis purposes afterward. However, we encourage applied SCED researchers designing combined SCEDs that meet the WWC design standards for experi- mental control. In order to demonstrate our methodology, we were limited to published combined designs. The examples included are typical for the field and are solely used to demonstrate the analysis technique. In terms of future research directions, the suggested models can be extended by adding case characteristics (gender, age, race, etc.) to investigate their moderating effect on the treatment effectiveness. However, recent research related to power indicates that at least 12 cases are needed, or 7 cases in combi- nation with at least 40 measurement occa- sions, to be able to include case characteristics in the analyses (Moeyaert et al., 2017). This, of course, depends on the particular predictors and the value of the true treatment effect. Simulation studies can be performed in order to investigate the power for a particular set of design conditions. Again, this is beyond the scope of this paper. Other ways of coding the design matrix are also pos- sible depending on the specific research ques- tions and structure of the data being analyzed. To further enhance the internal validity, single-case researchers might consider introducing randomization when develop- ing the combined SCED design. As dis- cussed in depth by J. R. Ledford et al. (2018), several forms of randomization can be incorporated in the design. First, the start and the retrieval of the intervention can be randomized. In this scenario, it is recommended that the randomization does not start until baseline stability is estab- lished. Second, the order of the conditions can be randomized, which is typically done in ATDs. Unrestricted randomization is not recommended to avoid conditions not representing ATDs (i.e., all baseline condi- tions could be chosen first) or to avoid that a certain randomized pattern is consistently chosen (i.e., treatment 1 is always adminis- tered after treatment 2). A third randomiza- tion form is the random assignment of participants to intervention start points. This is relevant for multiple-baseline designs across participants. Incorporating randomization in the design allows for use of randomization tests to make conclusions related to treatment effectiveness. The advantage of such tests is that the sampling distribution is built based upon the rando- mization patterns and as a consequence, no parametric assumptions are made and needed (for more details about randomiza- tion, see J. M. Ferron & Levin, 2014; Heyvaert et al., 2017). Inclusion of rando- mization has the potential to reduce the risk of biased effect size estimates. In order to increase the external validity of treatment effectiveness and contribute to evi- dence-based decisions in research, practice and policy, multiple SCED studies can be sum- marized. Previous research demonstrates how the multilevel meta-analytic framework can be used to combine single SCEDs (Moeyaert, 2018; Moeyaert, Ugille, Ferron, Onghena, et al., 2014). Therefore, future research is needed to demonstrate how pure and com- bined SCEDs can be combined using the mul- tilevel meta-analytic approach. Similarly, a following-up study can be conducted to evaluate the consequences of ignoring the complex nature of combined designs. CONCLUSIONS This study is the first study introducing and demonstrating a promising methodo- logical framework for effect size estimation EBP ADVANCEMENT CORNER 21 for combined SCEDs. The two-level hier- archical model is recommended as it has the possibility to include variables to account for the combined design complex- ity. In this study, the logic of modeling the combined SCED study is introduced, empirical illustrations are given, analysis output is discussed and SAS code is sup- plemented. Single-case researchers are given the tools (and are encouraged) to modify and/or further extend the models. The proposed method of coding and esti- mating effect sizes for combined SCEDs can be a useful technique to inform researchers and practitioners about the effectiveness of interventions. DISCLOSURE STATEMENT No potential conflict of interest was reported by the author(s). Funding This research was supported by the Institute of Education Sciences, U.S. Department of Education, through grants [R305D150007 and R305D190022]. The content is solely the respon- sibility of the author and does not necessarily represent the official views of the Institute of Education Sciences, or the U.S. Department of Education. REFERENCES Baek, E., & Ferron, J. M. (2013). Multilevel models for multiple-baseline data: Modeling across participant variation in autocorrelation and residual variance. Behavior Research Methods, 45(1), 65-74. doi: 10.3758/s13428-012-0231-z Barlow, D. H., & Hayes, S. C. (1979). Alternating treat- ments design: One strategy for comparing the effects of two treatments in a single subject. Journal of Applied Behavior Analysis, 12(2), 199-210. doi: 10.1901/jaba.1979.12-199 Barlow, D. H., Nock, M., & Hersen, M. (2009). Single case experimental designs : Strategies for studying beha- vior for change. Pearson/Allyn and Bacon. 22 EBP ADVANCEMENT CORNER Casella, G. (1985). An introduction to empirical Bayes analysis. The American Statistician, 39(2), 83-87. doi: 10.2307/2682801 Chung, Y., & Cannella-Malone, H. I. (2010). The effects of profession manipulations on automatically maintained challenging behavior and _ task responding. Behavior Modification, 34(6), 479-502. doi: 10.1177/0145445510378380 Ferron, J., & Jones, P. K. (2006). Tests for the visual analysis of response-guided multiple-baseline data. The Journal of Experimental Education, 75(1), 66-81. doi: 10.3200/JEXE.75.1.66-81 Ferron, J. M., & Levin, J. R. (2014). Single-case per- mutation and randomization statistical tests: Present status, promising new developments. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Methodological and _ statistical advances (pp. 153-183). American Psychological Association. Gast, D. L., Lloyd, B. P., & Ledford, J. R. (2018). Multiple baseline and multiple probe designs. In J. R. Ledford & D. L. Gast (Eds.), Single case research methodology (3rd ed., pp. 239-281). Routledge. Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta- regression with dependent effect size estimates. Research Synthesis Methods, 1(1), 39-65. Hembry, I., Bunuan, R., Beretvas, S. N., Ferron, J., & Van den Noortgate, W. (2015). Estimation of a nonlinear intervention phase trajectory for multiple-baseline design data. The Journal of Experimental Education, 83(4), 514-546. doi: 10.1080/00220973.2014.907231 Heyvaert, M., Saenen, L., Maes, B., & Onghena, P. (2014). Systematic review of restraint interventions for challenging behaviour among persons with intellectual disabilities: Focus on effectiveness in single-case experiments. Journal of Applied Research in Intellectual Disabilities, 27(6), 493-590. doi: 10.1111 /jar.12094 Heyvaert, M., Moeyaert, M., Verkempynck, P., Van den Noortgate, W., Vervloet, M., Ugille, M., & Onghena, P. (2017). Testing the intervention effect in single-case experiments: A Monte Carlo simulation study. The Journal of Experimental Education, 85(2), 175-196. doi: 10.1080/00220973.2015.1123667 Horner, R. H., & Odom, S. L. (2014). Constructing single-case research designs: Logic and options. In T. Kratochwill & J. Levin (Eds.), Single-case interven- tion research: Methodological and statistical advances (pp. 27-51). American Psychological Association. Jason, L. A., & Frasure, S. (1979). Increasing peer-tutoring behaviors in the third grade classroom [Paper presentation]. Annual Convention of the American Psychological Association, New York. Kelley, M. E., Lerman, D. C., & Van Camp, C. M. (2002). The effects of competing reinforcement schedules on the acquisition of functional communication. Journal of Applied Behavior Analysis, 35(1), 59-63. doi: 10.1901/ jaba.2002.35-59 Kennedy, C. (2005). Single-case designs for educational research (Vol. 1). Pearson/A & B. Kokina, A., & Kern, L. (2010). Social story™ interven- tions for students with autism spectrum disorders: A meta-analysis. Journal of Autism and Developmental Disorders, 40(7), 812-826. doi: 10.1007/s10803-009- 0931-0 Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2010). Single-case designs technical documentation. What Works Clearinghouse. https:// ies.ed.gov/ncee/wwc/Docs/ReferenceResources/ wwc_scd.pdf Kratochwill, T. R., & Levin, J. R. (2014). Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Statistical and methodological advances (pp. 53-91). American Psychological Association. Kratochwill, T. R., Levin, J. R., Horner, R. H., & Swoboda, C. M. (2014). Visual analysis of single- case intervention research: Conceptual and metho- dological issues. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Statistical and methodological advances (pp. 91-125). American Psychological Association. Ledford, J., & Gast, D. L. (2018). Combination and other designs. In J. R. Ledford & D. L. Gast (Eds.), Single case research methodology (3rd ed., pp. 239-281). Routledge. Ledford, J., Lane, J., & Severini, K. (2018). Systematic use of visual analysis for assessing outcomes in sin- gle case design studies. Brain Impairment, 19(1), 4-17. doi: 10.1017/BrImp.2017.16 Ledford, J. R., Lane, J. D., & Tate, R. (2018). Evaluating quality and rigor in single case research. In J. R. Ledford & D. L. Gast (Eds.), Single case research methodology (3rd ed. pp. 365-392). Routledge. Lenz, A. S. (2013). Calculating effect size in single-case research: A comparison of nonoverlap methods. Measurement and Evaluation in Counseling and Development, 46(1), 64-73. doi: 10.1177/ 0748175612456401 Lindberg, J. S., Iwata, B. A., & Kahang, S. W. (1999). On the relation between object manipulation and stereotypic self-injurious behavior. Journal of Applied Behavior Analysis, 32(1), 51-62. doi: 10.1901/ jaba.1999,32-51 Maggin, D. M., Swaminathan, H., Rogers, H. J., O'Keeffe, B. V., Sugai, G., & Horner, R. H. (2011). A generalized least squares regression approach for computing effect sizes in single-case research: Application examples. Journal of School Psychology, 49(3), 301-321. doi: 10.1016/j.jsp.2011.03.004 Manolov, R., & Solanas, A. (2013). A comparison of mean phase difference and generalized least squares for analyzing single-case data. Journal of School Psychology, 51(2), 201-215. doi: 10.1016/j. jsp.2012.12.005 Matson, J. L., & Keyes, J. B. (1990). A comparison of DRO to movement suppression time-out and DRO with two self-injurious and aggressive mentally retarded adults. Research in Developmental Disabilities, 11(1), 111-120. doi: 10.1016/0891-4222(90)90008-V Moeyaert, M., Ugille, M., Ferron, J., Beretvas, S., & Van den Noortgate, W. (2013). Three-level analysis of single-case experimental data: Empirical validation. Journal of Experimental Education, 82(1), 1-21. doi: 10.1080/00220973.2012.745470 Moeyaert, M., Ugille, M., Ferron, J., Beretvas, S., & Van den Noortgate, W. (2014). The influence of the design matrix on treatment effect estimates in the quantitative analyses of single-case experimental design research. Behavior Modification, 38(5), 665-704. doi: 10.1177/01454455 14535243 Moeyaert, M., Ugille, M., Ferron, J., Onghena, P., Heyvaert, M., & Van den Noortgate, W. (2014). Estimating intervention effects across different types of single-subject experimental designs: Empirical illustration. School Psychology Quarterly, 25 (1), 191-211. doi: 10.1037/spq0000068 Moeyaert, M., Ferron, J., Beretvas, S. N., & Van den Noortgate, W. (2014). From a single-level analysis to a multilevel analysis of single-case experimental designs. Journal of School Psychology, 52(2), 191-211. doi: 10.1016/j.jsp.2013.11.003 Moeyaert, M., Maggin, D. M., & Verkuilen, J. (2016). Reliability and validity of extracting data from image files in contexts of single-case experimental design studies. Behavior Modification, 40(6), 874-900. doi: 10.1177/01454455 16645763 Moeyaert, M., Akhmedjanova, D., & Bogin, D. (2017). The power to test moderator effects in multilevel modeling of single-case data [Manuscript in preparation]. Moeyaert, M., Zimmerman, K., & Ledford, J. (2018). Analysis and meta-analysis of single-case experi- mental data. In J. Ledford & D. Gast (Eds.), Single- case methodology: applications in special education and behavioral sciences. New York: Routledge. Moeyaert, M. (2019). Quantitative synthesis of research evidence: Multilevel meta-analysis. Behavioral Disorders, 44(4), 241-256. doi: 10.1177/ 0198742918806926 EBP ADVANCEMENT CORNER 23 Moeyaert, M., Klingbeil, D., Rodabaugh, E., & Turan, M. (2019). Multilevel meta-analysis of peer-tutoring interventions to increase academic performance and social interactions for people with special needs. Remedial and Special Education. doi: 10.1177/0741932519855079 Parker, R. I., Vannest, K. J., & Davis, J. L. (2011). Effect size in single-case research: A review of nine nonoverlap techniques. Behavior Modification, 35(4), 303-322. doi: 10.1177/0145445511399147 Parker, R. I., Vannest, K. J., & Davis, J. L. (2014). A simple method to control positive baseline trend within data nonoverlap. The Journal of Special Education, 48(2), 79- 91. doi: 10.1177/0022466912456430 Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1). Sage. Rohatgi, A. (2011). WebPlotDigitizer. https://automeris. io/WebPlotDigitizer Schlosser, R. W., Lee, D. L., & Wendt, O. (2008). Application of the percentage of non-overlapping data in systematic reviews and meta-analyses: A systematic review of reporting characteristics. Evidence-based Communication Assessment and _ Intervention, 2(3), 163-187. doi: 10.1080/17489530802505412 Scruggs, T. E., Mastropieri, M. A., & Casto. (1987). The quantitative synthesis of single subject research: Methodology and _ validation. Remedial © Special Education, 8(2), 24-33. doi: 10.1177/ 074193258700800206 Seybert, S., Dunlap, G., & Ferro, J. (1996). The effects of choice-making on the problem behaviors of high-school students with intellectual disabilities. Journal of Behavior Education, 6(1), 49-65. doi: 10.1007/BF02110477 Shadish, W. R., Rindskopf, D. M., & Hedges, L. V. (2008). The state of the science in the meta-analysis of single-case experimental designs. Evidence-based Communication Assessment and Intervention, 2(3), 188-196. doi: 10.1080/17489530802581603 Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods, 43(4), 971-980. doi: 10.3758/s13428-011-0111-y Shadish, W. R., Kyse, E. N., & Rindskopf, D. M. (2013). Analyzing data from single-case designs using multi- level models: New applications and some agenda items for future research. Psychological Methods, 18 (3), 385-405. doi: 10.1037/a0032964 Shadish, W. R., Rindskopf, D. M., Hedges, L. V., & Sullivan, K. J. (2013). Bayesian estimates of autocorre- lations in single-case designs. Behavior Research Methods, 45(3), 813-821. doi: 10.3758/s13428-012-0282-1 Shadish, W. R., Zuur, A. F., & Sullivan, K. J. (2014). Using generalized additive (mixed) models to analyze 24 EBP ADVANCEMENT CORNER single case designs. Journal of School Psychology, 52(2), 149-178. doi: 10.1016/j.jsp.2013.11.004 Shogren, K. A., Faggella-Luby, M. N., Bae, S. J., & Wehmeyer, M. L. (2004). The effect of choice-making as an intervention for problem behavior: A meta-analysis. Journal of Positive Behavior Interventions, 6(4), 228-237. doi: 10.1177/10983007040060040401 Swaminathan, H., Horner, R. H., Sugai, G., Smolkowski, K., Hedges, L., & Spaulding, S. A. (2010). Application of generalized least squares regres- sion to measure effect size in single-case research: A technical report. Unpublished technical report, Institute for Education Sciences. Trottier, N., Kamp, L., & Mirenda, P. (2011). Effects of peer-mediated instruction to teach use of speech-generating devices to students with autism in social game routines. Augmentative and Alternative Communication, 27(1), 26-39. doi: 10.3109/ 07434618.2010.546810 Wang, S. Y., Parrila, R., & Cui, Y. (2013). Meta-analysis of social skills interventions of single- case research for individuals with autism spectrum disorders: Results from three-level HLM. Journal of Autism and Developmental Disorders, 43(7), 1701-1716. doi: 10.1007/s10803-012-1726-2 Wolery, M., Busick, M., Reichow, B., & Barton, E. E. (2010). Comparison of overlap methods for quantita- tively synthesizing single-subject data. The Journal of Special Education, 44(1), 18-28. doi: 10.1177/ 0022466908328009 Zimmerman, K. N., Ledford, J. R., & Severini, K. E. (2019). Brief Report: The effects of a weighted blan- ket on engagement for a student with ASD. Focus on Autism and Other Developmental Disabilities, 34(1), 15-19. doi: 10.1177/1088357618794911