“All learning, whether done in school or elsewhere, requires time.”

Benjamin Bloom (1974, p. 682).

In the education sciences, there is a rich tapestry of theories (see Hattie et al., 2020; Murphy & Alexander, 2000; Urhahne & Wijnia, 2023). Many of these theories can be characterized as what Davis et al. (2007) describe as “simple theories.” Davis et al. define simple theories as theories that encompass a “few constructs and related propositions” (p. 482), each of which has empirical or analytic backing but often lacks a certain level of necessary detail to make strong predictions from case-to-case. It is important to emphasize that “simple theories” should not be misconstrued as lacking value. Rather, this terminology emphasizes that these theories are usually described in writing (not mathematical equations) or depicted in flow-chart style graphics. And though these theories are “in all likelihood correct” (Davis et al., 2007, p. 482), there is often room for increasing precision linking constructs more comprehensively, and refining their theoretical logic.

Compared to simple theory, formal modeling and simulation methods aim to elucidate all relationships and variable distributions using mathematical equations, usually instantiated in a programming language. This approach is also largely in line with what has been called “model-centric” science. Here, the emphasis is less on collecting individual results, but rather on conceptualizing the goal of research as building progressively better models of the phenomena of interest (Devezer & Buzbas, 2023). While there are some notable contributions in this direction within educational sciences (e.g., Doroudi et al., 2019; Frank & Liu, 2018; Hull et al., 1940), such modeling traditions are more frequently observed in neighboring disciplines like cognitive science (e.g., Hintzman, 1991; Navarro, 2021).

The potential upsides of the formal modeling approach have long been recognized. As Bjork (1973) notes, there are four main reasons for a theorist to use mathematical models: (a) mathematical models are “more readily falsifiable”; (b) require theoretical precision; (c) illuminate the consequences of the different model assumptions; and (d) improve our data analysis by generating rich data (pp. 428–429). Bjork also argues mathematical models might be more practically useful, noting applications in intelligent tutoring and medical decision-making. Current proponents of mathematical models generally offer similar reasons for their use (e.g., Devezer & Buzbas, 2023; Navarro, 2021; van Rooij & Blokpoel, 2020).

Practically, simple or verbal theories allow certain details to remain implicit in the mind of the researcher. In the context of simple theory, indicating that an intervention enhances student outcomes might suffice without specifying the underlying behavioral or cognitive mechanism (see Yan & Schuetze, 2023). However, when employing formal computational models, there is a need to enumerate the exact relationships linking input and output factors. Navarro (2021) points out one of the strengths of formal modeling, stating that it “tells us which measurement problems we need to solve” (p. 710). What’s more, the transparency inherent in formal modeling encourages constructive dialogues. If a researcher disagrees with a model’s assumptions, they can implement, simulate, and compare between alternative modeling assumptions (Devezer & Buzbas, 2023).

Building on these recent developments in formal theory, the present paper introduces the Computational Model of School Achievement (CoMSA), which attempts to act as a unifying model between cognitive, metacognitive, and motivational theory. Specifically, the CoMSA is a formal (mathematical) model of the relationship between learning over time and exam achievement. The CoMSA simulates the trajectories of individual students over time, allowing for the analysis of learning as both a process and as a product. It also allows for the analysis of student achievement at both the individual-student and group-classroom levels. More broadly, this model introduces an alternate approach to developing theory that relies on computational approaches, not as commonly seen in educational psychology. The use of computational modeling allows the CoMSA to predict complex interactions between time-on-task, motivation, learning strategies, and achievement that cannot be easily captured in a simple verbal model, thus improving educational researchers’ abilities to make nuanced predictions concerning where, when, and why a student may or may not benefit from a particular cognitive, motivational, or metacognitive intervention.

Theoretical Background

The present formal mathematical theory of goal-oriented exam achievement builds upon the foundational works of Bloom (1968, 1976), Carroll (1962, 1963, 1977, 1989), Glaser (1982), and Gettinger (1984a, 1984b), on what have been collectively known as “models of school learning” (Harnischfeger & Wiley, 1978). These theoretical and sometimes computational models attempt to relate teacher and student characteristics (aptitudes, motivations, and instructional quality) to learning outcomes. Similarities can also be seen with the performance-resource functions analyzed by Norman and Bobrow (1975), as both approaches focus on the tradeoff between invested effort and achievement Additionally, the CoMSA shares theoretical affinities with the work of Dumas and colleagues on dynamic measurement (Dumas & McNeish, 2017; Dumas et al., 2020), as both approaches focus on modeling educational achievement over time. Modeling approaches put forth by Ackerman and colleagues also have focused on the curvilinear relationship between time investment and achievement, but in the context of skill-learning, automaticity, and reaction times (see Ackerman, 1987; Kanfer & Ackerman, 1989).

Nevertheless, of all these models, Carroll’s (1962, 1963) model of school learning is the closest predecessor to the CoMSA. Carroll’s model takes five variables as input: aptitude, ability to understand instruction, perseverance (motivation), opportunity (how much time is available), and quality of instruction. Carroll’s model emphasized the idea that different students need different amounts of time to reach satisfactory performance on the same task. Some students will need more time, and some will need less time—even under ideal instructional conditions. However, the quality of instruction can also vary, with poorer quality instruction leading to slower uptake. Carroll also notes that students have a certain level of motivation (time willing to spend studying) and a certain amount of time available to them to study.

Carroll (1963) formalizes the relationship between these five variables through the following equation (n.p.):

$$\mathrm{Degree\, of\, Learning}= f\left(\frac{\mathrm{time\, actually\, spent}}{\mathrm{time\, needed}}\right)$$

The numerator is determined by computing the minimum value of the following variables: opportunity, perseverance, and time needed to learn (which is modulated up/down by instructional quality). This means that if motivation or time available to learn is less than the time needed to learn, the degree of learning will be lower than the student’s potential. If perseverance and time available are greater than the time needed to learn, then potential will be reached. In other words, the degree of learning is a fraction with the actual time invested on the top and the time necessary (potential) on the bottom.

Though Carroll’s model provides a strong conceptual foundation for understanding learning, it is notoriously complex to understand (Harnischfeger & Wiley, 1978). Additionally, it does not incorporate educational psychological theorizing developed since the 1960s, leaving room for improvement. For example, Carroll’s model of school learning assumes a linear relationship between time spent and educational achievement (until the ceiling is reached), but Son and Sethi (2010) argue that curvilinear learning curves are more likely than not. Nor does his model take into account individual and contextual differences in floor performance and pre-requisite knowledge.

Another aspect that is missing from Carroll’s model is the metacognitive monitoring aspect of self-regulated learning. The outcome measure of Carroll’s (1963) model is “degree of learning,” but learning is rarely achieved independent of a goal (see Ariel & Dunlosky, 2013; Deci & Ryan, 2000). School students are almost always tasked with learning for a specific exam, assignment, or other situation in which their knowledge will be applied. But metacognitive monitoring of progress towards these goals is imperfect. Heuristics are often used to determine if complex material has been adequately learned (Undorf, 2020). This means that learners have an imperfect understanding of their knowledge state. Accordingly, the CoMSA attempts to account for the metacognitive (in)accuracy of learners’ metacognitive judgments.

Purpose and Function of the CoMSA

The essential purpose of the CoMSA is to predict exam achievement based on student and contextual characteristics, bringing together motivational, cognitive, and metacognitive research. The CoMSA incorporates much of the same foundational ideas as Carroll’s (1962) and Bloom’s (1976) models but refines them and refocuses around the notion of the learning curve, as opposed to the learning ratio that defines Carroll’s model of school learning. In addition to offering a more intuitive graphical orientation to the model, this approach better allows for the disentangling of prior knowledge, learning rate, and floor/ceiling effects. The present work also extends previous models of school learning to incorporate newer theorizing about the impact of metacognitive monitoring and agenda-based regulation as they relate to school achievement. For ease of interpretation, the CoMSA uses test grades as the outcome of interest; this outcome metric also allows for easier linkage to empirical results and individual goals operationalized as norms of study. Because performance-related goals are often driving student behavior (Elliot et al., 2011), it is crucial to include them into the model as key inputs. Lastly, the computational resources available now, 60 years later, allow us to conduct broader simulations and analyses of the implications of this family of models.

The CoMSA offers two primary modes of interaction, graphical and mathematical. The graphical mode visualizes educational interventions as manipulations to the learning curve (see the progression of figures throughout this paper for example). Here, we might ask, “how would the learning curve change if we modify the learner’s strategy use, motivational beliefs, or metacognitive control?” Such visualization exercises can serve as beneficial tools for planning and conceptualizing modifications to the learning environment.

In addition to its visual component, the CoMSA can also be instantiated as a formal mathematical model, manipulated, and evaluated through computer simulation. In this computational modeling approach, assumptions are explicitly encoded in computer code, enabling modification, theory testing, and novel predictions (van Rooij & Blokpoel, 2020). Furthermore, data can be simulated from formal models to better understand dynamics over time that would be difficult to understand using conventional methods. CoMSA can simulate student-level and also group/classroom-level data. Here, classroom-level data is calculated by aggregating over data simulated at the individual level.

The following analyses do not directly test the ability of the CoMSA to model individual-level data. Rather, we assume that the S-shaped curve is generally acceptable and analyze the implications that follow from this assumption. This assumption is justified; see the work of the strong modeling tradition of analyzing learning curves using logistic or S-shaped curves; see Koedinger et al. (2023) and Murre (2014) and the broader item response theory approach (Cai et al., 2016). These studies all show success using S-shaped or logistic models to model student achievement. In the following set of analyses, the CoMSA is utilized to simulate and analyze classroom data from two meta-analytic sources. The first analysis looks at the correlation between pre- and post-test gains (Simonsmeier et al., 2022). The second analysis investigates the correlation between time-on-task and academic achievement (Godwin et al., 2021). The subsequent pages detail the reasoning behind the structure and assumptions of the CoMSA.

Model Structure

The central unit of analysis in the CoMSA is the learning curve. Following a convention that dates at least back to Thorndike (1916), the learning curve maps the relationship between time spent studying and task performance. In this case, the task of interest is an examination. See Fig. 1 for an example. By manipulating this curve, we can think about different learners and contexts, with varying levels of test difficulty, motivation, and learning rates.

Fig. 1
figure 1

Phases of the logistic learning curve 

Note that the learning curve depicted in Fig. 1 should be understood as relating the number of hours spent studying to anticipated achievement for an individual student in a specific exam context. For example, “If student A with characteristics X and Y in situation Z were to study 4 h, they would be expected to achieve a grade of 30%, but if the same student were to study 6 h, they might be expected to obtain a grade of 80% on the exam.” It also should be understood that the metric on the X-axis is measured in hours of engagement. Spending time simply sitting with the material while procrastinating on one’s phone would not increment the student along their learning curve (see Carroll, 1963; Karweit & Slavin, 1981). In other words, the learning curve shown in Fig. 1 attempts to answer all things equal—for this student in this context—what would the trajectory of their exam grade look like as a function of time-on-task?

In the present modeling exercise, the learning curve is assumed to take on the shape of a logistic curve. In line with previous work (e.g., Ackerman, 1987; Nijenkamp et al., 2016, 2022; Son & Sethi, 2010), the logistic curve is a natural choice for the present modeling exercise as it fits within the traditional 0–100 percent scale of grading systems and can flexibly capture a wide array of learning situations with its three parametersFootnote 1: intercept, slope, and inflection point. With these three parameters, logistic curves can be made to look S-shaped, like straight lines, or even power curves if they are properly shifted to the left or right.

Here, readers more familiar with the cognitive psychology literature may ask whether it would be more accurate to model learning as an exponential or power function. Indeed, a variety of different mathematical equations have been used to model learning curves (Stepanov & Abramson, 2008), but for the present purposes, the logistic curve strikes a useful balance of parsimony and analytic power. As Murre (2014) shows, the emphasis on power or exponential learning curves may be an artifact of cognitive psychology’s focus on simple word-pair learning experiments. Rather, complex tasks, such as learning to juggle (Qiao, 2021) or learning a foreign language (Daller et al., 2013), are most likely to show an S-shaped learning curve (Son & Sethi, 2010). Even the learning of nonsense syllables can exhibit an S-shaped curve when the list length is sufficiently large, as shown by Hull et al.’s (1940) data from the learning of 205 nonsense syllables over multiple presentations (see Figs. 26 and 27 of Hull et al., 1940).

Conceptually, as shown in Fig. 1, the flat initial part of the learning curve may be thought of as any necessary time to learn pre-requisite material that is not being tested but is necessary to understand the tested material. In other words, there is a possibility of a warm-up period before improvements may be seen. Following the warm-up period, there is a period of increased learning, where the slope steepens. And finally—similar to the assumption of Karweit and Slavin (1981)—there is a period of mastery where learning may still occur but may not be captured by in-class exams—which are not generally psychometrically validated for large ranges of knowledge (Slepkov et al., 2021).

Research on item response theory (Cai et al., 2016) and additive factors model (and related models such as performance factors analysis; Koedinger et al., 2023; Pavlik et al., 2021) have shown that logistic curves are able to accurately model the acquisition of knowledge by individual students in real-world educational contexts. In this way, the student-level modeling assumptions have been quite well validated in many empirical papers (Pelánek, 2017). But less is known about how these logistic longitudinal dynamics over time modulate the relationship between inputs (motivation, learning strategies, and interventions) and outputs (achievement, time spent) at the aggregate level, which is typically how educational interventions are evaluated (e.g., through t-tests and correlations across groups of students). We return to these open questions with the analyses presented later in this paper.

Modeling Individual and Contextual Differences with the CoMSA

The flexibility of the logistic learning curve allows for the modeling of learning strategy, metacognitive, and motivational interventions on populations that vary on their baseline cognitive, contextual, and motivational variables.

Modeling Differences in Learning Rate

Using the logistic learning curve as a base, we can then think about learners as they vary in terms of learning contexts, strategies, motivation, and metacognition. Differences in learning strategies and contexts can be thought of as differences in the shape of the learning curve, in terms of slope, intercept, and inflection point. For example, a medical student studying for a comprehensive licensing exam might need to study for hundreds of hours to achieve a passing score. This context would be represented by a much shallower learning curve than, say for example, a fifth-grade student studying for a ten-word vocabulary quiz, which may only require an hour of study to master. The vocabulary quiz might look more similar to the red curve in Fig. 2, while the medical student’s licensing exam might look like an even more extreme version of the blue curve.

Fig. 2
figure 2

Three learning curves showing different learning rates. Note: the dashed line indicates performance when the number of hours spent studying is equal to eight. The vertical separation between the curves is not stable over the amount of time spent studying, meaning that shifting the curve’s slope would have heterogeneous effects dependent on the learner’s number of hours spent studying

It is critical to remember that the slope and all parameters of the learning curve are not directly mappable to any one contextual or psychological variable. As with Lewin’s (1936) equation, the assumption here is that learning over time is a function of both the student and their environment (and this function is not necessarily a simple or additive combination of a student and their context). Even within a classroom of medical students all studying for the same exam, some students may learn more efficiently than others due to individual differences (McDermott & Zerr, 2019; Zerr et al., 2018) or differences in study strategies (Carpenter et al., 2022; Carroll, 1977). Conversely, the same student may learn more or less quickly in different learning contexts (see Fig. 3).

Fig. 3
figure 3

Illustration of individual student learning over time. Note: here, ten learners have been simulated from the CoMSA with mean slope = 1 (SD = 0.2), mean inflection point = 4 (SD = 2), mean intercept = 0.47 (alpha = 3.5, beta = 4), norm of study = 100 (constant, no metacognitive error or bias), and mean maximum willingness to study = 5 h (SD = 1.5). Each student exhibits their own trajectory, with some students requiring more or less practice to reach the same level of proficiency

While the following sets of figures contained in this paper isolate transformations to the learning curve to ensure proper communication, in real-world contexts, these transformations could always happen concurrently with one-another. In other words, one classroom may have learners with a high slope on average, but also low motivation, and also high metacognitive monitoring accuracy, while another classroom may show low learning slopes, low motivation, and low metacognitive monitoring accuracy. None of these variables is necessarily assumed to correlate with one-another under the present simulation, but this assumption can be relaxed as needed. Indeed, analyses presented later on in this paper show that at the very least, it appears that we should assume that prior knowledge and motivation are at least somewhat correlated within classrooms.

Modeling Differences in Prior Knowledge

Learning curves do not always need to start at zero. Differences in interest, teacher effectiveness, or previous experiences may lead to students starting with different levels of prior knowledge. In the context of the CoMSA, prior knowledge is operationalized as the predicted output of the model when time spent equals zero. Two inputs to the model shift this prior knowledge value: the floor (intercept) or inflection point of the learning curve.

Shifting the Floor

The floor or intercept of the learning curve (Logis c) can be modulated up and down due to individual and contextual factors (see Fig. 4).Footnote 2 Examination policies that reduce the ability to score below a certain threshold are one simple way of influencing the floor of the learning curve. For example, it is difficult to score below 25% on traditional four-choice multiple-choice exams. Similarly, some teachers adopt classroom policies of assigning 50 percent as the lowest possible score on assignments, rather than zero. Additionally, the floor may be impacted by teacher or instructional effectiveness. The more effective in-class learning is, the more knowledge a student may start out with when they begin self-regulated studying.

Fig. 4
figure 4

Two approaches to modeling prior knowledge. Note: on the left are two curves with the same inflection point, but different starting floor performance. On the right are two curves with the same floor performance, but different inflection points

An extreme example of shifting the floor of the learning curve would be to consider giving a college student an examination designed for a middle school student. The college student will likely start at a relatively high proficiency for simple algebra quizzes, meaning that the floor of their learning curve should be higher than that of a middle school student facing novel mathematics problems. Altogether, it is critical to remember that the floor of the learning curve varies from student to student and from context to context. Regardless of the upstream influences, the floor of the learning curve most directly reflects the grade a student would achieve if they were not to study at all—with one caveat, the relationship between floor and prior knowledge also depends on the inflection point. Shifting the inflection point shifts the entire learning curve to the left or right. Thus, changes to the inflection point can lead to modifications of the relationship between the floor of the curve and the student’s predicted achievement when time spent equals zero.

Shifting the Inflection Point

The inflection point of the learning curve determines the left versus right shift of the learning curve. In terms of educational theory, this component of the CoMSA relates to the minimum amount of time needed for a student to study before they see improvement in their capabilities as measured by the classroom examination of interest. For example, classroom exams may require students to know some level of prerequisite knowledge before they can begin to learn the content being tested in the class. Bloom (1976) calls these necessary prerequisite skills “cognitive entry behaviors,” because they are necessary to enter the learning phase of subsequent skills. If all the cognitive entry behaviors have been mastered, a student’s inflection point will be closer to zero. The more prerequisite behaviors needed to master before learning the focal material, the further the curve gets shifted to the right (away from zero).

As a more concrete example, for a native speaker of Ukrainian, a Ukrainian vocabulary test may have an early inflection point, requiring very little additional background knowledge. On the other hand, for someone learning Ukrainian as a second language, the same vocabulary test may have a much later inflection point. If the new learner does not know how Cyrillic letters match to their respective phonemes, they will have to gain a basic understanding of this knowledge before they will be able to easily acquire new vocabulary. In other words, the more prerequisite knowledge required to do well on an examination, the further the curve will be shifted to the right.

Modeling Differences in Motivation

In the model, motivational variables determine how far each student progresses along their learning curve, but this is not to say that all motivational interventions only affect time-on-task. They may also have other behavioral effects on students’ slope or metacognitive monitoring biases. Motivational variables are instantiated in two ways in the CoMSA. Students can be limited either by their goals (in terms of achievement) or by the amount of time they are willing to spend studying. As with Carroll’s (1962, 1963) model, there is a circuit breaker mechanism, meaning that once either one of the two criteria is met, a student will disengage from studying.

The first criterion is called the norm of study. Under this assumption, students are thought to be working towards individualized performance goals (i.e., some students may be seeking an 80% on this exam, while another student will only be happy with a 95%). The norm of study criterion derives from models of study time allocation, going back to at least the work of Le Ny et al. (1972) and adopted in subsequent theorizing by Carver and Scheier (1982, 1990), Thiede and Dunlosky’s (1999) discrepancy reduction model, and Ackerman’s (2014) diminishing criterion model. In these models, differences in motivation are almost entirely represented by differences in norms of study.

However, each student also has a second criterion related to the maximum time they are willing to spend studying (see Fig. 5). In this way, students are self-regulated towards their goals—but only to an extent. They direct their efforts towards achieving a certain grade until it becomes too costly to obtain time-wise (or the time has run out). Intuitively, the performance goal circuit breaker is necessary because it seems unlikely students would continue studying even after they know themselves to have mastered the material. The maximum time spent criterion—what Carroll called “perseverance”—is necessary because it is commonplace to observe students who not only spend much time studying, but also underperform the grades they purport to seek (Carroll, 1962, 1963).

Fig. 5
figure 5

Modeling differences in motivation on the learning curve. Note: motivation determines how long students will progress upon their learning curves. Highly motivated students will be willing to spend more time studying and thus are expected to achieve higher grades, all else equal. Here, the norm of study is not depicted (i.e., assumed to equal 100%)

An interesting implication of the CoMSA is that not every hour of study time is equal. As shown in Fig. 1, if we assume a logistic curve relating time spent to achievement, some hours of study time will be more productive than others (Son & Sethi, 2010). Depending on the warm-up period (see earlier section on the inflection point), where the curve is relatively flat, sometimes the most efficient hours of study will come early in the study session. In other cases, the most beneficial hours of study will require many hours of studying of pre-requisites before they can be reached. This has important ramifications for educational interventions that seek to increase the amount of time dedicated to education (i.e., motivational interventions), as some students will benefit greatly from spending an additional hour of studying, while others are not anticipated to get much benefit at all from an additional hour of study (e.g., those at the ceiling or floor of the test).

Modeling Differences in Metacognition

Under the current model, motivation and cognition do not exist in isolation. Rather, in line with existing theorizing, metacognition acts as a “bridge between decision making and memory” and also “learning and motivation” (Nelson & Narens, 1994, p. 1). Metacognitive factors most likely affect the CoMSA in a multi-faceted way through both affecting students’ perceptions of their achievement and also their study behavior. Here, the impacts of metacognitive monitoring, beliefs, and control are discussed as they relate to the CoMSA.

Metacognitive Monitoring

As shown in Fig. 6, the model assumes that metacognitive monitoring faculties determine how accurately students are able to anticipate their exam performance. In the CoMSA, metacognitive monitoring is modulated by two different factors: bias, which is also known as “calibration” in the metacognition literature, and precision (“resolution” or “sensitivity”; Fleming & Lau, 2014). Here, these terms refer to the metacognitive monitoring of the entire test, that is, the students’ judgments about the mastery of the entire corpus of tested content. However, students also make metacognitive judgments of learning concerning individual items, which will have more multifaceted effects on the model through several different inputs. For example, better monitoring at the item-level could lead to faster acquisition via targeted study of materials that are more likely to be tested (see Schuetze & Yan, 2022). The bias and precision parameters defined in the following paragraphs represent just two potential knobs which monitoring abilities could impact.

Fig. 6
figure 6

Modeling differences in metacognitive monitoring on the learning curve. Note: gray dots represent metacognitive judgments, while the black curve represents the ground truth. Bias refers to the average error of the judgments, while the precision refers to the deviation of the judgments around the curve

Bias refers to the average distance between the students’ metacognitive judgment of their anticipated test performance and their true state of knowledge. In other words, it represents their average over- or underconfidence concerning their upcoming test. If bias is high, it may mean that a student will study too long or stop studying too early relative to their goals. An overconfident student will overestimate their current state of knowledge, assess that their goal has been achieved, and thus discontinue studying. Conversely, an underconfident or perfectionist student may continue studying far past the time necessary to master the material. Though such perfectionistic striving can be adaptive and actually lead to increased achievement outcomes (Rice et al., 2013; Stoeber & Rambow, 2007), this increased performance in absolute terms may come at the cost of using study time relatively inefficiently by studying beyond the time necessary to achieve the student’s norm of study.

As opposed to bias, precision refers to the relative consistency of these judgments—how noisy they are, for example, whether a student is always making judgments relative to the ground truth in a similarly underconfident, accurate, or overconfident manner. Consistency of metacognition is instantiated as the error variance around the judgment of learning relative to the ground truth (see Fig. 6). Practically, low precision (high error variance) might mean that a student has perfectly calibrated metacognitive monitoring on average, but each individual judgment does not afford much information about the state of their knowledge.

Metacognitive Beliefs and Control

In addition to metacognitive monitoring, researchers often are interested in the effects of metacognitive knowledge, beliefs, and control strategies on self-regulated learning. These facets of metacognition may be best instantiated on the CoMSA as changes to the slope of the model. For example, students need to use metacognitive knowledge to adapt their choice of study strategies to the learning context. If a student does not know about retrieval practice, they may not use it for this reason, but even if they know about the benefits of retrieval practice, other metacognitive and motivational beliefs may lead them to failing to adopt this beneficial strategy (Rea et al., 2022). Although there may be a metacognitive knowledge component to the choice of study strategy, the downstream effect would be to change the slope of the learning curve, as retrieval practice is more beneficial per unit of time spent studying than more passive controls such as rereading. Similar analysis can be applied to metacognitive control at the item level (which is distinct from the test-level bias and precision parameters discussed above). Knowing which items to prioritize is one way of increasing academic achievement (Schuetze & Yan, 2022; Son & Kornell, 2008), but ultimately, this will lead to increased slope also.

How Do the Components of the Model Come Together?

As shown in Fig. 7, there are three general steps to go from inputs to the CoMSA to predicted achievement outcomes. First, each student’s learning curve is established. Establishing the learning curve means determining the slope, intercept, and inflection point of the curve. Steeper slopes indicate faster learning than shallower slopes. Higher intercepts indicate either that the exam has a functional floor on how low a student can achieve (e.g., due to guessing on a multiple-choice examination), or that the student is starting out with some existing level of prior knowledge. The inflection point determines how much prerequisite knowledge must be learned before improvement on the tested material will be reflected in the student’s grade.

Fig. 7
figure 7

CoMSA and simulation overview

In the second stage, after establishing the functional form of the learning curve, the curve is modulated with regard to the student’s metacognitive bias and precision. This modulation of the curve is most important when it comes to the third stage, which determines how long the student is willing to study. Under the present model, students discontinue studying due to one of two reasons. Either (1) they have assessed that they have reached their goal in terms of anticipated achievement (norm of study). These assessments are where metacognitive monitoring comes into play. Students who are over- or underconfident will assess their competence inaccurately relative to their true performance. Or (2) they have run out of willingness to spend any more time (or they simply do not have any more time). Together, the norm and time constraints act as circuit breakers. Students stop studying whenever the first constraint of the two constraints is tripped. Once the shape of the student’s learning curve and their time spent studying has been established, a grade can be predicted from the CoMSA.

Note that these three steps are laid out in a strict sequence for the purpose of making the present modeling and simulation study tractable. In real-world educational settings, when students are actively engaging in self-regulated learning, there are undoubtedly contingencies between each of the steps. For example, metacognitive information is known to alter learners’ study choices in real time (Thiede & Dunlosky, 1999). In this case, step 2 would lead into step 1. Theoretically, the CoMSA can be extended to deal with such complexities, but for the sake of parsimony in the present study, all inputs to the model are assumed to be set once at the beginning of the study session(s) and do not change over the course learning period.

How Does This Model Integrate Contemporary Work on Psychological Constructs?

It would be understandable for the approach outlined in this paper to be somewhat counterintuitive to those with deep backgrounds in the contemporary theory and practices of educational psychology. As mentioned in the introduction, contemporary educational psychology theory often centers on verbal theories, describing how psychological constructs relate to educational achievement. For example, dominant theories of motivation tend to focus on the links between a belief or construct (e.g., expectancies, values, and self-efficacy) and an outcome of interest (e.g., achievement, persistence in school, or well-being). Though this theoretical approach is common (see Hattie et al., 2020), it is not infallible and potentially crowds out alternative modes of theorizing in educational psychology (see Kaplan, 2023; Murayama, 2023).

For this reason, there have been few references to these sorts of constructs in the previous pages. This is not an oversight. Rather, the CoMSA acts as an interface between traditional construct-focused educational psychology and the achievement outcomes of interest (see Fig. 8). In other words, the CoMSA models the downstream behavioral effects of cognition, motivation, and metacognition. Most contemporary research focuses on the beliefs/motivations/cognitions → outcome link without specifying how exactly intermediate or learner behavior changes (see Yan & Schuetze, 2023). The CoMSA focuses on the behavior → outcome link but does not make theoretical commitments regarding how these changes to the shape of the learning curve or motivation of the learner occur. Rather, the model asks “assuming these changes do occur, what should researchers expect to happen?”.

Fig. 8
figure 8

Linking the CoMSA to existing psychological theory

A common area of research known as utility value interventions can act as a case study for understanding where the CoMSA fits with contemporary motivation scholarship. Utility-value interventions encourage students to find the information they are learning in school to be more valuable to their long-term success. For example, Harackiewicz et al. (2016) engaged students in a task that involved writing an essay that emphasized the personal relevance of their introductory biology coursework. The theory here is that students in the utility value condition will see more value in their schoolwork and thus achieve better academic outcomes.

Accordingly, Harackiewicz et al. found that the intervention increased students’ self-reported value for their coursework and grades, with particular benefits to first-gen underrepresented minority students. The intuitive link to the CoMSA would be to hypothesize that the increase in the value of biology to these students leads them to increase their norm of study. If a student finds a course more valuable to their long-term goals, they might be less likely to accept lower performance (i.e., adopt a higher norm of study or more stringent achievement-focused goals). Where perhaps a B was an acceptable course grade before, a student may now only disengage studying when they assess their anticipated performance as reflecting an A- or A.

It could also be that a student would set aside more time for a higher-valued course. This increase in time would be represented on the CoMSA by shifting students’ willingness to study further along the curve to the right (see Fig. 5). If someone values a course more, they may be more likely to prioritize this course over other courses, extracurricular activities, or outside hobbies. But then, we can also use the CoMSA to ask “how effective would adding an additional increment of time to this student’s study time be?” The effectiveness depends on a host of factors, but primarily on where the student currently stands in relation to their learning curve. If the student has already mastered the material (Fig. 1), they may not benefit from an additional hour of study at all. If the student is too early in their learning journey—for example, they do not have the basics of biology or chemistry nailed down but are being tested on advanced biochemical concepts—an additional hour of study may not be enough to show any progress on the classroom exam of interest (despite the fact that the student is improving their biology knowledge more generally).

Of course, it is also possible that a value intervention affects both willingness to study (maximum study time) and norm of study concurrently. Or perhaps the intervention works on an entirely different aspect of the CoMSA. For example, higher value may lead students to think more deeply and make connections between the biology content and prior knowledge. Or perhaps, as Xu et al. (2021) found, a growth mindset intervention reduces cognitive load, thus enabling students to better learn from a text on the Doppler effect. In these cases, increased cognitive engagement might be better instantiated as a shift to the slope of the learning curve, because the students benefiting from the motivational interventions are using their time more efficiently. Similarly, one might think that a retrieval practice intervention might shift the slope upwards (increase learning rate) and reduce metacognitive bias, because retrieval practice has been shown to improve learning rate and metacognitive accuracy over restudy controls (Schuetze et al., 2019). These interventions represent just a selection of many construct- or strategy-focused theories in educational psychology, but a similar analysis of the interleaving effect (Brunmair & Richter, 2019), expectancy-value (Wigfield & Eccles, 2000), or self-determination theories (Deci & Ryan, 2000) could be facilitated through the use of the CoMSA as a theoretical model for anticipating and understanding where and when students would be benefited by various interventions.

Simulation Methods

The mathematical model defined in the previous sections formed the basis of the present simulation. Overall, 22 total model input parameters were manipulated in the simulation. Fourteen of these parameters define the baseline characteristics of the simulation. A further seven parameters have intervention (Δ, read “delta”) counterparts. For the present paper, these intervention parameters are not of interest, as we are focused on understanding the patterns of outcomes in the control data. However, these intervention parameters were used for analyses described further in Schuetze (2023). See the Appendix for a full listing and description of the 22 inputs to the simulation.

Simulation Inputs

In a period of iteration and consultation with subject matter experts, the simulation ranges for the 22 parameters in the simulation were defined. Possible values were selected with the overall imperative to err on the side of expansiveness (i.e., test wider ranges of possible values rather than more limited ranges). The values for each parameter were sampled from a uniform distribution of equally spaced values. For example, the slope variable could have one of 26 possible values: 0.25, 0.30, 0.35, 0.4 … 1.45, 1.50. Uniform distributions were used as a sort of uninformative prior, to explore a wide range of equally probable possible values.

Data Generation

Generating Reasonable Classroom-Level Data

Naively, one might use grid expansion to enumerate all possible parameter combinations and evaluate the effects of every possible intervention in the stimuli space (see, theAppendix Table 4). If we multiply out each possible combination of levels of input parameters (e.g., 26 possible levels for the slope × 32 levels for the maximum willingness to study, and so forth…), there are 6.73 × 1028 possible unique combinations. This number was not computationally tractable (in terms of both generating and analyzing the data). Given the computationally expensive number of possible combinations, a final sample size of 100,000 combinations of inputs was chosen. Because not all combinations of inputs will be valid (e.g., standard deviations were negative), in the first step, 200,000 potential combinations were sampled from the overall possibility space.

To arrive at N = 100,000, these 200,000 combinations were filtered through three steps to assure reasonable and computable combinations to arrive at the final database of 100,000 simulation runs. In the first step, runs were filtered for being uncalculable. For example, in some runs the metacognitive SD was negative after the intervention was applied; since standard deviations cannot be negative, these runs were removed before the simulation was run (this criterion filtered 8% of the 200 k). Second, during the simulation, the code checks whether the control and treatment curves cross over one another. These runs were also filtered as to only focus on positive interventions to the system (this criterion removed 37% of the runs). This second filtering step did not appreciably change the results of the simulation, save for analyses concerning the slope (Logis a) parameter, where the filtering mechanism resulted in more parsimonious results. For ease of interpretation, the final database was limited to a random sample of the remaining 100,000 simulation runs (this third criterion removed 4.5% of runs). Repeated simulation indicated that the results of the following analyses stabilized after as little as 10,000 simulation runs, indicating that the final sample size of 100,000 was more than sufficient to avoid overfitting.

Generating Student-Level Data

After generating 100,000 different simulated classroom contexts, 5000 data points were generated within each classroom. The N of 5000 students was chosen to balance sampling error with computational burden. These groups of 5000 students can be thought of as counterfactual classrooms. Within these counterfactual classrooms, each student’s learning curve input parameters and motivational characteristics were sampled from the relevant distribution for each parameter (see, Table 1).

Table 1 Inputs to the CoMSA

For example, for each of the 100,000 simulation runs, a mean and standard deviation was set for the value of willingness to study. Then, for each of the 5000 students within each simulation run, a value for willingness to study was sampled dependent on the mean and standard deviation set for that particular simulation run (this structure resembles that of a multilevel model). This means that within each classroom, there will be variation in the maximum amount of time each student will spend studying, but generally, students from the same classroom will be more similar to one another than students in a separate classroom with a different underlying set of mean and standard deviation parameters for willingness to study.

All other inputs to the simulation, including the logistic parameters slope (Logis a), intercept (Logis c), and inflection point (Logis d) were also drawn from distributions, where the parameters (means, SDs, etc.…) of these distributions were manipulated between simulated classrooms (see the Appendix for more detail). In summary, students within each simulation run (classroom) were drawn from the same underlying distributions but had differing motivational and learning curve characteristics depending on their draw.

Simulation Outputs

Finally, the performance of students within each simulation run in terms of achievement and time spent studying was calculated. The database can be queried, filtered, plotted, and analyzed to better understand the effects of different individual and contextual factors on the CoMSA. The full database is available at the following OSF project: https://osf.io/z8n3a/. This database includes N = 184,032 simulation runs, representing over 1.8 billion individual simulated students. This number includes runs rejected due to the second and third criterion (see simulation input section above). The results of the present project were derived through an iterative model building and refining process and thus were not pre-registered.

Results

For the following proof-of-concept analyses, data is used from two different meta-analyses to compare the results from the simulation of the CoMSA to empirical data from a wide variety of educational contexts. One of the advantages of the current modeling approach is that it allows for both simulation of individual students and groups of students (i.e., classrooms). In both examples, we simulate classroom-level descriptive statistics by simulating individual students and then aggregating over this student-level data.

The Relationship Between Time-on-Task and Achievement

An open question in educational psychology relates to the importance of time spent on task for predicting achievement. The relationship between time-on-task and achievement is a classic question of educational psychology and one that has produced seemingly inconsistent findings over years of study (Godwin et al., 2021; Vu et al., 2022). In 1981, Karweit and Slavin hypothesized “that the choice of model linking time and learning may be implicated in the inconsistent findings for time-on-task” (p. 171). The following results suggest that the CoMSA could be one such model. Here, data summarized by Godwin et al. (2021) is used as an empirical benchmark against which data from the CoMSA simulation can be compared against.

Time-on-Task Analysis

The correlation between time spent and achievement was computed for each of the 100,000 simulation runs. This data shows a mean correlation between time spent and achievement of r = 0.42. However, this correlation is context-dependent and is not stable across all of the simulation runs. The 5th percentile r equaled − 0.06, and the 95th percentile r equaled 0.83. The minimum r was − 0.67, while the maximum was r = 0.97 (see Fig. 9). Given that the present simulation assumes a strong causal link between time spent and achievement at the level of the individual student, one might expect it to overestimate the relationship between these two variables. This was not the case.

Fig. 9
figure 9

Density plots of simulated and empirical correlations between time spent and achievement. Note: dashed lines indicate means of respective distributions. The simulation captures the same general trend of the empirical correlation coefficients (mostly positive, left skewed, with a thin negative tail below zero) without any fitting procedure. Density plots can be interpreted as histograms scaled such that the area of the distribution is equal to one

In fact, the r of 0.42 in the present simulation is slightly lower than the figures reported by Cury et al. (2008), who conducted a laboratory investigation, finding a raw correlation between time-on-task during a practice period and achievement as measured by performance on a decoding task on an IQ subtest of r = 0.49. Perhaps more educationally relevant, Godwin et al. (2021) report a summary table of 17 studies examining the relationship between time-on-task and performance in K-12 classroom environments. The estimates of the individual studies ranged from − 0.23 to 0.78, with a mean of approximately r = 0.31. As shown by Fig. 9, the overall distributions of the empirical and simulated data showed clear resemblance, with the Godwin et al. (2021) data showing slightly smaller correlations on average.

Interim Discussion

The present model suggests that the correlation between time spent and achievement will be highly dependent not only on the noise in the data (the more noise, the less likely a correlation is to be high), but also on the features of the learning curve, such as slope, prior knowledge, and intercept. If the slope is high, all else equal, we might expect to see a stronger relationship between time spent and achievement. As for the instances where there is a negative correlation (approximately 8.5% of the simulation data), this could be explained by situations where the norm of study constraint causes already high-performing students not to study much at all, while those with lower prior knowledge starting points may need to invest more time to get to a lower end result. The existence of a negative correlation does not necessarily mean that adding an additional hour of study time is actually leading to worse performance. Rather, the need for study time is confounded with prior knowledge, creating an illusory negative relationship between time-on-task and achievement. These findings help unify data from the cognitive psychology literature which clearly supports the time-on-task hypothesis (e.g., Zerr et al., 2018) and real-world educational data (Godwin et al., 2021) that previously seemed to contradict the notion that more time-on-task creates higher achievement.

Relationship Between Prior Knowledge and Knowledge Gains

Another open question in educational psychology concerns the seemingly inconsistent relationship between prior knowledge and knowledge gains in academic settings. In the following analyses, the distribution of effect sizes generated through the simulation of the CoMSA is compared to the data collected in a meta-analysis conducted by Simonsmeier et al. (2022). In this meta-analysis, Simonsmeier and colleagues were interested in assessing the “knowledge is power hypothesis.” In other words, the hypothesis is that higher domain-specific prior knowledge is one of the best predictors of future knowledge acquisition. To better understand the relationship between prior knowledge and knowledge acquisition, Simonsmeier et al. collected meta-analytic data from a wide variety of studies from all domains of education (though the majority of studies came from K-12 populations). Essentially, their analyses included any study where at least a pre- and post-test correlation could be calculated using a semi-objective measure of domain-specific knowledge. They did not include studies where knowledge is derived via self-report Likert scale metrics or broad indicators such as IQ scores or metacognitive knowledge.

In terms of primary outcomes of interest, Simonsmeier et al. calculated correlations between prior knowledge with final test performance (rp), absolute knowledge gains [rag = cor(pre-test, post-test − pretest)], and normalized knowledge gains [rng = cor(pre-test, (post-test − pretest) / (scale max − pretest))]. In essence, Simonsmeier et al. sought to understand whether prior knowledge was correlated with increased rate of knowledge acquisition (i.e., a positive feedback loop or “rich gets richer” effect). Critically, they found that there was a high pre-post test correlation (rP = 0.53), but a small negative correlation between prior knowledge and normalized knowledge gains (rNG = 0.06).Footnote 3 The CoMSA allows for the calculation of these quantities across the dataset and thus the comparison of the predicted correlations and true correlations found by Simonsmeier et al. This meta-analysis is particularly suited to this task because the three outcome measures give relatively independent information about the classroom environment. Furthermore, Simonsmeier et al. analyzed studies where the primary research question did not relate to the issues of prior knowledge and post-test performance. Thus, there are potentially fewer worries about publication bias in this sample of studies.

After personal communication with the first author of this study, a subset of the data reported by Simonsmeier et al. (2022) was received, composed of the data from all the studies where all three correlations were able to be computed (rP, rAG, and rNG). This data consisted of 270 correlation triplets. The average sample size was 139, ranging between N = 8 and N = 1508. These three correlations all give unique insight into the dynamics of knowledge acquisition between pre- and post-test. In the words of Simonsmeier et al., “rP cannot be inferred from rNG or rAG (without additional information) and vice versa” (p. 46).

Thus, the present data provide at least three relatively independent data points for each of these studies that either can or cannot be reproduced by the CoMSA. Although no guarantees about the representativeness of these studies can be made, these data points represent real learning situations and the type of data that can result from student learning between pre and post-tests. One other interesting, yet counterintuitive aspect of gain scores is that they are not easily understood in terms of conventional statistical modeling (Ackerman, 1987). As Clifton and Clifton (2019) show, a gain score’s correlation with a pretest is inevitable due to regression to the mean and the fact that a gain score is calculated from the pretest score. In other words, because the pre-test factors into the calculation of both sides of the correlation rAG, the metric is partially a pre-test score’s correlation with itself. Clifton and Clifton write that “… the change score is always correlated with the baseline score [] and with the post score []. This is purely a statistical artefact, and it exists regardless of whether the treatment is effective or not” (p. 3). In other words, even when there is no causal relationship between pre-test and post-test, the correlation between pre-test and gains may appear negative (and this does not even account for any further complexity introduced by ceiling effects). For this reason, it may be best to simulate a data-generating mechanism at the individual student level and then compare the gain scores that result to empirical data, rather than to interpret positive gain scores in empirical samples as evidence for the rich gets richer effect.

Rich-Gets-Richer Analysis

As shown in Table 2 and Fig. 10, the data before matching had varying levels of similarity to the Simonsmeier data, between 7 and 15% error. While the rP data looked relatively similar to the Simonsmeier data before matching, there were some discrepancies between the simulation and Simonsmeier’s data when it came to rAG and rNG. In particular, the unmatched simulated data from the CoMSA was more likely to produce negative values of rNG and rAG than found in the empirical data. To better understand why the source of these discrepancies, a matching process was engaged such that better-fitting inputs to the simulation could be found. In other words, the following analysis attempts to answer two questions: (1) Can the CoMSA produce realistic patterns of data? And (2) of the inputs to the CoMSA in the current simulation, which of these inputs is most likely to produce realistic data patterns?

Table 2 Mean correlations for simulated and empirical data and chi-squared tests of the equality of correlation matrices
Fig. 10
figure 10

Comparison of correlation coefficients pre- and post-matching. Note: rP = correlation between pre- and post-test. rAG = correlation between pretest and absolute gains. rNG = correlation between pretest and net gains. The goal of the matching procedure was for the red distribution (matched data) to resemble the black line indicating Simonsmeier’s data

In this matching process, each row from the Simonsmeier data was matched one-by-one to the closest simulation run in terms of rP, rAG, and rNG, such that the root mean squared error between the three pairs of correlations derived from each empirical study and the matched simulation run was minimized. This approach was relatively successful in finding 270 matching pairs of empirical data and simulated data, as evidenced by the descriptive statistics presented in Table 2.

The last column of Table 2 presents a series of Jennrich (1970) tests of the equality of correlation matrices. Perhaps somewhat confusingly, these tests are testing the correlation matrix computed from outcome variables, which themselves are correlations. The resulting chi-squared values from these tests can be understood as a sort of model fit index. The results shown in the right-most column of Table 2 indicate the following: Before the matching process, the correlation matrices between the three variables of interest (rp, rAG, and rNG) differed significantly between the simulated and empirical data, indicating relatively poor model fit. After the matching process, the correlation matrix from the matched data was significantly better fitting than that of the full simulation data. Furthermore, the matched data was no longer significantly different from the Simonsmeier data. Thus, the matching process resulted in a model that both mimicked the means and correlations of the Simonsmeier data.

Table 3 Comparison of matched and unmatched rows from the simulation for simulation inputs

This was further confirmed by visual inspection of the density plots comparing the true versus matched data. After matching, the distributions looked relatively similar for all three correlational metrics, except for a few regions in the extremes, with only 1–2% error. Specifically, the empirical data tends to be more extreme than would be anticipated by the simulated data. This seems at least partially attributable due to sampling error in the empirical data that does not exist in the simulated data (since the simulated data is estimated with N = 5000 students, it is much more precise than the average study contained in the Simonsmeier meta-analysis).

After building the matched dataset, it was of interest to determine if the matching procedure favored some of the input values of the CoMSA over others. If some values of the inputs to the CoMSA were more common in the matched data than in the non-matched data, this would suggest that these values for the CoMSA might be more reasonable than the relatively flat input distributions used to run the simulation. In other words, can we use empirical meta-analytic data to better understand what the inputs to the CoMSA might look like in real education settings?

To test whether the matched simulation rows differed from the overall inputs to the simulation, t-tests of means and the Kolmogorov–Smirnov tests of equal distributions between the matched and unmatched data were conducted. As summarized in Table 3, the largest difference between the matched and unmatched data was found in the distribution of the input of the learning curve intercept (Logis c). All but four variables showed significant differences between the matched and unmatched data at the p < 0.001 level. These four variables were: performance criterion, metacognitive precision, floor-time correlation, and criterion-time correlation.

What does it mean when there is no significant difference between the matched and unmatched rows? The lack of significant difference indicates that the subject-matter-expert-derived inputs to the simulation were “hitting the mark,” rather than needing to be shifted during the matching process. For example, the current simulation tests a range of correlations between learning curve intercept and motivation (willingness to study) from r = 0 to r = 0.85. This nonsignificant result simply means that the mean of these intercept-time correlations in the matched data did not significantly differ from 0.43, the mean of the correlation in the full set of simulation data.

Given that the mean may not be the only parameter of interest in understanding the differences between the matched and unmatched distributions, and significance testing may be a somewhat blunt instrument for the present purposes, it may be best to visually inspect each of the relevant distributions visually, which are shown in Fig. 12. Overall, we see that the matched simulations were more likely to come from simulations with low starting floor performance (Logis c), relatively high criterion-time correlations, and relatively low baseline metacognitive bias (underconfidence). We also see that the baseline time limit tends towards the lower end of the input distribution suggesting that this data comes from studies where students are putting at least a moderate amount of learning effort into between the pre- and post-tests.

Interim Discussion

The above analyses show that the CoMSA could reproduce the data collected in the Simonsmeier et al. (2022) meta-analysis. However, caution should be taken, as in-depth analysis would have to be undertaken to determine if there are competing theories that would better explain this data. A prerequisite for such an analysis would be for competing theories to also be computationally formalized (which would be a positive step for the literature). What can be inferred from the present analyses is as follows: After the matching process, the data resulting from the CoMSA is statistically indistinguishable from the empirical data collected by Simonsmeier, while also being generated by a relatively plausible model of learning over time.

The matching process indicated that the empirical data was best matched by simulated learning situations where relatively high amounts of learning were occurring. For example, the matched data almost uniformly showed low starting prior knowledge (Logis c), relatively high slopes (Logis a), and relatively high norms of study (most between 75 and 95 percent). This makes conceptual sense, as one might expect teachers to scaffold content, such that they are engaging students at the edge of their zone of proximal development (Vygotsky, 1978). Theoretically, this is where students are expected to feel the most motivated and for the most learning to occur—this is as opposed to potential scenarios outside the zone of proximal development where learning is too hard or too easy and very little payoff can be made (Puntambekar, 2022). Furthermore, the simulation lends support to the notion that there is a positive correlation between time-on-task and one’s starting point. Indeed, the mean matched correlation between the intercept of the learning curve and motivation is r = 0.41 and is significantly greater than zero. This constraint is in line with work on self-efficacy theory that finds reciprocal relationships between achievement and self-efficacy (Schunk & DiBenedetto, 2020, 2021). In other words, the present study adds evidence for the “big-fish-little-pond” relationship between within-class achievement ranking and motivation, which has been recently shown strong support by a national census analysis of entire grades of Luxembourgish elementary students by van der Westhuizen et al. (2023).

General Discussion

The purpose of the present paper is twofold. The first purpose was to introduce the CoMSA, which represents a novel computational model of school achievement that provides a framework through which researchers can understand student achievement and where it may be best to intervene in order to improve outcomes. The second purpose was to show how formal modeling approaches, as embodied by CoMSA, can be used to test and expand upon psychological theory and potentially explain two open questions in the educational psychology literature as a proof of concept. In this section, these analyses are discussed first, and then the paper concludes with the broader aims and limitations of the present modeling approach.

Proof-of-Concept Analyses

The proof-of-concept analyses showed that two open research questions, with heterogeneous findings in the literature, could both be modeled as resulting from populations of students defined by varying distributions of learning curve shapes and motivation. In the first proof-of-concept analysis, the simulation data was compared to the highly variable empirical results concerning the relationship between time-on-task and academic achievement (the so-called time-on-task hypothesis). Godwin et al. (2021) summarize this variation in correlations across studies, from r = − 0.23 to 0.78, indicating that “the relationship between time and learning remains elusive as prior research has obtained mixed findings” (p. 503). Although potentially a single value is elusive, this variation reported by Godwin et al. is in line with the fundamental predictions of the CoMSA and may not be much of a conundrum as has been suggested by earlier reviews in the literature (e.g., Godwin et al., 2021; Karweit, 1984; Vu et al., 2022). Put simply, the data summarized by Godwin et al. (2021) can be created even when assuming that time-on-task is always a positive influence on learning.

In the second question of interest, we looked at how well the data generated from the CoMSA matched the results of Simonsmeier et al.’s (2022) data concerning the relationship between prior knowledge and post-test performance. Here, the CoMSA was used to calculate potential correlations between raw performance, absolute gains, and net gains. Using a matching process, we determined that the CoMSA could recreate the patterns of correlations seen in the empirical data summarized by Simonsmeier et al. (2022) with less than 2–4% error if the correct ranges of inputs were used.

Broader Implications of the CoMSA and the Formal Modeling Approach Generally

The CoMSA represents one of many potential paths forward through computational modeling within the domain of educational psychology. The primary strength of this model lies in its ability to simulate the interactions between motivation, cognitive processes, and achievement over time. With further refinement and software infrastructure, such counterfactual simulation tools could be invaluable for both researchers and educators. Furthermore, CoMSA distills and builds upon the fundamental notion—embodied by Carroll’s (1962, 1963) model of school learning—that there are two general approaches to improving learning: either a student can study better or they can study longer (Bloom, 1976). Here, we may slightly modify this maxim to say that either you can shift the learning curve (i.e., engage in learning strategy or instructional interventions), or you can shift your position along the learning curve (i.e., engage in motivational interventions). In this way, hopefully, the CoMSA can facilitate the synthesis of different research threads within educational psychology, providing an overarching framework through which new data can be analyzed and findings interpreted.

More broadly, formal computational models offer several advantages over traditional verbal models for studying educational phenomena. Perhaps counterintuitively, one of the main benefits of using formal models is that they make the assumptions and shortcomings of our models become more clear. Though it is painful to see where our models fail, this increased clarity can help us better pinpoint areas of improvement. Ultimately, embracing a model-centric framework in educational psychology can optimize study design and hypothesis testing, in addition to supporting the development of integrative theory and fostering practical application of education research (Bjork, 1973; Devezer & Buzbas, 2023).

Limitations, Future Directions, and Concluding Thoughts

Though the CoMSA may be a useful tool, there are undoubtedly limitations to the present modeling exercise. Learning is a dynamic, chaotic, and unruly process, which the CoMSA can only attempt to approximate through differences in learning curves, time spent on task, and metacognitive biases. That said, the economy is a dynamic, chaotic, and unruly process, too. And economists have gotten substantial mileage out of shifting supply, demand, and labor curves. Humphrey (1992) calls supply and demand curves: “Undoubtedly the simplest and most frequently used tool of microeconomic analysis” (p. 3). Educational psychologists might find that learning curves could function as one of the simplest means of analysis of academic achievement.

As van Rooij and Blokpoel (2020) note, computational modeling is an iterative art, much like sculpting. Formal models represent the creators’ own biases and understandings of a complex subject matter. For this reason, it would be inappropriate to consider the present model to be set in stone. Rather—to paraphrase Gerald Bullett (1950)—this model is offered up not as a final contribution to the scholarship of the subject, but as a means of the first (or maybe second; Carroll, 1962) approach to it. The mark of a strong model, then, will be to see its assumptions validated and flaws ameliorated by future scientists (see Devezer & Buzbas, 2023).

While the current analysis primarily concentrates on classroom-level descriptive statistics, future research can extend this model’s capabilities by exploring its capacity to track individual learning trajectories and students’ responses to interventions. Furthermore, for the sake of parsimony, the current simulation only concerns itself with self-regulated learning in the pursuit of a single goal. And this goal is assumed not to change over time. Future studies could also expand upon this assumption by considering how learners react in real-time to progress or setbacks. Perhaps, a learner finds that material is much harder than they anticipate (i.e., the slope of the learning curve is very shallow). They may potentially decide to lower their norm of study as a response (see Ackerman, 2014).

Additional insights could also be garnered through the simultaneous modeling of multiple learning curves with shared motivational resources (i.e., to model a student having to decide between studying for multiple classes). One might also consider extending the CoMSA model from the test level to the item level. In this case, researchers could examine how students respond when they realize they understand item A better than item B. Historically, these types of inquiries have been central to many self-regulated learning models (e.g., Thiede & Dunlosky, 1999). There is room for progress to be achieved by computationally linking the CoMSA to these more fine-grained models of self-regulated learning and study-time allocation.

Altogether, the present simulation results represent just a subset of potential uses of the CoMSA. Beyond these proof-of-concept analyses, the CoMSA also provides a mathematical and graphical interface through which researchers, teachers, and policy makers can consider how achievement is manifested through motivational and cognitive factors. Furthermore, the CoMSA unites often disparate research on motivation, cognition, and metacognition into a single computational model. The CoMSA grounds existing theories from educational psychology in the concrete behavior changes that can alter academic achievement, making nuanced predictions about the interplay between prior knowledge, motivation, learning strategies, and academic outcomes.