It is well established that measurable expertise differences exist between experts and novices in the way they solve a variety of visual problems in their domains, including interpreting data visualisations in sciences, completing surgery or making a diagnosis in medicine, winning a game of chess, analysing search actions in sports and many other domains (Abernethy & Russell, 1987; Gegenfurtner et al., 2011; Hainke & Pfeiffer, 2021; Harsh et al., 2019; Jaarsma et al., 2014; Prümper et al., 1992; Puppe et al., 2021; Riggs et al., 2018; Van Meeuwen et al., 2014; Wolff et al., 2016). Yet, the methods by which experts scaffold novice learning, especially in one-on-one contexts like apprenticeships, are still understudied (Jarodzka et al., 2020). The aim of this scoping review is to identify and organise these methods. We focus on tasks that require the interpretation of (complex) visualisations for problem-solving, and thus on visual expertise.

When visual experts scaffold novices in one-on-one interaction, a unique dynamic emerges. Since many experts practice within their field in addition to teaching, they may possess little (if any) didactic knowledge about how to scaffold novices to allow them to solve domain-specific visual problems (Bromme et al., 2005). This lack of didactic knowledge is exacerbated by three phenomena that are characteristic of expertise: encapsulation, jargon usage, and lack of codifiability of tacit/implicit knowledge (Meyer, 2008; Rikers et al., 2000; Zimmermann & Jucks, 2018). Through repeated application, the knowledge an expert possesses becomes efficiently organised, a process known as encapsulation (Rikers et al., 2000). Lower-level concepts become nested within higher-order concepts and may become difficult to access for the purpose of instructing novices. Expertise is often accompanied by (meta)linguistic resources like jargon, which can be difficult for novices to understand (Carr, 2010). Lastly, experts often acquire tacit/implicit knowledge, a knowledge type which is largely non-codifiable, in other words, organised in a way that makes it difficult or impossible to express through verbal or written means (see the concept of ‘meaning barrier’; Bromme & Jucks, 2018; Bromme et al., 2005; de Jong & Ferguson-Hessler, 1996; Meyer, 2008). Furthermore, in instruction designed for novices, complex visual elements are disassembled and simplified to facilitate learning. However, for experts who are comfortable working with complex (encapsulated) visual elements, such breakdowns may remove context or add unnecessary information, to the point where the teaching task becomes unnecessarily demanding for the expert. This is known as the expertise-reversal effect, which further exacerbates the disparity between experts and novices (Hinds et al., 2001; Kalyuga, 2009).

Understanding how to bridge this disparity may improve the quality of visual expertise scaffolding in face-to-face expert-novice dyads in different domains. The present review investigates the existing body of literature to understand what methods experts use to scaffold novices use during visual problem-solving tasks to overcome this disparity. We review research in which visual experts use scaffolding behaviours to convey their knowledge regarding visual tasks to novices.

The Function of Scaffolding

The process by which novices gain visual expertise can be scaffolded by visual experts. Scaffolding is an instructional method where an instructor provides support to a learner to reduce the complexity of a task, thereby enabling the learner to complete it. Reducing complexity allows for cognitive resources such as working memory capacity to be allocated more effectively, improving learning. Scaffolding was first introduced in 1975 (Bruner, 1975; Wood et al., 1976). The concept is closely linked to works of Vygotsky, Luria, and Bernstein, particularly the zone of proximal development or ZPD (Shvarts & Bakker, 2019; Vygotsky, 1978). Scaffolding has been adopted as a general instructional method aimed at providing access to the ZPD. Van de Pol and colleagues (2010) isolated three components of scaffolding: contingency, fading, and transfer of responsibility. In their terms, contingency refers to the concept that scaffolding employed by the expert must be contingent, that is, dependent on and responsive to the learner’s needs and responses; fading refers to the gradual reduction in scaffolding techniques as the learner becomes more capable; and finally, transfer of responsibility occurs when the learner has successfully mastered the task at hand.

Although these components help to describe the timeline of scaffolding, they do not provide concrete methods by which experts can scaffold novices, i.e. scaffolding behaviours (Applebee & Langer, 1983). Van de Pol and colleagues (2010) describe how the existing number of scaffolding behaviours is too vast to report. Differing categorisations of scaffolding can be found in various articles (e.g. Anghileri, 2006; Collins et al., 1989; Hsin & Wu, 2011; Pentimonti & Justice, 2010; Sharpe, 2008; Thompson, 2009). Understandably, Mermelshtine (2017) suggested the current approach to studying scaffolding is “less than unified”, due to discrepancies in the methodology of previous research.

In the context of visual expertise, scaffolding behaviours by experts can help novices to grasp complex visual information and tasks. This is relevant given that one-on-one interactions remain the dominant instructional method in many learning environments, e.g. through apprenticeships. However, a clear overview and method of categorisation for scaffolding behaviours in visual contexts are lacking. This scoping review aims to understand the range of scaffolding behaviours present in expert-novice interaction research in visual domains and to create a unifying framework that allows scaffolding research with diverse methods to be integrated.

A Scaffolding Framework Based on Cognitive Load Theory

To create such a framework, our approach starts with the underlying cognitive system where learning takes place: memory. We pose that the core function of scaffolding is to overcome the limitations of the novices’ working memory, as described by cognitive memory models (Atkinson & Shiffrin, 1968; Miller, 1956; Peterson & Peterson, 1959). By using cognitive models of working memory and cognitive load theory (Paas et al., 2003a, 2003b; Sweller, 1988; Sweller et al., 1998, 2019) as the foundation for our framework, we can not only organise scaffolding behaviours into coherent categories but also connect scaffolding categories to underlying cognitive processes. This will allow for a more fundamental understanding of the role of scaffolding in teaching visual problem-solving in face-to-face settings. Additionally, it will enable us to elucidate the relationships between various scaffolding behaviours, including how they may work in harmony and complement each other in practical application.

We centre cognitive load theory (CLT) in our approach (Choi et al., 2014; Paas et al., 2003a, 2003b; Sweller, 1988; Sweller et al., 1998, 2019; Van Merriënboer & Sweller, 2010). Our choice to centre CLT stems from the connection CLT has with both expertise development and instructional design (Ayres & Paas, 2012; Blayney et al., 2015; Paas et al., 2003a). Additionally, centring CLT allows us to implement broad ‘umbrella’ categories, which encompass various scaffolding behaviours. To explain how CLT is used to categorise scaffolding behaviours, we first provide an overview of memory. Various memory models are integrated in Fig. 1 to provide a cognitive basis for the present research (Atkinson & Shiffrin, 1968; Baddeley & Hitch, 1974; Mayer, 2002; Paas et al., 2003b; Tulving, 1972; Van Merriënboer & Sweller, 2010).

Fig. 1
figure 1

Working memory model which provides a cognitive basis for expertise development and effective scaffolding. Note. Interpreted from the multi-store model of memory by Atkinson and Shiffrin in green (1968), the long-term memory model by Tulving in red (1972), the working memory model by Baddeley and Hitch in yellow (1974), which influenced the cognitive theory of multimedia learning by Mayer in blue (Mayer, 2002) and cognitive load theory in purple (Paas et al., 2003b; Van Merriënboer & Sweller, 2010). Grey sections indicate overlap between two or more models

In Fig. 1, we start with sensory input (1) through verbal and nonverbal behaviour displayed by the expert, such as pointing or asking questions. This information enters the novice’s sensory memory (2) through their eyes and ears (Mayer, 2002) but is immediately lost if no attention is paid (Atkinson & Shiffrin, 1968). By directing attention to this sensory input (Atkinson & Shiffrin, 1968), words and images are selected to bring into the working memory (3) (Mayer, 2002). Here, sounds and images are organised into coherent verbal and pictorial models by building connections between elements, a process based on the phonological loop and visuospatial sketchpad of Baddeley and Hitch’s model (1974; Mayer, 2002). Visual and auditory information is processed separately; this is known as the dual channels principle (Mayer, 2002). Processed and attended information in the working memory, either visual or auditory in nature, can be encoded into the long-term memory (4). The working memory (3) has a limited capacity of 7 ± 2 elements, of which 2–4 can be actively processed (Miller, 1956; Peterson & Peterson, 1959). This information must be rehearsed every 20 s, else it is lost (Atkinson & Shiffrin, 1968). The auditory and visual channels are both limited in their capacity, but the capacities of both channels are independent (Mayer, 2002). From the working memory (3), information can be encoded into the long-term memory (4), where knowledge is stored (Atkinson & Shiffrin, 1968; Mayer, 2002). This stored knowledge in the long-term memory can be subdivided into procedural and declarative knowledge (Tulving, 1972). Procedural knowledge, or knowing ‘how’, encompasses knowledge about how to complete tasks or actions, such as knowing the steps to do the laundry or play tennis. This includes many automated schemas, which is stored information that can bypass the working memory and influence behaviour directly (Van Merriënboer & Sweller, 2010). Declarative knowledge includes semantic knowledge, which is factual information such as knowing one’s name or date of birth, and episodic knowledge, which is knowledge about specific events that occurred at a timepoint in the past (Tulving, 1972). These knowledge types can be retrieved into the working memory. This process of retrieval strengthens the knowledge in the long-term memory, reducing the likelihood of information decay. When information is retrieved into the working memory, it can be integrated with visual/auditory input and used to drive behaviour (5); and as memory is an active (re)constructive process, the knowledge can be updated and re-encoded into the long-term memory.

Although the sensory memory (2) and long-term memory (4) are to our knowledge infinite in their capacity, the working memory (3) can hold limited information for a limited time. This creates a bottleneck in the process of knowledge formation. When too much information enters the working memory, this leads to cognitive overload, causing loss of information (Paas et al., 2003b; Van Merriënboer & Sweller, 2010). Scaffolding behaviours can be employed to help overcome the working memory bottleneck, as they can help the novice to make optimal use of the limited capacity that is available. The working memory bottleneck can be addressed at one of two stages: either ‘before’ working memory, by modulating perceptual/sensory information before or as it enters the working memory, or ‘after’ working memory, through the modulation of information retrieval from long-term memory.

We propose that scaffolding behaviours may be categorised into two conceptual groups based on their function relative to the cognitive load bottleneck. These groups are cueing and chunking.

Cueing and Chunking

Cueing (by some authors referred to as signaling; Mayer, 2002) is a method employed to direct attention during the learning process by utilising specific learning materials (Alpizar et al., 2020). For information to become consciously accessible in the working memory, it must first be attended to (Baars & Franklin, 2003). Cueing behaviours are verbal (auditory) and nonverbal (visual) behaviours made by experts with the purpose of directing attention, such as pointing to a specific part of a diagram. Such cueing behaviours encourage the novice to reduce the information they attend to (i.e. scaffolding ‘before’ the working memory bottleneck).

One trait common to experts with high domain-specific visual expertise is information reduction, also known as attention focusing or attentional weighting (Brams et al., 2019; Goldstone, 1998; Haider & Frensch, 1996; Sheridan & Reingold, 2014). This concept suggests experts can immediately and exclusively target relevant information when presented with a task. This can even occur to the point of inattentional blindness, a phenomenon where visually salient information is not consciously perceived due to attention being directed elsewhere (Drew et al., 2013). Which information is of relevance is determined by the prior knowledge experts have, often in the form of task familiarity, which reduces the amount of (irrelevant) information entering the working memory. As this is a skill specific to experts, novices do not yet (sufficiently) possess this ability. Cueing behaviours allow experts to reduce information for novices, reducing the information entering the novices’ working memory, and in turn reducing the novices’ cognitive load. Relative to the cognitive theory of multimedia learning (CTML), effective cueing assists with the processes of selecting words and images, the bridge between sensory memory and working memory, depicted in Fig. 1 between sections 2 and 3 (Mayer, 2002). Failure to effectively select information to attend to leads to loss of information (forgetting).

Chunking behaviours, on the other hand, are verbal (auditory) and nonverbal (visual) behaviours made by experts that encourage novices to construct, enrich and retrieve cognitive schemas (compensation ‘after’ the working memory bottleneck; see the connection between sections 3 and 4 in Fig. 1). According to CLT, expertise is stored in the form of domain-relevant cognitive schemas (Sweller, 1988). Schemas are created by grouping together or ‘chunking’ relevant information into a coherent mental representation. Once constructed, schemas are strengthened each time they are retrieved (i.e. practised) and can be updated or ‘enriched’ if additional information is added and rehearsed upon subsequent retrievals. Chunking, also referred to as unitisation or pattern detection/recognition in visual domains, grants experts the ability to recognise patterns in tasks due to repeated exposure to similar tasks in the past (Gobet & Simon, 1998; Goldstone, 1998; van Meeuwen et al., 2014). In CTML, chunking behaviours can support the process of organisation of words and images into verbal and pictorial representations—chunks—which can then be integrated with prior knowledge. Chunking behaviours from experts can also stimulate the retrieval of prior knowledge into the working memory, which can be connected with new information (‘enriched’).

Much of the knowledge represented by schemas or ‘chunks’ is not explicit. Non-explicit knowledge is often expressed through behavioural skills, such as playing the piano, which can become automatic over time. This process is called automatisation (see section 4, and the arrow from section 4 to section 5, in Fig. 1), wherein knowledge can bypass the working memory entirely to directly influence behaviour and is developed through extensive use of the same schemas/ ‘chunks’ (Van Merriënboer & Sweller, 2010). This further reduces cognitive load in working memory; when schemas, particularly behaviours, become automated, they no longer need to pass through the working memory. This way, no space is taken up by the activated schema in the working memory, and cognitive resources are free for other uses, such as processing new visual/auditory input.

In this paper, we propose the use of cueing and chunking as categories, in which scaffolding behaviours used by experts to instruct novices can be organised based on their cognitive function. Previous research has designated similar categorisation to attention cueing, a subset of scaffolding behaviours (de Koning et al., 2009). Our categorisation aims not only to unify a wider scope of research but also to stimulate more specificity in the analysis of scaffolding behaviours in existing and future research. For example, ‘questioning’ by the expert could be categorised as either cueing or chunking depending on the content. As visual problem-solving is often taught in one-on-one environments such as apprenticeships, we aim to investigate scaffolding behaviours used by experts (and novices) in one-on-one scenarios in visual domains.

Methods of Expression: Verbal and Nonverbal Scaffolding

Given the dual channels principle (see sections 1 and 2 in Fig. 1; Mayer, 2002) it is important to discuss scaffolding that affects both the auditory channel, verbal scaffolding, and the visual channel, nonverbal scaffolding. Scaffolding behaviours vary not only in their effect on working memory processing, but also in the manner they are expressed. Explicit knowledge may be conveyed via verbal means, and tacit knowledge (such as automated behaviours) may be conveyed through nonverbal means (Davies, 2015). The verbal or nonverbal nature of scaffolding behaviours influences how novices approach tasks, a visual example of which can be found in the case of cabinet makers; “[…] communication occurs […] through drawing. Drawings are done on any surface and all the time […] Eventually the novices also start to draw and if they want to ask or explain something they will draw rather than speak” (Gamble, 2001).

It is also possible for verbal and nonverbal scaffolding behaviours to occur simultaneously, such as an expert who narrates their actions. The CTML (Mayer, 2002), which encompasses the dual channels principle and the modality effect (Ginns, 2005; Mayer, 2002; Mousavi et al., 1995), suggests that within the working memory, separate systems process verbal and visual information from different sensory modalities separately. When congruent information is presented simultaneously through different sensory modalities, such as through verbal and nonverbal means, a synergistic effect takes place that improves the learner’s retention of that information relative to when information is only presented through one sensory modality or another. The modality effect refers to the specific case of improved learning with simultaneous graphical information represented visually and textual information presented auditorily, as occurs in narration (Ginns, 2005; Mousavi et al., 1995). We anticipate the use of verbal and nonverbal scaffolding, both separately and together, to be present in the literature.

Aims and Research Questions

Within this scoping review, the aim is to clarify and categorise verbal and nonverbal scaffolding behaviours utilised in the expert-novice dynamic for expertise communication within visual problem-solving in different visual domains. This will inform future research into the underlying scaffolding behaviours common in the expert-novice dynamic. The research questions are as follows:

  1. 1.

    Which scaffolding behaviours can be identified in research into the expert-novice dynamic in the context of visual expertise development?

  2. 2.

    What is the frequency with which cueing and chunking are manifested through both verbal and nonverbal means in scaffolding behaviours, as analysed within the context of our framework?

  3. 3.

    What (if any) further categories can be defined within our framework?

Materials and Methods

Protocol

The University of South Australia’s Joanna Briggs Institute (JBI) protocol was followed for the present scoping review (Peters et al. 2020). The JBI endorses the PRISMA statement; thus, the PRISMA flow diagram for reporting search results was used (JBISRIR Endorses PRISMA Statement, 2023).

Information Sources and Search

An initial search of 200 papers was explored to gather domains with visuospatial tasks. Relevant keywords were gathered that are associated with domain-specific tasks; the final search term is thus constructed using the terms derived from the initial search. To improve sensitivity, the search consisted of two components. The first component searched thesaurus terms and terms in the title and or abstract for apprenticeship or mentor and visual information or visual problem-solving combined with terms for the visual information-rich domains that were gathered. In the second component, major thesaurus terms or words in the title combining apprenticeship or mentorship with the specific domains were searched. This search term was constructed and refined in a collaboration between the first three authors. Only publications in the English language were considered. Conference abstracts were excluded. Due to visual expertise and scaffolding having various names in various domains, we constructed a search term that sufficiently encompasses not only keywords found in the initial explosion search but also related domain-specific keywords within various disciplines. The full search term can be found in Appendix 1. The final search was completed on July 1, 2022, in the databases Embase and Web of Science, following the recommendation of Bramer and colleagues (2017) (Table 1).

Table 1 Overview of search records per platform

Selection of Sources of Evidence

Rayyan was used to complete literature screening (Ouzzani et al., 2016). The inclusion and exclusion criteria used during screening can be found in Appendix 2.

Extracting Scaffolding Behaviours in Excerpts

Because the nature and methods of different papers isolated within the search were varied, the extraction of excerpts containing scaffolding behaviours also varied. The goal was to isolate each interaction between an expert and novice (or the description thereof) in the results section of each article. Each excerpt aimed to encompass the interaction event itself reported as close to the original dataset as possible. For example, direct descriptions of statistical trends, direct quotes, or researcher observations were included. In some cases, excerpts contained multiple (inter)actions from participants. Excerpts were categorised on two levels of deductive categories: method of expression (verbal and/or nonverbal) and cognitive strategy (cueing and/or chunking). For the definitions of cueing and chunking, tree diagrams were created to provide both an overarching conceptual definition as well as practical examples that are more representative of the content of excerpts (Figs. 2 and 3). Remaining definitions can be found in the coding scheme for excerpts in Appendix 3.

Fig. 2
figure 2

Cueing: from conceptual to concrete

Fig. 3
figure 3

Chunking: from conceptual to concrete

Results

Literature Characteristics

This preliminary screening refined the results from 6533 publications to 164 (Fig. 4) publications for potential inclusion based on the criteria in Appendix 2. The full texts of the 164 remaining publications were independently screened by three raters; rater 1 (the first author) and rater 2 and rater 3 (two additional researchers, of which rater 2 was naive to the project) using the same criteria. Conflicts were resolved through discussion. A total of 18 articles were identified as fit for excerpt extraction by all three researchers. These articles are summarised in Table 2.

Fig. 4
figure 4

PRISMA flow chart specifying record identification and removal. Note. Record screening was completed by the first author using criteria in Appendix 2. Full report screening (n = 164) was completed by the first author and two additional researchers. This diagram was generated with the PRISMA flow diagram tool (Haddaway et al., 2022)

Table 2 Overview of included articles and researcher notes

Inductive Category: Active vs. Passive

After familiarisation with the data, an additional inductive category was added to the existing deductive categories: active or passive scaffolding. This category was derived from the concept of ‘indirect scaffolding’ from Tambaum (2017) and Tambaum & Normak (2018), defined in Tambaum & Normak (2018) as “the turns in which the tutor is available for the learner but only observes the learner acting independently and does not interfere even when the learner faces a problem, remains thinking, tries to make or makes a wrong move” (p. 236). It allows for examination of purposeful absence demonstrated in excerpts, e.g. taking a step back to allow the novice to take on responsibility. Because this category describes the absence of behaviour, excerpts that were categorised as ‘passive’ were not further categorised into verbal and/or nonverbal or cueing and/or chunking.

Excerpt Extraction and Coding

In total, 212 excerpts were examined. Of these 212 excerpts, 164 were categorised as fit for inclusion in the final analysis using the coding scheme in Appendix 3. The distribution of excerpts per article can be found in Table 3, and the categorisation of all excerpts can be found in Table 4.

Table 3 Excerpts from each article per domain, inclusion, and representation in results
Table 4 Overview of excerpt categorisation

Inter-rater Reliability

To test the clarity and applicability of the themes, a 20% sample of the 212 original excerpts was selected through stratified random sampling (Iliyasu & Etikan, 2021). This method ensured the number of excerpts chosen from each article remained proportionate; however, within each article, the excerpts were chosen at random. Excerpts were independently rated by two raters. Cohen’s kappa was determined for each theme and total exclusion. Reliability was calculated both for each individual category, as well as the combinations of categories that are reported in the results section. This information is shown in Table 5.

Table 5 Inter-rater reliability for 20% of excerpts

Active Verbal Cueing

Within expert-novice interactions, 59.76% of the analysed excerpts described actions characterised by cueing. Four observed findings in these active verbal cueing excerpts are described below.

Finding 1: Experts Provide Concrete Information to Reduce Uncertainty for Novices

Scaffolding in the form of active verbal cueing often manifests in experts providing parts of information to allow novices to complete a thought or action themselves. This restricts the amount of information novices need to attend to. Tambaum (2017) describes a variety of direct scaffolding behaviours. In order from most frequently to least frequently occurring in the researcher’s dataset, these active verbal cues are hinting, redirecting the learner, making fill-in-the-blank requests, highlighting critical features, and finally ‘splicing in’ (also described as completing the learner’s reasoning). The most frequently occurring verbal cues serve to direct the attention of the learner to a specific visual area without completely giving away the solution; they narrow down the set of possible correct next options, such that novices still need to make a choice, but have fewer variables to juggle. Another example of narrowing down choices was observed by Clement & Steinberg (2002, p. 413), where a physics tutor asks, “Will the green change to something new or not?” By posing a closed question with a yes or no answer, the tutor limits the number of answers to two options, allowing the novice to consider those two options in depth. Such cues reduce the uncertainty for novices, often in contexts where a decision or answer is necessary.

Finding 2: Active Verbal Cues from Experts Prompt Nonverbal Action from Novices

In many cases, when an active verbal cue is given by an expert, it serves to prompt a nonverbal action from a novice—put simply, “do this”. Two articles, Ong et al. (2016) and Jaarsma et al. (2018) explicitly identified active verbal cues that describe where to look. Ong and colleagues (2016, p. 593) describe how “Both [the novice trainee and the expert surgeon] agreed that [expert surgeon] emphasised landmarks”. Jaarsma et al. (2018) similarly describe verbal ‘search tactics’ with the examples: “This is our main area of interest” and “We need to look for a desmoplastic reaction”. However, landmarks and ‘where to look’ occurred more frequently in excerpts of combined verbal and nonverbal cueing, than just verbal cueing alone. Instead, verbal instruction was more often used to prompt novices to take nonverbal action. As many visual tasks require multimodal information, often tactile information, it makes sense to start with an active verbal cue to induce a sensory experience before providing contextual or explanatory information. This is demonstrated by expert adze knappers (stone tool makers) in Indonesia: “Experts also engage in more general exposition about proper stance, technique, and even attitude. Observed examples include apprentices’ being told to work more slowly, […] and to strike the high or projecting parts of an edge first” (Stout, 2002, p. 703). Findings from Tambaum & Normak (2018) show that commands are often given in the first half of the ‘turn’, suggesting there may be a temporal component to the prevalence of active verbal cueing. This is reflected by an attending surgeon’s instruction, observed by Brady and colleagues (2021, p. 1803), “Pull back. Good. You got some very mild resistance when you pulled back, and that’s perfect. If you’re really having to pull—if you’re really having to apply pressure or if the patient starts coughing, that means to stop”. Here the initial active verbal cue is ‘pull back’, followed by an explanation of the tactile sensory input of ‘mild resistance’ and ‘really having to pull’. A similar verbal structure is found in Sutkin et al., (2018, p. 538) “R4 is preparing to sew peritoneum […] she must twist her wrist to approach the tissue. A4: ‘Other way. Other way. Other way.’ […] A4: ‘That way you're sewing towards yourself’”. The context provided by multisensory input is crucial to provide novices with a reference point for further instruction; this is reflected in a bakery apprentice’s observation:

At the beginning, when I was learning it, the master came up to me and said all the time: ‘touch the dough, try to feel it. This is the way it should feel every time. In the end, it was up to me to do it. Then he came over to feel the dough and inspected it: ‘it’s a bit too stiff, it needs to be a bit softer’ and so on. (Nielsen, 2008, p. 255)

Furthermore, if such sensory information is absent, it may impede the progress of the novice, as demonstrated in the failures of experts explaining computer interactions to novices:

The number of failed attempts was relatively big because during the first half of the session the tutors did only present declarative knowledge, e.g. what to do (take the cursor there). By the end of the session, both tutors of beginners redirected their focus from the display to the learner and phrased procedural knowledge e.g. how to do (how the mouse should be handled to get the cursor there). (Tambaum & Normak, 2018, p. 242)

Here, the link between the moving cursor (visual information) and physically moving the mouse (tactile information) is not effectively established, and it is not until verbal cues about the novice’s nonverbal behaviour are introduced that the outcome is improved.

This finding is particularly interesting in the context of Jaarsma et al. (2018), where researchers asked histopathology experts to explain how to examine a microscope slide to novices. Here, researchers manipulated who was in control of panning and zooming the microscope slide. When novices oversaw manipulation, experts made significantly more statements regarding navigation and processes (including the tactile movement of the slides) than when experts themselves were in charge. If active verbal cues to prompt nonverbal action by novices is a useful strategy for experts, one way to prompt more of this behaviour from experts could be to ensure novices have tactile control over the stimulus.

Finding 3: Experts Reassure by Prompting Novices to Continue

Although most active verbal cues referred to a tangible behaviour or element, verbal cues also served to scaffold novices when taking conceptual steps. This was often done by providing reassurance. Providing reassurance is a key component of scaffolding, as novices completing a task by themselves are often faced with the dilemma ‘have I chosen the right action?’. Providing reassurance scaffolds novices, as the experts take away the novice’s uncertainty, allowing the novice to reduce the information they attend to by allowing the novice to ignore meta-dialogues about whether they have taken the right steps and focus on procedural steps relevant to the task.

Often, reassurance comes in the form of what Tambaum (2017) refers to as ‘pumping’, which the researchers describe as “stimulating the learner to go further without specific instructions, e.g. by asking ‘What else?’ or ‘What did we do last time in the same situation?’” (p. 104). In other words, experts provide a nonspecific verbal cue that scaffolds the meta dialogue of ‘have I made the right decision?’ and allows novices to continue with reasoning steps or actionable steps. In Tambaum (2017), this was found to be the second most frequently used scaffolding behaviour, making up 42 of the 208 observed scaffolding behaviours in this study. Examples of this behaviour can also be found in Brady et al., (2021, p. 1804), where the attending physician (labelled as A-) echoes the fellow’s (labelled as F-) words to provide reassurance; “F6: ‘So now I come out?’ A6: ‘You come out.’ F6: ‘Until I’m here.’ A6: ‘Yeah.’ F6: ‘Okay.’ A6: ‘Now?’ F6: ‘Now I have....’ A6: ‘Have him take a deep breath and open.’ F6: ‘Can you take a deep breath, Sir?’”. In this scenario, the attending physician confirms the fellow’s actions until the fellow trails off- once this occurs, the attending physician fills in the blanks with an active verbal cue to prompt behaviour (as described in the previous section). Reassurance is also shown in Clement & Steinberg’s (2002, p. 404) observation of physics tutoring: “[novice speaking] ‘You’re never going to be completely empty of the charge. You’re always going to have some charge. Whatever metal, or whatever you have, there’s always gonna be some amount there.’ [expert] ‘You’re on a good track…’” Here the expert uses an active verbal cue to prompt the novice to continue their train of thought.

Although most reassurance stimulates novices to keep acting, reassurance can also serve to stimulate novices to continue thinking and self-correct:

F: ‘Should we extend this incision? Do you guys think?’ A does not say ‘no,’ but instead answers F’s question with a question: ‘You think so?’ F: ‘If not, can I replace this with a grasper?’ A, agreeing with F’s decision to not extend the incision: ‘Yes. That’s what I would do’. (Sutkin et al., 2015a, p. 248)

This form of feedback allows novices to retain independence while they course-correct, and in addition, it allows experts to steer novices’ behaviour (a form of cueing) without the stress of the confrontation that comes with negative feedback/criticism. Such stress is a factor that may limit experts in their expression of feedback. They may feel comfortable with verbal reassurance, but not with criticism, potentially restricting the amount of instruction and cueing they provide:

I don’t know how to give critical feedback. That’s what I’m bad at... If I notice that she [the nursing student] doesn’t seem to care much about the patient, for example. Then it’s hard for me to tell her. I’m not that kind of person. [...]

Well, the students are often young and nervous about their first placement. They can be scared and anxious. You want to be a little kind, you know—you need to be careful in how you articulate the feedback. (Froiland et al., 2022, p. 901)

Finding 4: Active Verbal Cueing Is a One-Way Phenomenon

Cues from novices to experts are seldom described in the collected excerpts for active verbal cueing, except for Deken and colleagues (2012), who describe how novice designers collaborate on visual design tasks with experts. Here, novice-driven ‘information seeking’ is described, where novices explicitly request information or pose questions. Even in this research, novice information seeking is scarce, making up only 8% of observations. Within this 8%, four sub-processes are described by Deken and colleagues (2012, p. 212): “Firstly, the novices queried the expert with the aim of improving their problem understanding (36.2%), and obtaining organisational information (32.4%). To a certain extent, the novice inquired about past solutions (17%) and about future meetings between the expert and their team (13.7%)”. The information requested by novices is limited in scope and often clarifies information that has already been established. Novices may rarely engage in active verbal cueing with experts because they do not yet have the contextual knowledge to know which questions to ask. In addition, Deken and colleagues (2012, p. 210) also describe that the already scarce information-seeking behaviour from novices decreases as the project progresses; “the more the design is defined, the less time the novices spent on explicitly querying the expert’s knowledge”.

Active Nonverbal Cueing

Of the 164 excerpts, 20 (12.20%) were categorised as active nonverbal cueing.

Finding 5: The Use of Isolated Nonverbal Cues Requires Some Novice Proficiency

Based on the prevalence of the terms ‘modelling’ and ‘demonstration’ in previous scaffolding literature, it was expected that much of the active nonverbal cueing excerpts would consist of such examples. However, only one reference to demonstration was observed by Nielsen (2008, p. 254): “With us, it is very much like the old apprentice principle. You are shown how to do things, then you try, and if you mess it up, you are shown once more […]”.

Instead, the majority of excerpts containing only active nonverbal cues came from Ong et al. (2016), Sutkin et al. (2015b), and Sutkin et al. (2018), all three of which are in medical domains. Here, the novices (surgical trainees, residents) already have years of theoretical experience and opportunities to practise in simulated medical settings. In these excerpts, it is apparent that many processes are already familiar to both the experts and the novices, as the novices know what to do with very little prompting:

The attending surgeon initiates an action that immediately results in a corollary action by the surgical trainee [....] Attending physician (A) grabs the left leg stirrup, squeezes the handle, and lowers the left stirrup and leg. Resident (R) immediately performs the same motion on the right, lowering the right stirrup and leg. No words are exchanged about the legs being lowered. (Sutkin et al., 2015b, p. 254)

It appears that some novice proficiency is required for active nonverbal cues, without verbal support, to be effective.

Such nonverbal cues streamline the process and allow for more efficiency in the completion of visual tasks. Leff and colleagues (2015) describe how nonverbal cues from experts in the form of gaze overlays allow novices to fixate more quickly on the right target:

[...] whilst operating under gaze guidance, trainee fixations appear to be more localised to the nodule to be biopsied. GL was significantly shorter in Collaborative Gaze Control mode[...]. This suggests that gaze assistance manifests as more rapid fixation on the appropriate target nodule to be biopsied. (Leff et al., 2015, p. 6)

However, in many scenarios, effective nonverbal cueing requires a level of awareness from the novice to look for and read nonverbal cues from experts. Ong and colleagues (2016) provide an instance of a novice not picking up on nonverbal cues from the expert surgeon:

SA tried to give subtle message of independence to TA by moving from 1st to 2nd assistant position midway during the case. TA missed this movement because she was concentrating on performing the operation. When pointed out during the interview, TA recognised what it meant. (Ong et al., 2016, p. 593)

For such cues to be effective, novices must not only have a proficient understanding of the processes in the visual task but also be receptive to subtle communication from experts, which may not be possible if their attentional resources are spent on the task at hand.

Active Combined Verbal-Nonverbal Cueing

Of the 164 excerpts, 28 (17.07%) were found to contain a combination of verbal and nonverbal cueing.

Finding 6: Combined Verbal and Nonverbal Cueing Helps Draw Attention to Landmarks

As mentioned previously, drawing attention to landmarks and ‘where to look’ was sometimes found in excerpts with verbal cues only, but most often found in excerpts with combined verbal and nonverbal cues. Examples of this are found in Sutkin and colleagues’ (2015b) research into nonverbal actions:

The attending surgeon points toward the surgical field or directly touches the surgical field. F: ‘It’s like (unintelligible) on the side. Broad ligament.’ A: ‘Yeah. All that.’ A moves his left hand with index finger extended and hand in a relaxed fist from outside the field toward the pelvic sidewall and withdraws back to outside the field. His left index finger moves toward the structure he is describing. A: ‘Fibroid over there.’ F extends a hand toward the same structure: ‘I feel it’. (Sutkin et al., 2015b, p. 254)

A similar description is found in the verbal counterpart of this paper, Sutkin et al. (2015a), “Attending surgeon (A) is using a laparoscopic pointer to manipulate the tissues while describing the anatomy to the resident (R). A: ‘That’s the end of it. That’s the end of that appendix right there’” (Sutkin et al., 2015a, p. 248). In other domains, landmarks are also used to pose questions:

[expert] ‘Which way is the current moving through this [bottom] bulb?’ [novice] ‘It should be coming that way’ (from above the bottom bulb). ‘I mean it should be coming from where it can’t be coming from.’ [expert] ‘Did it come from over here?’ (from somewhere above the capacitor). [novice] ‘No. It came from the capacitor. So it must come from one end of the capacitor?’ [expert] ‘Absolutely’. (Clement & Steinberg, 2002, p. 401)

Finding 7: Combined Verbal and Nonverbal Clarifies Tool Use

In many visual domains, novices must not only develop expertise on what they are looking at but also how to manipulate through tool use (Kramer et al., 2019). The development of expertise with dentistry tools provides an example:

[Novice] holds a scaling instrument over her mouth model and engages scaling strokes. [Expert] lightly grasps the protruding part of handle in her right hand and lays her left hand over the top of [novice]’s right. Together, [expert] and [novice] slowly perform several scaling strokes, while [expert] issues the directive, ‘Be more deliberate with each stroke.’ Note how [expert] uses molding to shape [novice]’s hands into scaling configuration and motion while directing her to recognize key kinesthetic features of the movement, as she says, ‘Feel how we pause, relax, stabilize, roll’ (here she is referring to coded segments that compose a scaling stroke). (Weddle & Hollan, 2010, p. 129)

Here, the instruction consists of a combination of nonverbal cueing (moving the tool) and verbal cueing (describing the kinesthetic features). Similar cueing is seen in the surgical domain:

As [expert] begins to throw the first suture, [expert] turns to [novice] and says: ‘Push down on the mesh.’ [expert] then hands [novice] a pair of pickups and says: ‘(unintelligible) these (unintelligible) push down on mesh.’ [novice] immediately inserts the pickups into the field to flatten the mesh. (Sutkin et al., 2018, p. 538)

This example can be seen as an extension of finding 2; an active verbal cue is used to prompt nonverbal action from the novice, and this cue is then supported by the nonverbal cue of providing a tool with which to execute the prompt. A similar format is seen in the exchange in the previous work from Sutkin and colleagues (2015b):

A is instructing F which instrument to use to support the ovarian tissue. F has her hand on the handle of the instrument. A: ‘You can hold the ovary open with this one.’ A grasps the stem of the instrument and shifts it to the right. (Sutkin et al., 2015b, p. 254)

Active Verbal Chunking

A total of 40 excerpts (24.39%) contained active verbal chunking, making it the most identified scaffolding category combination in the present research after active verbal cueing. Verbal chunking often means employing abstract statements, which encourage the novice to construct, enrich, and retrieve cognitive schemas (compensation ‘after’ the working memory bottleneck). This is contrasted with the more concrete nature of cueing statements. Hinds and colleagues (2001) compare how beginning instructors teach novices, and how expert instructors teach novices. They found that expert instructors used more abstract statements than beginning instructors and fewer concrete statements. This suggests that those with more expertise may be more likely to employ chunking behaviours in their interactions with novices.

Finding 8: Experts Introduce Procedure and Planning Over Time

This 8th finding builds on a previous finding, namely finding 3. Finding 3 suggests that experts can scaffold novices through reassurance, by removing doubt and uncertainty about the order of steps within processes. Later, however, these processes and explanations about the ‘order of operations’ become increasingly present, which allows novices to build this into increasingly detailed schemata. Tambaum & Normak (2018) describe expert-novice interactions as ‘turns’, and how experts teaching computer skills were more likely to provide explanations in the second half of the ‘turn’, e.g. after the initial step or action has been completed. Although timelines are not always explicit in the found articles, temporal information can be inferred from excerpts, for example:

… [My preceptor asked me] ‘so a patient is being discharged, what do you do?’ … I was so nervous … But then I [answered] and she’s like ‘well done, you know how to do it. Just go and do that now’. (McSharry & Lathlean, 2017, p. 78)

Here, although the novice is nervous and potentially uncertain, they have had enough experience with the process to describe the next step in the process.

When the next step is clear, experts can move on to other steps later in the process, developing the novice’s script schemas:

The attending surgeon discusses with the surgical trainee a plan for more than one step beyond the current one, often in the context of what is currently happening. A is discussing the next 2 steps necessary to excise the cyst from the ovary: ‘All right, let’s do irrigation, and let’s do bipolar’”. (Sutkin et al., 2015a, p. 248)

Experts may introduce explanations of why they are taking the steps, instead of just what the steps are:

[...] several experts told me that for large adzes it is particularly important to start work on the bit (si) first, then the butt (bumyok), and finally the middle section (ting kwiribkun). This is because premature thinning of the middle makes the piece more likely to break as the bit and the butt are shaped. It was also emphasized that special attention needs to be paid to shaping the dorsal ridge (amyok), because it is one of the most difficult areas to flake and often gives apprentices trouble. (Stout, 2002, p. 703)

Furthermore, when the next steps in the current process are clear, experts start to introduce dialogues about other eventualities:

Some preceptors believed that asking students ‘what if’ questions helped them to think about what they might do in other situations. One preceptor shared; ‘Sometimes you give them scenarios ‘what if…?’ Okay, the obs [vital signs] are all fine but what if, what would you do, what would be your first [action]…? (McSharry & Lathlean, 2017, p. 79)

Finding 9: Experts Introduce Analogies to Relate New Information to Old Information

Another use of active verbal chunking is the use of analogies by experts. Analogies are often introduced through literary devices like metaphors and similes, and they serve to connect new information with existing, often ‘everyday’ information, that novices already have access to. Weddle & Hollan (2010, p. 140) observe how the expert dentist first instructs the student to ‘make the fulcrum do the work for you’, after which the expert and novice each introduce a simile: “[expert] ‘But it’s like…’ [novice] ‘Yes, it’s like a lever.’ [expert] ‘Yeah, there you go, yeah, like a see-saw.’ [novice] ‘And you get more power from a lever..’” The authors note that it is interesting the expert uses the ‘see-saw’ simile, which is less formal than the ‘lever’ simile introduced by the novice. Similarly, Sutkin and colleagues (2015a, p. 249) observe how an expert surgeon uses a metaphor: “The attending surgeon uses a verbal analogy to describe a surgical step. ‘That’s what I tell everyone in the operating room: the pelvis is a bowl and you’re shaking hands to open up your spaces’”. Lastly, in the physics domain, a tutor uses an analogy throughout a practice problem:

‘I would like you to think about an automobile tire. What happens if we put a nail into it?’ [novice] ‘Then you’re going to allow an escape for the air.’ [expert] ‘Why does the air escape?’ [novice] ‘Because you’ve got the great pressurized air inside of your tire’. (Clement & Steinberg, 2002, p. 403)

Active Nonverbal and Verbal Chunking

No excerpts were coded as nonverbal chunking, and two (1.22%) as combined verbal-nonverbal chunking. These excerpts were categorised as chunking as they highlight script schemas, which are schemas about the order of operations of various procedures.

Finding 10: Experts Use Combined Verbal-Nonverbal Chunking to Demonstrate Entire Procedures, Which Become the Novice’s Responsibility Over Time

An interesting finding within these excerpts is that two articles describe the transfer of control from expert to novice over time, in other words, ‘fading’. This includes McSharry & Lathlean (2017, p. 78), “…after a few times of watching they give you the opportunity to do it yourself with their supervision … They talk you through exactly what you’re going to do,…” and similarly:

Demonstrations relating to technical skills, such as performing blood pressure measurements, blood sugar measurements, wound care and different aspects of personal hygiene care were observed across the two dyads. However, the demonstrations were mainly observed at the start of the placement and became less apparent further on. One participant explained: ‘In the beginning, they [the students] observe me, how I perform technical skills and hygiene care. The first three or four days we go together, and gradually they take over themselves’. (Froiland et al., 2022, p. 901)

Passive Behaviours—Choosing Not to Act

An inductive finding from the excerpts was passive scaffolding behaviours, which is the choice of experts to not provide instruction in situations where instruction may be expected or prompted by the novice. As previously described, this form of scaffolding is defined as ‘indirect’ scaffolding by Tambaum (2017) and Tambaum & Normak (2018). This relates to ‘fading’, reducing support over time, and connects to the working memory bottleneck. As the intrinsic load of the task decreases for the novice due to repeated exposure, more working memory resources become available for germane cognitive processes associated with schema development (Sweller et al., 1998). Experts may increase germane load for novices by allowing them to integrate new information with existing schemata. An example of this is allowing novices to grapple with meta-dialogues (the opposite of what is described in finding 3). However, for inaction to be effective, the expert must be aware of the novice’s working memory bottleneck, and know they are able to handle the increased cognitive load, as demonstrated in findings 11 and 12.

Finding 11: Experts Choose not to Provide Cueing or Chunking to Let Novices Take on Responsibility

Passive scaffolding behaviours are instances of experts taking a step back and providing space to novices to take on responsibility, present in 13 excerpts (7.93%). The phrases ‘stand back’ or ‘step back’ are present in excerpts from Brady et al. (2021) and McSharry & Lathlean (2017). In Brady et al., an attending physician describes when they determine it is appropriate to step back:

You can sense it when you’re there with them . . . . If they know the anatomy, if they can guide themselves around, yeah I’ll give them more and more. Yeah I remember a few people where even in their first three months I’d just stand back and let them go . . . . I just have a feel for it now. (Brady et al., 2021, p. 1802)

Knowing when to step back can be difficult in scenarios where expert-novice interactions occur over a shorter time span, or when multiple interactions are necessary, as in some nursing education programmes:

You’d work with [a preceptor] a lot and they’d have trust in you then … They wouldn’t be standing over you the whole time but they would always be showing you new things … When you’ve so many different ones [preceptors] you don’t build that relationship… (McSharry & Lathlean, 2017, p. 76)

Time is a vital ingredient to the passive approach, as allowing novices to see how mistakes play out can help them to build schemas about what to do and what to avoid, as demonstrated in this bakery apprentice’s experience:

Interviewer: Which is the best way? Apprentice: The bakery’s, I believe. Interviewer: The bakery’s – why? Apprentice: You have the opportunity to work with things yourself, instead of over there (the ‘practise bakery’ at the school). If anyone does anything wrong, the teacher is there immediately, they don’t do that in the bakery. If you do something wrong, she [the master in the bakery] will tell me so at the end: ‘you shouldn’t do it like that’ and you can see the result when it has been baked. (Nielsen, 2008, p. 257)

Although this passive approach to scaffolding can be helpful, knowing when to step back can be a challenge for experts. A common approach found in Tambaum (2017) is for experts to remain passive until the novice is visibly stuck: “When a given task involved typing, tutors always used indirect scaffolding and intervened only when a learner encountered a problem (e.g. could not find a letter key)” (p. 105). Similarly, analysis by Tambaum & Normak (2018) found that passive scaffolding was often used in the second part of a turn, whereas direct scaffolding was much more common in the first part. Such passive scaffolding was prevalent in Tambaum’s (2017) research, being used almost as often as forms of direct scaffolding:

The average proportion of direct scaffolding tactics used by the tutor constituted 8% of all moves (max. 22%, min. 4%). If indirect scaffolding is also taken into account, the average use of scaffolding tactics increases to 14% of moves (max. 29%, min. 6%). (p. 105)

This research also demonstrated that age plays a role in the prevalence of passive scaffolding, as “older tutors tended to use indirect scaffolding more often than their younger counterparts” (p. 107).

Finding 12: When Experts Are Too Passive, Novices May Feel ‘Lost’

There are benefits to choosing to take a step back; however, some excerpts demonstrate how a lack of scaffolding behaviours can lead to poor results. Although our results show that active verbal cueing is often the ‘first step’ for novices, observations also describe how direct cues can fail when the cues are contingent on conceptual understanding:

Similarly, to the tendency of the tutors of beginners to explain what needs to be done and skipping the explanation how to do it, the tutors of advanced learners gave feedback to failed attempts by commenting on the change that took place on the screen (what went wrong – the arrow moved away from the icon), but not by giving feedback to the learner’s incorrect activity (why it went wrong – e.g. your hand did not stay in place while you clicked on the mouse). (Tambaum & Normak, 2018, p. 242)

Here, the experts cued ineffective actions (“what” went wrong) but did not support schemata development through chunking (“why” it went wrong), and this lack of conceptual understanding impedes the novices’ progress. McSharry & Lathlean (2017, p.78) also describe this lack of conceptual support in nursing education, where actions are taken without any verbal support: “[..] some preceptors in this study did not see it as their role to verbalise their practice in terms of what they were doing and their rationale for care in these instance students reported ‘feeling lost’”.

Striking a Balance Between Cueing and Chunking

Although both cueing and chunking behaviours can be highly effective in isolation, they are generally used simultaneously by experts through narration of actions, which is in line with our previous description of the modality effect (Ginns, 2005; Mousavi et al., 1995). Almost all included articles contained excerpts that demonstrated both cueing and chunking actions with verbal and nonverbal modalities. It is important to find a balance between these behaviours, as in some cases, they will only be effective in conjunction. For example, for active verbal cues to be effective, they should be specific and accompanied by an explanation:

Many preceptors expected students to ‘show initiative’, ‘ask questions’, ‘keep busy’ and ‘get the work done’ with little direction. Some students described how they would ‘busy themselves’, carrying out tasks within the routine, they often were not aware of the holistic plan of care of the patient. (McSharry & Lathlean, 2017, p. 76)

Novices often need direct and specific cues, as they ‘don’t know what they don’t know.’ Particularly in early phases of learning, direct verbal cues to prompt novice behaviour are necessary, as novices do not yet have enough knowledge to know what to ask or where they can take initiative. Froiland et al. (2022) describe how there are often differences in expert mentoring styles, which lead to differences in output:

Conversations between the RN mentors and their assigned students were often characterised by the mentor’s instructions and explanations. In dyad 1, the RN mentor mainly provided the student with her own reflections on task priorities, technical skills and the clinical observations she conducted, rather than promoting reflective dialogues that included the nursing student’s thoughts and perspective. (Froiland et al., 2022, p. 901)

This excerpt describes a mentor that predominantly engages in cueing, prompting immediate next steps in the process and sharing information relevant for the task at hand (with only prioritisation as a potential example of chunking). On the other hand, this article describes another mentor who conversely engages in more chunking through reflection:

In comparison, the RN mentor in dyad 2 began by introducing and emphasising professional reflections with her assigned students. Observations from this dyad indicated that these students gradually went from being more passive and uncertain about how to reflect to initiating professional reflections themselves, thus developing independence throughout their placement period. (Froiland et al., 2022, p. 901)

It appears the level of effectiveness of scaffolding behaviours may vary over time. Cueing is often described as taking place near the start of the novice’s learning process, where information reduction is a crucial step (Haider & Frensch, 1996). These cues often prompt direct actions from novices or provide necessary information for the novice to start or continue a task. A certain level of familiarity is required, which can only be gained through repetition before chunking starts to become beneficial.

Discussion

In this scoping review, we present a framework for categorising scaffolding behaviours based on their function relative to the working memory bottleneck and their method of expression. When the working memory is overloaded by information, i.e. cognitive load is too high, information will be lost from the working memory. We identify two categories of scaffolding behaviours that can be used to reduce cognitive load. These are cueing, reducing the amount of incoming information by modulating which information is attended to, and chunking, connecting information to existing knowledge to reduce how much ‘space’ it takes up in the working memory. By using this framework, we were able to integrate findings from 18 papers describing scaffolding behaviours between experts and novices from various visual domains. This integration allowed us to compare findings with diverging methodologies.

In the present paper, we aimed to answer the following research questions: (1) Which scaffolding behaviours can be identified in research into the expert-novice dynamic in the context of visual expertise development? (2) What the frequency is with which cueing and chunking are manifested through both verbal and nonverbal means in scaffolding behaviours, as analysed within the context of our framework? (3) What (if any) further categories can be defined within our framework? To answer RQ1, analysis using our framework revealed that scaffolding behaviours can be identified, with almost all categories being identified in the present analysis. Regarding RQ3, the additional category of passive and active scaffolding was identified. Lastly, to address RQ2, the identified behaviours predominantly fell into the categories of active verbal cueing (30.49%) and active verbal chunking (24.39%). Numerous nonverbal and combined verbal-nonverbal active cues were also found (12.20% and 17.07%, respectively). On the other hand, fewer instances of combined verbal/nonverbal active chunking were found, no instances of nonverbal chunking, as well as few instances (7.92%) of passive scaffolding. It is noteworthy that for both cueing and chunking categories, most excerpts were verbal or combined verbal/nonverbal, and few to no excerpts were nonverbal only.

It is possible that verbal scaffolding is simply more common in expert-novice interactions, for various reasons. Verbal language shared by experts and novices provides a broad common ground for communication. Experts may be more comfortable verbally describing what they see, whereas figuring out which nonverbal actions best support novice learning could be more difficult (or simply take more time and effort to plan). Referring back to Fig. 1, many memory models describe a dual-channel approach where visual information is processed in parallel with auditory information. Given our focus is visual problem-solving, where the task itself takes up space in the visual channel, providing information through the parallel auditory channel may be inherently more efficient than providing additional visual information through gestures. Furthermore, as described in Finding 5, the use of isolated nonverbal cues requires some novice proficiency, in particular integration of prior knowledge into the working memory. Experts may rely on verbal scaffolding in scenarios where the novice’s prior knowledge is low or unknown.

However, it is also possible that the prevalence of verbal scaffolding is methodological, inFootnote 1. Moreover, for nonverbal interactions to be captured by researchers, the researchers themselves must be familiar enough with the domain to disentangle meaningful nonverbal actions from trivial ones, which is a more difficult pursuit than comprehension of jargon in verbal communication. Present literature may thus be skewed towards reporting verbal communication; or alternatively, existing research may study domains where such communication is predominantly accessible through speech. A similar argument can be made for the presence of fewer instances of passive scaffolding relative to active scaffolding; this is further described in our limitations.

Conceptually, our framework can unite expert-novice scaffolding research into tangible categories based on underlying cognitive processes as outlined in Fig. 1. It provides both breadth in the content it encompasses and depth in the function of scaffolding behaviours. Some other frameworks were referenced by articles within the present paper; in particular, we found that Collin’s theory of cognitive apprenticeship was often referenced in articles from the medical domain (Collins et al., 1989). Cognitive apprenticeship is widely considered effective as a cycle for creating instruction, particularly because it can make tacit processes explicit, thus helping novices overcome the hurdles of interaction with experts. However, our framework provides a level of specificity and a connection to underlying cognitive processes that is not present in the cognitive apprenticeship model. For example, modelling under cognitive apprenticeship theory can consist of nonverbal cueing, nonverbal chunking, combined verbal/nonverbal cueing, and combined verbal/nonverbal chunking. Our framework provides specificity here, and each of these categories connects to a specific aspect of the working memory bottleneck through cognitive load theory.

Literature included in the present review spanned a variety of visual domains. Medical domains were dominant in our sample, with 10 out of 18 papers falling into this domain. The remaining papers were almost all from unique domains, with the only similar domains being in the physics domain (Clement & Steinberg, 2002; Hinds et al., 2001). Given Embase is primarily geared towards biomedical and pharmacological research, some skew towards the medical domain was expected. To combat this, we investigated domain-specific keywords for other visual domains in our initial explosion search and included these in our final search term in both Embase and Web of Science. However, the initial explosion search revealed that the medical domain has a large amount of recently published research into medical education. Since most learning in medical placements such as residencies consists of one-on-one expert-novice learning, it is understandable that this domain would be highly represented in our identified literature.

Limitations

Our small sample of 18 papers left many domains unrepresented. Given the broad scope of our search term, this may suggest that the specific method of reporting scaffolding we looked for during the screening process might not be commonly used in other visual domains. The skew towards the medical domain carries an implication: although our goal is to draw conclusions that apply generally across domains, the insights derived from our analysis may predominantly reflect the nature of scaffolding interactions within the medical context, rather than other visual areas.

Under the assumption that the cognitive architecture of memory and expertise development is the same in experts and novices in various visual domains, our findings should still be applicable to other visual domains, despite this skew towards medicine. However, scaffolding interactions may look different in various visual domains due to external factors such as time pressure. Scaffolding via cueing takes less time to apply than chunking, often because chunking requires integration with prior knowledge (see Fig. 1). For example, when contrasting Clement & Steinberg (2002) and other articles describing surgery (Brady et al., 2021; Sutkin et al., 2015a, 2015b), reassurance through cueing occurs in shorter phrases in the time-intensive setting of surgery. This is shown using “yeah” and “now” in Brady and colleagues (2021, p. 1804), and in Sutkin and colleagues (2015a, p. 248), “A: ‘Cut. Go. With that.’ F wordlessly activates the bovie and incises the tissue. A: ‘Yeah’”. On the other hand, Clement & Steinberg’s (2002) observations show that the physics expert uses longer phrases, which often lend themselves to chunking, such as “Can you tell me why air comes out of the tire?” (p. 405) and “Now I’m going to talk about something completely different, which is going to seem to be unconnected…” (p. 402). At present, it appears the body of literature explicitly describing scaffolding behaviours related to visual expertise in one-on-one scenarios is limited in domains outside of medicine. We hope that in the future when more such research is available, a similar analysis can be done, and this limitation can be addressed.

Furthermore, although our inter-rater reliability is moderate, there is room for improvement. The lowest inter-rater reliability score comes from the inductive category for ‘passive’ scaffolding (κ = 0.55). In observational research, it can be difficult for a researcher as an observer to determine whether a lack of action is intentional, and if so, what the instructional action would have been had the expert chosen to act. These are inferences that cannot be made within the observational framework where interference is not possible. Our inter-rater reliability value for passive categorisation may reflect the need for inferences to be made when determining the intention of an observed action. In terms of empirical research, investigation methods such as stimulated recall with the aid of video recordings may be helpful in clarifying passive intent in experts.

Lastly, given the largely observational nature of the discussed studies, our review cannot provide explicit insight into the benefits of scaffolding behaviours on task performance, and whether (or how) novices’ attentional and cognitive processes are directly affected by the scaffolding behaviours of the experts. We encourage future empirical research to explore this link with causal research designs.

Conclusions

Ultimately, our framework and the findings that have been extracted using this framework can shed light on expert-novice scaffolding in various types of research. Our 12 findings are drawn from the reviewed articles; however, due to the small sample of 18 papers, there may be more patterns in cueing and chunking behaviours that have not yet been identified; for this reason, further research into scaffolding by visual experts is encouraged. Given much of the reviewed material is observational, we can draw conclusions about which scaffolding behaviours are commonly used, but further empirical research is necessary to draw sound conclusions about which behaviours are most effective for communicating visual expertise. We recommend that future empirical research uses this framework as the foundation for both further observational research into expert-novice interactions, and interventional research to investigate which scaffolding behaviours are most effective in the process of expertise conveyance during visual problem-solving.

The presence of human experts remains critical for both development of novices’ skills and stimulating their interest in a domain (Feeley et al., 2022). It is important to note that the present review incorporated only face-to-face expert-novice interactions—whether our findings hold true in hybrid and online settings is yet to be determined. We hope to encourage more research into expert-novice scaffolding, in visual domains or otherwise, to help improve our collective understanding of expert-novice learning.